Biological Database of Images and Genomes: tools for community annotations linking image and genomic information
Genomic data and biomedical imaging data are undergoing exponential growth. However, our understanding of the phenotype-genotype connection linking the two types of data is lagging behind. While there are many types of software that enable the manipulation and analysis of image data and genomic data as separate entities, there is no framework established for linking the two. We present a generic set of software tools, BioDIG, that allows linking of image data to genomic data. BioDIG tools can be applied to a wide range of research problems that require linking images to genomes. BioDIG features the following: rapid construction of web-based workbenches, community-based annotation, user management, and web-services. By using BioDIG to create websites, researchers and curators can rapidly annotate large number of images with genomic information. Here we present the BioDIG software tools that include an image module, a genome module and a user management module. We also introduce a BioDIG-based website, MyDIG, which is being used to annotate images of Mycoplasma.
💡 Research Summary
The paper introduces BioDIG, a generic, open‑source software suite designed to bridge the growing gap between biomedical imaging data and genomic information. While image analysis tools and genome browsers have matured independently, there has been no unified framework that allows researchers to link visual phenotypes directly to underlying genotypes. BioDIG addresses this need by providing three tightly integrated modules: an image module, a genome module, and a user‑management module, all wrapped in a web‑based workbench that can be rapidly deployed for any organism or research question.
The image module supports uploading high‑resolution microscopy, histology, or electron‑microscopy images, storing associated metadata (instrument, acquisition parameters, licensing), and defining Regions of Interest (ROIs). Users can annotate each ROI with free‑text labels or controlled vocabulary tags, which later become the anchors for genomic linkage. The genome module imports sequence data, gene models, functional annotations (GO, KEGG, etc.), and variant calls (VCF) from public repositories such as RefSeq, Ensembl, or UCSC. A dedicated “image‑gene mapping table” stores the many‑to‑many relationships between ROIs and genomic features, enabling both manual curation and automated bulk mapping.
User management implements role‑based access control (RBAC). Administrators configure the system and database schema, curators (or “validators”) review community‑submitted annotations, and ordinary users can upload images and propose annotations. This hierarchy preserves data quality while encouraging large‑scale community participation.
Technically, BioDIG is built on the Django web framework with a MySQL relational database. Image files reside on the file system, while metadata, annotations, and mapping tables are stored in the database, allowing fast queries even with thousands of high‑resolution images. A RESTful API exposes all core functions—image retrieval, ROI definition, gene lookup, annotation history, and user permissions—in JSON format, facilitating integration with external pipelines, machine‑learning workflows, or other bioinformatics platforms.
To demonstrate real‑world applicability, the authors deployed a BioDIG‑powered site called MyDIG focused on the bacterium Mycoplasma. Researchers using MyDIG upload microscopic images of Mycoplasma cells, delineate ROIs, and link each region to specific genes or mutations. Curators then validate these links before they become publicly visible. Within a short period, hundreds of images and thousands of gene‑ROI associations were accumulated, providing a valuable resource for studying how morphological variation correlates with genomic differences in this organism.
The authors highlight several strengths of BioDIG: rapid construction of web‑based annotation workbenches, support for community‑driven curation, fine‑grained user‑role management, and standardized web services that enable programmatic access. They also acknowledge limitations: current support for image formats is modest, scalability to multi‑gigabase genomes will require performance tuning, and there is no built‑in automated confidence scoring for annotations—an area ripe for future machine‑learning integration.
In conclusion, BioDIG represents the first comprehensive, open‑source platform that unifies image and genome data under a single, extensible framework. Its modular design, web‑centric architecture, and community‑annotation workflow make it suitable for a broad spectrum of life‑science domains, from microbiology and pathology to developmental biology. With further development—such as plug‑in extensions, cloud‑based storage, and AI‑driven annotation validation—BioDIG has the potential to become the de‑facto standard for phenotype‑genotype integration in the era of big biomedical data.
Comments & Academic Discussion
Loading comments...
Leave a Comment