A proposal for community driven and decentralized astronomical databases and the Open Exoplanet Catalogue

A proposal for community driven and decentralized astronomical databases   and the Open Exoplanet Catalogue

I present a new kind of astronomical database based on small text files and a distributed version control system. This encourages the community to work collaboratively. It creates a decentralized, completely open and democratic way of managing small to medium sized heterogeneous astronomical databases and catalogues. The use of the XML file format allows an easy to parse and read, yet dynamic and extendable database structure. The Open Exoplanet Catalogue is based on these principles and presented as an example. It is a catalogue of all discovered extra-solar planets. It is the only catalogue that can correctly represent the orbital structure of planets in arbitrary binary, triple and quadruple star systems, as well as orphan planets.


💡 Research Summary

The paper proposes a novel architecture for astronomical databases that replaces traditional centralized relational systems with a lightweight, file‑based approach managed through a distributed version‑control system (DVCS), specifically Git. By storing each astronomical object and its attributes in small, human‑readable XML files, the database gains a flexible, hierarchical structure capable of representing complex systems such as binaries, triples, and quadruple star arrangements, as well as free‑floating (orphan) planets. XML’s schema validation ensures data integrity while remaining easy to edit and parse with standard tools.

Git provides decentralized storage, automatic provenance tracking, and a collaborative workflow. Researchers clone the repository, make local edits, and submit changes via pull requests. Each commit records the author, timestamp, and rationale, creating a complete, immutable history that supports reproducibility and precise citation of specific data versions. Branching allows parallel development of experimental or curated subsets, and merging resolves conflicts with built‑in mechanisms. Continuous integration pipelines can automatically validate XML against the schema, compute summary statistics, and flag inconsistencies whenever new data are pushed.

The Open Exoplanet Catalogue serves as a concrete implementation of these principles. Unlike existing exoplanet archives that flatten multi‑star systems into a single‑star paradigm, this catalogue encodes the full orbital hierarchy: each star is an XML node, and planets are nested under the star they orbit. This representation preserves the true dynamical architecture of systems like 55 Cancri, enabling direct use in orbital dynamics simulations and facilitating accurate statistical analyses of multi‑star planetary populations. The catalogue is openly licensed, hosted on a public platform, and welcomes contributions from anyone, thereby democratizing data curation.

Key advantages highlighted include: (1) full transparency and reproducibility through versioned history; (2) low barrier to community contributions via familiar DVCS workflows; (3) extensibility—new parameters or object types can be added by extending the XML schema without redesigning a database schema; (4) cost‑effectiveness, as no dedicated database server is required; and (5) educational outreach, because the data are accessible to students and citizen scientists.

The authors acknowledge limitations: file‑based storage is less efficient for massive datasets such as high‑resolution spectra or large image catalogs, and complex ad‑hoc queries may perform poorly compared to indexed relational databases. Consequently, the proposed model is best suited for “small‑to‑medium” heterogeneous catalogs, and a hybrid architecture (file‑based metadata plus a separate indexing service) could address scalability concerns.

In summary, the paper demonstrates that a combination of XML and Git provides a decentralized, open, and democratic framework for managing astronomical catalogs. The Open Exoplanet Catalogue validates the concept, showing that intricate orbital configurations can be faithfully captured and that the community can collaboratively maintain a living, versioned dataset. This approach offers a promising blueprint for future astronomical databases across various sub‑fields, fostering greater collaboration, data integrity, and openness in the scientific process.