Geometric Data Science

Reading time: 5 minute
...

📝 Original Info

  • Title: Geometric Data Science
  • ArXiv ID: 2512.05040
  • Date: 2025-12-04
  • Authors: Olga D Anosova, Vitaliy A Kurlin

📝 Abstract

This book introduces the new research area of Geometric Data Science, where data can represent any real objects through geometric measurements. The first part of the book focuses on finite point sets. The most important result is a complete and continuous classification of all finite clouds of unordered points under rigid motion in any Euclidean space. The key challenge was to avoid the exponential complexity arising from permutations of the given unordered points. For a fixed dimension of the ambient Euclidean space, the times of all algorithms for the resulting invariants and distance metrics depend polynomially on the number of points. The second part of the book advances a similar classification in the much more difficult case of periodic point sets, which model all periodic crystals at the atomic scale. The most significant result is the hierarchy of invariants from the ultra-fast to complete ones. The key challenge was to resolve the discontinuity of crystal representations that break down under almost any noise. Experimental validation on all major materials databases confirmed the Crystal Isometry Principle: any real periodic crystal has a unique location in a common moduli space of all periodic structures under rigid motion. The resulting moduli space contains all known and not yet discovered periodic crystals and hence continuously extends Mendeleev's table to the full crystal universe.

💡 Deep Analysis

Figure 1

📄 Full Content

Where there is Matter, there is Geometry -Johannes Kepler a key figure in the 17th-century Scientific Revolution.

This book introduces the new research area of Geometric Data Science, where data can represent any real objects through geometric measurements. Some of the simplest inputs of real data objects are finite and periodic sets of unordered points.

For example, a molecule can be fully described by the positions of its atoms in a 3-dimensional space. However, many descriptions are highly ambiguous, especially to a computer, which operates only with numbers. For example, a photograph is ambiguous, because any object can have an astronomically large number of photographs.

All attempts to standardise photographs, as in passports, have shifted towards more reliable biometric data. Indeed, the identification of living organisms was dramatically improved due to the discovery of a DNA structure. However, geometric structures remained ambiguous for many objects, including proteins and materials, which are still represented by photograph-style inputs depending on arbitrary coordinate systems.

The major obstacle to progress from trial-and-error in chemistry and biology to a justified design of materials and drugs was the absence of rigorous definitions and problem statements. Geometric Data Science fills this gap by developing foundations based on equivalences, invariants, distance metrics, and polynomial-time algorithms.

The main geo-mapping problem is to analytically describe moduli spaces of geometric structures that are classes of data objects modulo an equivalence relation. These moduli spaces are prototypes of ’treasure maps’ containing all known objects of a certain type as well as all not yet discovered ones. A discrete example is Mendeleev’s table of chemical elements, which was initially half-empty, but importantly guided an efficient search for new elements. A continuous example is a geographic map of the Earth, where any location is unambiguously identified by the latitude and longitude.

Geometric Data Science aims to develop universal geographic-style coordinates for all real data objects under practically important equivalences, such as rigid motion. The first part of the book focuses on finite point sets. The most important result is a complete and continuous classification of all finite clouds of unordered points under rigid motion in any Euclidean space. The key challenge was to avoid the exponential complexity arising from permutations of the given unordered points. For a fixed dimension of the ambient Euclidean space, the times of all algorithms for the resulting invariants and distance metrics depend polynomially on the number of points.

The second part of the book advances a similar classification in the much more difficult case of periodic point sets, which model all periodic crystals at the atomic scale. The most significant result is the hierarchy of invariants from the ultra-fast to complete ones. The key challenge was to resolve the discontinuity of crystal representations that break down under almost any noise. Experimental validation on all major materials databases confirmed the Crystal Isometry Principle: any real periodic crystal has a unique location in a common moduli space of all periodic structures under rigid motion. The resulting moduli space contains all known and not yet discovered periodic crystals and hence continuously extends Mendeleev’s table to the full crystal universe.

The book was written for research students and professionals who work in mathematics and need rigorously justified and computationally efficient methods for real data. such as crystalline materials and molecules, including proteins. The pre-requisite knowledge is linear algebra, metric geometry, and calculus at the undergraduate level.

We finish by extending Johannes Kepler’s quote from the 17th century to inspire a transformation from brute-force computations, which currently ‘burn’ our planet, to a 21st-century Maths for Science revolution: where there is Data, there is Geometry.

The initial question that can be asked about any real data object is what is it? or (more formally) how is it defined? or (more deeply) how can we make sense of this data?

The first obstacle in achieving these goals is to embrace differences between real objects and their digital representations. For example, a car is a physical object that is very different from a pixel-based image of this car, which is only a matrix of integers.

The second obstacle is the ambiguity of digital representations in the sense that any real object can have many representations that look very different to a computer.

If measurements have continuous real values, the resulting space of representations is infinite. Even if we fix a finite resolution of physical measurements, all potential data values still live in a huge space. For example, all images of size 2 × 2 pixels and greyscale intensities 0, . . . , 255 form a huge collection

📸 Image Gallery

0-th_density.png 0-th_density_radii0.png 025r.png 055r.png 075r.png 1-regular_set_isotree.png 1-regular_set_local_clusters.png 1547d30046.png 1D_periodic_lattice_perturbation.png 1D_periodic_sequence_perturbation.png 1hho_2hhb_helices.png 1hho_a_141_crop.png 1hho_a_141_rgb_narrow_crop.png 1r.png 1st_density.png 2-regular_set_isotree.png 2-regular_set_local_clusters.png 2hhb_1hho.png 2hhb_a_141_crop.png 2hhb_a_141_rgb_narrow_crop.png 2nd_density.png 3-point_set_densities9.png 4-point-clouds-family.png 4-point-clouds-origin.png 4-point_clouds_h.png 4-point_clouds_h_labeled.png 4-point_clouds_oriented.png 4-regular_set_isotree.png 5-point_sets.png 6-point_periodic_pair.png 6-point_sets.png 7-point_sets.png 9-point_trees.png A-lab_MnAgO2.png Brillouin_30_Hexagonal.png Brillouin_30_Square.png COD2310812.png COD2310813.png CRISP-infinitely-many-layers.png CSD_mirror_images_2.png CSD_mirror_images_EMD.png CSD_mirror_images_times.png DC_L046inf.png DT_L0246inf.png Feynman+CRISP.png GDS-foundations.png GNoME4cb3b6ed9f.png GNoME776c1b7570.png Geometric-Data-Science-flagship.png HS+.png Hausdorff-vs-bottleneck.png ICSD42291.png ICSD42302.png PDD_under_noise_square4.png PUBTEM01_CSD1853896.png PUBTEM_CSD732550.png Pauling_EMD_alpha.png Q15_psi_all_10.png QS+RMo.png QS.png QS_L0246inf.png QS_sphere.png QT+PC.png QT.png QT_incircle.png S15_psi_all_10.png SQ15densities.png SQ15density4.png SRD+SPD+PDD+SDD+SCD.png T2alpha_NAVXUG-vs-99.png T2beta_DEBXIT05-vs-28.png T2delta_SEMDIA-vs-9.png T2delta_axes.png T2epsilon_exp-vs-1.png T2gamma_DEBXIT01-vs-62.png T2landscape.jpg T2molecule.png Voronoi2D.png achiral_lattices.png ada_db_averages_std.png alpha_superimposed_LO.png backbone-rigid-space_colour.png benzene_ring_3D.png beta_superimposed_LO.png cdc06a1a2a_CIF.png cdc06a1a2a_Mercury.png challenges-problem-solution-outcomes.png chlorobenzene_color.png coform2d.png common_ADA25.png common_AMD25.png cone2d.png crystal_finite_subsets.png ddc216e80c.png delta_superimposed_LO.png densities1D.png densities_1point_radius.png energy_barriers.png epsilon_superimposed_LO.png finite_sequences.png full_triangle2d.png fullerene_C20isomers.png gamma_superimposed_LO.png geometric-data-science-theory-applications.png graphene.png growing_intervals.png heatmap_C_allotropes.png hexagonal-lattice-cell-scales-up.png hexagonal-to-rectangular.png hexagonal2d.png hexagonal_lattice_densigram.png hexagonal_lattice_densities_0.png hexagonal_lattice_isotree.png hierarchy_sets.png histogram_maxima_10points_1000sequences.png homometric1D_SQ15.png homometric_crystals_comparison48v.png icsd_139006.png icsd_670065.png intervals_gaps.png lattice_classification2simple.png moduli-cloud-isometry-space.png mountain_peaks_above_clouds_small.jpg mp-1221808__Mn4Cr_Co2Sn_5.png mp-90_Cr.png noise_doubles.png octant2d.png periodic_set=lattice+motif_small.png perturbations_1hho_2hhb.png phenyllithium_color.png protein-backbone-vertical_colour.png rect_basis_discontinuity_simple.png rect_deformation_QS.png rectangular_cloud.png reduced_bases.png residue-basis-invariant_colour.png root_forms2d_reflection.png same-different-molecules.png sampled-waves.png set0_0125_025_075_densities3.png set0_powers2_psi_3_eta.png set0_powers3_psi_2_eta.png space_triangles.png square-vs-hexagon.png square2d.png square_lattice_cells3.png square_lattice_densigram.png square_lattice_densities_0.png square_lattice_isotree.png symmetric+special_clouds.png triangle-vs-square.png triangular_clouds.png voform2d.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut