Geometric Data Science

Reading time: 6 minute
...

📝 Original Info

  • Title: Geometric Data Science
  • ArXiv ID: 2512.05040
  • Date: 2025-12-04
  • Authors: Olga D Anosova, Vitaliy A Kurlin

📝 Abstract

This book introduces the new research area of Geometric Data Science, where data can represent any real objects through geometric measurements. The first part of the book focuses on finite point sets. The most important result is a complete and continuous classification of all finite clouds of unordered points under rigid motion in any Euclidean space. The key challenge was to avoid the exponential complexity arising from permutations of the given unordered points. For a fixed dimension of the ambient Euclidean space, the times of all algorithms for the resulting invariants and distance metrics depend polynomially on the number of points. The second part of the book advances a similar classification in the much more difficult case of periodic point sets, which model all periodic crystals at the atomic scale. The most significant result is the hierarchy of invariants from the ultra-fast to complete ones. The key challenge was to resolve the discontinuity of crystal representations that break down under almost any noise. Experimental validation on all major materials databases confirmed the Crystal Isometry Principle: any real periodic crystal has a unique location in a common moduli space of all periodic structures under rigid motion. The resulting moduli space contains all known and not yet discovered periodic crystals and hence continuously extends Mendeleev's table to the full crystal universe.

💡 Deep Analysis

Deep Dive into Geometric Data Science.

This book introduces the new research area of Geometric Data Science, where data can represent any real objects through geometric measurements. The first part of the book focuses on finite point sets. The most important result is a complete and continuous classification of all finite clouds of unordered points under rigid motion in any Euclidean space. The key challenge was to avoid the exponential complexity arising from permutations of the given unordered points. For a fixed dimension of the ambient Euclidean space, the times of all algorithms for the resulting invariants and distance metrics depend polynomially on the number of points. The second part of the book advances a similar classification in the much more difficult case of periodic point sets, which model all periodic crystals at the atomic scale. The most significant result is the hierarchy of invariants from the ultra-fast to complete ones. The key challenge was to resolve the discontinuity of crystal representations t

📄 Full Content

Where there is Matter, there is Geometry -Johannes Kepler a key figure in the 17th-century Scientific Revolution.

This book introduces the new research area of Geometric Data Science, where data can represent any real objects through geometric measurements. Some of the simplest inputs of real data objects are finite and periodic sets of unordered points.

For example, a molecule can be fully described by the positions of its atoms in a 3-dimensional space. However, many descriptions are highly ambiguous, especially to a computer, which operates only with numbers. For example, a photograph is ambiguous, because any object can have an astronomically large number of photographs.

All attempts to standardise photographs, as in passports, have shifted towards more reliable biometric data. Indeed, the identification of living organisms was dramatically improved due to the discovery of a DNA structure. However, geometric structures remained ambiguous for many objects, including proteins and materials, which are still represented by photograph-style inputs depending on arbitrary coordinate systems.

The major obstacle to progress from trial-and-error in chemistry and biology to a justified design of materials and drugs was the absence of rigorous definitions and problem statements. Geometric Data Science fills this gap by developing foundations based on equivalences, invariants, distance metrics, and polynomial-time algorithms.

The main geo-mapping problem is to analytically describe moduli spaces of geometric structures that are classes of data objects modulo an equivalence relation. These moduli spaces are prototypes of ’treasure maps’ containing all known objects of a certain type as well as all not yet discovered ones. A discrete example is Mendeleev’s table of chemical elements, which was initially half-empty, but importantly guided an efficient search for new elements. A continuous example is a geographic map of the Earth, where any location is unambiguously identified by the latitude and longitude.

Geometric Data Science aims to develop universal geographic-style coordinates for all real data objects under practically important equivalences, such as rigid motion. The first part of the book focuses on finite point sets. The most important result is a complete and continuous classification of all finite clouds of unordered points under rigid motion in any Euclidean space. The key challenge was to avoid the exponential complexity arising from permutations of the given unordered points. For a fixed dimension of the ambient Euclidean space, the times of all algorithms for the resulting invariants and distance metrics depend polynomially on the number of points.

The second part of the book advances a similar classification in the much more difficult case of periodic point sets, which model all periodic crystals at the atomic scale. The most significant result is the hierarchy of invariants from the ultra-fast to complete ones. The key challenge was to resolve the discontinuity of crystal representations that break down under almost any noise. Experimental validation on all major materials databases confirmed the Crystal Isometry Principle: any real periodic crystal has a unique location in a common moduli space of all periodic structures under rigid motion. The resulting moduli space contains all known and not yet discovered periodic crystals and hence continuously extends Mendeleev’s table to the full crystal universe.

The book was written for research students and professionals who work in mathematics and need rigorously justified and computationally efficient methods for real data. such as crystalline materials and molecules, including proteins. The pre-requisite knowledge is linear algebra, metric geometry, and calculus at the undergraduate level.

We finish by extending Johannes Kepler’s quote from the 17th century to inspire a transformation from brute-force computations, which currently ‘burn’ our planet, to a 21st-century Maths for Science revolution: where there is Data, there is Geometry.

The initial question that can be asked about any real data object is what is it? or (more formally) how is it defined? or (more deeply) how can we make sense of this data?

The first obstacle in achieving these goals is to embrace differences between real objects and their digital representations. For example, a car is a physical object that is very different from a pixel-based image of this car, which is only a matrix of integers.

The second obstacle is the ambiguity of digital representations in the sense that any real object can have many representations that look very different to a computer.

If measurements have continuous real values, the resulting space of representations is infinite. Even if we fix a finite resolution of physical measurements, all potential data values still live in a huge space. For example, all images of size 2 × 2 pixels and greyscale intensities 0, . . . , 255 form a huge collection

…(Full text truncated)…

📸 Image Gallery

0-th_density.png 0-th_density.webp 0-th_density_radii0.png 0-th_density_radii0.webp 025r.png 025r.webp 055r.png 055r.webp 075r.png 075r.webp 1-regular_set_isotree.png 1-regular_set_isotree.webp 1-regular_set_local_clusters.png 1-regular_set_local_clusters.webp 1547d30046.png 1547d30046.webp 1D_periodic_lattice_perturbation.png 1D_periodic_lattice_perturbation.webp 1D_periodic_sequence_perturbation.png 1D_periodic_sequence_perturbation.webp 1hho_2hhb_helices.png 1hho_2hhb_helices.webp 1hho_a_141_crop.png 1hho_a_141_crop.webp 1hho_a_141_rgb_narrow_crop.png 1hho_a_141_rgb_narrow_crop.webp 1r.png 1r.webp 1st_density.png 1st_density.webp 2-regular_set_isotree.png 2-regular_set_isotree.webp 2-regular_set_local_clusters.png 2-regular_set_local_clusters.webp 2hhb_1hho.png 2hhb_1hho.webp 2hhb_a_141_crop.png 2hhb_a_141_crop.webp 2hhb_a_141_rgb_narrow_crop.png 2hhb_a_141_rgb_narrow_crop.webp 2nd_density.png 2nd_density.webp 3-point_set_densities9.png 3-point_set_densities9.webp 4-point-clouds-family.png 4-point-clouds-family.webp 4-point-clouds-origin.png 4-point-clouds-origin.webp 4-point_clouds_h.png 4-point_clouds_h.webp 4-point_clouds_h_labeled.png 4-point_clouds_h_labeled.webp 4-point_clouds_oriented.png 4-point_clouds_oriented.webp 4-regular_set_isotree.png 4-regular_set_isotree.webp 5-point_sets.png 5-point_sets.webp 6-point_periodic_pair.png 6-point_periodic_pair.webp 6-point_sets.png 6-point_sets.webp 7-point_sets.png 7-point_sets.webp 9-point_trees.png 9-point_trees.webp A-lab_MnAgO2.png A-lab_MnAgO2.webp Brillouin_30_Hexagonal.png Brillouin_30_Hexagonal.webp Brillouin_30_Square.png Brillouin_30_Square.webp COD2310812.png COD2310812.webp COD2310813.png COD2310813.webp CRISP-infinitely-many-layers.png CRISP-infinitely-many-layers.webp CSD_mirror_images_2.png CSD_mirror_images_2.webp CSD_mirror_images_EMD.png CSD_mirror_images_EMD.webp CSD_mirror_images_times.png CSD_mirror_images_times.webp DC_L046inf.png DC_L046inf.webp DT_L0246inf.png DT_L0246inf.webp Feynman+CRISP.png Feynman+CRISP.webp GDS-foundations.png GDS-foundations.webp GNoME4cb3b6ed9f.png GNoME4cb3b6ed9f.webp GNoME776c1b7570.png GNoME776c1b7570.webp Geometric-Data-Science-flagship.png Geometric-Data-Science-flagship.webp HS+.png HS+.webp Hausdorff-vs-bottleneck.png Hausdorff-vs-bottleneck.webp ICSD42291.png ICSD42291.webp ICSD42302.png ICSD42302.webp PDD_under_noise_square4.png PDD_under_noise_square4.webp PUBTEM01_CSD1853896.png PUBTEM01_CSD1853896.webp PUBTEM_CSD732550.png PUBTEM_CSD732550.webp Pauling_EMD_alpha.png Pauling_EMD_alpha.webp Q15_psi_all_10.png Q15_psi_all_10.webp QS+RMo.png QS+RMo.webp QS.png QS.webp QS_L0246inf.png QS_L0246inf.webp QS_sphere.png QS_sphere.webp QT+PC.png QT+PC.webp QT.png QT.webp QT_incircle.png QT_incircle.webp S15_psi_all_10.png S15_psi_all_10.webp SQ15densities.png SQ15densities.webp SQ15density4.png SQ15density4.webp SRD+SPD+PDD+SDD+SCD.png SRD+SPD+PDD+SDD+SCD.webp T2alpha_NAVXUG-vs-99.png T2alpha_NAVXUG-vs-99.webp T2beta_DEBXIT05-vs-28.png T2beta_DEBXIT05-vs-28.webp T2delta_SEMDIA-vs-9.png T2delta_SEMDIA-vs-9.webp T2delta_axes.png T2delta_axes.webp T2epsilon_exp-vs-1.png T2epsilon_exp-vs-1.webp T2gamma_DEBXIT01-vs-62.png T2gamma_DEBXIT01-vs-62.webp T2landscape.jpg T2landscape.webp T2molecule.png T2molecule.webp Voronoi2D.png Voronoi2D.webp achiral_lattices.png achiral_lattices.webp ada_db_averages_std.png ada_db_averages_std.webp alpha_superimposed_LO.png alpha_superimposed_LO.webp backbone-rigid-space_colour.png backbone-rigid-space_colour.webp benzene_ring_3D.png benzene_ring_3D.webp beta_superimposed_LO.png beta_superimposed_LO.webp cdc06a1a2a_CIF.png cdc06a1a2a_CIF.webp cdc06a1a2a_Mercury.png cdc06a1a2a_Mercury.webp challenges-problem-solution-outcomes.png challenges-problem-solution-outcomes.webp chlorobenzene_color.png chlorobenzene_color.webp coform2d.png coform2d.webp common_ADA25.png common_ADA25.webp common_AMD25.png common_AMD25.webp cone2d.png cone2d.webp crystal_finite_subsets.png crystal_finite_subsets.webp ddc216e80c.png ddc216e80c.webp delta_superimposed_LO.png delta_superimposed_LO.webp densities1D.png densities1D.webp densities_1point_radius.png densities_1point_radius.webp energy_barriers.png energy_barriers.webp epsilon_superimposed_LO.png epsilon_superimposed_LO.webp finite_sequences.png finite_sequences.webp full_triangle2d.png full_triangle2d.webp fullerene_C20isomers.png fullerene_C20isomers.webp gamma_superimposed_LO.png gamma_superimposed_LO.webp geometric-data-science-theory-applications.png geometric-data-science-theory-applications.webp graphene.png graphene.webp growing_intervals.png growing_intervals.webp heatmap_C_allotropes.png heatmap_C_allotropes.webp hexagonal-lattice-cell-scales-up.png hexagonal-lattice-cell-scales-up.webp hexagonal-to-rectangular.png hexagonal-to-rectangular.webp hexagonal2d.png hexagonal2d.webp hexagonal_lattice_densigram.png hexagonal_lattice_densigram.webp hexagonal_lattice_densities_0.png hexagonal_lattice_densities_0.webp hexagonal_lattice_isotree.png hexagonal_lattice_isotree.webp hierarchy_sets.png hierarchy_sets.webp histogram_maxima_10points_1000sequences.png histogram_maxima_10points_1000sequences.webp homometric1D_SQ15.png homometric1D_SQ15.webp homometric_crystals_comparison48v.png homometric_crystals_comparison48v.webp icsd_139006.png icsd_139006.webp icsd_670065.png icsd_670065.webp intervals_gaps.png intervals_gaps.webp lattice_classification2simple.png lattice_classification2simple.webp moduli-cloud-isometry-space.png moduli-cloud-isometry-space.webp mountain_peaks_above_clouds_small.jpg mountain_peaks_above_clouds_small.webp mp-1221808__Mn4Cr_Co2Sn_5.png mp-1221808__Mn4Cr_Co2Sn_5.webp mp-90_Cr.png mp-90_Cr.webp noise_doubles.png noise_doubles.webp octant2d.png octant2d.webp periodic_set=lattice+motif_small.png periodic_set=lattice+motif_small.webp perturbations_1hho_2hhb.png perturbations_1hho_2hhb.webp phenyllithium_color.png phenyllithium_color.webp protein-backbone-vertical_colour.png protein-backbone-vertical_colour.webp rect_basis_discontinuity_simple.png rect_basis_discontinuity_simple.webp rect_deformation_QS.png rect_deformation_QS.webp rectangular_cloud.png rectangular_cloud.webp reduced_bases.png reduced_bases.webp residue-basis-invariant_colour.png residue-basis-invariant_colour.webp root_forms2d_reflection.png root_forms2d_reflection.webp same-different-molecules.png same-different-molecules.webp sampled-waves.png sampled-waves.webp set0_0125_025_075_densities3.png set0_0125_025_075_densities3.webp set0_powers2_psi_3_eta.png set0_powers2_psi_3_eta.webp set0_powers3_psi_2_eta.png set0_powers3_psi_2_eta.webp space_triangles.png space_triangles.webp square-vs-hexagon.png square-vs-hexagon.webp square2d.png square2d.webp square_lattice_cells3.png square_lattice_cells3.webp square_lattice_densigram.png square_lattice_densigram.webp square_lattice_densities_0.png square_lattice_densities_0.webp square_lattice_isotree.png square_lattice_isotree.webp symmetric+special_clouds.png symmetric+special_clouds.webp triangle-vs-square.png triangle-vs-square.webp triangular_clouds.png triangular_clouds.webp voform2d.png voform2d.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut