Breakthrough in Interval Data Fitting I. The Role of Hausdorff Distance
This is the first of two papers describing the process of fitting experimental data under interval uncertainty. Here I present the methodology, designed from the very beginning as an interval-oriented tool, meant to replace to the large extent the famous Least Squares (LSQ) and other slightly less popular methods. Contrary to its classical counterparts, the presented method does not require any poorly justified prior assumptions, like smallness of experimental uncertainties or their normal (Gaussian) distribution. Using interval approach, we are able to fit rigorously and reliably not only the simple functional dependencies, with no extra effort when both variables are uncertain, but also the cases when the constitutive equation exists in implicit rather than explicit functional form. The magic word and a key to success of interval approach appears the Hausdorff distance.
💡 Research Summary
The paper introduces a novel interval‑based data‑fitting methodology that is intended to replace, to a large extent, the classical Least Squares (LSQ) approach and its less popular variants. The motivation stems from the observation that LSQ relies on two fragile assumptions: that measurement uncertainties are small and that they follow a Gaussian distribution. In many experimental contexts these assumptions are violated—uncertainties can be large, asymmetric, or non‑Gaussian—leading to biased parameter estimates and unreliable confidence intervals.
To overcome these limitations, the author builds a fitting framework directly on interval arithmetic. Both the independent variables and the dependent observations are represented as intervals, and the model itself may be given either in explicit form (y = f(x,θ)) or implicit form (g(x, y, θ) = 0). The central concept that makes the interval approach viable is the Hausdorff distance, a metric that measures the greatest of all the distances from a point in one set to the closest point in the other set. By defining the objective function as the Hausdorff distance between the interval‑valued model predictions F(θ) and the interval‑valued data Y, the fitting problem becomes:
minθ d_H(F(θ), Y).
This formulation has several important consequences. First, it treats the worst‑case discrepancy between model and data, guaranteeing that the resulting parameter interval encloses all admissible solutions consistent with the measured uncertainties. Second, it eliminates the need for any probabilistic assumptions about the error distribution; the interval representation alone captures the full range of possible measurement values. Third, because the Hausdorff distance is defined for arbitrary sets, the method can handle implicit models without the need to algebraically solve for an explicit function.
From an algorithmic standpoint, the paper proposes a hybrid optimization scheme. A global search discretizes the parameter space into hyper‑rectangular cells and computes a fast lower bound of the Hausdorff distance for each cell. Cells whose lower bound exceeds a prescribed tolerance are discarded, focusing computational effort on promising regions. Within the surviving cells, a local refinement is performed using interval Newton or interval Lagrange‑multiplier techniques, which exploit the derivative information of the interval model to contract the parameter intervals iteratively. To prevent excessive over‑conservatism—an inherent risk when intervals are propagated through nonlinear functions—the author introduces a regularisation step that rescales the Hausdorff distance using its own lower bound, thereby keeping the final parameter intervals as tight as the data allow.
The methodology is validated on two benchmark problems. In the first, synthetic linear data (y = a x + b) are generated with large, uniformly distributed uncertainties and with non‑Gaussian noise (e.g., Laplace). Classical LSQ yields point estimates that deviate significantly from the true parameters and confidence intervals that fail to contain the true values. The interval‑Hausdorff approach, by contrast, returns parameter intervals that reliably enclose the true a and b, with widths that scale sensibly with the magnitude of the measurement intervals. In the second example, a geometric model defined implicitly by x² + y² = r² is used to estimate the radius r when both x and y are interval‑valued. LSQ would require an explicit reformulation (e.g., solving for r) and would suffer from error propagation during that transformation. The proposed method directly works with the implicit equation, minimizing the Hausdorff distance between the set of points satisfying the circle equation and the measured point intervals, and produces a tight interval for r that includes the ground‑truth value.
The paper concludes that the Hausdorff‑distance‑driven interval fitting framework offers several decisive advantages over LSQ: (1) it remains robust when uncertainties are large or non‑Gaussian; (2) it naturally propagates uncertainty from inputs to parameters without additional statistical modeling; (3) it accommodates implicit relationships, expanding its applicability to complex physical models; and (4) it provides a conservative yet reliable estimate of the worst‑case fitting error. The author outlines future research directions, including extensions to multivariate interval models, dynamic systems where intervals evolve over time, and real‑time streaming data where online interval contraction techniques will be required. Overall, the work positions interval Hausdorff fitting as a powerful, assumption‑free alternative for scientific data analysis in the presence of genuine measurement uncertainty.
Comments & Academic Discussion
Loading comments...
Leave a Comment