On the general position subset selection problem

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Let $f(n,\ell)$ be the maximum integer such that every set of $n$ points in the plane with at most $\ell$ collinear contains a subset of $f(n,\ell)$ points with no three collinear. First we prove that if $\ell \leq O(\sqrt{n})$ then $f(n,\ell)\geq \Omega(\sqrt{\frac{n}{\ln \ell}})$. Second we prove that if $\ell \leq O(n^{(1-\epsilon)/2})$ then $f(n,\ell) \geq \Omega(\sqrt{n\log_\ell n})$, which implies all previously known lower bounds on $f(n,\ell)$ and improves them when $\ell$ is not fixed. A more general problem is to consider subsets with at most $k$ collinear points in a point set with at most $\ell$ collinear. We also prove analogous results in this setting.

💡 Research Summary

The paper studies the “general‑position subset selection” problem in the Euclidean plane. Given a set P of n points such that no line contains more than ℓ points, let f(n,ℓ) denote the largest integer t for which every such P contains a subset of t points with no three collinear (i.e., in general position). The authors improve the known lower bounds on f(n,ℓ) by developing two probabilistic constructions that work in complementary ranges of ℓ.

First, when ℓ ≤ c·√n (the “square‑root regime”), they apply a simple random sampling argument. Each point is kept independently with probability p, chosen so that the expected number of points retained is about √(n/ln ℓ). The key technical step is to bound the probability that any line receives ℓ + 1 or more retained points (a “collision”). By a union bound over all lines determined by the point set and using the fact that each line can contain at most ℓ points, the expected number of collisions is shown to be less than one when p ≈ c′/√(ℓ ln ℓ). Markov’s inequality then guarantees the existence of a collision‑free sample of size Ω(√(n/ln ℓ)). This yields the first main result:
f(n,ℓ) ≥ c₁·√(n/ln ℓ) for ℓ ≤ O(√n).

Second, for the broader range ℓ ≤ c·n^{(1−ε)/2} (the “sub‑polynomial regime”), the simple sampling method is insufficient. The authors turn to the dependent random choice (DRC) technique, a powerful tool in extremal combinatorics. They first select a relatively large random subset S of size m ≈ c₂·√(n·log_ℓ n). Within S, they examine the family of lines determined by pairs of points. By the DRC lemma, there exists a subfamily of points that have many common neighbours, i.e., many other points lying on the same line. The authors then iteratively prune S: whenever a line contains more than ℓ points, they delete a carefully chosen point from that line, reducing the line’s multiplicity while preserving most of S. Because the initial size m contains a logarithmic factor log_ℓ n, the total number of deletions is bounded, and the final set S′ still has size Ω(√(n·log_ℓ n)). Moreover, by construction no line in S′ contains more than ℓ points, and therefore no three points are collinear. This establishes the second main bound:
f(n,ℓ) ≥ c₃·√(n·log_ℓ n) for ℓ ≤ O(n^{(1−ε)/2}).

Both bounds dominate all previously known lower bounds, which either required ℓ to be a fixed constant or produced only Ω(√n) estimates regardless of ℓ. The paper also treats a natural generalisation: given a set with at most ℓ collinear points, find a large subset in which no line contains more than k points (k‑general position). By adapting the same probabilistic framework—adjusting the collision threshold from 3 to k and modifying the DRC parameters—the authors obtain analogous lower bounds of the form Ω(√(n·log_ℓ n)) for the k‑general‑position version.

The technical contributions can be summarised as follows:

Random Sampling with Logarithmic Damping – A careful choice of sampling probability that balances the expected size of the retained set against the probability of violating the ℓ‑collinearity constraint, leading to a √(n/ln ℓ) bound.
Dependent Random Choice Adapted to Geometry – Translating the DRC lemma, traditionally used for bipartite graphs, to the geometric setting of point‑line incidences, and showing that a large random subset contains many “rich” lines that can be pruned without destroying most points.
Iterative Pruning Scheme – A deterministic cleaning step that removes points from over‑populated lines while preserving a large fraction of the original random set, crucial for achieving the √(n·log_ℓ n) bound.
Extension to k‑General Position – Generalising the analysis to allow up to k collinear points, demonstrating that the same probabilistic machinery yields comparable lower bounds for any fixed k ≥ 3.

The paper concludes with a discussion of open problems. The most immediate question is whether the √(n·log_ℓ n) bound is tight; the best known upper bounds are still far from matching this growth, especially when ℓ grows with n. Another direction is to explore higher‑dimensional analogues, where points lie in ℝ^d and one forbids large subsets on a common hyperplane. The authors suggest that the combination of random sampling and DRC may be fruitful in those settings as well. Overall, the work significantly advances our understanding of how density constraints (the ℓ‑collinearity limit) influence the size of general‑position subsets, and it introduces techniques that are likely to be useful in a broad range of combinatorial geometry problems.

On the general position subset selection problem

💡 Research Summary

Comments & Academic Discussion

Leave a Comment