Deep Reinforcement Learning for Fano Hypersurfaces

We design a deep reinforcement learning algorithm to explore a high-dimensional integer lattice with sparse rewards, training a feedforward neural network as a dynamic search heuristic to steer exploration toward reward dense regions. We apply this t…

Authors: Marc Truter

Deep Reinforcement Learning for Fano Hypersurfaces
DEEP REINF OR CEMENT LEARNING F OR F ANO HYPERSURF A CES MARC TRUTER Abstract. W e design a deep reinforcement learning algorithm to explore a high-dimensional in teger lattice with sparse rewards, training a feedforward neural netw ork as a dynamic search heuristic to steer exploration tow ard rew ard dense regions. W e apply this to the disco very of F ano 4-fold h yp ersurfaces with terminal singularities, ob jects of central imp ortance in algebraic geometry . F ano v arieties with terminal singularities are fundamental building blo cks of algebraic v arieties, and explicit examples serve as a vital testing ground for the developmen t and generalisation of theory . Despite decades of effort, the combinatorial intractabilit y of the underlying search space has left this classification severely incomplete. Our reinforcement learning approach yields thousands of previously unknown examples, hundreds of which we show are inaccessible to known search metho ds. 1. Introduction W e searc h a high-dimensional integer lattice directly inspired by the construction of F ano h ypersur- faces, where eac h hypersurface is enco ded as a lattice p oint. Our goal is to discov er new F ano 4-fold h yp ersurfaces with terminal singularities, whic h corresp ond to the rew ard p oints in our search. The terminal condition giv es rise to a reward landscap e that is sparse and unkno wn a priori, yet spatially clustered, and it is this final attribute we will exploit. The searc h space is the 6-dimensional in teger lattice Z 6 , a 2-dimensional pro jection of whic h is illustrated in Figure 1. Figure 1. The 6-dimensional dynamic heuristic (deep reinforcement learning) search for ter- minal F ano 4-fold hypersurfaces pro jected onto 2-dimensions. While the quasismo oth terminal p oin ts are fully classified, the search disco v ers previously unkno wn nonquasismooth terminal ones. See Figure 9 for full details. Exhaustiv e searc h algorithms ha v e pro v en effectiv e in lo w-dimensional cases, as is discussed in § 3.1. In higher dimensions, ho w ever, the combinatorial explosion of the searc h space renders suc h methods infeasible for discov ering examples with high degrees far from those already known. T o o v ercome this, we in tro duce t w o algorithms. The first is a fixed heuristic search. The second, a dynamic heuristic searc h that builds up on the ideas of the first, in which w e use a neural netw ork as our heuristic and contin uously up date it via deep reinforcement learning. The fixed search is deterministic, whereas the dynamic search is nondeterministic due to a sto chastic component that promotes exploration. The use of a compact neural net w ork trained using temp oral difference learning allo ws the dynamic heuristic to smo oth o v er the high v ariance in the rew ard signal. The combination of this and the stochastic comp onen t allows regions of the search space to b e reached that are computationally infeasible for the fixed heuristic to access. In our exp eriments, we found hundreds of examples lying in such regions. This paper con tributes to a gro wing b o dy of w ork applying data science and mac hine learning to purely mathematical data, with early applications in algebraic geometry [1, 7, 8], subsequen tly expanding to other areas of mathematics [9, 10]. 1 2 M. TRUTER 2. Integer La ttice Sear ch 2.1. Setup. W e b egin by describing the setup of our search in the in teger lattice Z n . The relation to searc hing for F ano 4-fold hypersurfaces with terminal singularities is explained in § 3. En vironmen t: An n -dimensional integer lattice, Z n , with a subset of p oints that w e wan t to discov er, that we will refer to as r ewar d p oints . Challenging properties: (1) Sparse: Reward p oints o ccupy a negligibly small fraction of the total search space. (2) Unkno wn a priori: Rew ard status of a p oint cannot b e determined without direct ev aluation. Exploitable properties: (1) Spatially clustered : Rew ard points exhibit spatial lo cality , such that the presence of a reward p oin t increases the likelihoo d of neighbouring p oints also b eing rew ard p oints. Goal: T o find b oth many and hard to reach reward points. All attributes other than clustering p ose c hallenges for constructing a searc h algorithm. Based on the clustering, we use previously found rew ard points in the search to inform where to searc h next. In b oth of the following algorithms, we construct heuristics that prioritise searching near denser regions of rewards. 2.2. Fixed Heuristic. The algorithm b egins with a start p oint. W e pro ceed by searc hing its neighbour- ing points and determining whether they are reward p oints. W e add the neighbouring p oints to a search queue and assign them a priority v alue dep endent on their proximit y to previously found rew ard p oints. The function that computes this priority v alue is fixed, therefore making it a fixed heuristic algorithm. The algorithm resets by restarting the pro cess with a p oint in the queue with the highest priority . Figure 2. Flow chart of the fixed heuristic search algorithm. The algorithm depicted in Figure 2 is p erformed as follows. (1) (a) Pick a start p oint p ∈ Z n , and set v ( p ) = 1, where v is the priorit y function defined in (3). (b) Set the step count s = 0 and fix the maximum step count s max ∈ N . (c) Initialise the search queue Q as an empty heap. (2) Incremen t the step count s by 1. Identify all neigh b ouring points N of p , defined as the set of p oints exactly distance 1 aw a y under the L 1 norm, N := { n ∈ Z n | ∥ n − p ∥ 1 = 1 } , in other words, all p oints differing by ± 1 from p in one co ordinate. (3) F or each n ∈ N determine its priority v alue v ( n ) := ( 1 , if n is a rew ard p oint, 1 2 v ( p ) , otherwise , and add ( n, v ( n )) to the search queue Q if n has never b een added b efore. (4) Determine a p oint p ′ suc h that ( p ′ , v ( p ′ )) ∈ Q has the largest v alue v ( p ′ ) in the heap. That is, tak e the first p oin t of the heap ordered b y priorit y v alues v . Set p = p ′ and remo ve ( p ′ , v ( p ′ )) from Q . Return to (2) if s < s max , otherwise terminate the algorithm. W e observe in § 3.3 that when the algorithm is applied to finding terminal F ano h yp ersurfaces, it is effectiv e at finding man y new examples in rew ard dense regions. W e build on the ideas of the fixed heuristic searc h to design a dynamic heuristic searc h in § 2.3 that can find rew ard p oin ts in lo wer density areas. DEEP REINFOR CEMENT LEARNING F OR F ANO HYPERSURF ACES 3 2.3. Dynamic Heuristic (Deep Reinforcement Learning). The algorithm b egins with a chosen start p oint. W e compute its neighbours and, for each, determine the priorit y v alues assigned to them b y a neural netw ork function. W e add these to a searc h queue ordered by priority v alues. Next, we assign rew ards dep endant on whether the neighbours added were reward p oints or not, and use these to up date the neural net work using temp oral difference learning. The pro cess is then rep eated b y searching a p oint with the highest priority in the searc h queue. Figure 3. Flow chart of the dynamic heuristic search algorithm. The algorithm depicted in Figure 3 is p erformed as follows. (1) (a) Pick a start p oint p ∈ Z n . (b) Set the step count s = 0, and fix the maximum step count s max ∈ N . (c) Initialise a search queue Q as an empty heap. (d) Create an MLP neural netw ork f θ : Z n → R with initial parameter θ , this will b e our dynamic heuristic that determines priority in the search queue Q . Fix the temporal difference discount factor γ ∈ (0 , 1), whic h affects how we up date f θ via temp oral difference learning. (e) Fix a standard deviation σ ∈ R ≥ 0 , this determines the sto chastic component added to the priorit y v alue and thereby controls exploration. (f ) Fix r reward ∈ N , the v alue given for finding a reward p oint. Set s reward = 0, the n um b er of steps since a rew ard was last found. (2) Incremen t the step count s and steps since terminal s reward b y 1. Iden tify all neighbouring p oints N of p , defined as the set of p oin ts exactly distance 1 a wa y under the L 1 norm, N := { n ∈ Z n | ∥ n − p ∥ 1 = 1 } , in other words, all p oints differing by ± 1 from p in one co ordinate. (3) Determine whether any n ∈ N are rew ard p oints, and if so, reset s reward = 0. Compute their reward v alues r ( n ) = ( r reward , if n is a rew ard p oint , − √ s reward , otherwise . Consider the set of tuples ( p, n, r ( n )) for eac h n ∈ N . This data is used to train the netw ork via temp oral difference (TD) learning [19, § 6]. T o impro ve training stabilit y , w e fix a cop y of the curren t net work parameters, denoting them θ ′ , whic h remain frozen during this update step. F or eac h ( p, n, r ( n )) for n ∈ N , we compute their TD targets t ( n ) = r ( n ) + γ f θ ′ ( n ) , where γ ∈ (0 , 1) is the discoun t factor controlling the trade off b etw een short and long term rew ards. V alues of γ close to 0 pro duce greedy , short term b ehaviour whilst v alues close to 1 encourage more long term b ehaviour. W e then compute the TD error, measuring the discrepancy b etw een the estimated v alue of p and the TD target, δ ( θ , p, n ) = f θ ( p ) − t ( n ) . Note that θ in f θ ( p ) is updated during optimisation, whilst t ( n ) is held fixed via θ ′ . Minimising δ ( θ , p, n ) constitutes b o otstrapping: future v alue estimates are refined using past ones. Concretely , 4 M. TRUTER w e minimise the normalised mean squared error (MSE) loss L ( θ ) = 1 2 |N | X n ∈N δ ( θ , p, n ) 2 , using a gradien t based optimiser such as Adam. (4) F or eac h n ∈ N , compute their priority v alues v ( n ) = f θ ( n ) + ε , where ε is sampled from N (0 , σ 2 ), a normal distribution with mean 0 and v ariance σ 2 . The stochastic comp onen t, ε , improv es exploration. Add ( n, v ( n )) to the search queue Q if it has not previously b een searc hed b efore. (5) Let p ′ b e a p oint in the search queue Q such that v ( p ′ ) is the largest v alue in the heap. That is, take the first p oint of the heap ordered by priority v alues v . Set the new search p oin t p = p ′ , and return to (2) if s < s max , otherwise terminate the algorithm. Since the dynamic heuristic searc h is nondeterministic, rerunning the algorithm can uncov er new rew ards within the same fixed num b er of steps. The searc h is also flexible in its ob jectiv es; the reward function can b e mo dified to incentivise the disco very of points with specific properties, suc h as a high degree. 3. F ano 4-f old Hypersurf a ces 3.1. Con text. Algebr aic varieties , the geometric shap es defined by p olynomial equations, are central ob jects in mathematics. Among them, hyp ersurfac es , defined by a single polynomial equation, are the most tractable. A fundamental goal is to classify v arieties into basic building blocks [18, § 2.2]: F ano , Calabi-Y au , and gener al typ e with terminal singularities [17], a well known class of mild singularities. Birk ar [2] prov ed that in an y fixed dimension, only finitely many families of F ano v arieties exist with terminal singularities, making a complete classification, in other words, building a ‘p erio dic table’, a finite problem. In dimensions 1, curves, and 2, surfaces p erio dic tables are known. In dimension 3, many imp ortan t elements are known [4, 12, 13, 15]. V ery little, ho wev er, is known in dimension 4. In dimension 3, R eid [11, § 16.6 T able 5] produced a complete list of all 95 F ano 3-fold hypersurfaces with terminal singularities by a terminating algorithm [3, § 2]. Iano-Fletc her [11, § 16.7 T able 6] extended this to tw o equations, using a brute force searc h to find 85 families, working exhaustively from the origin of a search space of vectors ( a 1 , . . . , a 6 ) of integers 1 ≤ a 1 ≤ . . . ≤ a 6 , up to a fixed, arbitrary , limit of the degree d = ( P a i ) − 1 = 100 where results seemed to ha ve dried up. It w as only m uc h later that Chen, Chen and Chen [6] prov ed that Iano-Fletcher’s list is indeed complete. Suc h a search, run on hypersurfaces would tak e p olynomial time O ( d 4 ) in dimension 3, and would reco v er Reid’s list of 95. When mo ving to dimension 4, it becomes O ( d 5 ), and is no longer viable; the search space is to o large, there are significan tly more resulting cases, rew ard points hav e high degrees, and the complexity of determining terminalit y increases. Figure 4. Exhaustive search of F ano 4-fold hypersurfaces with terminal singularities. In total 84 , 733 terminal examples were found, 7 , 346 quasismooth, and 77 , 387 nonquasismo oth. Each frame shows p oints in Z 2 , obtained by pro jecting the original Z 6 searc h space onto consecutive co ordinate pairs via ( a 1 , . . . , a 6 ) 7→ ( a i , a i +1 ). DEEP REINFOR CEMENT LEARNING F OR F ANO HYPERSURF ACES 5 When running the same exhaustive algorithm up to degree d = 200 for F ano 4-fold h yp ersurfaces with terminal singularities, we found 77 , 387 new nonquasismo oth examples, as illustrated in Figure 4. How- ev er, the search was unable to progress b eyond this degree due to the p olynomial increase in complexity at higher degrees. This computational b ottleneck is precisely what b oth the fixed and dynamic heuristic algorithms of § 2 are designed to ov ercome, by guiding the search rather than exhaustiv ely exploring the space. Figure 5. Classification of 11 , 617 quasismooth F ano 4-fold hypersurfaces with terminal sin- gularities. Each frame sho ws p oints in Z 2 , obtained by pro jecting the original Z 6 searc h space on to consecutiv e co ordinate pairs via ( a 1 , . . . , a 6 ) 7→ ( a i , a i +1 ). Bro wn and Kasprzyk [3] pro v ed, ho wev er, that if one restricts to the far simpler subclass of quasi- smo oth v arieties, a complete classification in dimension 4 can b e ac hieved. They found 11 , 617 families of quasismo oth F ano 4-fold hypersurfaces; the list is on the Graded Ring Database [5]. Not only do es qua- sismo othness make determining terminality easy and quic k, using a cheap criterion, but it also provides a series of strong bounding conditions. This p ermits a terminating tree searc h algorithm that can b e run in parallel, o vercoming b oth the absence of a termination condition and the increase in complexit y . Their classification establishes the assumption that nonquasismo oth terminal p oints should also exhibit the same clustering behaviour exhibited b y the quasismo oth examples, as can be observ ed in Figure 5. This is further justified by the result in § 3.2, which sho ws the criterion for determining terminalit y in the general setting, degenerates to the criterion in the quasismo oth case. Figure 6. The cum ulative num b er of terminal F ano 4-fold hypersurfaces found in the exhaus- tiv e searc h against degree. In dimension 3, quasismo othness is w ell kno wn to be an acceptable ‘generalit y’ assumption, whic h rules out few, if any , families. How ever, in dimension 4 the quasismooth assumption is far to o strong: quasismo oth F ano 4-folds make up only a small fraction of all F ano 4-fold hypersurfaces. Figure 6 depicts the cumulativ e num b er of quasismooth F ano 4-fold hypersurfaces against nonquasismooth ones p er hypersurface degree, illustrating the comp elling reason why we must study the general case. 6 M. TRUTER 3.2. Bac kground. T o ground the general construction, w e first illustrate it with a classical example, elliptic curves. The family of all elliptic curves is giv en by X 6 ⊂ P (1 , 2 , 3). The ambien t weighte d pr oje ctive sp ac e , P (1 , 2 , 3) = ( C 3 \{ 0 } ) / C ∗ , where λ ∈ C ∗ acts on C 3 \{ 0 } with co ordinates ( x, y , z ) via λ · ( x, y , z ) = ( λx, λ 2 y , λ 3 z ). A curv e in the family X 6 : ( f 6 = 0) is the set of solutions of a homogeneous p olynomial f 6 of degree 6 which m ust b e of the form f 6 = c 1 z 2 + c 2 y 3 + c 3 x 6 + c 4 x 4 y + c 5 x 2 y 2 + c 6 x 3 z + c 7 xy z for some c 1 , . . . , c 7 ∈ C , noting that x has weigh t 1, y has weigh t 2, and z has weigh t 3, so that each term do es indeed ha v e weigh t 6. Therefore, the family X 6 giv en b y all p ossible equations f 6 is parametrised b y its co efficients [ c 1 : · · · : c 7 ] ∈ P 6 . Extending the same construction to any weigh t d and dimension n , we can define families of n - dimensional hypersurfaces X d : ( f d = 0) ⊂ P ( a 1 , . . . , a n +2 ) for weigh ts 1 ≤ a 1 ≤ . . . ≤ a n +2 . As with the elliptic curv e example, the family is parametrised b y P N − 1 , where N is the n umber of monomials of degree d in weigh ts a 1 , . . . , a n +2 . W e assume X d is wel l-forme d [11, § 6.10], in which case the adjunction numb er is defined as α = n +2 X i =1 a i − d and X d is F ano if α > 0, Calabi-Y au if α = 0, and general type if α < 0. F or example, the elliptic curves X 6 ⊂ P (1 , 2 , 3) hav e α = (1 + 2 + 3) − 6 = 0, and they are Calabi-Y au v arieties. In the F ano case, w e refer to the adjunction num b er as the F ano index i X = α . In this pap er, we consider the main case of terminal F ano 4-fold hypersurfaces, those of F ano index i X = 1. By fixing the F ano index, we hav e d = P a i − 1, and so ma y enco de the data as an integer vector ( a 1 , . . . , a 6 ) ∈ Z 6 b ounded by 1 ≤ a 1 ≤ . . . ≤ a n +2 . (A) (B) Figure 7. (A) The real locus of the affine cone of the member of X 6 ⊂ P (1 , 2 , 3) whose co ef- ficien ts are all equal to 1, whic h is quasismo oth and hence admits only quotient singularities. (B) The real lo cus of the affine cone of the member of X 10 ⊂ P (1 , 3 , 4) whose co efficients are all equal to 1, which is nonquasismo oth; the hyperquotient singularity is visible as the line passing through the origin. Next we come to the analysis of terminal singularities. On a hypersurface X , singularities can o ccur for t wo distinct reasons: either the deriv ativ e of the equation f drops rank at a point P ∈ X , that is, all deriv atives v anish at P , or the C ∗ quotien t defining the am bien t space has a non trivial stabiliser at P . The latter case makes P ∈ X a quotient singularity , and in this case we may use the computationally c heap criterion to determine terminalit y . By definition, quasismo oth v arieties ha ve only suc h quotient singularities. Figure 7(A) depicts such an example. In the former case, when the equation itself has a singularit y , w e refer to P ∈ X as a hyp ersurfac e singularity . But, worse y et, our main concern is when b oth P ∈ X is an equation singularity and the C ∗ quotien t has a non trivial stabiliser, we say P ∈ X is a hyp er quotient singularity , and think of it as comp osed of b oth the hypersurface equation singularity as a lo cus inside the ambien t quotient space singularity . Suc h a singularity is visible in Figure 7(B), whe re it is visible as the line passing through the origin. W e will search for gener al members of X d ⊂ P ( a 0 , . . . , a n +2 ). Assuming generalit y means we study h yp ersurfaces corresp onding to a dense open subset of the parameter space. This ensures that all mem b ers of the family s hare the same singularity structure, so we can compute a definitive list of singular p oints and analyse their terminality uniformly . If P is a quotient singularity [17, § 4], then it will b e a singularity of type 1 r ( b 1 , . . . , b n ) for some r ≥ 1 and b i ≥ 0 such that b i ≤ r − 1. The Reid–Shepherd-Barron–T ai criterion [16, § 3.1][20, § 3.2] says that P DEEP REINFOR CEMENT LEARNING F OR F ANO HYPERSURF ACES 7 is terminal if and only if 1 r n X i =1 k b i − 1 > 0 , ∀ 1 ≤ k ≤ r − 1 , where k b i ∈ { 0 , . . . , r − 1 } denotes the residue of k b i mo dulo r . If P is a h yp erquotient singularity [17, § 4], then it will b e a singularity of type 1 r ( b 1 , . . . , b n +1 ; e ) for some r ≥ 1 and b i ≥ 0 such that b i , e ≤ r − 1. W e approximate terminalit y by performing Mori’s criterion [14] restricted to the lattice p oin ts inside the unit cub e. That is, w e approximate P to b e terminal if either r = 1, in whic h case it is a hypersurface singularity , or r ≥ 2 and 1 r n +1 X i =1 k b i − min ( 1 r n +1 X i =1 m i · k b i      x m 1 1 · · · x m n +1 n +1 ∈ f ′ ) − 1 > 0 , ∀ 1 ≤ k ≤ r − 1 , where f ′ is the lo cal equation of f on an affine patch that contains P . Notably , when X d is quasismo oth, w e will hav e f ′ = x i + · · · for some 1 ≤ i ≤ n + 1, and therefore find that Mori’s criterion degenerates to the Reid–Shepherd-Barron–T ai criterion. 3.3. Analysis. W e will apply both the fixed and dynamic heuristic algorithms of § 2 to disco ver new F ano 4-fold hypersurfaces with terminal singularities with F ano index 1. The h yp ersurfaces are of the form X d ⊂ P ( a 1 , . . . , a 6 ), where d = ( P a i ) − 1 and 1 ≤ a 1 ≤ . . . ≤ a 6 , and are enco ded in our search as in teger vectors ( a 1 , . . . , a 6 ) ∈ Z 6 . Our goal is tw ofold: to identify as many new examples as p ossible and to uncov er hard to reac h ones. W e show that the fixed heuristic search is particularly successful in the former, whilst the dynamic heuristic search ac hieves b oth. T o ov ercome the high degree obstruction faced by the exhaustive searc h as was discussed in § 3.1, we will b egin b oth the fixed and dynamic searches from the quasismo oth terminal classification, whic h comprises 11 , 617 cases. In practice, this means we force the first 11 , 617 searched p oints in b oth algorithms to b e the terminal quasismo oth ones, and progress normally from then on. W e run b oth the fixed and dynamic heuristic algorithms for 10 , 000 , 000 steps. In the dynamic search, w e use the hyperparameters in T able 1. Let F and D be the set of terminal p oin ts found by the fixed and dynamic searches, resp ectively . The fixed and dynamic searches are depicted in Figures 8 and 9 resp ectively . Hyp erparameter V alues MLP Neural Net work Lay ers (40,) Activ ation function LeakyReLU LeakyReLU slop e 0.01 Optimiser Adam Optimiser learning rate 0.001 TD discount factor, γ 0.2 Standard deviation, σ 2 Searc h reward, r reward 1 T able 1. Hyp erparameters used in the dynamic heuristic (deep reinforcement learning) searc h. In the fixed heuristic searc h, w e find |F | = 113 , 996 nonquasismooth F ano 4-fold h yp ersurfaces with terminal singularities. The algorithm is deterministic, so it will disco ver the s ame examples on a rerun. It is particularly effectiv e at finding a large quan tity of new examples. It do es, ho wev er, hav e limitations. As shown b y the histogram in Figure 10(A), the fixed search is unable to stray far from previously kno wn rew ard p oints. The dynamic search found |D | = 85 , 262. Since the search is nondeterministic, each run will find a different set D . Due to the more exploratory nature of the dynamic search, one expects few er examples than the fixed one in the same step count, as a greater num b er of steps are sp ent in unprofitable regions during exploration. This is seen in Figure 10(C). The histogram in Figure 10(B) sho ws the upshot of this ho w ever. The figure shows hundreds of examples found by the dynamic search that are computationally inaccessible to the fixed one. T o measure the inaccessibility of p oints found exclusively by the dynamic search, we analyse tw o sets: F \D , the 31 , 480 p oints found b y the fixed but not the dynamic search, and D \F , the 3 , 106 p oints found b y the dynamic but not the fixed search. F or each p oint in F \D , w e compute the shortest distance under the L 1 norm to the nearest p oint in D . F or each p oint in D \F , we compute the shortest distance to the nearest point in F . F rom these distances we derive a lo wer bound on the n umber of steps required to reac h a p oin t from its nearest neigh b our. This allo ws us to sho w that hundreds of points found b y the dynamic search would b e computationally exp ensiv e to find using the fixed searc h alone. Combined with 8 M. TRUTER Figure 8. Fixed heuristic search. The search found 113 , 996 nonquasismo oth F ano 4-fold h y- p ersurfaces with terminal singularities. Eac h frame sho ws points in Z 2 , obtained by pro jecting the original Z 6 searc h space onto consecutive co ordinate pairs via ( a 1 , . . . , a 6 ) 7→ ( a i , a i +1 ). Figure 9. Dynamic heuristic (deep reinforcement learning) search. The search found 85 , 262 nonquasismo oth F ano 4-fold h yp ersurfaces with terminal singularities. Eac h frame sho ws points in Z 2 , obtained by pro jecting the original Z 6 searc h space onto consecutive co ordinate pairs via ( a 1 , . . . , a 6 ) 7→ ( a i , a i +1 ). the fact that the search was initialised from 11 , 617 starting p oints, this demonstrates that suc h points are effectively computationally inaccessible to the fixed search. Explicitly , for a p oint p ∈ F \D (resp. D \F ), we define the shortest distance D ( p ) := min {∥ p − q ∥ 1 | q ∈ F \{ p } (resp. D \{ p } ) } . Let q denote the nearest neighbour of p , so that ∥ p − q ∥ 1 = D ( p ). W e no w derive lo wer and upp er b ounds on the num b er of steps required to find p starting from q . W e first assume that p is the closest p oint to q in the relev ant set. Relaxing this assumption leav es the low er bound unchanged, it only weak ens it, but inv alidates the upper b ound, since the search ma y b e steered aw a y from p by a closer p oint. Alternativ ely , replacing the priority function in the fixed search with the constant function v ( n ) := 1 ensures both bounds remain v alid. The fixed searc h exhaustiv ely expands points in order of increasing L 1 distance from q , visiting all p oints at distance 1, then 2, and so on. Consequently , to reach a p oint at DEEP REINFOR CEMENT LEARNING F OR F ANO HYPERSURF ACES 9 distance D ( p ), the search must first visit at least one p oin t at distance D ( p ) − 1, and must ha v e already visited all p oin ts at distance ≤ D ( p ) − 2. This giv es a low er b ound on the num b er of steps for p to b e found from q , s L ( p ) = # { ( a 1 , . . . , a 6 ) ∈ B ( q , D ( p ) − 2) ∩ ( Z 6 ∩ ( a 1 ≥ 1) 5 \ i =1 ( a i ≤ a i +1 )) } + 1 . Assuming either p is the closest p oin t to q , or using v ( n ) := 1 as the priority v alue, we m ust hav e found p after searc hing all p oints of distance D ( p ) − 1, and so obtain an upp er b ound s U ( p ) = # { ( a 1 , . . . , a 6 ) ∈ B ( q , D ( p ) − 1) ∩ ( Z 6 ∩ ( a 1 ≥ 1) 5 \ i =1 ( a i ≤ a i +1 )) } . Moreo ver, the likelihoo d that the low er bound b ecomes weak er grows with D ( p ). W e establish b oth b ounds under the assumption that p is the closest p oint to q ; under this assumption, p is guaranteed to b e found within s U ( p ) steps, whic h, as noted ab ov e, would b e even larger without this assumption. The probabilit y of finding p in s L ( p ) ≤ s ≤ s U ( p ) many steps is then given b y P ( s ) = s − ( s L ( p ) − 1) s U ( p ) − ( s L ( p ) − 1) as it must b e found by a p oint of distance D ( p ) − 1, of whic h there are s U ( p ) − ( s L ( p ) − 1). Therefore, the probability of the lo wer b ound b eing achiev ed is 1 / ( s U ( p ) − ( s L ( p ) − 1)), which increases with D ( p ). Almost all examples hav e weigh t a 1 = 1, w eakening the low er bound. The reduction of the b ound caused by cases where a i − D ( p ) < 1 for i ≥ 2, and a i +1 − a i − D ( p ) < 0 for i ≥ 1, were in practice found to b e negligible. Ov erlo oking this allo ws us to give an appro ximation for the lo wer bound. Assuming a 1 = 1, a i − D ( P ) ≥ 1, a i +1 − a i − D ( p ) > 0, s L ( p ) = # { ( a 1 , . . . , a 6 ) ∈ B (0 , D ( p ) − 2) ∩ ( Z 6 ∩ ( a 1 ≥ 0)) } + 1 . Using this, when D ( p ) = 5, we obtain s L ( p ) = 305, whereas, for D ( p ) = 15 we get 227 , 305, for D ( p ) = 16 it is 528 , 865 and for D ( p ) = 17 it is 774 , 912. T o put the exp ense of these large D ( p ) p oints into p ersp ectiv e, one should note that executing the full 10 , 000 , 000 step fixed search was itself costly . (A) (B) (C) Figure 10. (A) A histogram sho wing the distribution of distances from p oints found in the fixed but not dynamic search to their closest neighbours. (B) A histogram showing the distribution of distances from p oints found in the dynamic but not fixed search to their closest neighbours. The dynamic search finds points at greater distances than the fixed search, corresp onding to p oin ts that are increasingly inaccessible to the latter. (C) A graph showing the n umber of nonquasismo oth terminal examples found against the num b er of steps taken by the search. As exp ected, the dynamic searc h finds fewer terminal points in the same num b er of steps as the fixed one, since more steps are sp ent in unprofitable territory in order to reach terminal p oints at greater distances. Consider X 1020 ⊂ P (1 , 15 , 32 , 139 , 340 , 494), a p oint found in the dynamic but not the fixed searc h. Its closest point is X 1011 ⊂ P (1 , 10 , 31 , 143 , 337 , 490), a quasismooth start p oin t, at distance 17. Our appro ximation predicts at least 774 , 912 steps are required to reac h it. By setting v ( n ) := 1 in the fixed searc h w e lo cated it in 1 , 041 , 501 steps. How ever, using the original priority v alue, the searc h is directed a wa y from X 1020 , and ev en after 10 , 000 , 000 steps starting from X 1011 alone, it remains out of reach. 10 M. TRUTER As Figure 10(B) illustrates, among all p oin ts found by the dynamic search but not the fixed one, h undreds lie far from any other point, and are therefore b eyond the computational reach of the fixed searc h. Code and Da t a A v ailability All co de required to replicate the results is av ailable on GitHub [21] under an MIT license, along with all datasets. A cknowledgements I am grateful to Gavin Bro wn, Alexander Kasprzyk, Hefin Lam bley and Martin Lotz for v aluable feedbac k during the writing of this pap er. The author w as supp orted by the W arwick Mathematics Institute Centre for Do ctoral T raining, and gratefully ackno wledges funding from the UK Engineering and Physical Sciences Research Council (Grant n umber: EP/W523793/1). References [1] P . Berglund, Y. He, E. Hey es, E. Hirst, V. Jejjala, and A. Luk as. New Calabi-Yau manifolds from genetic algorithms. Phys. L ett. B , 850:Paper No. 138504, 10, 2024. [2] C. Birk ar. Singularities of linear systems and b oundedness of Fano v arieties. Ann. of Math. (2) , 193(2):347– 405, 2021. [3] G. Brown and A. Kasprzyk. F our-dimensional pro jectiv e orbifold hypersurfaces. Exp. Math. , 25(2):176–193, 2016. [4] G. Bro wn, A. Kasprzyk, and M. Reid. Kaw amata bounds for Fano threefolds and the Graded Ring Database. page 23pp, 2022. [5] G. Brown and A. M. Kasprzyk. The graded ring database. https://grdb.co.uk, 2007-present. [6] J. Chen, J. A. Chen, and M. Chen. On quasismo oth w eighted complete intersections. J. A lgebr aic Ge om. , 20(2):239–262, 2011. [7] T. Coates, A. M. Kasprzyk, and S. V eneziale. Machine learning detects terminal singularities. In A. Oh, T. Naumann, A. Glob erson, K. Saenko, M. Hardt, and S. Levine, editors, A dvanc es in Neur al Information Pr o cessing Systems , volume 36, pages 67183–67194. Curran Asso ciates, Inc., 2023. [8] T. Coates, A. M. Kasprzyk, and S. V eneziale. Machine learning the dimension of a F ano v ariety . Natur e Communic ations , 14(1):5526, 2023. [9] A. Davies, P . V eliˇ ck ovi ´ c, L. Buesing, S. Blackw ell, D. Zheng, N. T oma ˇ sev, R. T anburn, P . Battaglia, C. Blun- dell, A. Juh´ asz, M. Lack enb y , G. Williamson, D. Hassabis, and P . Kohli. Adv ancing mathematics by guiding h uman in tuition with AI. Natur e , 600(7887):70–74, 2021. [10] Y. He. AI-driven research in pure mathematics and theoretical physics. Natur e R eviews Physics , 6(9):546–553, 2024. [11] A. R. Iano-Fletc her. W orking with w eighted complete in tersections. In Explicit bir ational geometry of 3-folds , v olume 281 of L ondon Math. So c. L e cture Note Ser. , pages 101–173. Cam bridge Univ. Press, Cambridge, 2000. [12] V. A. Isko vskih. F ano threefolds. I. Izv. Akad. Nauk SSSR Ser. Mat. , (no. 3,):516–562, 717, 1977. [13] V. A. Isko vskih. F ano threefolds. I I. Izv. Akad. Nauk SSSR Ser. Mat. , (no. 3,):506–549, 1978. [14] S. Mori. On 3-dimensional terminal singularities. Nagoya Math. J. , 98:43–66, 1985. [15] S. Mori and S. Muk ai. On Fano 3-folds with B 2 ≥ 2. In A lgebr aic varieties and analytic varieties (Tokyo, 1981) , volume 1 of A dv. Stud. Pur e Math. , pages 101–129. North-Holland, Amsterdam, 1983. [16] M. Reid. Canonical 3-folds. In Journ ´ ees de G´ eometrie Alg´ ebrique d’Angers, Juil let 1979/Algebr aic Ge ometry, Angers, 1979 , pages 273–310. Sijthoff & No ordhoff, Alphen aan den Rijn—Germanto wn, Md., 1980. [17] M. Reid. Y oung p erson’s guide to canonical singularities. In Algebr aic ge ometry, Bowdoin, 1985 (Brunswick, Maine, 1985) , volume 46, P art 1 of Pr oc. Symp os. Pur e Math. , pages 345–414. Amer. Math. Soc., Providence, RI, 1987. [18] M. Reid. Up date on 3-folds. In Pr o c e e dings of the International Congr ess of Mathematicians, Vol. II (Beijing, 2002) , pages 513–524. Higher Ed. Press, Beijing, 2002. [19] R. S. Sutton and A. G. Barto. R einfor cement L e arning: An Intr o duction . MIT Press, Cambridge, MA, 2nd edition, 2018. [20] Y. T ai. On the Ko daira dimension of the mo duli space of ab elian v arieties. Invent. Math. , 68(3):425–439, 1982. [21] M. T ruter. Deep reinforcemen t learning for F ano h yp ersurfaces: source code and datasets, 2026. https: //github.com/marctruter/deep_fano_hypersurface . Ma thema tics Institu te, University of W ar wick, Coventr y, CV4 7AL, UK Email addr ess : Marc.Truter@warwick.ac.uk

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment