Deep Reinforcement Learning for Fano Hypersurfaces

DEEP REINF OR CEMENT LEARNING F OR F ANO HYPERSURF A CES MARC TRUTER Abstract. W e design a deep reinforcement learning algorithm to explore a high-dimensional in teger lattice with sparse rewards, training a feedforward neural netw ork as a dynamic search heuristic to steer exploration tow ard rew ard dense regions. W e apply this to the disco very of F ano 4-fold h yp ersurfaces with terminal singularities, ob jects of central imp ortance in algebraic geometry . F ano v arieties with terminal singularities are fundamental building blo cks of algebraic v arieties, and explicit examples serve as a vital testing ground for the developmen t and generalisation of theory . Despite decades of eﬀort, the combinatorial intractabilit y of the underlying search space has left this classiﬁcation severely incomplete. Our reinforcement learning approach yields thousands of previously unknown examples, hundreds of which we show are inaccessible to known search metho ds. 1. Introduction W e searc h a high-dimensional integer lattice directly inspired by the construction of F ano h ypersur- faces, where eac h hypersurface is enco ded as a lattice p oint. Our goal is to discov er new F ano 4-fold h yp ersurfaces with terminal singularities, whic h corresp ond to the rew ard p oints in our search. The terminal condition giv es rise to a reward landscap e that is sparse and unkno wn a priori, yet spatially clustered, and it is this ﬁnal attribute we will exploit. The searc h space is the 6-dimensional in teger lattice Z 6 , a 2-dimensional pro jection of whic h is illustrated in Figure 1. Figure 1. The 6-dimensional dynamic heuristic (deep reinforcement learning) search for ter- minal F ano 4-fold hypersurfaces pro jected onto 2-dimensions. While the quasismo oth terminal p oin ts are fully classiﬁed, the search disco v ers previously unkno wn nonquasismooth terminal ones. See Figure 9 for full details. Exhaustiv e searc h algorithms ha v e pro v en eﬀectiv e in lo w-dimensional cases, as is discussed in § 3.1. In higher dimensions, ho w ever, the combinatorial explosion of the searc h space renders suc h methods infeasible for discov ering examples with high degrees far from those already known. T o o v ercome this, we in tro duce t w o algorithms. The ﬁrst is a ﬁxed heuristic search. The second, a dynamic heuristic searc h that builds up on the ideas of the ﬁrst, in which w e use a neural netw ork as our heuristic and contin uously up date it via deep reinforcement learning. The ﬁxed search is deterministic, whereas the dynamic search is nondeterministic due to a sto chastic component that promotes exploration. The use of a compact neural net w ork trained using temp oral diﬀerence learning allo ws the dynamic heuristic to smo oth o v er the high v ariance in the rew ard signal. The combination of this and the stochastic comp onen t allows regions of the search space to b e reached that are computationally infeasible for the ﬁxed heuristic to access. In our exp eriments, we found hundreds of examples lying in such regions. This paper con tributes to a gro wing b o dy of w ork applying data science and mac hine learning to purely mathematical data, with early applications in algebraic geometry [1, 7, 8], subsequen tly expanding to other areas of mathematics [9, 10]. 1 2 M. TRUTER 2. Integer La ttice Sear ch 2.1. Setup. W e b egin by describing the setup of our search in the in teger lattice Z n . The relation to searc hing for F ano 4-fold hypersurfaces with terminal singularities is explained in § 3. En vironmen t: An n -dimensional integer lattice, Z n , with a subset of p oints that w e wan t to discov er, that we will refer to as r ewar d p oints . Challenging properties: (1) Sparse: Reward p oints o ccupy a negligibly small fraction of the total search space. (2) Unkno wn a priori: Rew ard status of a p oint cannot b e determined without direct ev aluation. Exploitable properties: (1) Spatially clustered : Rew ard points exhibit spatial lo cality , such that the presence of a reward p oin t increases the likelihoo d of neighbouring p oints also b eing rew ard p oints. Goal: T o ﬁnd b oth many and hard to reach reward points. All attributes other than clustering p ose c hallenges for constructing a searc h algorithm. Based on the clustering, we use previously found rew ard points in the search to inform where to searc h next. In b oth of the following algorithms, we construct heuristics that prioritise searching near denser regions of rewards. 2.2. Fixed Heuristic. The algorithm b egins with a start p oint. W e pro ceed by searc hing its neighbour- ing points and determining whether they are reward p oints. W e add the neighbouring p oints to a search queue and assign them a priority v alue dep endent on their proximit y to previously found rew ard p oints. The function that computes this priority v alue is ﬁxed, therefore making it a ﬁxed heuristic algorithm. The algorithm resets by restarting the pro cess with a p oint in the queue with the highest priority . Figure 2. Flow chart of the ﬁxed heuristic search algorithm. The algorithm depicted in Figure 2 is p erformed as follows. (1) (a) Pick a start p oint p ∈ Z n , and set v ( p ) = 1, where v is the priorit y function deﬁned in (3). (b) Set the step count s = 0 and ﬁx the maximum step count s max ∈ N . (c) Initialise the search queue Q as an empty heap. (2) Incremen t the step count s by 1. Identify all neigh b ouring points N of p , deﬁned as the set of p oints exactly distance 1 aw a y under the L 1 norm, N := { n ∈ Z n | ∥ n − p ∥ 1 = 1 } , in other words, all p oints diﬀering by ± 1 from p in one co ordinate. (3) F or each n ∈ N determine its priority v alue v ( n ) := ( 1 , if n is a rew ard p oint, 1 2 v ( p ) , otherwise , and add ( n, v ( n )) to the search queue Q if n has never b een added b efore. (4) Determine a p oint p ′ suc h that ( p ′ , v ( p ′ )) ∈ Q has the largest v alue v ( p ′ ) in the heap. That is, tak e the ﬁrst p oin t of the heap ordered b y priorit y v alues v . Set p = p ′ and remo ve ( p ′ , v ( p ′ )) from Q . Return to (2) if s < s max , otherwise terminate the algorithm. W e observe in § 3.3 that when the algorithm is applied to ﬁnding terminal F ano h yp ersurfaces, it is eﬀectiv e at ﬁnding man y new examples in rew ard dense regions. W e build on the ideas of the ﬁxed heuristic searc h to design a dynamic heuristic searc h in § 2.3 that can ﬁnd rew ard p oin ts in lo wer density areas. DEEP REINFOR CEMENT LEARNING F OR F ANO HYPERSURF ACES 3 2.3. Dynamic Heuristic (Deep Reinforcement Learning). The algorithm b egins with a chosen start p oint. W e compute its neighbours and, for each, determine the priorit y v alues assigned to them b y a neural netw ork function. W e add these to a searc h queue ordered by priority v alues. Next, we assign rew ards dep endant on whether the neighbours added were reward p oints or not, and use these to up date the neural net work using temp oral diﬀerence learning. The pro cess is then rep eated b y searching a p oint with the highest priority in the searc h queue. Figure 3. Flow chart of the dynamic heuristic search algorithm. The algorithm depicted in Figure 3 is p erformed as follows. (1) (a) Pick a start p oint p ∈ Z n . (b) Set the step count s = 0, and ﬁx the maximum step count s max ∈ N . (c) Initialise a search queue Q as an empty heap. (d) Create an MLP neural netw ork f θ : Z n → R with initial parameter θ , this will b e our dynamic heuristic that determines priority in the search queue Q . Fix the temporal diﬀerence discount factor γ ∈ (0 , 1), whic h aﬀects how we up date f θ via temp oral diﬀerence learning. (e) Fix a standard deviation σ ∈ R ≥ 0 , this determines the sto chastic component added to the priorit y v alue and thereby controls exploration. (f ) Fix r reward ∈ N , the v alue given for ﬁnding a reward p oint. Set s reward = 0, the n um b er of steps since a rew ard was last found. (2) Incremen t the step count s and steps since terminal s reward b y 1. Iden tify all neighbouring p oints N of p , deﬁned as the set of p oin ts exactly distance 1 a wa y under the L 1 norm, N := { n ∈ Z n | ∥ n − p ∥ 1 = 1 } , in other words, all p oints diﬀering by ± 1 from p in one co ordinate. (3) Determine whether any n ∈ N are rew ard p oints, and if so, reset s reward = 0. Compute their reward v alues r ( n ) = ( r reward , if n is a rew ard p oint , − √ s reward , otherwise . Consider the set of tuples ( p, n, r ( n )) for eac h n ∈ N . This data is used to train the netw ork via temp oral diﬀerence (TD) learning [19, § 6]. T o impro ve training stabilit y , w e ﬁx a cop y of the curren t net work parameters, denoting them θ ′ , whic h remain frozen during this update step. F or eac h ( p, n, r ( n )) for n ∈ N , we compute their TD targets t ( n ) = r ( n ) + γ f θ ′ ( n ) , where γ ∈ (0 , 1) is the discoun t factor controlling the trade oﬀ b etw een short and long term rew ards. V alues of γ close to 0 pro duce greedy , short term b ehaviour whilst v alues close to 1 encourage more long term b ehaviour. W e then compute the TD error, measuring the discrepancy b etw een the estimated v alue of p and the TD target, δ ( θ , p, n ) = f θ ( p ) − t ( n ) . Note that θ in f θ ( p ) is updated during optimisation, whilst t ( n ) is held ﬁxed via θ ′ . Minimising δ ( θ , p, n ) constitutes b o otstrapping: future v alue estimates are reﬁned using past ones. Concretely , 4 M. TRUTER w e minimise the normalised mean squared error (MSE) loss L ( θ ) = 1 2 |N | X n ∈N δ ( θ , p, n ) 2 , using a gradien t based optimiser such as Adam. (4) F or eac h n ∈ N , compute their priority v alues v ( n ) = f θ ( n ) + ε , where ε is sampled from N (0 , σ 2 ), a normal distribution with mean 0 and v ariance σ 2 . The stochastic comp onen t, ε , improv es exploration. Add ( n, v ( n )) to the search queue Q if it has not previously b een searc hed b efore. (5) Let p ′ b e a p oint in the search queue Q such that v ( p ′ ) is the largest v alue in the heap. That is, take the ﬁrst p oint of the heap ordered by priority v alues v . Set the new search p oin t p = p ′ , and return to (2) if s < s max , otherwise terminate the algorithm. Since the dynamic heuristic searc h is nondeterministic, rerunning the algorithm can uncov er new rew ards within the same ﬁxed num b er of steps. The searc h is also ﬂexible in its ob jectiv es; the reward function can b e mo diﬁed to incentivise the disco very of points with speciﬁc properties, suc h as a high degree. 3. F ano 4-f old Hypersurf a ces 3.1. Con text. Algebr aic varieties , the geometric shap es deﬁned by p olynomial equations, are central ob jects in mathematics. Among them, hyp ersurfac es , deﬁned by a single polynomial equation, are the most tractable. A fundamental goal is to classify v arieties into basic building blocks [18, § 2.2]: F ano , Calabi-Y au , and gener al typ e with terminal singularities [17], a well known class of mild singularities. Birk ar [2] prov ed that in an y ﬁxed dimension, only ﬁnitely many families of F ano v arieties exist with terminal singularities, making a complete classiﬁcation, in other words, building a ‘p erio dic table’, a ﬁnite problem. In dimensions 1, curves, and 2, surfaces p erio dic tables are known. In dimension 3, many imp ortan t elements are known [4, 12, 13, 15]. V ery little, ho wev er, is known in dimension 4. In dimension 3, R eid [11, § 16.6 T able 5] produced a complete list of all 95 F ano 3-fold hypersurfaces with terminal singularities by a terminating algorithm [3, § 2]. Iano-Fletc her [11, § 16.7 T able 6] extended this to tw o equations, using a brute force searc h to ﬁnd 85 families, working exhaustively from the origin of a search space of vectors ( a 1 , . . . , a 6 ) of integers 1 ≤ a 1 ≤ . . . ≤ a 6 , up to a ﬁxed, arbitrary , limit of the degree d = ( P a i ) − 1 = 100 where results seemed to ha ve dried up. It w as only m uc h later that Chen, Chen and Chen [6] prov ed that Iano-Fletcher’s list is indeed complete. Suc h a search, run on hypersurfaces would tak e p olynomial time O ( d 4 ) in dimension 3, and would reco v er Reid’s list of 95. When mo ving to dimension 4, it becomes O ( d 5 ), and is no longer viable; the search space is to o large, there are signiﬁcan tly more resulting cases, rew ard points hav e high degrees, and the complexity of determining terminalit y increases. Figure 4. Exhaustive search of F ano 4-fold hypersurfaces with terminal singularities. In total 84 , 733 terminal examples were found, 7 , 346 quasismooth, and 77 , 387 nonquasismo oth. Each frame shows p oints in Z 2 , obtained by pro jecting the original Z 6 searc h space onto consecutive co ordinate pairs via ( a 1 , . . . , a 6 ) 7→ ( a i , a i +1 ). DEEP REINFOR CEMENT LEARNING F OR F ANO HYPERSURF ACES 5 When running the same exhaustive algorithm up to degree d = 200 for F ano 4-fold h yp ersurfaces with terminal singularities, we found 77 , 387 new nonquasismo oth examples, as illustrated in Figure 4. How- ev er, the search was unable to progress b eyond this degree due to the p olynomial increase in complexity at higher degrees. This computational b ottleneck is precisely what b oth the ﬁxed and dynamic heuristic algorithms of § 2 are designed to ov ercome, by guiding the search rather than exhaustiv ely exploring the space. Figure 5. Classiﬁcation of 11 , 617 quasismooth F ano 4-fold hypersurfaces with terminal sin- gularities. Each frame sho ws p oints in Z 2 , obtained by pro jecting the original Z 6 searc h space on to consecutiv e co ordinate pairs via ( a 1 , . . . , a 6 ) 7→ ( a i , a i +1 ). Bro wn and Kasprzyk [3] pro v ed, ho wev er, that if one restricts to the far simpler subclass of quasi- smo oth v arieties, a complete classiﬁcation in dimension 4 can b e ac hieved. They found 11 , 617 families of quasismo oth F ano 4-fold hypersurfaces; the list is on the Graded Ring Database [5]. Not only do es qua- sismo othness make determining terminality easy and quic k, using a cheap criterion, but it also provides a series of strong bounding conditions. This p ermits a terminating tree searc h algorithm that can b e run in parallel, o vercoming b oth the absence of a termination condition and the increase in complexit y . Their classiﬁcation establishes the assumption that nonquasismo oth terminal p oints should also exhibit the same clustering behaviour exhibited b y the quasismo oth examples, as can be observ ed in Figure 5. This is further justiﬁed by the result in § 3.2, which sho ws the criterion for determining terminalit y in the general setting, degenerates to the criterion in the quasismo oth case. Figure 6. The cum ulative num b er of terminal F ano 4-fold hypersurfaces found in the exhaus- tiv e searc h against degree. In dimension 3, quasismo othness is w ell kno wn to be an acceptable ‘generalit y’ assumption, whic h rules out few, if any , families. How ever, in dimension 4 the quasismooth assumption is far to o strong: quasismo oth F ano 4-folds make up only a small fraction of all F ano 4-fold hypersurfaces. Figure 6 depicts the cumulativ e num b er of quasismooth F ano 4-fold hypersurfaces against nonquasismooth ones p er hypersurface degree, illustrating the comp elling reason why we must study the general case. 6 M. TRUTER 3.2. Bac kground. T o ground the general construction, w e ﬁrst illustrate it with a classical example, elliptic curves. The family of all elliptic curves is giv en by X 6 ⊂ P (1 , 2 , 3). The ambien t weighte d pr oje ctive sp ac e , P (1 , 2 , 3) = ( C 3 \{ 0 } ) / C ∗ , where λ ∈ C ∗ acts on C 3 \{ 0 } with co ordinates ( x, y , z ) via λ · ( x, y , z ) = ( λx, λ 2 y , λ 3 z ). A curv e in the family X 6 : ( f 6 = 0) is the set of solutions of a homogeneous p olynomial f 6 of degree 6 which m ust b e of the form f 6 = c 1 z 2 + c 2 y 3 + c 3 x 6 + c 4 x 4 y + c 5 x 2 y 2 + c 6 x 3 z + c 7 xy z for some c 1 , . . . , c 7 ∈ C , noting that x has weigh t 1, y has weigh t 2, and z has weigh t 3, so that each term do es indeed ha v e weigh t 6. Therefore, the family X 6 giv en b y all p ossible equations f 6 is parametrised b y its co eﬃcients [ c 1 : · · · : c 7 ] ∈ P 6 . Extending the same construction to any weigh t d and dimension n , we can deﬁne families of n - dimensional hypersurfaces X d : ( f d = 0) ⊂ P ( a 1 , . . . , a n +2 ) for weigh ts 1 ≤ a 1 ≤ . . . ≤ a n +2 . As with the elliptic curv e example, the family is parametrised b y P N − 1 , where N is the n umber of monomials of degree d in weigh ts a 1 , . . . , a n +2 . W e assume X d is wel l-forme d [11, § 6.10], in which case the adjunction numb er is deﬁned as α = n +2 X i =1 a i − d and X d is F ano if α > 0, Calabi-Y au if α = 0, and general type if α < 0. F or example, the elliptic curves X 6 ⊂ P (1 , 2 , 3) hav e α = (1 + 2 + 3) − 6 = 0, and they are Calabi-Y au v arieties. In the F ano case, w e refer to the adjunction num b er as the F ano index i X = α . In this pap er, we consider the main case of terminal F ano 4-fold hypersurfaces, those of F ano index i X = 1. By ﬁxing the F ano index, we hav e d = P a i − 1, and so ma y enco de the data as an integer vector ( a 1 , . . . , a 6 ) ∈ Z 6 b ounded by 1 ≤ a 1 ≤ . . . ≤ a n +2 . (A) (B) Figure 7. (A) The real locus of the aﬃne cone of the member of X 6 ⊂ P (1 , 2 , 3) whose co ef- ﬁcien ts are all equal to 1, whic h is quasismo oth and hence admits only quotient singularities. (B) The real lo cus of the aﬃne cone of the member of X 10 ⊂ P (1 , 3 , 4) whose co eﬃcients are all equal to 1, which is nonquasismo oth; the hyperquotient singularity is visible as the line passing through the origin. Next we come to the analysis of terminal singularities. On a hypersurface X , singularities can o ccur for t wo distinct reasons: either the deriv ativ e of the equation f drops rank at a point P ∈ X , that is, all deriv atives v anish at P , or the C ∗ quotien t deﬁning the am bien t space has a non trivial stabiliser at P . The latter case makes P ∈ X a quotient singularity , and in this case we may use the computationally c heap criterion to determine terminalit y . By deﬁnition, quasismo oth v arieties ha ve only suc h quotient singularities. Figure 7(A) depicts such an example. In the former case, when the equation itself has a singularit y , w e refer to P ∈ X as a hyp ersurfac e singularity . But, worse y et, our main concern is when b oth P ∈ X is an equation singularity and the C ∗ quotien t has a non trivial stabiliser, we say P ∈ X is a hyp er quotient singularity , and think of it as comp osed of b oth the hypersurface equation singularity as a lo cus inside the ambien t quotient space singularity . Suc h a singularity is visible in Figure 7(B), whe re it is visible as the line passing through the origin. W e will search for gener al members of X d ⊂ P ( a 0 , . . . , a n +2 ). Assuming generalit y means we study h yp ersurfaces corresp onding to a dense open subset of the parameter space. This ensures that all mem b ers of the family s hare the same singularity structure, so we can compute a deﬁnitive list of singular p oints and analyse their terminality uniformly . If P is a quotient singularity [17, § 4], then it will b e a singularity of type 1 r ( b 1 , . . . , b n ) for some r ≥ 1 and b i ≥ 0 such that b i ≤ r − 1. The Reid–Shepherd-Barron–T ai criterion [16, § 3.1][20, § 3.2] says that P DEEP REINFOR CEMENT LEARNING F OR F ANO HYPERSURF ACES 7 is terminal if and only if 1 r n X i =1 k b i − 1 > 0 , ∀ 1 ≤ k ≤ r − 1 , where k b i ∈ { 0 , . . . , r − 1 } denotes the residue of k b i mo dulo r . If P is a h yp erquotient singularity [17, § 4], then it will b e a singularity of type 1 r ( b 1 , . . . , b n +1 ; e ) for some r ≥ 1 and b i ≥ 0 such that b i , e ≤ r − 1. W e approximate terminalit y by performing Mori’s criterion [14] restricted to the lattice p oin ts inside the unit cub e. That is, w e approximate P to b e terminal if either r = 1, in whic h case it is a hypersurface singularity , or r ≥ 2 and 1 r n +1 X i =1 k b i − min ( 1 r n +1 X i =1 m i · k b i      x m 1 1 · · · x m n +1 n +1 ∈ f ′ ) − 1 > 0 , ∀ 1 ≤ k ≤ r − 1 , where f ′ is the lo cal equation of f on an aﬃne patch that contains P . Notably , when X d is quasismo oth, w e will hav e f ′ = x i + · · · for some 1 ≤ i ≤ n + 1, and therefore ﬁnd that Mori’s criterion degenerates to the Reid–Shepherd-Barron–T ai criterion. 3.3. Analysis. W e will apply both the ﬁxed and dynamic heuristic algorithms of § 2 to disco ver new F ano 4-fold hypersurfaces with terminal singularities with F ano index 1. The h yp ersurfaces are of the form X d ⊂ P ( a 1 , . . . , a 6 ), where d = ( P a i ) − 1 and 1 ≤ a 1 ≤ . . . ≤ a 6 , and are enco ded in our search as in teger vectors ( a 1 , . . . , a 6 ) ∈ Z 6 . Our goal is tw ofold: to identify as many new examples as p ossible and to uncov er hard to reac h ones. W e show that the ﬁxed heuristic search is particularly successful in the former, whilst the dynamic heuristic search ac hieves b oth. T o ov ercome the high degree obstruction faced by the exhaustive searc h as was discussed in § 3.1, we will b egin b oth the ﬁxed and dynamic searches from the quasismo oth terminal classiﬁcation, whic h comprises 11 , 617 cases. In practice, this means we force the ﬁrst 11 , 617 searched p oints in b oth algorithms to b e the terminal quasismo oth ones, and progress normally from then on. W e run b oth the ﬁxed and dynamic heuristic algorithms for 10 , 000 , 000 steps. In the dynamic search, w e use the hyperparameters in T able 1. Let F and D be the set of terminal p oin ts found by the ﬁxed and dynamic searches, resp ectively . The ﬁxed and dynamic searches are depicted in Figures 8 and 9 resp ectively . Hyp erparameter V alues MLP Neural Net work Lay ers (40,) Activ ation function LeakyReLU LeakyReLU slop e 0.01 Optimiser Adam Optimiser learning rate 0.001 TD discount factor, γ 0.2 Standard deviation, σ 2 Searc h reward, r reward 1 T able 1. Hyp erparameters used in the dynamic heuristic (deep reinforcement learning) searc h. In the ﬁxed heuristic searc h, w e ﬁnd |F | = 113 , 996 nonquasismooth F ano 4-fold h yp ersurfaces with terminal singularities. The algorithm is deterministic, so it will disco ver the s ame examples on a rerun. It is particularly eﬀectiv e at ﬁnding a large quan tity of new examples. It do es, ho wev er, hav e limitations. As shown b y the histogram in Figure 10(A), the ﬁxed search is unable to stray far from previously kno wn rew ard p oints. The dynamic search found |D | = 85 , 262. Since the search is nondeterministic, each run will ﬁnd a diﬀerent set D . Due to the more exploratory nature of the dynamic search, one expects few er examples than the ﬁxed one in the same step count, as a greater num b er of steps are sp ent in unproﬁtable regions during exploration. This is seen in Figure 10(C). The histogram in Figure 10(B) sho ws the upshot of this ho w ever. The ﬁgure shows hundreds of examples found by the dynamic search that are computationally inaccessible to the ﬁxed one. T o measure the inaccessibility of p oints found exclusively by the dynamic search, we analyse tw o sets: F \D , the 31 , 480 p oints found b y the ﬁxed but not the dynamic search, and D \F , the 3 , 106 p oints found b y the dynamic but not the ﬁxed search. F or each p oint in F \D , w e compute the shortest distance under the L 1 norm to the nearest p oint in D . F or each p oint in D \F , we compute the shortest distance to the nearest point in F . F rom these distances we derive a lo wer bound on the n umber of steps required to reac h a p oin t from its nearest neigh b our. This allo ws us to sho w that hundreds of points found b y the dynamic search would b e computationally exp ensiv e to ﬁnd using the ﬁxed searc h alone. Combined with 8 M. TRUTER Figure 8. Fixed heuristic search. The search found 113 , 996 nonquasismo oth F ano 4-fold h y- p ersurfaces with terminal singularities. Eac h frame sho ws points in Z 2 , obtained by pro jecting the original Z 6 searc h space onto consecutive co ordinate pairs via ( a 1 , . . . , a 6 ) 7→ ( a i , a i +1 ). Figure 9. Dynamic heuristic (deep reinforcement learning) search. The search found 85 , 262 nonquasismo oth F ano 4-fold h yp ersurfaces with terminal singularities. Eac h frame sho ws points in Z 2 , obtained by pro jecting the original Z 6 searc h space onto consecutive co ordinate pairs via ( a 1 , . . . , a 6 ) 7→ ( a i , a i +1 ). the fact that the search was initialised from 11 , 617 starting p oints, this demonstrates that suc h points are eﬀectively computationally inaccessible to the ﬁxed search. Explicitly , for a p oint p ∈ F \D (resp. D \F ), we deﬁne the shortest distance D ( p ) := min {∥ p − q ∥ 1 | q ∈ F \{ p } (resp. D \{ p } ) } . Let q denote the nearest neighbour of p , so that ∥ p − q ∥ 1 = D ( p ). W e no w derive lo wer and upp er b ounds on the num b er of steps required to ﬁnd p starting from q . W e ﬁrst assume that p is the closest p oint to q in the relev ant set. Relaxing this assumption leav es the low er bound unchanged, it only weak ens it, but inv alidates the upper b ound, since the search ma y b e steered aw a y from p by a closer p oint. Alternativ ely , replacing the priority function in the ﬁxed search with the constant function v ( n ) := 1 ensures both bounds remain v alid. The ﬁxed searc h exhaustiv ely expands points in order of increasing L 1 distance from q , visiting all p oints at distance 1, then 2, and so on. Consequently , to reach a p oint at DEEP REINFOR CEMENT LEARNING F OR F ANO HYPERSURF ACES 9 distance D ( p ), the search must ﬁrst visit at least one p oin t at distance D ( p ) − 1, and must ha v e already visited all p oin ts at distance ≤ D ( p ) − 2. This giv es a low er b ound on the num b er of steps for p to b e found from q , s L ( p ) = # { ( a 1 , . . . , a 6 ) ∈ B ( q , D ( p ) − 2) ∩ ( Z 6 ∩ ( a 1 ≥ 1) 5 \ i =1 ( a i ≤ a i +1 )) } + 1 . Assuming either p is the closest p oin t to q , or using v ( n ) := 1 as the priority v alue, we m ust hav e found p after searc hing all p oints of distance D ( p ) − 1, and so obtain an upp er b ound s U ( p ) = # { ( a 1 , . . . , a 6 ) ∈ B ( q , D ( p ) − 1) ∩ ( Z 6 ∩ ( a 1 ≥ 1) 5 \ i =1 ( a i ≤ a i +1 )) } . Moreo ver, the likelihoo d that the low er bound b ecomes weak er grows with D ( p ). W e establish b oth b ounds under the assumption that p is the closest p oint to q ; under this assumption, p is guaranteed to b e found within s U ( p ) steps, whic h, as noted ab ov e, would b e even larger without this assumption. The probabilit y of ﬁnding p in s L ( p ) ≤ s ≤ s U ( p ) many steps is then given b y P ( s ) = s − ( s L ( p ) − 1) s U ( p ) − ( s L ( p ) − 1) as it must b e found by a p oint of distance D ( p ) − 1, of whic h there are s U ( p ) − ( s L ( p ) − 1). Therefore, the probability of the lo wer b ound b eing achiev ed is 1 / ( s U ( p ) − ( s L ( p ) − 1)), which increases with D ( p ). Almost all examples hav e weigh t a 1 = 1, w eakening the low er bound. The reduction of the b ound caused by cases where a i − D ( p ) < 1 for i ≥ 2, and a i +1 − a i − D ( p ) < 0 for i ≥ 1, were in practice found to b e negligible. Ov erlo oking this allo ws us to give an appro ximation for the lo wer bound. Assuming a 1 = 1, a i − D ( P ) ≥ 1, a i +1 − a i − D ( p ) > 0, s L ( p ) = # { ( a 1 , . . . , a 6 ) ∈ B (0 , D ( p ) − 2) ∩ ( Z 6 ∩ ( a 1 ≥ 0)) } + 1 . Using this, when D ( p ) = 5, we obtain s L ( p ) = 305, whereas, for D ( p ) = 15 we get 227 , 305, for D ( p ) = 16 it is 528 , 865 and for D ( p ) = 17 it is 774 , 912. T o put the exp ense of these large D ( p ) p oints into p ersp ectiv e, one should note that executing the full 10 , 000 , 000 step ﬁxed search was itself costly . (A) (B) (C) Figure 10. (A) A histogram sho wing the distribution of distances from p oints found in the ﬁxed but not dynamic search to their closest neighbours. (B) A histogram showing the distribution of distances from p oints found in the dynamic but not ﬁxed search to their closest neighbours. The dynamic search ﬁnds points at greater distances than the ﬁxed search, corresp onding to p oin ts that are increasingly inaccessible to the latter. (C) A graph showing the n umber of nonquasismo oth terminal examples found against the num b er of steps taken by the search. As exp ected, the dynamic searc h ﬁnds fewer terminal points in the same num b er of steps as the ﬁxed one, since more steps are sp ent in unproﬁtable territory in order to reach terminal p oints at greater distances. Consider X 1020 ⊂ P (1 , 15 , 32 , 139 , 340 , 494), a p oint found in the dynamic but not the ﬁxed searc h. Its closest point is X 1011 ⊂ P (1 , 10 , 31 , 143 , 337 , 490), a quasismooth start p oin t, at distance 17. Our appro ximation predicts at least 774 , 912 steps are required to reac h it. By setting v ( n ) := 1 in the ﬁxed searc h w e lo cated it in 1 , 041 , 501 steps. How ever, using the original priority v alue, the searc h is directed a wa y from X 1020 , and ev en after 10 , 000 , 000 steps starting from X 1011 alone, it remains out of reach. 10 M. TRUTER As Figure 10(B) illustrates, among all p oin ts found by the dynamic search but not the ﬁxed one, h undreds lie far from any other point, and are therefore b eyond the computational reach of the ﬁxed searc h. Code and Da t a A v ailability All co de required to replicate the results is av ailable on GitHub [21] under an MIT license, along with all datasets. A cknowledgements I am grateful to Gavin Bro wn, Alexander Kasprzyk, Heﬁn Lam bley and Martin Lotz for v aluable feedbac k during the writing of this pap er. The author w as supp orted by the W arwick Mathematics Institute Centre for Do ctoral T raining, and gratefully ackno wledges funding from the UK Engineering and Physical Sciences Research Council (Grant n umber: EP/W523793/1). References [1] P . Berglund, Y. He, E. Hey es, E. Hirst, V. Jejjala, and A. Luk as. New Calabi-Yau manifolds from genetic algorithms. Phys. L ett. B , 850:Paper No. 138504, 10, 2024. [2] C. Birk ar. Singularities of linear systems and b oundedness of Fano v arieties. Ann. of Math. (2) , 193(2):347– 405, 2021. [3] G. Brown and A. Kasprzyk. F our-dimensional pro jectiv e orbifold hypersurfaces. Exp. Math. , 25(2):176–193, 2016. [4] G. Bro wn, A. Kasprzyk, and M. Reid. Kaw amata bounds for Fano threefolds and the Graded Ring Database. page 23pp, 2022. [5] G. Brown and A. M. Kasprzyk. The graded ring database. https://grdb.co.uk, 2007-present. [6] J. Chen, J. A. Chen, and M. Chen. On quasismo oth w eighted complete intersections. J. A lgebr aic Ge om. , 20(2):239–262, 2011. [7] T. Coates, A. M. Kasprzyk, and S. V eneziale. Machine learning detects terminal singularities. In A. Oh, T. Naumann, A. Glob erson, K. Saenko, M. Hardt, and S. Levine, editors, A dvanc es in Neur al Information Pr o cessing Systems , volume 36, pages 67183–67194. Curran Asso ciates, Inc., 2023. [8] T. Coates, A. M. Kasprzyk, and S. V eneziale. Machine learning the dimension of a F ano v ariety . Natur e Communic ations , 14(1):5526, 2023. [9] A. Davies, P . V eliˇ ck ovi ´ c, L. Buesing, S. Blackw ell, D. Zheng, N. T oma ˇ sev, R. T anburn, P . Battaglia, C. Blun- dell, A. Juh´ asz, M. Lack enb y , G. Williamson, D. Hassabis, and P . Kohli. Adv ancing mathematics by guiding h uman in tuition with AI. Natur e , 600(7887):70–74, 2021. [10] Y. He. AI-driven research in pure mathematics and theoretical physics. Natur e R eviews Physics , 6(9):546–553, 2024. [11] A. R. Iano-Fletc her. W orking with w eighted complete in tersections. In Explicit bir ational geometry of 3-folds , v olume 281 of L ondon Math. So c. L e cture Note Ser. , pages 101–173. Cam bridge Univ. Press, Cambridge, 2000. [12] V. A. Isko vskih. F ano threefolds. I. Izv. Akad. Nauk SSSR Ser. Mat. , (no. 3,):516–562, 717, 1977. [13] V. A. Isko vskih. F ano threefolds. I I. Izv. Akad. Nauk SSSR Ser. Mat. , (no. 3,):506–549, 1978. [14] S. Mori. On 3-dimensional terminal singularities. Nagoya Math. J. , 98:43–66, 1985. [15] S. Mori and S. Muk ai. On Fano 3-folds with B 2 ≥ 2. In A lgebr aic varieties and analytic varieties (Tokyo, 1981) , volume 1 of A dv. Stud. Pur e Math. , pages 101–129. North-Holland, Amsterdam, 1983. [16] M. Reid. Canonical 3-folds. In Journ ´ ees de G´ eometrie Alg´ ebrique d’Angers, Juil let 1979/Algebr aic Ge ometry, Angers, 1979 , pages 273–310. Sijthoﬀ & No ordhoﬀ, Alphen aan den Rijn—Germanto wn, Md., 1980. [17] M. Reid. Y oung p erson’s guide to canonical singularities. In Algebr aic ge ometry, Bowdoin, 1985 (Brunswick, Maine, 1985) , volume 46, P art 1 of Pr oc. Symp os. Pur e Math. , pages 345–414. Amer. Math. Soc., Providence, RI, 1987. [18] M. Reid. Up date on 3-folds. In Pr o c e e dings of the International Congr ess of Mathematicians, Vol. II (Beijing, 2002) , pages 513–524. Higher Ed. Press, Beijing, 2002. [19] R. S. Sutton and A. G. Barto. R einfor cement L e arning: An Intr o duction . MIT Press, Cambridge, MA, 2nd edition, 2018. [20] Y. T ai. On the Ko daira dimension of the mo duli space of ab elian v arieties. Invent. Math. , 68(3):425–439, 1982. [21] M. T ruter. Deep reinforcemen t learning for F ano h yp ersurfaces: source code and datasets, 2026. https: //github.com/marctruter/deep_fano_hypersurface . Ma thema tics Institu te, University of W ar wick, Coventr y, CV4 7AL, UK Email addr ess : Marc.Truter@warwick.ac.uk

Deep Reinforcement Learning for Fano Hypersurfaces

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment