Fast Integer Multiplication using Modular Arithmetic

F ast In teger Multiplication Using Mo dular Arithmetic Anindy a De, Piyush P Kurur ∗ , Chandan Saha Dept. of Computer Science and Engineering Indian Institute of T ec hnology , Kanpur Kanpur, UP , India, 208016 { anindya,p pk,csaha } @cse .iitk.ac.in Ramprasad Saptharishi † Chennai Mathematical Institute Plot H1, SIPCOT IT P ark P adur PO, Siruseri, India, 60310 3 and Dept. of Computer Science and Engineering Indian Institute of T ec hnology , Kanpur Kanpur, UP , India, 208016 ramprasa d@cmi.ac.in No v em b er 26, 2024 Abstract W e give an O ( N · log N · 2 O (log ∗ N ) ) a lgorithm for multiplying tw o N -bit integers that improv es the O ( N · log N · log log N ) algor ithm b y Sch¨ onhage-Stra ssen [SS71]. Both these algorithms use mo dular arithmetic. Recen tly , F ¨ urer [F ¨ ur0 7] g av e an O ( N · log N · 2 O (log ∗ N ) ) algorithm whic h how ever uses ar ithmetic ov er complex num b ers as opp os ed to mo dular a rithmetic. In this pap er, we use m ultiv ariate polynomia l multiplication along with ideas from F ¨ urer’s algorithm to achiev e this improv ement in the mo dular setting. Our a lgorithm can also be view ed as a p -a dic version of F ¨ urer’s algo rithm. Thu s, we show that the t w o seemingly diﬀ erent approaches to in teger m ultiplication, modular and complex a rithmetic, are similar. 1 In tro duction Computing the pr o duct of t wo N -bit in tegers is an imp ortan t problem in algorithmic n umb er theory and algebra. A naiv e app roac h leads to an algorithm that uses O ( N 2 ) bit op erations. Karatsuba [KO63] sh o we d th at s ome m ultiplication op eratio ns of such an algorithm can b e replaced b y less costly addition op erations whic h reduces the ov eral l run ning time of the algorithm to O ( N log 2 3 ) bit op erations. Shortly afterw ards this result was impro v ed b y T oom [T o o63 ] who sho we d that for an y ε > 0, inte ger m ultiplicatio n can b e done in O ( N 1+ ε ) time. Th is led to the question as ∗ Researc h sup p orted through Researc h I F oundation pro ject NR NM/CS/2003 0163 † Researc h done while visiting I IT Kanpur und er Pro ject FL W/DST/CS/200602 25 1 to whether the time complexit y can b e impro v ed fu r ther b y replacing the term O ( N ǫ ) b y a p oly- logarithmic factor. In a ma jor b reakthrough, Sc h¨ o nh age and Strassen [SS71] ga v e t w o eﬃcien t algorithms for multiplying int egers u sing fast p olynomia l multi plication. One of the algorithms ac hiev ed a run n ing time of O ( N · log N · log lo g N . . . 2 O (log ∗ N ) ) using arithmetic ov er complex n umb ers (appro ximated to suitable pr ecision), wh ile the other used arithmetic mo du lo carefully c hosen integ ers to impr ov e the complexit y further to O ( N · l og N · log log N ). Despite m an y eﬀorts, the mo dular algo rithm remained the b est u n til a recen t remark able resu lt by F ¨ urer [F ¨ ur 07]. F ¨ urer ga v e an alg orithm that uses arithmetic o ver complex n umb ers and ru n s in O ( N · log N · 2 O (log ∗ N ) ) time. Till date this is the b est time complexit y result kno wn for integ er m ultiplication. Sc h¨ onhage and S trassen introd uced t wo seemingly diﬀerent approac hes to in teger m ultiplica- tion – u sing complex and mo du lar arithmetic. F¨ urer’s algorithm imp ro v es the time complexit y in the complex arithmetic setting by clev erly reducing some costly multiplic ations to simple shift. Ho w ev er, the algorithm needs to approxima te the complex n umb ers to certain precisions during computation. This in tro d uces the added task of b oun ding the total truncation errors in the analysis of the algorithm. On the cont rary , in the mo dular setting the error analysis is virtually absent. In addition, mo du lar arithmetic giv es a discrete appr oac h to a discrete problem lik e inte ger mul- tiplicatio n. Therefore it is natural to ask w hether we can ac hiev e a similar impr o v emen t in time complexit y of this pr oblem in the mo dular arithmetic setting. In this pap er, w e answer this qu es- tion aﬃrmativ ely . W e give an O ( N · log N · 2 O (log ∗ N ) ) algorithm for int eger m ultiplication using mo dular arithmetic, thus matc hing the impro ve ment made by F¨ urer. Ov erview of our result As is the case in b oth Sc h¨ onhage-Strassen’s and F ¨ urer’s algorithms, we start by redu cing the prob- lem to p olynomial multiplic ation ov er a ring R by p rop erly enco ding the giv en integ ers. P olynomials can b e m ultiplied eﬃcie ntly us in g Discrete F our ier T ransforms (DFT), whic h u s es sp ecial ro ots of unit y . F or instance, to multi ply t w o p olynomials of degree less than M using the F ourier transform , w e r equ ir e a princip al 2 M -th ro ot of u nit y (see Deﬁnition 2.1 for princip al r o ot ). An eﬃcien t w a y of computing the DFT of a p olynomial is through the F ast F ourier T r ansform (FFT). In addition, if m ultiplications b y these ro ots are eﬃcien t, we get a faster algorithm. S ince m ultiplication by 2 is a s hift, it would b e go o d to hav e a ring with 2 as a ro ot of un it y . One w a y to constru ct suc h a ring in the m o dular setting is to consider rings of th e form R = Z / (2 M + 1) Z as is the case in Sc h¨ onhage and Strassen [SS71]. Ho w ev er, th is mak es the size of R equal to 2 M , whic h although w orks in case of Sch¨ onhage and Strassen’s algorithm, is a litt le to o large to handle in our case. W e w ould lik e to ﬁn d a ring w h ose size is b ounded b y some p olynomial in M and w hic h also con tains a principal 2 M -th ro ot of unit y . In f act, it is this c h oice of ring that p oses the primary c h allenge in adapting F ¨ u rer’s algorit hm and making it wo rk in the discrete setting. In order to o v ercome this h urd le we c ho ose the ring to b e R = Z /p c Z , for a prime p and a constant c suc h that p c = pol y ( M ). The ring Z /p c Z , has a principal 2 M -th ro ot of unit y if and only if 2 M divides p − 1, whic h means that w e n eed to searc h for a prime p in the arithmetic progression { 1 + i · 2 M } i> 0 . T o mak e this searc h computatio nally eﬃcien t, w e need the degree of the p olynomials M to b e su ﬃcien tly small compared to the in put size. It tur ns out that th is can b e ac hiev ed b y considering multiv ariate p oly- nomials instead of u niv ariate p olynomials. W e use enough v ariables to mak e sure that th e searc h for suc h a pr ime do es not aﬀect the ov erall running time; the n umb er of v ariables ﬁnally c hosen is a constan t as well . In fact, the use of m ultiv ariate p olynomial multiplic ations and a small ring are the main steps where our algorit hm diﬀers from earlier algorit hm s by Sc h¨ onhage-Strassen and F¨ urer. 2 The use of inner a nd outer DFT pla ys a cen tral role in b oth F ¨ urer’s as we ll as our algo- rithm. T o wa rd s understandin g the notion of in n er and outer DFT in the con text of multiv ariate p olynomials, we pr esen t a group theoret ic int erpr etation of Discre te F ourier T ransform (DFT). Arguing along the line of F ¨ urer [F ¨ u r07] we show that r ep eated use of eﬃcien t computatio n of inner DFT’s using some sp ecial ro ots of un it y in R makes the ov erall pro cess eﬃcien t and leads to an O ( N · log N · 2 O (log ∗ N ) ) time algorithm. 2 The Ring, the Prime and the Ro ot of Unit y W e w ork with the ring R = Z [ α ] / ( p c , α m + 1) for some m , a constant c and a prime p . Elemen ts of R are thus m − 1 degree p olynomials o v er α with co eﬃcients from Z /p c Z . By construction, α is a 2 m -th ro ot of unit y and m ultiplication of any elemen t in R b y an y p o w er of α can b e ac hiev ed b y s h ifting op eratio ns — this pr op ert y is cru cial in m aking some m ultiplications in the FFT less costly (Section 4.2). Giv en an N -bit n umber a , w e enco d e it as a k -v ariate p olynomial ov er R with degree in eac h v ariable less than M . The parameters M and m are p o w ers of t wo suc h that M k is roughly N log 2 N and m is roughly log N . The p arameter k will ultimately b e c hosen a constan t (see Section 5). W e no w explain the details of this enco ding. 2.1 Enco ding Integers into multiv ariate Pol ynomials Giv en an N -bit in teger a , we ﬁrst break these N b its in to M k blo c ks of roughly N M k bits eac h. Th is corresp onds to repr esen ting a in b ase q = 2 N M k . Let a = a 0 + . . . + a M k − 1 q M k − 1 where a i < q . T he n umb er a is con v erted in to a p olynomial as follo ws: 1. Express i in base M as i = i 1 + i 2 M + · · · + i k M k − 1 . 2. Enco de eac h term a i q i as the monomial a i · X i 1 1 · · · X i k k . As a result, the n umber a gets con v erted to the p olynomial P M k − 1 i =0 a i · X i 1 1 · · · X i k k . F urther, w e br eak eac h a i in to m 2 equal sized blo cks wh ere the num b er of bits in eac h b lo c k is u = 2 N M k · m . Eac h co eﬃcien t a i is then enco ded as p olynomial in α of degree less than m 2 . The p olynomials are then padded w ith zero es to stretc h their degrees to m . Thus, th e N -bit num b er a is con v erted to a k -v ariate p olynomial a ( X ) o v er Z [ α ] / ( α m + 1). Giv en integ ers a and b , eac h of N bits, w e enco de them as p olynomials a ( X ) and b ( X ) and compute the pro d uct p olynomial. The pro duct a · b can b e reco ve red by su bstituting X s = q M s − 1 , for 1 ≤ s ≤ k , and α = 2 u in the p olynomial a ( X ) · b ( X ). The co eﬃcien ts in the p ro du ct p olynomial could b e as large as M k · m · 2 2 u and hence it is su ﬃcien t to do arithmetic mo d ulo p c where p c > M k · m · 2 2 u . Therefore, a ( X ) can indeed b e considered as a p olynomial o ver R = Z [ α ] / ( p c , α m + 1). Our c hoice of the p rime p ens ures that c is in fact a constan t (see Section 5). 2.2 Cho osing the prime The prime p should b e c hosen suc h that the ring Z /p c Z has a princip al 2 M -th root of unit y , which is required for p olynomial multi plication using FFT. A principal ro ot of u nit y is deﬁned as follo ws. 3 Deﬁnition 2.1. An n -th r o ot of u nity ζ ∈ R is said to b e primitiv e if it g ener ates a cyclic gr oup of or der n under multiplic ation. F urthermor e, it is said to b e p rincipal if n is c oprime to the char acteristic of R and ζ satisﬁes P n − 1 i =0 ζ ij = 0 for al l 0 < j < n . In Z /p c Z , a 2 M -th ro ot of un ity is prin cipal if and only if 2 M | p − 1 (see also Section 6). As a result, we need to choose the prime p from the arithmetic progression { 1 + i · 2 M } i> 0 , which is the main b ottlenec k of our approac h. W e no w explain how this b ottlenec k can b e circumv en ted. An upp er b ound for the least prime in an arithmetic progression is giv en by the follo wing theorem [Lin44]: Theorem 2.2 (Linnik) . Ther e exist absolute c onstants ℓ and L such that for any p air of c oprime inte gers d and n , the le ast prime p such that p ≡ d mo d n is less than ℓn L . Heath-Bro wn [HB92] show ed that the Linnik c onstant L ≤ 5 . 5. Recall th at M is c hosen suc h that M k is Θ  N log 2 N  . If we c ho ose k = 1, that is if w e use u niv ariate p olynomia ls to enco de in tegers, then the parameter M = Θ  N log 2 N  . Hence the least prime p ≡ 1 (mo d 2 M ) could b e as large as N L . Since all kno wn deterministic sieving pro cedures tak e at least N L time this is clearly infeasible (for a ran d omized appr oac h see S ection 5.1). Ho w ev er, by c ho osing a larger k we can ensure that the least p rime p ≡ 1 (mo d 2 M ) is O ( N ε ) for some constan t ε < 1. Remark 2.3. If k is an y in teger greater than L + 1, then M L = O  N L L +1  and hence th e least prime p ≡ 1 mo d 2 M can b e found in o ( N ) time. 2.3 The Ro ot of Unit y W e require a pr incipal 2 M -th ro ot of unit y in R to compute the F ourier transforms. This ro ot ρ ( α ) should also hav e the p r op ert y that its  M m  -th p o w er is α , so as to make some multipli cations in the FFT eﬃcient (Lemma 4.9). Such a ro ot can b e computed b y interp olatio n in a wa y similar to that in F ¨ ur er’s algo rithm [F¨ ur07, Section 3], but w e brieﬂy sk etc h the p ro cedure for completeness. W e ﬁ rst obtain a ( p − 1)-th ro ot of unit y ζ in Z /p c Z b y lifting a generator of F ∗ p . The  p − 1 2 M  -th p ow er of ζ give s us a 2 M -th ro ot of unit y ω . A generator of F ∗ p can b e computed b y brute force, as p is suﬃcien tly small. Ha ving obtained a generator, we can use Hensel Lifting [NZM91, Theorem 2.23]. Lemma 2.4. L et ζ s b e a primitive ( p − 1) -th r o ot of unity i n Z /p s Z . Then ther e exists a unique primitive ( p − 1) -th r o ot of u nity ζ s +1 in Z /p s +1 Z such that ζ s +1 ≡ ζ s (mo d p s ) . This unique r o ot is given by ζ s +1 = ζ s − f ( ζ s ) f ′ ( ζ s ) wher e f ( X ) = X p − 1 − 1 . W e n eed the follo w ing claims to compute the ro ot ρ ( α ). Claim 2.5. L et ω b e a princip al 2 M -th r o ot of unity in Z /p c Z . (a) If σ = ω M m , then σ is a princip al 2 m -th r o ot of unity. (b) The p olynom ial x m + 1 = Q m i =1 ( x − σ 2 i − 1 ) in Z /p c Z . M or e over, for any 0 ≤ i < j ≤ 2 m , the ide als gener ate d by ( x − σ i ) and ( x − σ j ) ar e c omaxima l in Z [ x ] /p c Z . 4 (c) The r o ots  σ 2 i − 1  1 ≤ i ≤ m ar e distinct mo dulo p and ther ef or e the diﬀer enc e of any two of them is a unit in Z [ x ] /p c Z . W e then, through in terp olation, solv e for a p olynomial ρ ( α ) such that ρ ( σ 2 i +1 ) = ω 2 i +1 for all 1 ≤ i ≤ m . Th en, ρ ( σ 2 i +1 ) = ω 2 i +1 1 ≤ i ≤ m = ⇒  ρ ( σ 2 i +1 )  M /m = ω (2 i +1) M/m = σ 2 i +1 = ⇒ ( ρ ( α )) M /m = α (mo d α − σ 2 i +1 ) 1 ≤ i ≤ m = ⇒ ( ρ ( α )) M /m = α (mo d α m + 1) The ﬁ rst t wo parts of the claim ju s tify the Chinese Remaindering. Finally , computing a p olynomial ρ ( α ) such that ρ ( σ 2 i +1 ) = ω 2 i +1 can b e done th r ough in terp olation. ρ ( α ) = m X i =1 ω 2 i +1 Q j 6 = i ( α − σ 2 j + 1 ) Q j 6 = i ( σ 2 i +1 − σ 2 j + 1 ) The division by ( σ 2 i +1 − σ 2 j + 1 ) is justiﬁed as it is a unit in Z /p c Z (part (c) of Claim 2.5). 3 The In teger Multiplication Algorithm W e are giv en tw o inte gers a, b < 2 N to m ultiply . W e ﬁx constan ts k and c whose v alues are giv en in Section 5. Th e algorithm is as follo ws: 1. Cho ose M and m as p o wers of tw o such that M k ≈ N log 2 N and m ≈ log N . Find the least prime p ≡ 1 (mo d 2 M ) (Remark 2.3). 2. Enco de the inte gers a and b as k -v ariate p olynomials a ( X ) and b ( X ) r esp ectiv ely o v er the ring R = Z [ α ] / ( p c , α m + 1) (Sectio n 2.1). 3. Compute the ro ot ρ ( α ) (Sectio n 2.3). 4. Use ρ ( α ) as th e pr incipal 2 M -th ro ot of unity to compute the F ourier transforms of the k - v ariate p olynomials a ( X ) and b ( X ). Multiply comp onent- wise and tak e the in v erse F ourier transform to obtain the pr o duct p olynomial . 5. Ev aluate the pro duct p olynomial at appropriate p o we rs of t w o to reco ver the integ er pro duct and return it (Section 2.1). The only missing piece is th e F ourier transforms for multiv ariate p olynomia ls. The follo wing section giv es a group theoretic description of FFT. 4 F ourier T ransform A con v enien t w a y to study p olynomial m ultiplication is to int erpret it as multi plication in a gr oup algebr a . 5 Deﬁnition 4.1 (Group Algebra) . L et G b e a gr oup. The group algebra of G over a ring R is the set of forma l sums P g ∈ G α g g wher e α g ∈ R with addition deﬁne d p oint-wise and multiplic ation deﬁne d via c onvolution as fol lows X g α g g ! X h β h h ! = X u   X g h = u α g β h   u Multiplying univ ariate p olynomials o v er R of degree less than n can b e seen as multi plication in the group alg ebra R [ G ] where G is the cyclic group of order 2 n . Similarly , multi plying k -v ariate p olynomials of degree less than n in eac h v ariable can b e seen as m ultiplying in the group algebra R [ G k ], where G k denotes the k -fold p ro du ct group G × . . . × G . In this section, we stud y the F ourier transform o v er the group algebra R [ E ] where E is an additive ab elian gr oup . Most of this, alb eit in a diﬀeren t form, is w ell kno wn but is provi ded here for completeness. [Sha99, Ch apter 17] In order to simplify our pr esentat ion, we will ﬁx the base ring to b e C , the ﬁ eld of complex n umb ers. Let n b e the e xp onent of E , that is the maxim um order of any elemen t in E . A similar approac h can b e follo wed for any other base ring as long as it has a principal n -th ro ot of unit y . W e consider C [ E ] as a vect or space with basis { x } x ∈ E and use th e Dirac notatio n to represent elemen ts of C [ E ] — the ve ctor | x i , x in E , denotes the element 1 .x of C [ E ]. Deﬁnition 4.2 (Characters) . L et E b e an additive ab elian gr oup. A c haracter of E is a homo- morphis m fr om E to C ∗ . An example of a charact er of E is the trivial charact er, whic h w e will denote by 1, that assigns to ev ery elemen t of E the complex num b er 1. If χ 1 and χ 2 are t wo c haracters of E then their pro du ct χ 1 .χ 2 is deﬁn ed as χ 1 .χ 2 ( x ) = χ 1 ( x ) χ 2 ( x ). Prop osition 4.3. [Sha99, Chapter 17, The or em 1] L e t E b e an additive ab elian gr oup of exp onent n . Then the values taken by any char acter of E ar e n -th r o ots of unity. F urthermor e, the char acters form a multiplicat ive ab elian group ˆ E which is isomo rphic to E . An imp ortant prop ert y that the charact ers satisfy is the follo wing [Is a94 , Corollary 2.14]. Prop osition 4.4 (Sc hur’s Orthogonalit y) . L et E b e an additive ab elian gr oup. Then X x ∈ E χ ( x ) = ( 0 if χ 6 = 1 , # E otherwise X χ ∈ ˆ E χ ( x ) = ( 0 if x 6 = 0 , # E otherwise. It follo ws from Sch ur’s orthogonalit y that the collection of v ectors | χ i = P x χ ( x ) | x i forms a basis of C [ E ]. W e will call this basis the F ourier b asis of C [ E ]. Deﬁnition 4.5 (F ourier T ransform) . L e t E b e an additive ab elian gr oup and let x 7→ χ x b e an isomorp hism b etwe e n E and ˆ E . The F ourier transform over E is the line ar map fr om C [ E ] to C [ E ] that sends | x i to | χ x i . 6 Th us , the F ourier transform is a change of basis from the p oint b asis {| x i} x ∈ E to the F ourier basis {| χ x i} x ∈ E . Th e F ourier transform is unique only up to the choi ce of the isomorphism x 7→ χ x . This isomorphism is d etermined b y the c hoice of the prin cipal root of unit y . Remark 4.6. Given an elemen t | f i ∈ C [ E ], to compute its F ourier transf orm it is suﬃcien t to compute the F ourier c o eﬃcients {h χ | f i} χ ∈ ˆ E . 4.1 F ast F ourier T ransform W e no w describ e the F ast F ourier T ransform for general ab elian group s in the c haracter th eoretic setting. F or the rest of the section ﬁx an additiv e ab elia n group E o ver wh ic h we w ould lik e to compute the F ourier transform. Let A b e any subgroup of E and let B = E / A . F or any suc h pair of ab elian groups A and B , we hav e an appropriate F ast F ourier transformation, whic h w e describ e in the rest of th e section. Prop osition 4.7. 1. Every char acter λ of B c an b e “lifte d” to a char acter of E (which wil l also b e denote d by λ ) deﬁne d as fol low s λ ( x ) = λ ( x + A ) . 2. L et χ 1 and χ 2 b e two char acters of E that when r estricte d to A ar e identic al. Then χ 1 = χ 2 λ for some char acter λ of B . 3. The gr oup ˆ B is (isomor phic to) a sub gr oup of ˆ E with the quotient gr oup ˆ E / ˆ B b eing (isomor- phic to) ˆ A . W e no w consider the task of computing the F ourier transform of an elemen t | f i = P f x | x i presen ted as a list of co eﬃcien ts { f x } in the p oin t basis. F or this, it is su ﬃcien t to compu te the F ourier co eﬃcien ts {h χ | f i} for eac h c haracter χ of E (Remark 4.6). T o describ e the F ast F ourier transform we ﬁ x t w o sets of cosets representa tiv es, one of A in E and one of ˆ B in ˆ E as follo ws. 1. F or eac h b ∈ B , b b eing a coset of A , ﬁx a coset repr esen tativ e x b ∈ E suc h b = x b + A . 2. F or eac h c haracter ϕ of A , ﬁx a charact er χ ϕ of E suc h that χ ϕ restricted to A is the charact er ϕ . The c haracters { χ ϕ } form (can b e thought of as) a s et of coset representa tiv es of ˆ B in ˆ E . Since { x b } b ∈ B forms a set of coset r epresen tativ es, an y | f i ∈ C [ E ] can b e wr itten uniquely as | f i = P f b,a | x b + a i . Prop osition 4.8. L et | f i = P f b,a | x b + a i b e an element of C [ E ] . F or e ach b ∈ B and ϕ ∈ ˆ A let | f b i ∈ C [ A ] and | f ϕ i ∈ C [ B ] b e deﬁne d as fol lows . | f b i = X a ∈ A f b,a | a i | f ϕ i = X b ∈ B χ ϕ ( x b ) h ϕ | f b i | b i Then for any char acter χ of E , which c an b e expr esse d as χ = λ · χ ϕ , the F ourier c o eﬃcie nt h χ | f i = h λ | f ϕ i . 7 Pr o of. Recall that λ ( x + A ) = λ ( x ), and ϕ is a restrictio n of the χ to th e su bgroup A . h χ | f i = X b X a χ ( x b + a ) f b,a = X b λ ( x b + a ) X a χ ϕ ( x b + a ) f b,a = X b λ ( b ) χ ϕ ( x b ) X a ϕ ( a ) f b,a = X b λ ( b ) χ ϕ ( x b ) h ϕ | f b i = h λ | f ϕ i W e are now ready to describ e the F ast F ourier transform give n an elemen t | f i = P f x | x i . 1. F or eac h b ∈ B compute the F ourier transforms of | f b i . This requires # B m an y F ourier transforms o ve r A . 2. As a r esult of the pr evious step we ha v e f or eac h b ∈ B and ϕ ∈ ˆ A the F our ier co eﬃ- cien ts h ϕ | f b i . Compu te for eac h ϕ th e v ectors | f ϕ i = P b ∈ B χ ϕ ( x b ) h ϕ | f b i | b i . Th is r equires # ˆ A. # B = # E many m ultiplications by roots of unit y . 3. F or eac h ϕ ∈ ˆ A compute the F ourier trans form of | f ϕ i . This requires # ˆ A = # A many F ourier transforms o ve r B . 4. An y c haracter χ of E is of th e form χ ϕ λ for some ϕ ∈ ˆ A and λ ∈ ˆ B . Using Prop osition 4.8 w e h a v e at the end of Step 3 all the F our ier co eﬃcien ts h χ | f i = h λ | f ϕ i . If the quotien t group B itself has a sub group that is isomorphic to A then w e can apply this pro cess recursiv ely on B to obtain a divide and conquer pro cedure to compute the F ourier transform. In the standard FFT we use E = Z / 2 n Z . The subgroup A is 2 n − 1 E wh ic h is isomorph ic to Z / 2 Z and the quotien t group B is Z / 2 n − 1 Z . 4.2 Analysis of the F ourier T ransform Our goal is to m ultiply k -v ariate p olynomia ls ov er R , with the degree in eac h v ariable less than M . This can b e achie ved by embedd ing the p olynomial s int o the algebra of the pr o duct group E =  Z 2 M · Z  k and m ultiplying them as elemen ts of the alg ebra. Since the exp onen t of E is 2 M , we require a pr incipal 2 M -th ro ot of unity in the rin g R . W e shall use the ro ot ρ ( α ) (as deﬁned in Section 2.3) for the F ourier transform o ve r E . F or ev ery subgroup A of E , w e hav e a corresp ond in g FFT. W e choose the subgroup A as  Z 2 m · Z  k and let B b e the quotien t group E / A . The group A has exp onent 2 m and α is a principal 2 m -th ro ot of unity . Since α is a p ow er of ρ ( α ), we can u s e it f or the F ourier transform o ver A . As m ultiplications by p o w ers of α are just sh ifts, this makes F ourier transform ov er A eﬃcien t. Let F (2 M , k ) denote the complexit y of computing the F ourier tran s f orm ov er  Z 2 M · Z  k . W e ha v e F (2 M , k ) =  M m  k F (2 m, k ) + M k M R + (2 m ) k F  M m , k  (1) 8 where M R denotes the complexit y of m ultiplications in R . The ﬁ rst term comes from the # B m an y F ourier transforms o v er A (Step 1 of FFT), the second term corresp onds to the multiplic ations b y ro ots of u nit y (Step 2) and the last term come s f r om the # A many F ourier transforms o ve r B (Step 3). Since A is a subgroup of B as well, F ourier transforms o v er B can b e recursivel y computed in a similar wa y , with B pla ying the role of E . Th erefore, by simplifying the recurrence in Equ ation 1 w e get: F (2 M , k ) = O  M k log M m k log m F (2 m, k ) + M k log M log m M R  (2) Lemma 4.9. F (2 m, k ) = O ( m k +1 log m · log p ) Pr o of. Th e FFT o ver a group of size n is usually done b y taking 2-p oint FFT’s follo w ed b y n 2 -p oin t FFT’s. T his inv olv es O ( n log n ) m ultiplications by ro ots of unit y and add itions in base ring. Using this metho d, F ourier transforms ov er A can b e computed with O ( m k log m ) multiplica tions and additions in R . S ince eac h multiplica tion is b et ween an elemen t of R and a p ow er of α , this can b e eﬃcient ly ac hiev ed thr ough shifting op erations. Th is is dominated by the add ition op eratio n, whic h tak es O ( m log p ) time, since this inv olv es adding m co eﬃcients from Z /p c Z . Therefore, from Equation 2, F (2 M , k ) = O  M k log M · m · log p + M k log M log m M R  (3) 5 Complexit y A n alysis The c hoice of parameters should ensure that the follo wing constrain ts are satisﬁed: 1. M k = Θ  N log 2 N  and m = O (log N ). 2. M L = O ( N ε ) w h ere L is the L innik constant (Theorem 2.2) and ε is an y constant less than 1. Recall that this mak es pic king the prime b y b r ute force feasible (see Remark 2.3). 3. p c > M k · m · 2 2 u where u = 2 N M k m . This is to preven t o ve rﬂ o ws d uring mo dular arithmetic (see Section 2.1). It is straigh tforward to c hec k that k > L + 1 and c > 5( k + 1) satisfy the ab ov e constrain ts. Heath- Bro wn [HB92] sho w ed that L ≤ 5 . 5 and therefore c = 42 clearly suﬃces. Let T ( N ) denote the time complexit y of multiplying t w o N b it in tegers. T h is consists of: (a) T ime required to pic k a suitable prime p , (b) C omputing the ro ot ρ ( α ), (c) E n co ding the inp ut in tegers as p olynomials, (d) Multiplying the enco ded p olynomia ls, 9 (e) E v aluating the pr o duct p olynomial. As argued b efore, the pr ime p can b e c hosen in o ( N ) time. T o compute ρ ( α ), we need to lift a generator of F ∗ p to Z /p c Z follo we d by an interpolation. Since c is a constant and p is a p rime of O (log N ) bits, the time required for Hensel Lifting and interpolation is o ( N ). The enco ding in vo lve s dividing bits int o smaller b lo c ks, and expressing the exp onents of q in base M (Section 2.1) and all these tak e O ( N ) time since M is a p o wer of 2. Similarly , ev aluation of the pr o duct p olynomial take s linear time as w ell. Therefore, the time complexit y is domin ated b y the time tak en for p olynomia l multiplic ation. Time complexity of P olynomial Multiplication F rom Equation 3, the complexit y of F ourier transform is giv en b y F (2 M , k ) = O  M k log M · m · log p + M k log M log m M R  Prop osition 5.1. [Sch82] Multiplic ation in the ring R r e duc es to multiplying O (log 2 N ) bit inte gers and ther efor e M R = T  O (log 2 N )  . Pr o of. Element s of R can b e seen as p olynomials in α o ve r Z /p c Z with d egree at most m . Giv en t w o suc h p olynomials f ( α ) and g ( α ), enco d e them as follo ws: Replace α by 2 d , transforming the p olynomials f ( α ) and g ( α ) to the in tegers f (2 d ) and g (2 d ) resp ecti vel y . The parameter d is c h osen suc h that the co eﬃcien ts of the pro duct h ( α ) = f ( α ) g ( α ) can b e r eco v ered from the pro d uct f (2 d ) · g (2 d ). F or this, it is suﬃcient to ensure that the maxim um co eﬃcien t of h ( α ) is less than 2 d . Since f and g are p olynomials of degree m , we w ould w an t 2 d to b e greater than m · p 2 c , w hic h can b e ensured by choosing d = Θ (log N ). Th e int egers f (2 d ) and g (2 d ) are b oun d ed b y 2 md and hence the task of m ultiplying in R reduces to O (log 2 N ) bit in teger multi plication. Multiplicati on of tw o p olynomials in v olv e a F ourier transform follo wed b y comp onent- wise mul- tiplicatio ns and an inv erse F our ier transform. Since the num b er of component-wise m ultiplications is only M k , the time tak en is M k · M R whic h is clearly sub s umed in F ( M , k ). Therefore, the time tak en for multi plying the p olynomials is O ( F ( M , k )). Th us, the complexit y of our intege r m ultiplication algorithm T ( N ) is give n b y , T ( N ) = O ( F ( M , k )) = O  M k log M · m · log p + M k log M log m M R  = O  N log N + N log N · log log N T ( O (log 2 N ))  The ab ov e recurrence leads to the follo wing theorem. Theorem 5.2. Given two N bit inte g ers, their pr o duct c an b e c ompute d in O ( N · log N · 2 O (log ∗ N ) ) time. 10 5.1 Cho osing the Prime Randomly T o ensure that th e searc h for a prime p ≡ 1 (mod 2 M ) do es not aﬀec t the o ve rall time complexit y of the algorithm, we considered m ultiv ariate p olynomials to restrict the v alue of M ; an alternati ve is to use r and omization. Prop osition 5.3. A ssuming ERH, a prime p such that p ≡ 1 (mo d 2 M ) c an b e c ompute d by a r andomize d algorithm with exp e cte d running time ˜ O (log 3 M ) . Pr o of. Titc hm ars h [Tit30] (see also Tianxin [Tia90]) show ed, assuming ERH, that the num b er of primes less than x in the arithmetic progression { 1 + i · 2 M } i> 0 is giv en b y , π ( x, 2 M ) = Li ( x ) ϕ (2 M ) + O ( √ x log x ) for 2 M ≤ √ x · (log x ) − 2 , wh ere Li ( x ) = Θ( x log x ) and ϕ is the Euler totien t function. In our case, since M is a p o w er of t w o, ϕ (2 M ) = M , an d h ence for x ≥ 4 M 2 · log 6 M , w e ha ve π ( x, 2 M ) = Ω  x M log x  . Therefore, for an i chosen uniformly randomly in the range 1 ≤ i ≤ 2 M · log 6 M , the probabilit y that i · 2 M + 1 is a prime is at least d log x for a constan t d . F ur thermore, primalit y test of an O (log M ) b it n umb er can b e done in ˜ O (log 2 M ) time using Rabin-Miller p rimalit y test [Mil76, Rab80]. Hence, with x = 4 M 2 · log 6 M a su itable prime for our algorithm can b e found in exp ected ˜ O (log 3 M ) time. 6 A Diﬀeren t P ersp ectiv e Our algorithm can b e seen as a p -adic v ersion of F¨ urer’s integ er multiplica tion algorithm, wh ere the ﬁeld C is replaced by Q p , the ﬁeld of p -adic num b ers (for a quic k in tro d uction, see Bak er’s online notes [Bak07]). Much lik e C , where representing a general elemen t (sa y in base 2) tak es inﬁn itely man y bits, represen ting an elemen t in Q p tak es in ﬁ nitely man y p -adic digits. Since we cannot w ork with inﬁ nitely many digits, all arithmetic has to b e done with ﬁnite precision. Mo d ular arithmetic in the base ring Z [ α ] / ( p c , α m + 1), can b e view ed as arithmetic in the ring Q p [ α ] / ( α m + 1) ke eping a precision of ε = p − c . Arithmetic with ﬁ nite precision naturally introduces some err ors in computation. Ho w eve r, the nature of Q p mak es the error analysis simp ler. The ﬁeld Q p comes with a norm | · | p called the p -adic norm, whic h satisﬁes the stronger triangle inequalit y | x + y | p ≤ max  | x | p , | y | p  [Bak07, Prop osi- tion 2.6]. As a r esult, unlike in C , the errors in computation do not comp ou n d. Recall that the eﬃciency of FFT crucially dep ends on a sp ecial p rincipal 2 M -th ro ot of u nit y in Q p [ α ] / ( α m + 1). Suc h a ro ot is constructed with the help of a primitiv e 2 M -th r o ot of unit y in Q p . The ﬁeld Q p has a primitiv e 2 M -th ro ot of un it y if and only if 2 M divides p − 1 [Bak07, Th eorem 5.12]. Also, if 2 M divides p − 1, a 2 M -th root can b e obtained from a ( p − 1)-th ro ot of unity b y taking a suitable p o w er. A primitiv e ( p − 1)-th ro ot of un it y in Q p can b e constru cted, to suﬃcient precision, using Hensel Lifting starting fr om a generator of F ∗ p . 11 7 Conclusion There are tw o approac hes for m ultiplying in tegers, on e using arithmetic ov er complex num b ers, and the other using mo dular arithmetic. Using complex num b ers, Sc h¨ onhage and Strassen [SS71] ga v e an O ( N · log N · log log N . . . 2 O (log ∗ N ) ) algorithm. F¨ urer [F ¨ ur07] impro ve d this complexit y to O ( N · log N · 2 O (log ∗ N ) ) using some sp ecial ro ots of u nit y . Th e other approac h, that is mo dular arithmetic, can b e seen as arithmetic in Q p with certain precision. A d irect adaptation of the S c h¨ onhag e- Strassen’s algo rithm in the mo dular setting leads to an O ( N · log N · log log N . . . 2 O (log ∗ N ) ) algo rithm. In this p ap er, we sh o we d that by choosing an appropriate prime and a sp ecial ro ot of unit y , a running time of O ( N · log N · 2 O (log ∗ N ) ) can b e ac hiev ed through m o dular arithmetic as w ell. T herefore, in a wa y , w e hav e uniﬁed the t wo paradigms. Ac kno wledgemen t W e thank V. Vina y , Srik anth Sriniv asan and the anonymo us referees for many helpful suggestions that impro ved the o v erall presen tation of this pap er. References [Bak07] Alan J. Bak er. An introdu ction to p -adic num b ers and p -adic analysis. Online Notes , 2007. h ttp://www.maths.gla.ac.uk/ ∼ a jb/dvi-ps/padicnotes.p df . [F ¨ ur 07] Martin F ¨ ur er. F aster in teger multiplic ation. Pr o c e e dings of the 39 th ACM Symp osium on The ory of Computing , pages 57–66, 2007. [HB92] D. R. Heath-Bro wn . Z ero-free regions for Diric hlet L-functions, and the least prime in an arithmetic p rogression. In Pr o c e e dings of the L ondon Mathematic al So ciety, 64(3) , pages 265–3 38, 1992. [Isa94] I. Martin Isaacs. Char acter the ory of ﬁnite gr oups . Do v er pu blications Inc., New Y ork, 1994. [K O63] A Karatsuba and Y Ofman. Multiplication of m ultidigit num b ers on automata. English T r anslatio n in Soviet Physics Doklad y , 7:595 –596, 1963. [Lin44] Y uri V. Linnik. On the least p rime in an arithmetic progression, I. The basic theorem, I I. The Deuring-Heilbronn’s phenomenon. R e c. Math. (Mat. Sb ornik) , 15:139–17 8 and 347–3 68, 1944. [Mil76] G. L. Miller. Riemann’s hyp othesis and tests for primalit y . Journal of Computer and System Scienc es , 13:300–31 7, 1976. [NZM91] Iv an Niv en, Herb ert S. Zuc kerman, and Hugh L. Montg omery . An Intr o duction to the The ory of Numb ers . John Wiley and Sons , Singap ore, 1991. [Rab80] Mic hael O. Rabin. Probabilistic algorithm for testing primalit y . Journal of N umb er The ory , 12:128–13 8, 1980. 12 [Sc h82] Arnold S ch¨ onhage. Asymptotically fast algorithms for the n umerical m ultiplication and division of p olynomials w ith complex coeﬃcien ts. In Computer Algebr a, EUROCAM , v olume 144 of L e c tur e Notes in Computer Scienc e , pages 3–15, 198 2. [Sha99] Igor R. Shafarevic h. Basic Notions of A lgebr a . Spr in ger V erlag, USA, 1999. [SS71] A Sc h¨ onhage and V Strassen. S c hnelle Multiplik atio n grosser Zahlen. Computing , 7:281– 292, 1971. [Tia90] Cai T ianxin. Primes represent able b y p olynomials and the lo w er b oun d of the least primes in arithmetic progressions. A cta Mathematic a Sinic a, New Series , 6:289–29 6, 1990. [Tit30] E. C. Titc hmarsh . A divisor p roblem. R end. Cir c. M at. Palerme , 54:414–42 9, 1930. [T o o63] A L . T o om. Th e complexit y of a scheme of f unctional elemen ts simula ting the m ultipli- cation of in tegers. English T r anslat ion in Soviet Mathematics , 3:714–7 16, 1963. 13

Fast Integer Multiplication using Modular Arithmetic

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment