Multiplication with Fourier Optics Simulating 16-bit Modular Multiplication

This paper will describe a simulator developed by the authors to explore the design of Fourier transform based multiplication using optics. Then it will demonstrate an application to the problem of constructing an all-optical modular multiplication c…

Authors: Abigail Timmel, John Daly

Multiplication with Fourier Optics Simulating 16-bit Modular   Multiplication
Multiplication with F ourier Optics Simulating 16-bit Modular Multiplication Abigail N. T immel Booz Allen Hamilton 308 Sentinel Driv e Annapolis Junction, MD 20701 Email: timmel.abigail@bah.com John T . Daly Laboratory for Physical Sciences 5520 Research Park Dri ve Catonsville, MD 21128 Email: jtdaly3@lps.umd.edu Abstract —This paper will describe a simulator de veloped by the authors to explore the design of Fourier transform based multi- plication using optics. Then it will demonstrate an application to the problem of constructing an all-optical modular multiplication circuit. That circuit implements a novel approximate version of the Montgomery multiplication algorithm that enables the calculation to be performed entirely in the analog domain. The results will be used to corroborate the feasibility of scaling the design up to 16-bits without the need for analog to digital con versions at intermediate steps. Index T erms —optical computing, con volution processor , Fourier optics, optical simulation, Montgomery multiplication I . B A C K G RO U N D Optical computing, performing calculations using photons instead of electrons, is an idea that has been around for decades [1], [2], [3], [4]. Like many competing non-CMOS technologies howe ver , it was ultimately beat out by Moore’ s Law scaling of transistors. Optics, being fundamentally limited by the wav elength of light, cannot attain the density of transistors which, in 2017, achiev ed a 10nm process tech- nology in commercial electronics [5]. Furthermore, as an analog technology, which is typically limited to 8-12 bits of resolution [6], optical computing does not compete with digital technologies in terms of computational accuracy . In spite of the hegemon y of CMOS technology for computing, there continues to be acti ve research in methods of lev eraging optics for computation. With the recent slowing of Moore’ s Law scaling accompanied by adv ances in nanophotonics [7] and some initial commercial adoption of optical computing technologies [8], the research community has demonstrated renewed interest in e xploring the role of optics in post-Moore’ s era computing [9]. A key strength in optical computing is the use of Fourier transforms to simplify conv olution problems. In particular , Fourier transforms can be used to turn multiplication from an O ( n 2 ) problem into an O ( n ) one. This is due to the distributi ve property: multiplying ( a + b )( c + d ) giv es the same result as adding ( ac + ad + bc + bd ) . If we can separate out the result of the former , we get four multiplies by only performing one. This suggests that any multiplication problem can be simplified by summing the digits of each n -digit factor and then multiplying. Separating out the 2 n multiplies in the result is still an issue, but we can resolve this by using a Fourier transform to sum the digits of our factors. If we con vert each factor into a function that takes on the value of each digit for some interval, the Fourier transform, ef fectively gi ves us the sum of the digits, each multiplied by a factor of e − ixy . F ( y ) = Z ∞ −∞ f ( x )e − ixy d x (1) W e can then do a point-wise multiply (multiply each point of one Fourier transformed function by the corresponding point on the other Fourier transformed function) to obtain the sum of products in each point of the result. T o separate out these products, we do the in verse Fourier transform: f ( x ) = Z ∞ −∞ F ( y )e ixy d x (2) This gi ves us a function that represents the product of the numbers encoded in the original two functions. The reason this technique is not widely used on digital comput- ers is that Fourier transforms are expensiv e to compute [10], [11]. There is only an advantage for suf ficiently large mul- tiplication problems. Howe ver , it can be shown that Fourier transforms can easily be achieved optically by using lenses and masks [12], [13]. Optical correlators designed to perform multiplication, particularly vector -matrix multiplication, are not ne w [14]. Neither are the optical simulations used to design such systems. Howe ver , the authors believ e this to be the first time that such correlators have been demonstrated by simulation as a viable design for modular multiplication in optics. Therefore, the contributions of this work are the end-to-end simulation of a novel application of Fourier optics to the implementation of high precision arithmetic and a new approach to computing modular multiplication based on discrete con volution. c  2018 IEEE I I . L E N S / M A S K C O N FI G U R A T I O N The equation that gov erns the electromagnetic field U 1 in a plane behind a mask is as follows [15], [16], [17]: U 1 ( x 1 , y 1 ) = Z ∞ −∞ Z ∞ −∞ U 0 ( x 0 , y 0 )e − ikr d x 0 d y 0 (3) U 0 is the field in the plane of the mask, z is the distance from the mask to the plane where U 1 is being calculated, and r is the distance from the point ( x 0 , y 0 ) to the point ( x 1 , y 1 ) . W e can take the binomial expansion of the exponent, keeping only the first two terms: r = p z 2 + ( x 0 − x 1 ) 2 + ( y 0 − y 1 ) 2 ≈ z " 1 + 1 2  x 0 − x 1 z  2 + 1 2  y 0 − y 1 z  2 # Equation 3 becomes: U 1 ( x 1 , y 1 ) = Z ∞ −∞ Z ∞ −∞ U 0 ( x 0 , y 0 ) − z iλr 2 e − ikz e − ik 2 z [ ( x 0 − x 1 ) 2 +( y 0 − y 1 ) 2 ] d x 0 d y 0 (4) This is a form that will be useful in analyzing the expo- nent. If we take the exponential term of Equation 4 and multiply out the exponent, we get: e − ik 2 z [ ( x 0 − x 1 ) 2 +( y 0 − y 1 ) 2 ] = e − ik 2 z [ x 2 0 + y 2 0 ] e − ik 2 z [ x 2 1 + y 2 1 ] e − ik 2 z [ x 0 x 1 + y 0 y 1 ] For an exact Fourier transform, we only want to have the e − ik z [ x 0 x 1 + y 0 y 1 ] term inside the integral. The e − ik 2 z [ x 2 1 + y 2 1 ] term can go outside the integral, b ut we still have to deal with e − ik 2 z [ x 2 0 + y 2 0 ] . Putting a lens behind the mask results in a phase shift giv en by: A ( x 0 , y 0 ) = e − ikn ∆ e ik 2 z [ x 2 0 + y 2 0 ] f is the focal length of the lens. This exactly cancels e − ik z [ x 2 0 + y 2 0 ] if z = f . The resulting field is gi ven by: U 1 ( x 1 , y 1 ) = e − ik 2 z [ x 2 1 + y 2 1 ] e − ikn ∆ − z iλr 2 Z ∞ −∞ Z ∞ −∞ U 0 ( x 0 , y 0 )e − ik z [ x 0 x 1 + y 0 y 1 ] d x 0 d y 0 (5) So, the focal plane of a lens behind a mask is an exact Fourier transform of the plane containing the mask. The inv erse transform can be performed in the same manner as the initial transform. This is due to the symmetry of the Fourier transform plane: U 0 ( x 0 , y 0 ) = U 0 ( − x 0 , − y 0 ) , allowing us to rev erse the sign in the exponent. Howe ver , there is still the factor of e − ik z [ x 0 x 1 + y 0 y 1 ] outside the integral that will create problems. Our next lens must cancel both e − ik z 1 [ x 0 x 1 + y 0 y 1 ] and e − ik z 2 [ x 0 x 1 + y 0 y 1 ] . T o do this, we must satisfy the equations: 1 z 1 − 1 z 2 = − 1 f 2 (6) Point-wise multiplying the field values after the first lens cause the term outside the integral to be squared: e ik z ( x 2 1 + y 2 1 ) . T o achiev e an exact inv erse Fourier transform, we must therefore satisfy the equation: 2 z 1 − 1 z 2 = − 1 f 2 (7) In general, the i th lens must be at a distance: z i = − 1 1 f i − n +1 z i − 1 (8) n is the number of multiplies done between steps i and i − 1 . Fig. 1. Lens configuration to produce an exact Fourier transform I I I . C RO P P I N G An experiment was done by simulation to explore the effect of different parameters on how much information is lost when a Fourier transform image is cropped. In this experiment, we put light in a checkerboard pattern through two lenses a distance 2 f apart and put the output screen a distance 2 f from the second lens. Though this is not an exact Fourier transform, it nev ertheless returns the original image. Fig. 2. Lens configuration to produce an inexact Fourier transform W e ran the experiment a number of times with different values of wav elength and focal length and compared the output images. The field at the second lens was cropped to different sizes. As expected, larger cropping sizes yielded a much clearer output image than the smaller ones. As wa velength decreased, the output image also became clearer . This verifies the expectation that a shorter wa velength results in a more compact Fourier transform. The output image improved for a shorter focal length, sho wing that a lar ge focal length contributes to the spreading out of a Fourier transform. These results are shown in the table below . I V . P O I N T - W I S E M U LT I P L I C A T I O N Performing a point-wise multiply optically remains a chal- lenge. W e must encode the digits of our numbers spatially in the amplitudes of the input field in order to obtain the Fourier transform. Howe ver , combining information contained x = 4 mm x = 3 mm x = 2 mm x = 1 mm Fig. 3. Ef fects of changing cropping size ( λ = 200 nm , f = 100 mm ) λ = 100 nm λ = 200 nm λ = 600 nm λ = 1200 nm Fig. 4. Ef fects of changing wav elength ( x = 2 mm , f = 100 mm ) in the amplitudes of two fields results in a sum offset by phase differences, not a product. A product could be obtained by passing light from one transformed input through an amplitude mask representing the other transformed input. For example, if the amplitude at a specific point on one of the inputs was 30% of the peak amplitude, it would correspond to a point on the mask that let 30% of the light through. If the corresponding point in the other input was 70% of the peak amplitude, after going through the mask it would be 21% of the peak, effecti vely representing 3 × 7 . The problem with this is that we would have to make a new mask for e very pair of inputs. It is feasible to use some photochromic material, exposed to the dif fraction pattern of a light source, to create such a mask during computation. Ho wever , this process is likely to be slow and would create a bottleneck in computation. If our goal is to raise a number to a power , it may be worthwhile to make a mask and then pass the input through f = 50 mm f = 100 mm f = 200 mm f = 400 mm Fig. 5. Ef fects of changing focal length ( x = 2 mm , λ = 200 nm ) many times. Howe ver , without the ability to do carries, the number of distinct integers stored in each point would quickly grow to exceed our ability to distinguish with a detector . Physically , passing the same ray of light through a mask many times will cause it to grow fainter and fainter until the signal is lost. Using phase differences to compute multiplication is another option. Howe ver , it would be a difficult task of itself to calculate how much to shift the phase in order to obtain the correct combined value. W e need to solve the equation: A e − ikt + B e − i ( kt + ϕ ) = AB e − ikt B e − i ( kt + ϕ ) = ( AB − A )e − ikt e − i ( kt + ϕ ) = A B ( B − 1)e − ikt − i ( k t + ϕ ) = ln  A B ( B − 1)e − ikt  ϕ = i ln  A B ( B − 1)  This inv olves a fair amount of computation that would need to be handled by an auxiliary computer , so it is easier to consider simply doing the point-wise multiply with a detector and electronic computer . The result of the multiply would then be encoded in light to be put through another lens to do the in verse Fourier transform. Phase information is essential for correctly performing an in verse Fourier transform, so the output beam would need to incorporate phase. V . E N C O D I N G D I G I T S There are several options for how to encode numbers into light patterns. The most obvious way is to set equally sized bands to different amplitudes. This works reasonably well, and the results of several multiplication problems are shown below . Note that the output plane is the mirror image of the answer , so digits must be read from right to left. It is possible to see 010010 011110 0011121110 010011 111010 01111321110 101011001000 101101001011 010212313313223122011000 Fig. 6. Results of multiplication using banded encoding 010010 011110 0011121110 010011 111010 01111321110 101011001000 101101001011 010212313313223122011000 Fig. 7. Results of multiplication using diagonal encoding the digits in the bands of the output, but it is rather dif ficult to tell where each digit starts and ends. The picture becomes much clearer when we utilize both dimensions of the plane and encode our digits diagonally . For the remainder of this paper , we will encode binary digits numerically . When we apply this technique to the modular multiplication problem below , we will add additional spacing between digits to prev ent adjacent digits from blurring together . V I . S I M U L A T I O N C O D E Much good work has already been done in creating simulators for Fourier optics [18], [19], [20]. Our simulator consists of approximately 1800 lines of C code compiled on a Linux workstation using gcc. It is parallelized using the OpenMP library to distribute the 2D conv olution workload over threads. Profiling with gprof demonstrates that the code spends at least 95% of its time computing the Fresnel approximation for lenses. Runtimes vary with the pix el count of the image. Running on 20 threads of a dual socket Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz node, the simulation time ranges from around 25 seconds per lens at 250 × 250 pixels resolution to 80 minutes per lens at 1000 × 1000 pixels resolution. The simulation is configured using an instruction file lists the devices to be modeled during each step of the execution. An example instruction file is shown in Figure 10 below . The code can maintain arbitrarily many optical paths that can be split and recombined over the course of the simulation. The instruction file also includes the ability to specify “tap” files to write out intermediate data values. The simulation results were validated by comparison with rectangular aperture results documented by Abedin [21]. V I I . M O D U L A R M U LT I P L IC AT I O N Modular multiplication in volves solving problems of the form c = a · b mod m for integer v alues of a , b and m . T ypically , the value of each of these will be of similar magnitude. What makes solving this problem difficult on modern computers is the fact that the modulo, or remainder, operation requires a division which is an expensi ve operation. Howe ver , for cases in which the modulus m is odd, Montgomery has proposed an alternate formulation of the problem that replaces di vision by m with division by an alternate base r that is chosen to be a power of two greater than m . For binary arithmetic, this change reduces the costly division problem to one of inexpensi ve masks and shifts [22], [23]. A. Montgomery multiplication Choose a v alue r = 2 k to be the next po wer of two greater than m and compute M = − m − 1 mo d r . Montgomery observed that when the v alues of a and b are transformed to new values ¯ a = a · r mo d m and ¯ b = b · r mo d m then we can calculate the corresponding value of ¯ c as follo ws: ¯ c = ¯ a ¯ b +  ¯ a ¯ bM mod r  m r (9) W e observe that for an arbitrary integer x , the value of x mo d r is the lower k bits of the binary representation of x while di vision by r is the upper | x | − r bits. T o recover the solution c one needs to conv ert the result back out of the Montgomery domain which is accomplished by setting c = ¯ c · R mo d m where R = r − 1 mo d m . The example belo w illustrates the Montgomery multiplication on 16-bit values of a , b and m . B. Montgomery example Compute a · b mo d m using a = 28510 , b = 38762 and m = 36057 . Seeing that a , b and m are all 16-bit values, choose r = 2 17 = 131072 and pre-compute M and R . M = − 36057 − 1 mo d 131072 = 52375 R = 131072 − 1 mo d 36057 = 14408 Next, conv ert a and b into the Montgomery basis. ¯ a = 28510 · 131072 mo d 36057 = 23411 ¯ b = 38762 · 131072 mo d 36057 = 31495 Now , the steps of the Montgomery multiplication proceed as shown below . k 1 = ¯ a ¯ b = 23411 · 31495 = 737329445 k 2 = k 1 · M = 737329445 · 52375 = 38617629681875 k 3 = k 2 mo d r = AND (38617629681875 , 131071) = 92371 k 4 = k 3 · m = 92371 · 36057 = 3330621147 k 5 = k 1 + k 4 = 737329445 + 3330621147 = 4067950592 ¯ c = k 5 /r = SHIFT (4067950592 , 17) = 31036 Con vert the result back out of the Montgomery domain. c = ¯ c · R mo d m = 31036 · 14408 mo d 36057 = 23831 we see that this is in fact the correct solution to our initial problem: 28510 · 38762 mo d 36057 = 23831 . Looking back at the calculation above, there are a few important simplifications which will make the problem easier to implement in optics. They will be discussed in the next section. C. Distinguishing high and low or der bits As described by W arren [23], a closer inspection of Equation 9 rev eals that the lower k bits of the numerator are not required, since they will be truncated in the division by r . Thus it is suf ficient to compute the high bits of each of the values in the numerator, with two additional bits to address carry . Additionally , the mo d r in the numerator will eliminate the high bits of ¯ a ¯ bM , so they do not need to be calculated either . T aking these observations into account, the example above now proceeds as follows. k 1 = ¯ a ¯ b = 23411 · 31495 = 737329445 k 1 lo = AND (737329445 , 131071) = 49455 k 1 hi = RSHIFT (737329445 , 15) = 22501 k 2 = k 1 lo · M = 49455 · 52375 = 2589681875 k 3 = k 2 mo d r = AND (2589681875 , 131071) = 92371 k 4 = k 3 · m = 92371 · 36057 = 3330621147 k 4 hi = SHIFT (3330621147 , 15) = 101642 k 5 hi = k 1 hi + k 4 hi + 1 = 22501 + 101642 + 1 = 124144 ¯ c = k 5 hi / 2 2 = SHIFT (124144 , 2) = 31036 Notice that k 1 hi and k 4 hi are being shifted by only 15 bits, which is two fewer bits than the size of r . Those two ov erlap bit pre vent carry errors from propagating from the less significant bits during addition. Howe ver , those two additional bits still need to be truncated in the final step when computing ¯ c . Also, note that when computing k 5 hi there is an extra one being added. This serves the purpose of setting the carry bit resulting from the addition of the less significant bits. Where this technique differs from the standard approach described by [23] is in the treatment of the addition to compute k 5 from the example above. W arren describes the addition as in volving both upper and lower bits of k 1 , while we hav e discarded all but a few “overlap” digits by the time we reach this step. This will be significant in optical implementation, described in the next section, because we must demonstrate the ability to perform this approximation not for a multiplication but for a con volution instead. D. Appr oximate implementation The simplifications of the previous section will form the basis of an optical approach to computing the modular multiplica- tion. Limiting the number of bits carried from one operation to the next reduce the optical resolution required to accurately compute a solution, thus enabling the computation of larger numbers of digits. The intuition behind this is that the fre- quency of the Fourier transform will scale with the number of digits in the optically encoded representation of each number , and the resolution required to accurately ev aluate the point- wise multiplication will be bounded by the Nyquist frequency and the size of the image in the transform plane. Also note that there are some important distinctions between Montgomery in its digital implementation and the new hybrid optical approach. The most obvious difference is the fact that “multiplications” in optics are not actually multiplications but con volutions, as illustrated in Figure 8. Con volution requires one fewer digit in its representation than multiplication. How- ev er , the value of the digits gro w beyond the magnitude of the base because the conv olution performs no carries from one digit to the next. The consequence is that the two “bits” of o verlap required previously to pre vent carry errors from propagating will now translate to a number of “digits” of ov erlap. Since those digits contain unresolved carries, more digits will be needed, b ut this is a complicated calculation and no formal upper bound has yet been established. For the purposes of the example in this paper we performed experiments to determine by trial-and-error that six digts would be a sufficient to prev ent carry errors. Fig. 8. Comparing multiplication on the left and con volution on the right in terms of the high and low k -digits of the result. When masking high order bits of the con volution operation, additional bits are necessary to capture the unpropagated carries from the low bits. Futher, the middle bit is counted as both a high and low bit, which will result in the shift of one additional digits when computing our approximate ¯ c belo w . In terms of the optical computation, this approach trades off resolution for increased dynamic range. It must also contend with the fact that shifting and masking does not translate directly to the results of conv olution arithmetic. For clarity , values are written as a vector of their binary digits. ¯ a = (1 , 0 , 1 , 1 , 0 , 1 , 1 , 0 , 1 , 1 , 1 , 0 , 0 , 1 , 1) ¯ b = (1 , 1 , 1 , 1 , 0 , 1 , 1 , 0 , 0 , 0 , 0 , 0 , 1 , 1 , 1) m = (1 , 0 , 0 , 0 , 1 , 1 , 0 , 0 , 1 , 1 , 0 , 1 , 1 , 0 , 0 , 1) M = (1 , 1 , 0 , 0 , 1 , 1 , 0 , 0 , 1 , 0 , 0 , 1 , 0 , 1 , 1 , 1) Now reproduce the steps from the preceding section as fol- lows, using six ov erlap digits instead of two. The notation x ⊗ y will denote the digit con volution of x and y , and x ⊕ y will denote the digit sum. k 1 = ¯ a ⊗ ¯ b = (1 , 1 , 2 , 3 , 2 , 4 , 4 , 3 , 5 , 4 , 4 , 5 , 4 , 4 , 6 , 6 , 5 , 3 , 3 , 4 , 3 , 2 , 3 , 2 , 1 , 1 , 2 , 2 , 1) = 737329445 k 1 hi = (1 , 1 , 2 , 3 , 2 , 4 , 4 , 3 , 5 , 4 , 4 , 5 , 4 , 4 , 6 , 6 , 5 , 3 , 3) = 720045 k 1 lo = (4 , 4 , 6 , 6 , 5 , 3 , 3 , 4 , 3 , 2 , 3 , 2 , 1 , 1 , 2 , 2 , 1) = 573733 k 2 = k 1 lo ⊗ M = (4 , 8 , 10 , 12 , 15 , 16 , 16 , 19 , 22 , 17 , 17 , 22 , 19 , 20 , 25 , 32 , 28 , 25 , 24 , 20 , 16 , 15 , 13 , 11 , 9 , 8 , 6 , 5 , 5 , 5 , 3 , 1) = 30049265875 k 3 = k 2 mo d r = (32 , 28 , 25 , 24 , 20 , 16 , 15 , 13 , 11 , 9 , 8 , 6 , 5 , 5 , 5 , 3 , 1) = 3762387 k 4 = k 3 ⊗ m = (32 , 28 , 25 , 24 , 52 , 76 , 68 , 62 , 87 , 105 , 92 , 115 , 133 , 114 , 102 , 121 , 100 , 86 , 79 , 66 , 51 , 43 , 37 , 30 , 23 , 19 , 14 , 9 , 6 , 5 , 3 , 1) = 135660388059 k 4 hi = (32 , 28 , 25 , 24 , 52 , 76 , 68 , 62 , 87 , 105 , 92 , 115 , 133 , 114 , 102 , 121 , 100 , 86 , 79 , 66 , 51 , 43) = 132480817 k 5 hi = k 1 hi ⊕ k 4 hi ⊕ 1 = (32 , 28 , 25 , 25 , 53 , 78 , 71 , 64 , 91 , 109 , 95 , 120 , 137 , 118 , 107 , 126 , 104 , 92 , 85 , 71 , 54 , 46) = 133200926 ¯ c =  k 5 hi / 2 7  =  1 4 , 7 32 , 25 128 , 25 128 , 53 128 , 39 64 , 71 128 , 1 2 , 91 128 , 109 128 , 95 128 , 15 16 , 137 128 , 59 64 , 107 128 , 63 64 , 13 16 , 23 32 , 85 128 , 71 128 , 27 64 , 23 64  = 1040632 This is a different v alue of ¯ c than obtained in the previous example. Howe ver , conv erting the result back out of the Montgomery domain with c = ¯ c · R mo d m = 1040632 · 14408 mo d 36057 = 23831 demonstrates that it has recov ered the correct value of c . The con volutional approach is an approximate modular multiplica- tion in the sense that it produces a result that will typically be an arbitrary multiple of the modulus. Notice that in the last step of computing ¯ c the last seven digits have been removed, despite the fact that the ov erlap was only six digits. That is because of the one fe wer digits resulting from con volution compared to multiplication. The most significant k digits of the result include one digits from the k least significant digits which must also be remov ed. E. An optical implementation The computation of 28510 · 38762 mo d 36057 = 23831 was simulated in optics using the simulator and con volutional algorithmic approach described abov e. Shifts and masks are accomplished through optical masking and alignment. Addi- tion is performed using a beam-splitter in rev erse to combine values. Multiplications by one are performed to make certain that each value being added has passed through the same number of multiplier steps. This is necessary to keep the phase information consistent across the summands. In general, the most significant bit (MSB) of a number is in the upper right hand corner of the image, and the least significant bit (LSB) is in the lower left hand corner . Howe ver , after each multiplication the resulting image is re versed. The reversal is denoted by an asterisk next to a value in the image. The Fig. 9. The device configuration of a modular multiplication that receiv es inputs ¯ a and ¯ b in the Montgomery domain and produces an output ¯ c , also in the Montgomery domain. Circled numbers represent locations where the image data is depicted in Figure 11 that follo ws. optical configuration to implement the modular multiplication is illustrated in Figure 9 abov e, and the instruction file to run the simulation is shown in Figure 10 below . In general, the point-wise multiplies are assumed to be performed using a spatial light modulator (SLM) which creates an image mask of the Fourier transform of one of the v alues to be multiplied. As discussed above, this step is generally considered to be slow , b ut it is important to point out that only one of the SLMs, corresponding to the Fourier transform of ¯ b actually needs to be updated in real time. The remaining transforms can be precomputed, and thus the setting of the SLM once initialized will remain fixed. The results of the computation are illustrated in Figure 11 that follows. Digit values printed below each image are the result /home/jtdaly3/Software/opmul/mod_mul_16bit/output /home/jtdaly3/Software/opmul/mod_mul_16bit/tap 0.000200 1005 #### Simulation: 28510 * 38672 mod 36057 = 23831 generate 15 67 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 generate 15 67 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 generate 15 67 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 generate 15 67 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 generate 15 67 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 generate 15 67 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 generate 15 67 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 generate 15 67 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 tap lens 1 15.000000 -15.000000 1.500000 2.000000 15.000000 1.000000 1.000000 lens 2 15.000000 -15.000000 1.500000 2.000000 15.000000 1.000000 1.000000 tap pointwise_mul 1 2 lens 1 5.000000 -5.000000 1.500000 2.000000 15.000000 1.000000 1.000000 tap beam_splitter 1 mask 1 -0.500000 0.164179 0.164179 -0.500000 mask 2 -0.044776 0.500000 0.500000 -0.044776 tap lens 2 7.500000 -7.500000 1.500000 2.000000 15.000000 1.000000 1.000000 lens 3 15.000000 -15.000000 1.500000 2.000000 15.000000 1.000000 1.000000 tap pointwise_mul 2 3 lens 2 5.000000 -5.000000 1.500000 2.000000 15.000000 1.000000 1.000000 tap mask 2 -0.500000 0.029851 0.029851 -0.500000 tap lens 2 7.500000 -7.500000 1.500000 2.000000 15.000000 1.000000 1.000000 lens 3 15.000000 -15.000000 1.500000 2.000000 15.000000 1.000000 1.000000 tap pointwise_mul 2 3 lens 2 5.000000 -5.000000 1.500000 2.000000 15.000000 1.000000 1.000000 tap mask 2 -0.500000 0.179104 0.179104 -0.500000 tap lens 3 15.000000 -15.000000 1.500000 2.000000 15.000000 1.000000 1.000000 lens 4 15.000000 -15.000000 1.500000 2.000000 15.000000 1.000000 1.000000 tap pointwise_mul 3 4 lens 3 5.000000 -5.000000 1.500000 2.000000 15.000000 1.000000 1.000000 tap filter 3 0.5 tap pointwise_add 1 3 tap lens 1 7.500000 -7.500000 1.500000 2.000000 15.000000 1.000000 1.000000 lens 3 15.000000 -15.000000 1.500000 2.000000 15.000000 1.000000 1.000000 tap pointwise_mul 1 3 lens 1 5.000000 -5.000000 1.500000 2.000000 15.000000 1.000000 1.000000 tap lens 1 7.500000 -7.500000 1.500000 2.000000 15.000000 1.000000 1.000000 lens 3 15.000000 -15.000000 1.500000 2.000000 15.000000 1.000000 1.000000 tap pointwise_mul 1 3 lens 1 5.000000 -5.000000 1.500000 2.000000 15.000000 1.000000 1.000000 tap pointwise_add 1 2 tap lens 1 7.500000 -7.500000 1.500000 2.000000 15.000000 1.000000 1.000000 lens 1 7.500000 -7.500000 1.500000 2.000000 15.000000 1.000000 1.000000 tap read_out 1 detector 1 12.000000 67 Fig. 10. Instruction file to simulate 16-bit modular multiplication illustrated in Figure 9 at 1 µ m pixel pitch and 15 mm separation between lenses. of integrating the light intensity over the area of each digit and normalizing. The results are then rounded to the nearest integer . The maximum absolute error is 0 . 3419 and the RMS error is 0 . 1442 , meaning that the analog result matches the correct integer solution. The main sources of error were found to be resolution and dynamic range. Specifically , the resolution in the Fourier transform plane needs to be sufficient for the pixel density to be at least twice the Nyquist frequenc y of the transform. This simulation used a 1 mm imaging area at each point-wise multiply and assumed a minimum pixel pitch of 1 µ m × 1 µ m. The dynamic range was assumed to be at least 12 stops (i.e., bits) or approximately 36 dB. The values F ( M ∗ ) , F ( m ) , F (1 ∗ ) and F (1) are precomputed inputs to the point-wise multiply unit. The performance bottleneck for this de vice will be the point-wise multiply by F ( ¯ b ) which must be performed in real time with each modular multiplica- tion. V I I I . C O N C L U S I O N S Optical techniques seem to be very promising in the future of high-performance computing. The most significant area of future work will be to explore ways of performing a point- wise multiply optically . If that can be done, this method may provide huge speed-ups ov er current technology . Even if the point-wise multiply must be done with an auxiliary computer , the computation will be O ( n ) instead of O ( n 2 ) thanks to the nearly instantaneous Fourier transform achiev ed optically . If this path proves to be optimal, more work would need to be done to decrease the load for the electronic computer . A lookup table approach is possible, as are methods of keeping field va lues at whole numbers. Other future work might include finding the limits of ho w much information can be packed into a given area, how small the Fourier transform plane can be cropped, and how many operations can be done in series before the signal becomes unreadable. Based on our simulation results it also appears feasible for an architecture similar to the one analyzed to accurately perform all-optical modular multiplication with greater precision than has been traditionally associated with analog computation [2]. Giv en our estimates of parameters such as size, resolution and dynamic range, such a device could be manufacturable from commercially available components. If these simulation results bear out in the lab then the ability to accurately perform 16-bit arithmetic could be a significant step forward for optical computation. A further contrib ution of this work is the demonstration of an approximate algorithm, implementable in optics, that is capable of correctly computing discrete modular multiplication. Future work should include further refinement to the simulation, optimization of the de vice configurations, particularly with re gards to the lenses, and then fabrication and characterization of an actual device. R E F E R E N C E S [1] Alexander A. Sawchuk and Timothy C. Strand, “Digital Optical Com- puting, ” Proceedings of the IEEE, 72 , 7 (1984). [2] Demetri Psaltis and Ravindra A. Athale, “High accuracy computation with linear analog optical systems: a critical study , ” Applied Optics, 25 , 18 (1986). [3] Dror G. Feitelson, Optical Computing: a Survey for Computer Scientists, MIT Press, Cambridge, MA, USA. (1988). [4] Joe T ouch and Abdel-Hameed Badawy and V olk er J. Sorger , “Optical computing, ” Nanophotonics, 6 , 3 (2017), Retrieved 2 Nov . 2017, from doi:10.1515/nanoph-2016-0185. [5] “Samsung Starts Industry’s First Mass Production of System-on-Chip with 10-Nanometer FinFET T echnology , ” N.p, n.d. W eb, (2017), https://news.samsung.com/global/samsung-starts-industrys-first-mass- production-of-system-on-chip-with-10-nanometer-finfet-technology . [6] J. Hasler and S. Shah, “Reconfigurable Analog PDE computation for Baseband and RF Computation, ” GOMAC, March, 2017. [7] A. Femius K oenderink and Andrea Al ` u and Albert Polman, “Nanopho- tonics: Shrinking light-based technology , ” Science, 348 , 6234 (2015). [8] “Optalysys White Paper , ” N.p, n.d. W eb, 23 February 2015, https://www . scribd.com/document/308816109/Optalysys-White-Paper . [9] IEEE Rebooting Computing, http://rebootingcomputing.ieee.org/. [10] A. Sch ¨ onhage and V . Strassen, “Schnelle Multiplikation großer Zahlen, ” Computing, 7 , 3 (1971). [11] M. F ¨ urer , “Faster Integer Multiplication, ” Proceedings of the 39 th Annual ACM Symposium on Theory of Computing, June, 2007. [12] V inod Chandran and Thomas F . Krile and John F . W alkup, “Optical techniques for real-time binary multiplication, ” Applied Optics, 25 , 24 (1986). [13] Peter Bryanston-Cross, “The lens as a Fourier Transform System. ” N.p, n.d. W eb . Optical Engineering Laboratory , Univ ersity of W arwick, 26 November 2002, http://www .eng.warwick.ac.uk/ ˜ espbc/courses/undergrad/ lec8/fourier lens.htm. [14] Nathan T . Shaked, T al T abib, Gil Simon and Joseph Rosen, “Optical binary-matrix synthesis for solving bounded NP-complete combinatorial problems, ” Optical Engineering 46 , 10 (2007). [15] Roel Baets and Hugo Theinpont, “Fourier Optics, ”’ N.p, n.d. W eb, 3 July 2017, http://www .photonics.intec.ugent.be/download/ocs130.pdf. [16] Joseph W . Goodman, Intr oduction to F ourier Optics, The McGraw-Hill Companies, Inc, Ne w Y ork, NY , USA, (1996). [17] M.K. Kim, Digital Holographic Micr oscopy: Principles, T echniques, and Applications, http://www .springer .com/978-1-4419-7792-2, XVI (2011). [18] Ahmed Louri and Jongwhoa Na, “Modeling and simulation methodol- ogy for digital optical computing systems, ” Applied Optics, 33 , 8 (1994). [19] Seymour Trester , “Computer-Simulated Fresnel Diffraction Using the Fourier T ransform, ” IEEE Computing in Science and Engineering, 1 , 5 (1999). [20] D. Rafferty and U.H. W agner and C.Rau and P .Chang and S.Alcock and R. Dockree and I.R. Robinson, N.p, n.d. W eb, “Development of a computer model to simulate wa vefront propagation, ” (2010), http://www .ucl.ac.uk/ ˜ ucapikr/project.htm. [21] K.M. Abedin, M.R. Islam and A.F .M.Y Haider. “Computer simulation of Fresnel dif fraction from rectangular apertures and obstacles using the Fresnel integrals approach, ” Optics and Laser T echnolo gy, 39 (2007). [22] Peter L. Montgomery , “Modular multiplication without trial division, ” Mathematics of Computation, 44 (1985). [23] Henry W arren, “Montgomery Multiplication, ” N.p, n.d. W eb, http://www . hackersdelight.or g/MontgomeryMultiplication.pdf. #1 (1,0,1,1,0,1,1,0,1,1,1,0,0,1,1) #2 (1,1,1,1,0,1,1,0,0,0,0,0,1,1,1) #3 (1,1,2,3,2,4,4,3,5,4,4,5,4,4,6, 6,5,3,3,4,3,2,3,2,1,1,2,2,1)* #4 (1,1,2,3,2,4,4,3,5,4,4,5,4,4,6,6,5,3,3)* #5 (1)* #6 (1,1,2,3,2,4,4,3,5,4,4,5,5,4,6,6,5,3,3)* #7 (1,1,2,3,2,4,4,3,5,4,4,5,5,4,6,6,5,3,3)* #8 (4,4,6,6,5,3,3,4,3,2,3,2,1,1,2,2,1)* #9 (4,8,10,12,15,16,16,19,22,17,17, 22,19,20,25,32,28,25,24,20,16, 15,13,11,9,8,6,5,5,5,3,1) #10 (32,28,25,24,20,16,15,13,11,9,8,6,5,5,5,3,1) #11 (32,28,25,24,52,76,68,62,87,105,92, 115,133,114,102,121,100,86,79,66, 51,43,37,30,23,19,14,9,6,5,3,1)* #12 (32,28,25,24,52,76,68,62,87,105,92,115, 133,114,102,121,100,86,79,66,51,43)* #13 (32,28,25,25,53,78,71,64,91,109,95,120,137,118,107,126,104,92,85,71,54,46) Fig. 11. Images depict the digit encoding pattern after enumerated steps of the optical modular multiplication illustrated in Figure 9. V ector v alues are the integrated optical intensity for each digit. Those values are normalized and rounded to the nearest integer . An asterisk after a v ector denotes values that will appear in re verse order in its corresponding image.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment