Digital Electronics and Analog Photonics for Convolutional Neural Networks (DEAP-CNNs)
Convolutional Neural Networks (CNNs) are powerful and highly ubiquitous tools for extracting features from large datasets for applications such as computer vision and natural language processing. However, a convolution is a computationally expensive …
Authors: Viraj Bangari, Bicky A. Marquez, Heidi B. Miller
Digital Electronics and Analog Photonics for Con v olutional Neural Net w orks (DEAP-CNNs) Vira j Bangari, ∗ Bic ky A. Marquez, Heidi B. Miller, and Bhavin J. Shastri † Dep artment of Physics, Engine ering Physics & Astr onomy, Que en ’s University, Kingston, ON KL7 3N6, Canada Alexander N. T ait National Institute of Standar ds and T e chnolo gy (NIST), Boulder, Color ado 80305, USA Mitc hell A. Nahmias, Thomas F erreira de Lima, Hsuan-T ung Peng, and Paul R. Prucnal Dep artment of Ele ctric al Engine ering, Princ eton University, Princ eton, NJ 08544, USA (Dated: July 3, 2019) Con volutional Neural Netw orks (CNNs) are p o w erful and highly ubiquitous tools for extracting features from large datasets for applications such as computer vision and natural language pro- cessing. How ev er, a conv olution is a computationally exp ensiv e op eration in digital electronics. In con trast, neuromorphic photonic systems, which hav e exp erienced a recent surge of interest ov er the last few years, prop ose higher bandwidth and energy efficiencies for neural netw ork training and inference. Neuromorphic photonics exploits the adv antages of optical electronics, including the ease of analog pro cessing, and busing multiple signals on a single wa veguide at the sp eed of light. Here, we prop ose a Digital Electronic and Analog Photonic (DEAP) CNN hardware architecture that has p otential to b e 2.8 to 14 times faster while maintaining the same p ow er usage of current state-of-the-art GPUs. I. INTR ODUCTION The success of CNNs for large-scale image recognition has stimulated research in developing faster and more accurate algorithms for their use. Ho wev er, CNNs are computationally in tensive and therefore results in long pro cessing latency . One of the primary b ottlenec ks is computing the matrix m ultiplication required for forw ard propagation. In fact, ov er 80% of the total pro cessing time is sp ent on the conv olution [1]. Therefore, tech- niques that impro ve the efficiency of even forward-only propagation are in high demand and researched exten- siv ely [2, 3]. In this work, we present a complete digital electronic and analog photonic (DEAP) arc hitecture capable of per- forming highly efficien t CNNs for image recognition. The comp etitiv e MNIST handwriting dataset[4] is used as a b enc hmark test for our DEAP CNN. At first, we train a standard tw o-la yer CNN offline, after which net work pa- rameters are uploaded to the DEAP CNN. Our scop e is limited to the forward propagation, but includes p o wer and sp eed analyses of our prop osed architecture. Due to their sp eed and energy efficiency , photonic neu- ral net works hav e b een widely in vestigated from different approac hes that can b e group ed in to three categories: (1) reserv oir computing [5 – 8]; reconfigurable arc hitec- tures based on (2) ring-resonators [9–12], and (3) Mach- Zehnder interferometers [13, 14]. Reserv oir computing in the discrete photonic domain successfully implement neu- ral netw orks for fast information pro cessing, ho wev er the ∗ vira j.bangari@queensu.ca † shastri@ieee.org predefined random weigh ts of their hidden lay ers cannot b e mo dified [8]. An alternative approach uses silicon photonics to de- sign fully programmable neural net works [15], using a so- called broadcast-and-weigh t proto col [10 – 12]. This pro- to col is capable of implemen ting reconfigurable, recurren t and feedforw ard neural netw ork mo dels, using a bank of tunable silicon microring resonators (MRRs) that recre- ate on-chip synaptic weigh ts. Therefore, such a proto col allo ws it to emulate physical neurons. Mach-Zehnder in- terferometers hav e b een also used to mo del synaptic-like connections of physical neurons [14]. The adv an tage of the former approach o ver the latter is that it has already demonstrated fan-in, inhibition, time-resolv ed pro cess- ing, and autaptic cascadabilit y [12]. The DEAP CNN de- sign is therefore compatible with mainstream silicon pho- tonic device platforms. This approach leverages the ad- v ances in silicon photonics that hav e recently progressed to the level of sophistication required for large-scale inte- gration. F urthermore, this prop osed arc hitecture allo ws the implemen tation of multi-la yer net works to implemen t the deep learning framew ork. Inspired by the work of Mehrabian et al. [16], which la ys out a p oten tial arc hitecture for photonic CNNs with DRAM, buffers, and microring resonators, our design go es a step further by considering specific input repre- sen tation, as well as an example of how an algorithm for tasks such as MNIST handwritten digit recognition can b e mapped to photonics. Moreov er, we consider summa- tion of multi-c hannel inputs, multi-dimensional k ernels, the limitations on weigh ts being b et ween 0 and 1, and the arc hitecture for the depth of kernel or inputs. This work is divided in five sections: F ollo wing this in tro duction, in section (I I), we describ e conv olutions as used in the field of signal processing. Then, w e intro- 2 duce silicon photonic devices to p erform conv olutions in photonics. Section (I I I) introduces a hardware inspired algorithm to p erform such full photonic conv olutions. In Section (IV), w e utilize our previously describ ed architec- ture to build a tw o-la yers DEAP CNN for MNIST hand- written digit recognition. Finally , in section (V), we show an energy-sp eed b enchmark test, where we compare the p erformance of DEAP with the empirical dataset Deep- Benc h [17]. Note, we ha ve made the high level sim ulator and mapping to ol for the DEAP architecture publicly a v ailable [18]. I I. CONV OLUTIONS AND PHOTONICS I I.1. Con volutions Bac kground A conv olution of tw o discrete domain functions f and g is defined by: ( f ∗ g )[ t ] = ∞ X t = −∞ f [ τ ] g [ t − τ ] , (1) where ( f ∗ g ) represents a w eighted a verage of the function f [ τ ] when it is weigh ting by g [ − τ ] shifted by t . The w eighting function g [ − τ ] emphasizes different parts of the input function f [ τ ] as t changes. In digital image pro cessing, a similar pro cess is fol- lo wed. The conv olution of an image A with a kernel F pro duces a conv olved image O . An image is represented as a matrix of n umbers with dimensionality H × W , where H and W are the height and width of the image, resp ec- tiv ely . Each element of a matrix represents the in tensity of a pixel at that particular spatial location. A kernel is a matrix of real num b ers with dimensionality R × R . The v alue of a particular conv olv ed pixel is defined by: O i,j = R X k =1 R X l =1 F k,l A i + k,j + l . (2) Using matrix slicing notation, Eq. (2) can b e represented as a dot pro duct of tw o vectorized matrices: O i,j = v ec( F ) T · vec(( A m,n ) m ∈ [ i,i + R ] n ∈ [ j,j + R ] ) T . (3) A con volution reduces the dimensionality of the input image to ( H − R + 1) × ( W − R + 1), so a padding of zero v alues is normally applied around the edges of the input image to counteract this. A schematic illustration of a con volution in digital image pro cessing is shown at the top of Fig. 1. When con v olutions are used to p erform parallel ma- trix multiplications in neural netw orks suc h as CNNs, a con volution op eration is defined as: O i,j = v ec( F ) T · vec(( A m,n,k ) m ∈ [ iS,iS + R ] n ∈ [ j S,j S + R ] k ∈ [1 ,D ] ) T , (4) where the input A has dimensionalit y H × W × D , k ernel F has dimensionality R × R × D and D refers to the n um- b er of channels within the input image. The additional 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9 { 1,1 1,2 1,5 1,4 1,5 1,6 1,7 1,8 1,9 1,2 1,3 1,4 1,5 1,5 1,6 1,8 1,1 1,2 1,3 1,4 1,1 1,2 1,3 1,4 { 2,1 2,2 2,3 2,4 3,1 3,2 3,3 3,4 image k er nel DR 2 R R H W D D { { A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A F F F F F F F F F F F F F F F F 1 2 3 4 o o o o 1,1 1,2 1,3 1,4 F F F F 2,1 2,2 2,3 2,4 F F F F 3,1 3,2 3,3 3,4 F F F F DR 2 output 1 2 3 4 o o o o F o A = * 2,1 2,2 2,5 2,4 2,5 2,6 2,7 2,8 2,9 2,2 2,3 2,4 2,5 2,5 2,6 2,8 A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A 3,1 3,2 3,5 3,4 3,5 3,6 3,7 3,8 3,9 3,2 3,3 3,4 3,5 3,5 3,6 3,8 A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A 2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2,9 A A A A A A A A A A A A A A A A A A 3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 3,9 A A A A A A A A A A A A A A A A A A 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9 A A A A A A A A A A A A A A A A A A Figure 1. Schematic illustration of a conv olution. At the top of the figure, an input image is represented as a matrix of n umbers with dimensionality H × W × D where H , W and D are the height, width and depth of the image, resp ectively . Eac h element A i,j of A represents the intensit y of a pixel at that particular spatial lo cation. The k ernel F is a matrix with dimensionalit y R × R × D , where each element F i,j is defined as a real num b er. The kernel is slid ov er the image by using a stride S equal to one. As the image has multiple channels (or depth) D , the same kernel is applied to each c hannel. Assum- ing H = W , the o verall output dimensionalit y is ( H − R + 1) 2 . The b ottom of the figure shows how a conv olution op eration generalized into a single matrix-matrix multiplication. where the k ernel F is transformed into a vector F with D R 2 ele- men ts, and the image A is transformed into a matrix A of dimensionalit y D R 2 × ( H − R + 1) 2 . Therefore, the output is represen ted by a vector with ( H − R + 1) elements. T able I. Summary of Conv olutional Parameters P arameter Meaning N Num b er of input images H Heigh t of input image including padding W Width of input image including padding D Number of input channels R Edge length of kernel K Num b er of kernels S Stride parameter S is referred to as the “stride” of the conv olu- tion. This conv olution is similar to Eq. (3), except that the outputs from each channel are summed together in the end, and that the stride parameter is alwa ys equal to 1 in image pro cessing. The dimensionality of the output feature is: H − R S + 1 × W − R S + 1 × K, (5) where K is the n umber of different k ernels applied to an image, and d·e is the ceiling function. T able (I) contai ns 3 a summary of all the conv olutional parameters describ ed so far. One of the c hallenges with conv olutions is that they are computationally intensiv e op erations, taking up 86% to 94% of execution time for CNNs [1]. F or heavy work- loads, conv olutions are typically run on graphical pro- cessing units (GPUs), as they are able to p erform many mathematical operations in parallel. A GPU is a sp e- cialized hardware unit that is capable of p erforming a single mathematical op eration on large amounts of data at once. This parallelization allow GPUs to compute matrix-matrix m ultiplication at sp eeds m uch higher than a CPU [19]. The con volution op eration can be gen- eralized into a single matrix-matrix multiplication [20]. This is shown at the b ottom of Fig. 1, where the k er- nel F is transformed into a vector F with dimensionalit y K D R 2 × 1, and the image is transformed into a matrix A of dimensionalit y K D R 2 × H − R S + 1 W − R S + 1 K . Therefore, the output is represen ted by a vector with H − R S + 1 W − R S + 1 K elemen ts; where in this partic- ular case K = 1, S = 1 and H = W . I I.2. Silicon Photonics Background An emerging alternative to GPU computing is optical computing using silicon photonics for ultrafast informa- tion processing. Silicon photonics is a tec hnology that al- lo ws for the implementation of photonic circuits b y using the existing complemen tary-metal-oxide-semiconductor (CMOS) platform for electronics [21]. In recen t y ears, the silicon photonic based broadcast-and-weigh t arc hi- tecture has b een shown to p erform multiply-accum ulate op erations at frequencies up to fiv e times faster than con- v entional electronics [22]. Therefore, there is motiv ation to explore how photonics can b e used to p erform conv o- lutions, and ho w it compares to GPU-based implemen- tations. MRRs are the essential devices of our approach. A MRR is a circular wa v eguide that is coupled with either one or tw o wa veguides. Suc h silicon wa v eguides can b e man ufactured to hav e a width of 500 nm while ha ving a thickness of 220 nm. These wa v eguides hav e a b end radius of 5 µ m and can supp ort TE and TM p olarized w av elengths b etw een 1.5 µ m and 1.6 µ m [21]. The single w av eguide configuration is called an all-pass MRR, see Fig. 2(a). The ligh t from the wa veguide is transferred in to the ring via a directional coupler and then recombined. The effectiv e index of refraction b et ween the wa veguide and the MRR and the circumference of the MRR cause the re- com bined wa ve to ha ve a phase shift, thereby interfering with the in tensit y of original ligh t. The transfer func- tion of the intensit y of the light coming out through p ort with the ligh t going into the input p ort of the all-pass resonator is describ ed b y: T n ( φ ) = a 2 − 2 r a cos( φ ) + r 2 1 − 2 r a cos( φ ) + ( ar ) 2 . (6) A A 2 1,R 1,2 A i F 1,1 F 1,2 modulation weight b A A 2 2,R 2,2 A 2,1 input thr ough Ø T ran smission (a) (b) Figure 2. (a) All-pass MRR and (b) transfer function: the orange curve represents the Lorentzian line shap e describ ed b y Eq. (6), cen tered in the initial phase where MRR is in resonance with the incoming ligh t. The blue triangle curve sho ws how such phase can b e mo dified by heating the MRR via the application of a current through A i . The parameter r is the self-coupling co efficient, and a defines the propagation loss from the ring and the direc- tional coupler. The phase φ dep ends on the wa velength λ of the light and radius d of the MRR [23]: φ = 4 π 2 dn ef f λ , (7) where n ef f is the effectiv e index of refraction b et ween the ring and wa veguide. The v alue of n ef f can b e mo di- fied to indirectly change the resonance p eak. Such tuning is usually made by applying current to the ring prop or- tional to the v ariable A i . This pro cess heats the ring, yielding a shift of the resonance p eak. Figure 2(b) sho ws an example of such tuning: the orange curve represents the Lorentzian line shap e describ ed by Eq. (6), cen tered in the initial phase of the ring resonator, indicating that the MRR is in resonance with the incoming light. The blue triangle curv e shows how such phase can b e mo dified b y heating the MRR. The phase for an all-pass resonator corresp onding to a particular intensit y mo dulation v alue can b e computed b y using Eq. (6): φ i = arccos A i (1 + ( ar ) 2 ) − a 2 − r 2 2 r a (1 − A i,j ) , (8) resulting in a mo dulated in tensity equal to A i : I mod = T n ( φ i ) | E 0 | 2 = A i , (9) where E 0 is amplitude of the electric field. An alternative double wa veguide configuration is called the add-drop MRR. The transfer function of the through p ort ligh t intensit y with resp ect to the input ligh t is: T p ( φ ) = ( ar ) 2 − 2 r 2 cos( φ ) + r 2 1 − 2 r 2 cos( φ ) + ( r 2 a ) 2 ; (10) and the transfer function of the drop p ort light intensit y with resp ect to the input light is: T d ( φ ) = (1 − r ) 2 a 1 − 2 r 2 cos( φ ) + ( r 2 a ) 2 . (11) 4 A A 2 1,R 1,2 F D,1 F D,2 F 2 D,R F 2,1 F 2,2 F 2 2,R A i TIA TIA TIA A 2 2,R 2,2 A A 2 D,R D,2 A A 2 1,R 1,2 modulation weig input thr ough Ø (a) dr op PD input thr ough Ø T ran smission T ran smission Ø (b) (c) (d) Figure 3. (a) Add-drop configuration and O/E con version and amplification. (b) Output of the balanced photo dio de, the transfer function of T p − T d . Orange circle and green curves are drop and through p orts, describ ed by Eqs. (11) and (10), resp ectiv ely . In panels (c) and (d), the phase shifted ( φ + 0 . 2) blue curv es sho w ho w such p ositive and negativ e k ernel v alues from the drop and the through outputs, respectively . The or- ange triangle curves show how those v alues can b e amplified b y a factor of tw o using a TIA at the output of the balance photo diode. Those phase shifts are achiev ed by the applica- tion of a current through A i . In the case where the coupling losses are negligible, a ≈ 1, the relationship b etw een the add-drop through and drop transfer functions is T p = T d − 1. In addition, if we connect the through and drop ports into a balanced photo dio de and TIA as in Fig. 3(a), we get an effectiv e transfer function of g ( T p − T d ) where g is the gain of the TIA. Therefore, w e get a mo dulation of: I mod = g ( T p ( φ i ) − T n ( φ i )) | E 0 | 2 = A i . (12) A t the output of the balanced photo dio de, the transfer function of T p − T d is shown b y the blue triangle curv e in Fig. 3(b). Orange circle and green curves are Lorentzian line shapes, cen tered in the initial phase where MRR is in resonance with the incoming light, described b y Eqs. (11) and (10), resp ectively . Differently , Fig. 3(c) and (d), are cen tered in a modified phase ( φ + 0 . 2), according to a spe- cific v alue of the curren t A i . Here we aim to demonstrate ho w to represent p ositive and negative kernel v alues in analog photonics. This can b e ac hiev ed by incorp orat- ing a balanced-PD at the output of the add-drop MRR. In panels (c) and (d), the blue curves show suc h p ositive and negative kernel v alues from the drop and the through outputs, respectively . The orange triangle curv es show the TIA transfer function g ( T p − T d ), where g amplifies T p − T d b y a factor of tw o. I I.3. Dot Pro ducts with Photonics The fundamen tal op eration of a conv olution is the dot pro duct of t wo v ectorized matrices. Therefore, one needs to understand how to compute a vector dot pro duct us- ing photonics b efore prop osing an architecture capable of p erforming con volutions. A wa v elength multiplexed signal consists of k electro- magnetic w av es, eac h with angular frequency ω i , i = 1 , . . . , k . If it is assumed that each wa v e has an ampli- tude of E 0 , a p ow er env eloping function µ i whose mo du- lation frequency is significan tly smaller than ω i , then the slo wly v arying en velope approximation and a short-time F ourier transform can b e used to derive an expression for the m ultiplexed signal in the frequency domain: E mux ( ω ) = k X i =1 E 0 √ µ i δ ( ω − ω i ) , (13) where δ ( ω − ω i ) is the Dirac delta function and µ i ≥ 0, since p ow er env elop es are not negative. If the env eloping function is preven ted from amplifying the electric field, µ i can further b e restricted to the domain 0 ≤ µ i ≤ 1. Next, we introduce tunable linear filters H + ( ω ) and H − ( ω ) such that when they in teract with multiple fields, the follo wing weigh ted signals are created: E − w ( ω ) = H − ( ω ) E mux ( ω ) , E + w ( ω ) = H + ( ω ) E mux ( ω ) . (14) Assuming that the t wo signals are fed into a balanced photo dio de (balanced PD) with sp ectral resp onse R ( ω ), the induced photo curren t is describ ed by: i P D = ∞ Z −∞ dω R ( ω ) E + w ( ω ) 2 − E − w ( ω ) 2 , = ∞ Z −∞ dω R ( ω ) H + ( ω ) 2 − H − ( ω ) 2 | E mux ( ω ) | 2 , = k − 1 X i =0 R ( ω i ) H + ( ω i ) 2 − H − ( ω i ) 2 E 0 r i . (15) Assuming that R ( ω ) is roughly constan t in the area of sp ectral interest, one can set A i = E 0 R 0 µ i and F ∗ i = | H + ( ω i ) | 2 − | H − ( ω i ) | 2 resulting in a photo current equal to i P D = k X i =1 A i F ∗ i = ~ A · ~ F ∗ . (16) The through and drop ports of a MRR can b e used to implemen t the linear filters H + and H − suc h that | H + | 2 = T d and | H − | 2 = T d . Knowing that T p = T d − 1 with minimal losses, w e can set a particular w eigh t using: F ∗ i = 2 T d ( φ i ) − 1 , (17) 5 MUX WDM F D,1 F D,2 F 2,1 F 2,2 F 1,1 F 1,2 A A 2 2,R 2,2 A 2,1 A A 2 D,R D,2 A D,1 A A k 2 A 1 F 1 F 2 F k TIA weight ban ks balanced- PD input intensities Figure 4. An electro-optic architecture that p erforms dot pro ducts. A i ( i = 1 , . . . , k ) are input elemen ts enco ded in in tensities, multiplexed by a WDM and linked to the weigh t banks via a silicon wa veguide. F i are filter v alues that mo du- late the MRRs in the PWB. Drop and through output p orts are connected to a balanceD-PD, where the matrix multipli- cation is p erformed, follow ed by an amplifier TIA. where the phase, φ i can obtained b y using Eq. 10 and Eq. 11 to get: φ i = arccos − 1 2 r 2 a 2(1 − r ) 2 a F ∗ i + 1 − 1 − ( r 2 a ) 2 , (18) w e can see that F ∗ i can b e b etw een -1 and 1. Since T d is a filter that only represen ts v alues b etw een 0 and 1. In order to p erform a dot pro duct with a w eight vector ~ w whose comp onents are not limited to the range -1 to 1, a gain g T I A can b e applied to the photo current such that: ~ A · ~ F = g T I A ~ A · ~ F ∗ = g T I A k X i =1 A i F ∗ i , (19) if: g T I A = max 1 ≤ i ≤ k | F i | , (20) then, ~ F = g T I A ~ F ∗ ; (21) assuming that eac h φ i corresp onds to a weigh ting of w ∗ i . This electronic gain can b e p erformed using a tran- simp edance amplifier (TIA), whic h can b e manufactured in a standard CMOS pro cess [24] and pack aged or in- tegrated with the photonic chip [21]. A diagram of the electro-optic architecture describ ed in this section is pre- sen ted in Fig. 4. F rom now on, this amalgamation of electronic and optical comp onents is referred as a pho- tonic weigh t bank (PWB). PWBs similar to the one in Fig. 4 hav e b een successfully implemented in the past [11, 25, 26]. W e can represent negativ e inputs b etw een -1 and 1 by mo difying the p o wer env eloping function to µ i = 1 2 ( x i + 1). If the same set of deriv ations is follow ed, we can mo dify Eq. (21) to b e: ~ x · ~ w = g k X i =1 A i F ∗ i + k X i =1 E 0 R 0 F ∗ i ! . (22) The second term in this sum is a predictable bias cur- ren t term that conceptually b e subtracted b efore feeding in to the TIA. This is a disadv antage of supp orting neg- ativ e inputs, as additional optical or electronic con trol circuitry would need to b e designed. Another trade-off is a loss in precision due to a larger range of inputs needing to b e represented, analogous to the loss in precision with signed in tegers for classical computing. I I I. PERF ORMING CONVOLUTIONS USING PHOTONICS The goal of this section is to present a photonic ar- c hitecture capable of p erforming conv olutions for CNNs. This new arc hitecture is called DEAP . F or a maximum num b er of input channels D m and a maxim um kernel edge length R m as b ounding parame- ters for DEAP , we represen t the range of conv olutional parameters that a particular implementation of DEAP can supp ort. If a conv olutional parameter describ ed in T able (I) do es not hav e a complementary b ounding pa- rameter, it means that the DEAP arc hitecture can sup- p ort for arbitrary v alues of said con volutional parameter. I I I.1. Pro ducing a Single Conv olv ed Pixel First, we consider an architecture that can pro duce one conv olv ed pixel at a time. T o handle con v olutions for kernels with dimensionalit y up to R m × R m × D m , we will require R 2 m lasers with unique w av elengths since a particular conv olved pixel can b e represented as the dot pro duct of tw o 1 × R 2 m v ectors. T o represent the v alues of eac h pixel, we require D m R 2 m mo dulators (one p er ker- nel v alue) where each mo dulator k eeps the intensit y of the corresp onding carrier wa v e prop ortional to the nor- malized input pixel v alue. The R 2 m lasers are m ultiplexed together using w av elength division m ultiplexing (WDM), whic h is then split into D m separate lines. On every line, there are R 2 m all-pass MRRs, resulting in D m R 2 m MRRs in total. Each WDM line will mo dulate the signals cor- resp onding to a subset of R 2 m pixels on c hannel k , mean- ing that the mo dulated wa velengths on a particular line corresp ond to the pixel inputs ( A m,n,k ) m ∈ [ i,i + R m ] n ∈ [ j,j + R m ] where k ∈ [1 , D m ]. The D m WDM lines will then be fed into an ar- ra y of D m PWBs. Eac h PWB will contain R m MRRs with the weigh ts corresponding to the kernel v alues at a particular channel. F or example, the PWB on line k should contain the vectorized weigh ts for the kernel ( F m,n,k ) m ∈ [1 ,R 2 m ] n ∈ [1 ,R 2 m ] . Eac h MRR within a PWB should b e 6 * * * MUX WDM A A 2 1,R 1,2 A 1,1 1 λ 2 λ λ 2 R F D,1 F D,2 F 2 D,R F 2,1 F 2,2 F 2 2,R F 1,1 F 1,2 F 2 1,R R R R output R TIA TIA TIA Input K er nel laser diodes modulation weight ban ks voltage adder A A 2 2,R 2,2 A 2,1 A A 2 D,R D,2 A D,1 Figure 5. Photonic architecture for pro ducing a single conv olved pixel. Input images are enco ded in in tensities A l,k , where the pixel inputs A m,n,k with m ∈ [ i, i + R m ] , n ∈ [ j, j + R m ] , k ∈ [1 , D m ] are represented as A l,h , l = 1 , . . . , D and h = 1 , . . . , R 2 . Considering the b oundary parameters, we set D = D m and R = R m . Likewise, the filter v alues F m,n,k are represented as are represen ted as F l,h under the same conditions. W e use an array of R 2 lasers with different w av elengths λ h to feed the MRRs. The input and kernel v alues, A l,h and F l,h mo dulate the MRRs via electrical currents prop ortional to those v alues. Once the matrix parallel m ultiplications are p erformed, the voltage adder has the function to add all signals from weigh t banks. Here, R are resistance v alues. Then the output is the con volv ed feature. tuned unique the resonan t wa velength within the multi- plexed signal. The outputs of the weigh t bank array are electrical signals, eac h proportional to the dot pro duct ( F m,n,k ) m [1 ,R 2 m ] n ∈ [1 ,R 2 m ] · ( A p,q ,k ) p ∈ [ i,i + R 2 m ] q ∈ [ j,j + R 2 m ] . Finally , the signals from the weigh t banks need to b e added together. This can b e achiev ed using a passive voltage adder. The out- put from this adder will therefore b e the v alue of a single con volv ed pixel. Fig. 5 shows a complete picture of what suc h an architecture would lo ok like. T o p erform a conv olution with a kernel edge length less than R m , one can set ( F m,n,k ) m ∈ [ R +1 ,R m ] n ∈ [ R +1 ,R m ] to zero. Similarly , if the dimensionality of the kernel is less than D m , then the mo dulators ( A m,n,k ) m ∈ [1 ,H ] n ∈ [1 ,W ] should also b e set to zero, with k ∈ [ D + 1 , D m ] in this case. I I I.2. P erforming a F ull Conv olution In the previous section, w e hav e discussed how DEAP can produce a single con volv ed pixel. In order to p erform a conv olution of arbitrary size, one would need to stride along the input image and readjust the mo dulation array . Since the same kernel is applied across the set of inputs, the weigh t banks do not need to b e mo dified un til a new k ernel is applied. Fig. 6(a) demonstrates this pro cess on an input with S = 1. T o handle S ≥ 1, the inputs b eing passed in to DEAP should also b e stro de accordingly . In this approach, the inputs should ha ve b een zero padded b efore b eing passed into DEAP . In pseudo co de, p erform- ing a conv olution with K filters can b e implemented as sho wn in Algorithm 1. Algorithm 1 Con volutions for CNNs using DEAP 1: A is the input image 2: F is the kernel 3: R is the edge length of the kernel 4: O is a memory blo ck to store the conv olution 5: S is the stride 6: H and W are the height and width of the input image 7: function convol ve ( A, F , R, O , S, H , W ) 8: for ( k = 1; k ≤ K ; k = k + 1) do 9: load kernel weigh ts from F[:,:,:,k] 10: for ( h = 1; h ≤ H − R + 1; h = h + S ) do 11: for ( w = 1; w ≤ W − R + 1; w = w + S ) do 12: load inputs from A[h:min(h+T,H), w:min(w+R,W),:] 13: p erform con volution 14: store results in O[h/S,w/S,k] 15: end for 16: end for 17: end for 18: end function The DEAP architecture also allows for parallelization b y treating the photonic architecture prop osed in the previous section as a single output “con volutional unit”. Ho wev er, by creating n conv instances of these conv olu- tional units, y ou could produce n conv pixels p er cycle b y passing in the next set of inputs p er unit. This is demonstrated in Fig. 6(b) for n conv = 2. The compu- tation of output pixels can b e distributed across each con volutional unit, resulting in a run time complexit y of O K H W S 2 n conv . 7 (a) (b) Figure 6. (a) Cycling through a conv olution using DEAP . (b) Performing a conv olution with tw o con volutional units. IV. PHOTONIC CONVOLUTIONAL NEURAL NETW ORKS In this section, w e sho w how DEAP can be used to run a CNN. CNNs are a t yp e of neural netw ork that were dev elop ed for image recognition tasks. A CNN consists of some combination of conv olutional, nonlinear, p o ol- ing and fully connected lay ers [27], see Fig. 7(a). As in tro duced previously , conv olutions p erform a highly ef- ficien t and parallel matrix multiplication using kernels [3]. F urthermore, since kernels are typically smaller than the input images, the feature extraction op eration allows efficien t edge detection, therefore reducing the amoun t of memory required to store those features. CNNs are netw orks suitable to b e implemen ted in pho- tonic hardware since they demand fewer resources to do matrix multiplication and memory usage. The linear op- eration p erformed by conv olutions allows single feature extraction p er kernel. Hence, many kernels are required to extract as many features as p ossible. F or this reason, k ernels are usually applied in blo c ks, allowing the net- w ork to extract many different features all at once and in parallel. In feed-forward net works, it is typical to use a rectified linear unit (ReLU) activ ation function. Since ReLUs are linear piecewise functions that mo del an ov erall nonlin- earit y , they allow CNNs to b e easily optimized during training. The p o oling lay er introduces an stage where a set of neighbor pixels are encompassed in a single op era- tion. T ypically , such op eration consists in the application of a function that determines the maximum v alue among neigh b oring v alues. An av erage op eration can b e im- plemen ted likewise. Both approaches describ e max and a verage p o ols, resp ectively . This statistical op eration al- lo ws for a direct down-sampling of the image, since the dimensions of the ob ject are reduced by a factor of tw o. F rom this step, we aim to make our netw ork inv ariant and robust to small translations of the detected features. The triplet, conv olution-activ ation-p o oling, is usually rep eated several times for differen t kernels, k eeping in- v ariant the p o oling and activ ation functions. Once all p ossible features are detected, the addition of a fully con- nected lay er is required for the classification stage. This la yer prepares and shows the solutions of the task. CNNs are trained by changing the v alues of the ker- nels, analogous to how feed-forw ard neural netw orks are trained by changing the weigh ted connections [28]. The estimated k ernel and weigh t v alues are required in the testing stage. In this work, this stage is p erformed b y our on-chip DEAP CNN. Figure 7(b) shows a high-level o verview of the prop osed testing on-chip architecture. Here, the testing input v alues stored in the PC mo dulate the intensities of a group of lasers with identical p o wers but unique wa v elengths. These mo dulated inputs would 8 0 1 2 3 4 5 6 7 8 9 ADC SDR AM D A C D A C SDR AM PC Laser sour ces Modulator ar ray W eight ban k ar ray V oltage adder A ctivat ion (a) (b) Convolution A ctivat ion P ooling F ully connec ted Image input K er nel weights Convolved featur e Figure 7. Block diagrams that describ e: (a) a typical CNN, whic h contains conv olutions, activ ation functions, p o oling and fully connected lay ers. In this case we exemplify such diagram using MNIST-based recognition task that predicts the num b er 5; and (b) the DEAP architecture. In the computer (PC) the input image, kernel weigh ts, and conv olv ed features are stored. Also, the commands to implement the activ ation function off-chip are stored in the PC. The input image, kernel weigh ts and con volv ed features are transferred to the chip via DA Cs from the SDRAM. Then, the conv olution is p erformed on-chip. Finally the output is digitalized via an ADC and stored in a SDRAM connected to the computer. b e sen t in to an arra y of photonic w eigh t banks, which w ould then perform the conv olution for each c hannel. The kernels obtained in the training step are used to mo dulate these weigh t banks. Finally , the outputs of the w eight banks w ould b e summed using a voltage adder, whic h pro duces the conv olv ed feature. This simulator w orks using the transfer function of the MRRs, through p ort and drop p ort summing equations at the balanced PDs, and the TIA gain term to simulate a conv olution. The simulator assumes that the MRRs can only b e con- trolled with 7 − bits of precision as that has b een empir- ically observed in a lab setting. The MRR self-coupling co efficien t is equal to the loss, r = a = 0 . 99[29] in Eqs. (6) (10) and (11). The in terfacing of optical comp onents with electronics w ould b e facilitated b y the use of digital-to-analog con- v erters (DA Cs) and analog-to-digital con verters (ADCs), while the storage of output and retrieving of inputs w ould b e achiev ed by using memories GDDR SDRAM. The SDRAM is connected to a computer, where the infor- mation is already in a digital representation. Then, the implemen tation of the ReLU nonlinearit y and the reuse of the con volv ed feature to p erform the next conv olution can b e p erformed. The idea is to use the same archi- tecture to implement the triplet con volution-activ ation- p o oling on hardw are. In this w ork, we trained the CNN to p erform image recognition on the MNIST dataset. The training stage uses the ADAM optimizer and back-propagation algo- rithm to compute the gradient function. The optimized parameters to solve MNIST can b e categorized in tw o groups: (i) t wo 5 × 5 × 8 differen t k ernels and (ii) t wo fully connected lay ers of dimensions 128 × 800 and 128 × 10; and their resp ectiv e bias terms. These kernels are then defined by eigh t 5 × 5 different filters. In the following w e use our DEAP CNN sim ulator to recognize new input images, obtained from a set of 500 images, which are in- tended to b e used for the test step. Our sim ulator only w orks at the transfer level and do es not sim ulate noise or distortion from analog components. The pro cess of feature extraction p erformed by the DEAP CNN is illus- trated in Fig. 8(a). As it can b e seen in the illustration, a 28 × 28 input image from the test dataset is filtered by a first 5 × 5 × 8 kernel, using stride one. The output of this pro cess is a 24 × 24 × 8 con volv ed feature, with a ReLU activ ation function already applied. F ollowing the same pro cess, the second group of filters is applied to the con volv ed feature to generate the second output, i.e. a 20 × 20 × 8 conv olved feature. After the second ReLU is applied to the output, av er- age p o oling is utilized for inv ariance and down-sampling of the conv olv ed features. The av erage p o oling is imple- men ted by a 2 × 2 k ernel whose elemen ts are all 1 / 4. Ho w- ev er, the stride one w as k ept; therefore the p o oled feature has dimensionalit y 19 × 19 × 8. The down-sampling is im- plemen ted offline: from the 19 × 19 × 8 output, a simple algorithm extracts the elements that hav e even indexes. The result of this pro cess is a 10 × 10 × 8 p o oled output. Finally , the first fully connected la yer is fed through by the flattened version of the p o oled ob ject. The resultant v ector feeds the last fully connected lay er, where the re- sult of the MNIST classification app ears. The results of the MNIST task solved b y our sim u- lated DEAP CNN is sho wn b y Fig. 8(b). F or a test set of 500 images, we obtained an ov erall accuracy of 98%. This p erformance was compared to the results obtained using a standard t wo-la yers CNN including a max p o ol- ing lay er. W e found that This standard netw ork ac hieves 9 9 8 7 6 5 4 3 2 1 0 5 MNIST inpu t 28x28 Conv + R eL U 24x24x8 Conv + R eL U 20x20x8 Conv - A vg 10x10x8 FC (b) (a) 0 2 4 8 6 100 80 60 20 40 pr ediction (%) Figure 8. (a) An illustrative blo c k diagram of the tw o-lay ers DEAP CNN solving MNIST. (b) Results of the MNIST task using a simulated DEAP CNN. an ov erall accuracy of 98 . 6%. Therefore, w e can con- clude that our simulator is sufficiently robust despite the 7 − bits of precision considered in the DEAP CNN simu- lation. V. ENER GY AND SPEED ANAL YSES V.1. Energy Estimation The energy used by a single DEAP conv olutional unit dep ends on the R and D parameters. The 100- w av elength limitation for MRRs constrains the maxim um R to b e 10, as each multiplexed wa v eguide will store R 2 signals. The num b er of MRRs used in the mo dulator ar- ra y is equal to R 2 D , meaning that only certain D and R 2 v alues are allow ed for a finite n umber of MRRs. Assum- ing that a maxim um of 1024 MRRs can be manufactured in the mo dulator array , a con volutional unit can supp ort a large k ernel size with a limited num b er of c hannels, R = 10, D = 12, or a small kernel size with a large num b er of channels, R = 3, D = 113. W e will consider b oth edge cases to get a range of energy consumption v alues. F or the smaller conv olution size, w e will ha ve R 2 lasers, R 2 MRRs and DA Cs in the mo dulator arra y , R 2 D MRRs and D TIAs in the weigh t bank array and one ADC to con vert back in to digital signal. With 100 mW p er laser, 19.5 mW p er MRR, 26 mW p er DA C, 17 mW p er TIA [30] and 76 mW p er ADC, we get an energy usage of 112 W for the large kernel size and 95W for the smaller ker- nel size. Therefore, we estimate a single conv olution unit to use around 100 W when 1024 mo dulators are used to represen t inputs. V.2. DEAP Performance The time it takes for light to propagate from the WDM to b efore the balanced PDs is estimated by the following equation: t prop = k 2 π r M RR c (23) where c is the sp eed of light 2 π r M RR is the circumfer- ence of the MRR and k is the num b er of MRRs. Assum- ing 100 MRRs with a radius of 10 m [11, 31], the PWB gets a propagation time of around 21 ps and a through- put of 1 /t prop = 50 GS/s. The b ottlenecks come from the fact that the balanced PDs has a throughput of 25 GS/s[30] and the TIA has a throughput of 10 GS/s[32]. An individual MRR can b e mo dulated at sp eeds of 128 GS/s[31], meaning that the mo dulation frequency of the MRRs do es not b ottlenec k the throughput of the PWB. The throughput of a PWB is around 5 GS/s. The D ACs[33] and ADCs[34] b oth op erate at 5 GS/s and supp ort to 7-bits. The GDDR6 SDRAM op erates at 16 G with a 256-bit bus size[35]. Consequen tly , the sp eed of the system is limited by the throughput of the D ACs/ADCs, resulting in DEAP pro ducing a single con- v olved pixel at 5 GS/s or t = 200 ps. DeepBenc h [17] is an empirical dataset that contains ho w long v arious types of GPUs took to perform a con vo- lution for a given set of conv olutional parameters. T able (I I) contains the parameters used for each of these b ench- marks, and T able (I I I) contains the p ow er consumption. The sp eeds of v arious GPUs were directly taken from Ref. [17], while the sp eed of the conv olution was esti- 10 T able I I. Benchmarking parameters for DEAP W H D N K R w R h S 700 161 1 4 32 5 20 2 112 112 64 8 128 3 3 1 7 7 832 16 256 1 1 1 T able I I I. Benchmark ed GPUs with p ow er consumption GPU P ow er Usage (W) AMD V ega FE[36] 375 AMD MI25[37] 300 NVIDIA T esla P100[38] 250 NVIDIA GTX 1080 Ti[39] 250 mated using the follo wing equation: t runtime = 200 ps × N K n conv H − R S + 1 W − R S + 1 . (24) In some of the benchmarks, the kernels edge lengths were not equal, hence the parameters R w and R h whic h corre- sp ond to the width and heigh t of the kernels. F or eac h of the selected b enchmarks, the parameters R 2 D ≤ 1024, meaning that the con volutional netw ork is compatible with DEAP implemen tations. Figure 9. Estimated DEAP conv olutional runtime compared to actual GPU runtimes from DeepBench b enchmarks The estimated DEAP run times using one and t wo con- v olutional units were plotted against actual DeepBench run times in Fig. 9. F rom this, we can see that using t wo conv olutional units p erforms slightly b etter than all the GPU b enchmarks. While mean GPUs p ow er con- sumption is 295 W, DEAP with a single con volutional unit sues ab out 110 W. Therefore, DEAP can p erform con volutions b etw een 1.4 and 7.0 faster than the mean GPU runtime while using 0.37 times the energy consump- tion. Using tw o conv olutional units doubles the sp eed of DEAP , meaning that DEAP can b e b etw een 2.8 and 14 times faster than a con ven tional GPU while using almost 0.75 times the energy consumption. DEAP with a sin- gle unit p erforming at a sp eed somewhat similar to the GPUs is exp ected. VI. CONCLUSION W e hav e prop osed a photonic netw ork, DEAP , suited for conv olutional neural netw orks. DEAP was estimated to p erform conv olutions b etw een 2.8 and 14 times faster than a GPU while roughly using 0.75 times the energy consumption. A linear increase in pro cessing sp eeds cor- resp onds to a linear increase in energy consumption, al- lo wing for DEAP to b e as scalable as electronics. High level softw are sim ulations ha ve shown that DEAP is theoretically capable of p erforming a conv olution. W e demonstrate that our DEAP CNN is capable of solving MNIST handwritten recognition task with an ov erall ac- curacy of 98%. The largest b ottlenecks is the I/O in- terfacing with digital systems via D A Cs and ADCs. If photonic DA Cs[40] and ADCs[41] are to b e built with higher bit-precisions, the speedup ov er GPUs could be ev en higher. If higher bit precision photonic DA Cs and ADCs are able to b e built, replacing the electronic com- p onen ts with optical ones can significan tly decrease the run time. In order to realize a ph ysical implemen tation, there are a num b er of issues that still need to b e solv ed. Pac k ag- ing a silicon photonic with an electronic chip with high I/O count is a challenging RF engineering task, but it is a cen tral thrust in the roadmap for silicon photonic foundries [21]. There also needs to b e control circuitry that routes the outputs of the SDRAM into the rele- v ant DA Cs and from the ADCs into the SDRAM. Since w e assume that the control circuitry can op erate signif- ican tly faster than a memory access, we b elieve it will ha ve a negligible impact on the o verall throughput. An- other issue is that DEAP pro cesses data in the analog do- main, whereas GPUs p erform floating p oint arithmetic. Though floating-p oint arithmetic do es hav e some degree of error due to rounding in the man tissa, their errors are deterministic and predictable. On the other hand, the errors from photonics are due to sto chastic shot, sp ec- tral, Johnson-Nyquist and flick er noises, as w ell as quan- tization noise in the ADC, and distortion from the RF signals applied to the mo dulators. Ho wev er, artificially adding random noise to CNNs hav e been sho wn to reduce o ver-fitting [42], meaning that some degree of sto chastic b eha viour is tolerable in the domain of machine learning problems. Finally , MRRs hav e only b een shown to hav e up to 7-bits of precision, whic h is significantly smaller than the range precision supp orted by even half-precision (16-bit) floating p oint representations. In conclusion, photonics has the p otential to perform con volutions at speeds faster 11 than top-of- the-line GPUs while having a low er energy consumption. Moving forw ard, the greatest challenges to ov ercome hav e to do with increasing the precision of photonic comp onents so that they are comparable to clas- sical floating-p oint representations. Ov erall, silicon pho- tonics has the p otential to outp erform conv en tional elec- tronic hardware for conv olutions while having the ability to scale up in the future. A CKNOWLEDGMENT F unding for B.J.S., B.A.M., H.B.M., and V.B. was pro- vided by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Queen’s Research Initiation Gran t (RIG). [1] X. Li, G. Zhang, H. H. Huang, Z. W ang, and W. Zheng, in 2016 45th International Confer enc e on Par al lel Pr o- c essing (ICPP) (2016) pp. 67–76. [2] M. Jaderberg, A. V edaldi, and A. Zisserman, CoRR abs/1405.3866 (2014), [3] I. Go o dfellow, Y. Bengio, and A. Courville, De ep L e arn- ing (The MIT Press, 2016). [4] Y. LeCun and C. Cortes, (2010). [5] F. Dup ort, A. Smerieri, A. Akrout, M. Haelterman, and S. Massar, Scientific Rep orts 6 , 22381 EP (2016). [6] D. Brunner, M. C. Soriano, C. R. Mirasso, and I. Fisc her, Nature Communications 4 , 1364 (2013). [7] K. V andoorne, P . Mechet, T. V an V aerenbergh, M. Fiers, G. Morthier, D. V erstraeten, B. Schrau wen, J. Dambre, and P . Bienstman, Nature Communications 5 , 3541 EP (2014). [8] L. Larger, M. C. Soriano, D. Brunner, L. App eltan t, J. M. Gutierrez, L. P esquera, C. R. Mirasso, and I. Fisc her, Optics Express 20 , 3241 (2012). [9] P . R. Prucnal and B. J. Shastri, Neur omorphic Photonics (CR C Press, T aylor & F rancis Group, Bo ca Raton, FL, USA, 2017). [10] A. N. T ait, M. A. Nahmias, B. J. Shastri, and P . R. Pruc- nal, Journal of Light wa ve T ec hnology 32 , 4029 (2014). [11] A. N. T ait, A. X. W u, T. F. de Lima, E. Zhou, B. J. Shas- tri, M. A. Nahmias, and P . R. Prucnal, IEEE Journal of Selected T opics in Quantum Electronics 22 , 312 (2016). [12] A. N. T ait, T. F erreira de Lima, M. A. Nahmias, H. B. Miller, H.-T. Peng, B. J. Shastri, and P . R. Prucnal, arXiv e-prints , arXiv:1812.11898 (2018), arXiv:1812.11898 [physics.app-ph]. [13] T. W. Hughes, M. Minko v, Y. Shi, and S. F an, Optica 5 , 864 (2018). [14] Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr- Jones, M. Ho ch b erg, X. Sun, S. Zhao, H. Laro chelle, D. Englund, and M. Soljaˇ ci ´ c, Nat. Photonics 11 , 441 (2017). [15] T. F. de Lima, H. P eng, A. N. T ait, M. A. Nahmias, H. B. Miller, B. J. Shastri, and P . R. Prucnal, Journal of Ligh tw a ve T echnology 37 , 1515 (2019). [16] A. Mehrabian, Y. Al-Kabani, V. J. Sorger, and T. A. El-Ghaza wi, CoRR abs/1807.08792 (2018), [17] B. Research, Deepb ench. [18] V. Bangari, B. Marquez, H. Miller, and B. J. Shastri, DEAP, https://github.com/Shastri- Lab/DEAP (2019). [19] G. T an, L. Li, S. T riec hle, E. Phillips, Y. Bao, and N. Sun, in Pr o c e e dings of 2011 International Confer ence for High Performanc e Computing, Networking, Stor age and Anal- ysis , SC ’11 (ACM, New Y ork, NY, USA, 2011) pp. 35:1– 35:11. [20] S. Chetlur, C. W o olley , P . V andermersc h, J. Co- hen, J. T ran, B. Catanzaro, and E. Shelhamer, CoRR abs/1410.0759 (2014), [21] A. Rahim, T. Spuesens, R. Baets, and W. Bogaerts, Pro- ceedings of the IEEE 106 , 2313 (2018). [22] M. A. Nahmias, B. J. Shastri, A. N. T ait, T. F. de Lima, and P . R. Prucnal, Opt. Photon. News 29 , 34 (2018). [23] W. Bogaerts, P . De Heyn, T. V an V aeren b ergh, K. De V os, S. Kumar Selv ara ja, T. Claes, P . Dumon, P . Bien- stman, D. V an Thourhout, and R. Baets, Laser & Pho- tonics Reviews 6 , 47 (2012), [24] H. Zheng, R. Ma, and Z. Zhu, Analog Integrated Circuits and Signal Pro cessing 90 , 217 (2017). [25] M. Lipson, Journal of Ligh tw av e T ec hnology 23 , 4222 (2005). [26] A. N. T ait, H. Jay atillek a, T. F. D. Lima, P . Y. Ma, M. A. Nahmias, B. J. Shastri, S. Shekhar, L. Chrosto wski, and P . R. Prucnal, Opt. Express 26 , 26422 (2018). [27] K. O’Shea and R. Nash, CoRR abs/1511.08458 (2015), [28] K. Mehrotra, C. K. Mohan, and S. Rank a, Elements of Artificial Neur al Networks (MIT Press, Cambridge, MA, USA, 1997). [29] Y. T an and D. Dai, Journal of Optics 20 , 054004 (2018). [30] Z. Huang, C. Li, D. Liang, K. Y u, C. Santori, M. Fiorentino, W. Sorin, S. P alermo, and R. G. Beau- soleil, Optica 3 , 793 (2016). [31] J. Sun, R. Kumar, M. Sakib, J. B. Driscoll, H. Ja yatillek a, and H. Rong, Journal of Light wa ve T echnology 37 , 110 (2019). [32] M. Atef and H. Zimmermann, Analog In tegr. Circuits Signal Pro cess. 76 , 367 (2013). [33] B. Sedighi, M. Khafa ji, and J. C. Scheytt, International Journal of Microw av e and Wireless T echnologies 4 , 275 (2012). [34] J. F ang, S. Thirunakk arasu, X. Y u, F. Silv a-Riv as, C. Zhang, F. Singor, and J. Abraham, IEEE T ransac- tions on Circuits and Systems I: Regular Papers 64 , 1673 (2017). [35] I. Micron T ec hnology , Gddr6 sgram mt61k256m32 8gb: 2 channels x16/x8 gddr6 sgram. [36] I. Adv anced Micro Devices, Radeon vega frontier edition (liquid-co oled) (). [37] I. Adv anced Micro Devices, Radeon instinct mi25 accel- erator (). [38] N. Corp oration, Vidia tesla p100 gpu accelerator (). [39] N. Corp oration, Geforce gtx 1080 ti (). [40] F. Zhang, B. Gao, X. Ge, and S. P an, Optical Engineer- ing 55 , 031115 (2015). [41] M. A. Piqueras, P . Villalba, J. Puche, and J. Mart ´ ı, in 2011 IEEE International Conferenc e on Micr owaves, 12 Communic ations, Antennas and Ele ctr onic Systems (COMCAS 2011) (2011) pp. 1–6. [42] Z. Y ou, J. Y e, K. Li, and P . W ang, CoRR abs/1805.08000 (2018),
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment