318 56 18MB
English Pages 412 [378] Year 2017
1 Neuromorphic Engineering “The human brain performs computations inaccessible to the most powerful of today’s computers—all while consuming no more power than a light bulb. Understanding how the brain computes reliably with unreliable elements, and how different elements of the brain communicate, can provide the key to a completely new category of hardware (Neuromorphic Computing Systems) and to a paradigm shift for computing as a whole. The economic and industrial impact is potentially enormous.” Human Brain Project (2014) Complexity manifests in our world in countless ways [1, 2]. Social interactions between large populations of individuals [3], physical processes such as protein folding [4], and biological systems such as human brain area function [5] are examples of complex systems, which each have enormous numbers of interacting parts that lead to emergent global behavior. Emergence is a type of behavior that arises only when many elements in a system interact strongly and with variation [6], which is very difficult to capture with reductionary models. Understanding, extracting knowledge about, and creating predictive models of emergent phenomena in networked groups of dynamical units represent some the most challenging questions facing society and scientific investigation. Emergent phenomena play an important role in gene expression, brain disease, homeland security, and condensed matter physics. Analyzing complex and emergent phenomena requires data-driven approaches in which reams of data are synthesized by computational tools into probable models and predictive knowledge. Most current approaches to complex system and big-data analysis are software solutions that run on traditional von Neumann machines; however, the interconnected structure that leads to emergent behavior is precisely the reason why complex systems are difficult to reproduce in conventional computing frameworks. Memory and data interaction bandwidths greatly constrain the types of informatic systems that are feasible to simulate. The human brain is believed to be the most complex system in the universe. It has approximately 1011 neurons, and each neuron connected to up to 10,000 other neurons, communicating with each other via as many as 1015 synaptic connections. The brain is also indubitably a natural standard for information processing, one that has been compared to artificial processing systems since their earliest inception. It is estimated to perform between 1013 3
Neuromorphic Photonics
4
and 1016 operations per second while consuming only 25 W of power [7]. Such exceptional performance is in part due to the neuron biochemistry, its underlying architecture, and the biophysics of neuronal computation algorithms. The brain as a processor differs radically from computers today, both at the physical level and at the architectural level. Brain-inspired computing systems could potentially have paradigm defining degrees of data interconnection throughput (a key correlate of emergent behavior), which could enable the study of new regimes in signal processing, at least some of which (e.g., real-time complex system assurance and big data awareness) exude a pronounced societal need for new signal processing approaches. Unconventional computing platforms inspired by the human brain could simultaneously break performance limitations inherent in traditional von Neumann architectures in solving particular classes of problems. Machine instructions
Data
Integrated chip technology
Intel Core i7; 22 nm process; 1.86B Transistors; 257mm 2
Figure 1.1 Information processing with a standard von Neumann architecture. Instructions and data are both stored in memory and pass through a common bus to reach the processor. Interconnect bottleneck limits processor performance.
Conventional digital computers are based on the von Neumann architecture [8] (also called the Princeton architecture). As shown in Fig. 1.1, it consists of a memory that stores both data and instructions, a central processing unit (CPU) and inputs and outputs. Instructions and data stored in the memory unit lie behind a shared multiplexed bus which means that both cannot be accessed simultaneously. This leads to the well known von Neumann bottleneck [9] which fundamentally limits the performance of the system—a problem that is aggravated as CPUs become faster and memory units larger. Nonetheless, this computing paradigm has dominated for over 60 years driven in part by the continual progress dictated by Moore’s law1 [10] for CPU 1 The number of transistors that can be put on a microchip doubles every 18 to 24 months, doubling its performance.
Neuromorphic Engineering
5
scaling and Koomey’s law2 [11] for energy efficiency (multiply-accumulate (MAC) operations per joule) compensating the bottleneck. Over the last several years, though, such scaling has not followed suit, approaching an asymptote (see Fig. 1.2). The computation efficiency levels off below 10 MMAC/mW (or 10 GMAC/W or 100 pJ per MAC) [12]. The reasons behind this trend can be traced to both the representation of information at the physical level and the interaction of processing with memory at the architectural level [13].
( H_
~· .,·~
.,
.....
- ·"'"'' -
,,..:J.
~ v.
t ...
'
-·l ~>'l:\
~
v >W
U
~ - ~J
•
~ 11
W< ~
~ .
'
o.~
...
t·•
.
~ · ~·1
•:c
.
... '· . •••. I •,. :
....•......... !!'.. -·-··-····.....
fi ., , ' '" '"•"' ...-~.:» lot~
• ;.
t,..,.,lF·'" :r
lt .>tu Xl !COO
r =-·.,
.., ".. l!_.,~ _,'I ' qoNI ...H ,,..... .- U IW) ,.,~ J ):
t · f-
0
2; (IJ
1
0
Boron Nitride (1 nm)
~ 05 ~
0
E'
Graphene (0.335nm)
1.5
g 9 GHz). Reproduced with permission from Zhang et al. Opt. Express 22, 10202–10209 (2014) Ref. [63]. Copyright 2014 Optical Society of America.
One of the key enabling advances for hybrid III-V/Si sources were lowtemperature wafer bonding techniques. III-V epitaxial layer structures offer a tremendous and well controlled design space for quantum well, electronic, and dielectric engineering. Heteroepitaxial growth of III-V on Si is difficult because lattice mismatches result in large defect densities that destroy the material optical properties. Strongly confining SOI waveguides are extremely sensitive to III-V section placement, so attaching pre-patterned devices to waveguide chips presents an alignment challenge [57]. The wafer bonding approach of transferring an unpatterned III-V epitaxial stack to the SOI wafer is an attractive option since patterning can be performed after the bonding step and lithographically aligned to the underlying SOI circuit. To withstand post-processing steps, strong bonding is essential. Conventional wafer bonding necessitates high temperature anneals > 1000◦C, which can destroy the III-V/Si interface due to the thermal coefficient mismatch between the two semiconductors [59]. Low-temperature III-V/Si wafer bonding processes fall into two categories based on how the bonding interface is otherwise activated:
Neuromorphic Photonics
184
1) plasma assisted activation is used for direct molecular bonding between the native surface materials [55, 64, 65], and 2) polymeric activation is used for adhesive bonding in which a thin layer of polymer bonds the two materials [54, 66]. Figure 7.13 shows examples of molecular bonded hybrid lasers, and Fig. 7.14 shows examples of adhesively bonded hybrid lasers.
(e) - IO .-~
~
-20
2E
-30
;_\ l-.1.-:,..-., = 32nm :
·u
:e. 40 i;
·u L4
a:l
12
~
8.. -50 ~ ~
-60
1.57
(h)
(c)
I 58
1.59
1.6
1.61
1.62
Wavelengtl1 (J.lm)
Figure 7.14 Adhesively bonded hybrid microdisk lasers. (a) Schematic diagram of microdisk cavity entirely in a III-V layer coupled to an underlying SOI waveguide. (b) SEM images of fabricated lasers. (c) SEM cross-section showing metal contacts, III-V layer, underlying Si waveguide, buried oxide, and Si substrate. The III-V/Si interface is filled with a DVS-BCB adhesive polymer. (d) Optical image of wavelength multiplexed hybrid sources coupled to a bus SOI waveguide. (e) Measured spectra out of the bus waveguide when lasers are pumped. The four most prominent lasing peaks correspond to the four microdisk lasers. Republished with permission of John Wiley and Sons, Inc., from Roelkens et al. Laser Photon. Rev. 4, 751–779 (2010), Ref. [61].
Photonic devices that have been demonstrated on hybrid platforms include various types of lasers, in addition to amplifiers [67], photodetectors [68, 69], and electroabsorption modulators [40]. Straight hybrid waveguides terminated with reflective facets can be used to form Fabry-Perot lasers, which are easiest to fabricate but provide poor longitudinal mode control. Gratings etched into the SOI waveguides can form distributed feedback (DFB) cavities for compact and single mode behavior that does not require access to chip facets (Fig. 7.13(c)) [63, 70]. Lasers with two hybrid electrical sections can be biased to produce mode-locked pulse train outputs [71]. Thermal management is a key design consideration affecting hybrid laser performance because of their small active region cross-sections in which heat is dissipated [72, 73]. Large DFB lasers can thus achieve the high output powers required for long reach fiber links, while compact microdisk lasers can achieve the low threshold currents needed for densely integrated multi-wavelength on-chip links
Silicon Photonics
185
(Fig. 7.14) [61, 74]. In addition to microdisk geometries, microring cavities consisting of bent hybrid waveguide sections have been shown [75].
7.6 REFERENCES 1. B. Jalali and S. Fathpour, “Silicon Photonics,” Journal of Lightwave Technology, vol. 24, no. 12, pp. 4600–4615, 2006. 2. R. Soref, “The past, present, and future of silicon photonics,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 12, no. 6, Nov. 2006. 3. W. Bogaerts, R. Baets, P. Dumon, V. Wiaux, S. Beckx, D. Taillaert, B. Luyssaert, J. Van Campenhout, P. Bienstman, and D. Van Thourhout, “Nanophotonic waveguides in silicon-on-insulator fabricated with cmos technology,” Journal of Lightwave Technology, vol. 23, no. 1, pp. 401–412, 2005. 4. R. G. Beausoleil, “Large-scale integrated photonics for high-performance interconnects,” J. Emerg. Technol. Comput. Syst., vol. 7, no. 2, pp. 6:1–6:54, July 2011. 5. W. Bogaerts, M. Fiers, and P. Dumon, “Design challenges in silicon photonics,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 20, no. 4, pp. 1–8, July 2014. 6. C. Gunn, “CMOS photonics for high-speed interconnects,” Micro, IEEE, vol. 26, no. 2, pp. 58–66, Mar. 2006. 7. T. Baehr-Jones, T. Pinguet, P. Lo Guo-Qiang, S. Danziger, D. Prather, and M. Hochberg, “Myths and rumours of silicon photonics,” Nat. Photon, vol. 6, no. 4, pp. 206–208, Apr. 2012. 8. M. Hochberg, N. C. Harris, R. Ding, Y. Zhang, A. Novack, Z. Xuan, and T. Baehr-Jones, “Silicon photonics: The next fabless semiconductor industry,” IEEE Solid-State Circuits Magazine, vol. 5, no. 1, pp. 48–58, winter 2013. 9. L. Chrostowski and M. Hochberg, Silicon Photonics Design: From Devices to Systems. Cambridge University Press, 2015. 10. A.-J. Lim, J. Song, Q. Fang, C. Li, X. Tu, N. Duan, K. K. Chen, R.-C. Tern, and T.-Y. Liow, “Review of silicon photonics foundry efforts,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 20, no. 4, pp. 405–416, July 2014. 11. R. J. Bojko, J. Li, L. He, T. Baehr-Jones, M. Hochberg, and Y. Aida, “Electron beam lithography writing strategies for low loss, high confinement silicon optical waveguides,” Journal of Vacuum Science Technology B, vol. 29, no. 6, 2011. 12. A. Biberman, M. J. Shaw, E. Timurdogan, J. B. Wright, and M. R. Watts, “Ultralow-loss silicon ring resonators,” Optics Letters, vol. 37, no. 20, pp. 4236–4238, Oct. 2012. 13. K. K. Lee, D. R. Lim, L. C. Kimerling, J. Shin, and F. Cerrina, “Fabrication of ultralow-loss Si/SiO2 waveguides by roughness reduction,” Optics Letters, vol. 26, no. 23, pp. 1888–1890, Dec. 2001. 14. S. Pogossian, L. Vescan, and A. Vonsovici, “The single-mode condition for semiconductor rib waveguides with large cross section,” Journal of Lightwave Technology, vol. 16, no. 10, pp. 1851–1853, Oct. 1998. 15. O. Powell, “Single-mode condition for silicon rib waveguides,” Journal of Lightwave Technology, vol. 20, no. 10, p. 1851, Oct. 2002.
186
Neuromorphic Photonics
16. J. Lousteau, D. Furniss, A. B. Seddon, T. M. Benson, A. Vukovic, and P. Sewell, “The single-mode condition for silicon-on-insulator optical rib waveguides with large cross section,” Journal of Lightwave Technology, vol. 22, no. 8, p. 1923, Aug. 2004. 17. J. Xia, J. Yu, Z. Wang, Z. Fan, and S. Chen, “Low power 2x2 thermo-optic SOI waveguide switch fabricated by anisotropy chemical etching,” Optics Communications, vol. 232, pp. 223–228, 2004. 18. V. R. Almeida, R. R. Panepucci, and M. Lipson, “Nanotaper for compact mode conversion,” Optics Letters, vol. 28, no. 15, pp. 1302–1304, Aug. 2003. 19. A. Khilo, M. A. Popovic, M. Araghchini, and F. X. K¨ artner, “Efficient planar fiber-to-chip coupler based on two-stage adiabatic evolution,” Optics Express, vol. 18, no. 15, pp. 15 790–15 806, July 2010. 20. T. Barwicz, N. Boyer, S. Harel, T. Lichoulas, E. Kimbrell, A. Janta-Polczynski, S. Kamlapurkar, S. Engelmann, Y. Vlasov, and P. Fortier, “Automated, self-aligned assembly of 12 fibers per nanophotonic chip with standard microelectronics assembly tooling,” in IEEE 65th Electronic Components and Technology Conference (ECTC), May 2015, pp. 775–782. 21. Y. Wang, X. Wang, J. Flueckiger, H. Yun, W. Shi, R. Bojko, N. A. Jaeger, and L. Chrostowski, “Focusing sub-wavelength grating couplers with low back reflections for rapid prototyping of silicon photonic circuits,” Optics Express, vol. 22, no. 17, pp. 20 652–20 662, 2014. 22. C. Alonso-Ramos, A. O.-M. Nux, I. Molina-Fernandez, P. Cheben, L. ZavargoPeche, and R. Halir, “Efficient fiber-to-chip grating coupler for micrometric soi rib waveguides,” Optics Express, vol. 18, no. 14, pp. 15 189–15 200, July 2010. 23. A. Mekis, S. Gloeckner, G. Masini, A. Narasimha, T. Pinguet, S. Sahni, and P. De Dobbelaere, “A grating-coupler-enabled cmos photonics platform,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 17, no. 3, pp. 597–608, 2011. 24. D. Taillaert, H. Chong, P. Borel, L. H. Frandsen, R. M. De La Rue, R. Baets et al., “A compact two-dimensional grating coupler used as a polarization splitter,” Photonics Technology Letters, IEEE, vol. 15, no. 9, pp. 1249–1251, 2003. 25. R. Halir, A. Ortega-Monux, J. Schmid, C. Alonso-Ramos, J. Lapointe, D.-X. Xu, J. Wanguemert-Perez, I. Molina-Fernandez, and S. Janz, “Recent advances in silicon waveguide devices using sub-wavelength gratings,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 20, no. 4, pp. 279–291, July 2014. 26. W. D. Sacher, Y. Huang, L. Ding, B. J. F. Taylor, H. Jayatilleka, G.-Q. Lo, and J. K. S. Poon, “Wide bandwidth and high coupling efficiency si3n4-on-soi dual-level grating coupler,” Optics Express, vol. 22, no. 9, pp. 10 938–10 947, May 2014. 27. N. Lindenmann, G. Balthasar, D. Hillerkuss, R. Schmogrow, M. Jordan, J. Leuthold, W. Freude, and C. Koos, “Photonic wire bonding: a novel concept for chip-scale interconnects,” Optics Express, vol. 20, no. 16, p. 17667, July 2012. 28. N. Lindenmann, S. Dottermusch, M. L. Goedecke, T. Hoose, M. R. Billah, T. P. Onanuga, A. Hofmann, W. Freude, and C. Koos, “Connecting silicon photonic circuits to multicore fibers by photonic wire bonding,” Journal of Lightwave
Silicon Photonics
187
Technology, vol. 33, no. 4, pp. 755–760, 2015. 29. G. T. Reed, G. Mashanovich, F. Y. Gardes, and D. J. Thomson, “Silicon optical modulators,” Nat. Photon, vol. 4, no. 8, pp. 518–526, Aug. 2010. 30. W. M. J. Green, M. J. Rooks, L. Sekaric, and Y. A. Vlasov, “Ultra-compact, low rf power, 10 gb/s silicon mach–zehnder modulator,” Optics Express, vol. 15, no. 25, pp. 17 106–17 113, Dec. 2007. 31. M. Watts, D. Trotter, R. Young, and A. Lentine, “Ultralow power silicon microdisk modulators and switches,” in 2008 5th IEEE International Conference on Group IV Photonics, Sept. 2008, pp. 4–6. 32. P. Dong, S. Liao, D. Feng, H. Liang, D. Zheng, R. Shafiiha, C. Kung, W. Qian, G. Li, X. Zheng, A. Krishnamoorthy, and M. Asghari, “Low Vpp , ultralow-energy, compact, high-speed silicon electro-optic modulator.” Optics Express, vol. 17, no. 25, pp. 22 484–22 490, Dec. 2009. 33. A. Liu, R. Jones, L. Liao, D. Samara-Rubio, D. Rubin, O. Cohen, R. Nicolaescu, and M. Paniccia, “A high-speed silicon optical modulator based on a metal-oxide-semiconductor capacitor,” Nature, vol. 427, no. 6975, pp. 615–618, Feb. 2004. 34. W. D. Sacher, W. M. J. Green, S. Assefa, T. Barwicz, H. Pan, S. M. Shank, Y. A. Vlasov, and J. K. S. Poon, “Coupling modulation of microrings at rates beyond the linewidth limit,” Optics Express, vol. 21, no. 8, pp. 9722–9733, Apr. 2013. 35. D. Patel, S. Ghosh, M. Chagnon, A. Samani, V. Veerasubramanian, M. Osman, and D. V. Plant, “Design, analysis, and transmission system performance of a 41 ghz silicon photonic modulator,” Optics Express, vol. 23, no. 11, pp. 14 263—14 287, June 2015. 36. Q. Xu, B. Schmidt, S. Pradhan, and M. Lipson, “Micrometre-scale silicon electro-optic modulator,” Nature, vol. 435, no. 7040, pp. 325–327, May 2005. 37. Q. Xu, B. Schmidt, J. Shakya, and M. Lipson, “Cascaded silicon micro-ring modulators for wdm optical interconnection,” Optics Express, vol. 14, no. 20, pp. 9431–9436, Oct. 2006. 38. R. Dube-Demers, J. St-Yves, A. Bois, Q. Zhong, M. Caverley, Y. Wang, L. Chrostowski, S. LaRochelle, D. V. Plant, and W. Shi, “Analytical modeling of silicon microring and microdisk modulators with electrical and optical dynamics,” Journal of Lightwave Technology, vol. 33, no. 20, pp. 4240–4252, Oct. 2015. 39. Y.-H. Kuo, Y. K. Lee, Y. Ge, S. Ren, J. E. Roth, T. I. Kamins, D. A. B. Miller, and J. S. Harris, “Strong quantum-confined stark effect in germanium quantum-well structures on silicon,” Nature, vol. 437, no. 7063, pp. 1334–1336, Oct. 2005. 40. Y.-H. Kuo, H.-W. Chen, and J. E. Bowers, “High speed hybrid silicon evanescent electroabsorption modulator,” Optics Express, vol. 16, no. 13, pp. 9936–9941, June 2008. 41. M. Liu, X. Yin, E. Ulin-Avila, B. Geng, T. Zentgraf, L. Ju, F. Wang, and X. Zhang, “A graphene-based broadband optical modulator,” Nature, vol. 474, no. 7349, pp. 64–67, June 2011. 42. V. J. Sorger, N. D. Lanzillotti-Kimura, R.-M. Ma, and X. Zhang, “Ultracompact silicon nanophotonic modulator with broadband response,” Nanophotonics, vol. 1, no. 1, pp. 17–22, 2012.
188
Neuromorphic Photonics
43. L. Chen and M. Lipson, “Ultra-low capacitance and high speed germanium photodetectors on silicon,” Optics Express, vol. 17, no. 10, pp. 7901–7906, May 2009. 44. L. Vivien, J. Osmond, J.-M. Fedeli, D. Marris-Morini, P. Crozat, J.-F. Damlencourt, E. Cassan, Y. Lecunff, and S. Laval, “42 ghz p.i.n. germanium photodetector integrated in a silicon-on-insulator waveguide,” Optics Express, vol. 17, no. 8, pp. 6252–6257, Apr. 2009. 45. S. Assefa, F. Xia, and Y. A. Vlasov, “Reinventing germanium avalanche photodetector for nanophotonic on-chip optical interconnects,” Nature Letters, vol. 464, pp. 80–84, Mar. 2010. 46. L. Alloatti, S. A. Srinivasan, J. S. Orcutt, and R. J. Ram, “Waveguide-coupled detector in zero-change complementary metal–oxide–semiconductor,” Applied Physics Letters, vol. 107, no. 4, 2015. 47. M. W. Geis, S. J. Spector, M. E. Grein, J. U. Yoon, D. M. Lennon, and T. M. Lyszczarz, “Silicon waveguide infrared photodiodes with 35 ghz bandwidth and phototransistors with 50 AW−1 response,” Optics Express, vol. 17, no. 7, pp. 5193–5204, Mar. 2009. 48. K. K. Mehta, J. S. Orcutt, J. M. Shainline, O. Tehar-Zahav, Z. Sternberg, R. Meade, M. A. Popovic, and R. J. Ram, “Polycrystalline silicon ring resonator photodiodes in a bulk complementary metal-oxide-semiconductor process,” Optics Letters, vol. 39, no. 4, pp. 1061–1064, Feb. 2014. 49. H. Jayatilleka, K. Murray, M. Angel Guillen-Torres, M. Caverley, R. Hu, N. A. F. Jaeger, L. Chrostowski, and S. Shekhar, “Wavelength tuning and stabilization of microring-based filters using silicon in-resonator photoconductive heaters,” Optics Express, vol. 23, no. 19, pp. 25 084–25 097, Sep. 2015. 50. J. Mak, W. Sacher, T. Xue, J. Mikkelsen, Z. Yong, and J. Poon, “Automatic resonance alignment of high-order microring filters,” IEEE Journal of Quantum Electronics, vol. 51, no. 11, pp. 1–11, Nov. 2015. 51. J. S. Orcutt, B. Moss, C. Sun, J. Leu, M. Georgas, J. Shainline, E. Zgraggen, H. Li, J. Sun, M. Weaver, S. Uroısevic, M. Popovic, R. J. Ram, and V. Stojanovic, “Open foundry platform for high-performance electronicphotonic integration,” Optics Express, vol. 20, no. 11, pp. 12 222–12 232, May 2012. 52. M. Heck and J. Bowers, “Energy efficient and energy proportional optical interconnects for multi-core processors: Driving the need for on-chip sources,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 20, no. 4, pp. 332–343, July 2014. 53. J. D. B. Bradley and E. S. Hosseini, “Monolithic erbium- and ytterbiumdoped microring lasers on silicon chips,” Optics Express, vol. 22, no. 10, pp. 12 226–12 237, May 2014. 54. G. Roelkens, D. V. Thourhout, R. Baets, R. N¨ otzel, and M. Smit, “Laser emission and photodetection in an inp/ingaasp layer integrated on and coupled to a silicon-on-insulator waveguide circuit,” Optics Express, vol. 14, no. 18, pp. 8154–8159, Sep. 2006. 55. A. W. Fang, H. Park, O. Cohen, R. Jones, M. J. Paniccia, and J. E. Bowers, “Electrically pumped hybrid algainas-silicon evanescent laser,” Optics Express, vol. 14, no. 20, pp. 9203–9210, Oct. 2006.
Silicon Photonics
189
56. A. W. Fang, H. Park, Y.-H. Kuo, R. Jones, O. Cohen, D. Liang, O. Raday, M. N. Sysak, M. J. Paniccia, and J. E. Bowers, “Hybrid silicon evanescent devices,” Materials Today, vol. 10, no. 7, pp. 28–35, 2007. 57. G. Roelkens, J. V. Campenhout, J. Brouckaert, D. V. Thourhout, R. Baets, P. R. Romeo, P. Regreny, A. Kazmierczak, C. Seassal, X. Letartre, G. Hollinger, J. Fedeli, L. D. Cioccio, and C. Lagahe-Blanchard, “Iii-v/si photonics by die-to-wafer bonding,” Materials Today, vol. 10, no. 7–8, pp. 36–43, 2007. 58. H. Park, A. W. Fang, D. Liang, Y.-H. Kuo, H.-H. Chang, B. R. Koch, H.-W. Chen, M. N. Sysak, R. Jones, and J. E. Bowers, “Photonic integration on the hybrid silicon evanescent device platform,” Advances in Optical Technologies, vol. 2008, 2008. 59. D. Liang, A. W. Fang, H.-W. Chen, M. N. Sysak, B. R. Koch, E. Lively, O. Raday, Y.-H. Kuo, R. Jones, and J. E. Bowers, “Hybrid silicon evanescent approach to optical interconnects,” Applied Physics A, vol. 95, no. 4, pp. 1045– 1057, 2009. 60. D. Liang, G. Roelkens, R. Baets, and J. E. Bowers, “Hybrid integrated platforms for silicon photonics,” Materials, vol. 3, no. 3, p. 1782, 2010. 61. G. Roelkens, L. Liu, D. Liang, R. Jones, A. Fang, B. Koch, and J. Bowers, “IIIV/silicon photonics for on-chip and intra-chip optical interconnects,” Laser and Photonics Reviews, vol. 4, no. 6, pp. 751–779, 2010. 62. G.-H. Duan, C. Jany, A. L. liepvre, A. Accard, M. Lamponi, D. Make, P. Kaspar, G. Levaufre, N. Girard, F. Lelarge, J. M. Fedeli, A. Descos, B. B. Bakir, S. Messaoudene, D. Bordel, S. Menezo, G. de Valicourt, S. Keyvaninia, G. Roelkens, D. V. Thourhout, D. J. Thomson, F. Y. Gardes, and G. T. Reed, “Hybrid iii–v on silicon lasers for photonic integrated circuits on silicon,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 20, no. 4, pp. 158–170, July 2014. 63. C. Zhang, S. Srinivasan, Y. Tang, M. J. Heck, M. L. Davenport, and J. E. Bowers, “Low threshold and high speed short cavity distributed feedback hybrid silicon lasers,” Optics Express, vol. 22, no. 9, pp. 10 202–10 209, 2014. 64. D. Liang, A. W. Fang, H. Park, T. Reynolds, K. Warner, D. Oakley, and J. Bowers, “Low-temperature, strong SiO2 -SiO2 covalent wafer bonding for iii–v compound semiconductors-to-silicon photonic integrated circuits,” Journal of Electronic Materials, vol. 37, no. 10, pp. 1552–1559, 2008. 65. D. Liang and J. Bowers, “Highly efficient vertical outgassing channels for low-temperature inp-to-silicon direct wafer bonding on the silicon-on-insulator substrate,” Journal of Vacuum Science Technology B: Microelectronics and Nanometer Structures, vol. 26, no. 4, pp. 1560–1568, July 2008. 66. S. Messaoudene, S. Keyvaninia, C. Jany, F. Poingt, F. Lelarge, G. De Valicourt, G. Roelkens, D. Van Thourhout, F. Lelarge, J. Fedeli, and G.-H. Duan, “Low-threshold heterogeneously integrated inp/soi lasers with a double adiabatic taper coupler,” Photonics Technology Letters, IEEE, vol. 24, no. 1, pp. 76–78, 2012. 67. H.-W. Chen, A. Fang, J. Peters, Z. Wang, J. Bovington, D. Liang, and J. Bowers, “Integrated microwave photonic filter on a hybrid silicon platform,” IEEE Transactions on Microwave Theory and Techniques, vol. 58, no. 11, pp. 3213–3219, Nov. 2010.
190
Neuromorphic Photonics
68. H. Park, Y.-H. Kuo, A. W. Fang, R. Jones, O. Cohen, M. J. Paniccia, and J. E. Bowers, “A hybrid algainas-silicon evanescent preamplifier and photodetector,” Optics Express, vol. 15, no. 21, pp. 13 539–13 546, Oct. 2007. 69. H. Park, A. W. Fang, R. Jones, O. Cohen, O. Raday, M. N. Sysak, M. J. Paniccia, and J. E. Bowers, “A hybrid algainas-silicon evanescent waveguide photodetector,” Optics Express, vol. 15, no. 10, pp. 6044–6052, May 2007. 70. A. Fang, M. Sysak, B. Koch, R. Jones, E. Lively, Y.-H. Kuo, D. Liang, O. Raday, and J. Bowers, “Single-wavelength silicon evanescent lasers,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 15, no. 3, pp. 535–544, 2009. 71. B. R. Koch, A. W. Fang, O. Cohen, and J. E. Bowers, “Mode-locked silicon evanescent lasers,” Optics Express, vol. 15, no. 18, pp. 11 225–11 233, Sep. 2007. 72. M. Sysak, D. Liang, R. Jones, G. Kurczveil, M. Piels, M. Fiorentino, R. Beausoleil, and J. Bowers, “Hybrid silicon laser technology: A thermal perspective,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 17, no. 6, pp. 1490–1498, 2011. 73. M. N. Sysak, D. Liang, R. Beausoleil, R. Jones, and J. E. Bowers, “Thermal management in hybrid silicon lasers,” in Optical Fiber Communication Conference/National Fiber Optic Engineers Conference 2013. Optical Society of America, 2013, p. OTh1D.4. 74. G. Duan, C. Jany, A. Le Liepvre, M. Lamponi, A. Accard, F. Poingt, D. Make, F. Lelarge, S. Messaoudene, D. Bordel, J. Fedeli, S. Keyvaninia, G. Roelkens, D. Van Thourhout, D. Thomson, F. Gardes, and G. Reed, “Integrated hybrid III-V/Si laser and transmitter,” in 2012 International Conference on Indium Phosphide and Related Materials (IPRM), Aug. 2012, pp. 16–19. 75. D. Liang, M. Fiorentino, S. Srinivasan, J. Bowers, and R. Beausoleil, “Low threshold electrically-pumped hybrid silicon microring lasers,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 17, no. 6, pp. 1528–1533, 2011.
Analog 8 Reconfigurable Photonic Networks A spiking laser can be viewed as a neuron only in the context of a network. The configurable analog connection strengths between neurons, called weights, are as important to the task of network processing as the dynamical behavior of individual elements. Advances in integrated photonics, while intended for digital applications, could open the door for large-scale analog photonic systems, including weighted networks of dynamical laser neurons. Tait et al. [1] recently proposed an optical neural networking scheme called broadcast-andweight that uses wavelength-division multiplexing (WDM). In every neural network model, each node receives signals from many other nodes, performs some process, and transmits copies of a single output signal to multiple receiver neurons (Fig. 8.1(a)). Each input is modulated independently by a reconfigurable multiplier (also known as weight), which can be positive, negative, or zero. After weighting, all inputs to the neuron are summed, before modulating the nonlinear element: in this case, a laser neuron device. The configuration of the system is determined by its weight matrix, where element wij signifies the strength of the connection from neuron i to neuron j. Weight configuration happens on timescales much slower than network dynamics. The problem of neural networking contains prominent one-to-many (multicast) and many-to-one (fan-in) components. The goal of weighted photonic network integration is to support a large number of parallel, asynchronous, reconfigurable, and high speed connections between a distributed group of photonic processing elements, using standard device technologies discussed in Chapter 7. An analog photonic network consists of three aspects: a protocol, a node that abides by that protocol, and a network medium that supports multiple connections between these nodes. This chapter will begin with broadcastand-weight as a WDM protocol in which many signals can coexist in a single waveguide and all nodes have access to all the signals. Section 8.2 introduces the processing-network node (PNN) that performs the physical and logical functions required for broadcast-and-weight networking and neuromorphic processing, respectively. Section 8.3 introducs a broadcast loop (BL) which defines the medium in which a broadcast network exists and physically links a group of PNNs to one another. After the approach is introduced at a high level, aspects of feasibility will be discused in Section 8.5. Chapters 9–11 will go into more technical depth on the demonstration, implementation, and design of weighted photonic networks for spike processing.
191
192
Neuromorphic Photonics
8.1 BROADCAST-AND-WEIGHT PROTOCOL WDM channelization of the spectrum is one way to efficiently use the full capacity of a waveguide, which can have usable transmission windows up to 60 nm (7.5 THz bandwidth) [2]. In fiber communication networks, a WDM protocol called broadcast-and-select has been used for decades to create many potential connections between communication nodes. In broadcast-and-select, the active connection is selected, not by altering the intervening medium, but rather by tuning a filter at the receiver to drop the desired wavelength [3]. Broadcast-and-weight also consists of a group of nodes sharing a common broadcast medium in which the output of every node is assigned a unique transmission wavelength (Fig. 8.1(b)). It differs by directing multiple inputs simultaneously into one or two detectors with a continuous range of effective drop strengths between –1 and +1, corresponding to an analog weighting function. Weighting in a broadcast-and-weight network is accomplished by a tunable spectral filter bank at each node. By tuning continuously between 0%100% drop states, each filter drops a portion of its corresponding wavelength channel, thereby applying a coefficient of transmission analogous to a neural weight. The filters of a given receiver operate in parallel, allowing it to receive multiple inputs simultaneously. An interconnectivity pattern is determined by the local states of filters and not a state of the transmission medium between nodes. Routing in this network is transparent, parallel, and switchless, making it ideal to support asynchronous spiking and analog signals. The ability to control each connection, each weight, independently is critical for creating differentiation among the processing elements. A great variety of possible weight profiles allows a group of functionally similar units to compute a tremendous variety of functions despite sharing a common set of available input signals. Reconfiguration of the filters’ drop states, corresponding to weight adaptation or learning, intentionally occurs on timescales much slower (µs or ms) than spike signaling (ps). A reconfigurable filter could, for example, be implemented by a microring resonator whose resonance is tuned thermally or electronically. In a group of N nodes with N wavelengths, each node needs a dedicated weighting filter for all (N − 1) possible inputs plus one filter at its own wavelength to add its output to the broadcast medium. The total number of filters in the system would thus scale quadratically with N 2 . Analysis of scaling and design for filter banks is given in Section 9.4.
8.2 PROCESSING-NETWORK NODE In a biological nervous system, the complicated structure of physical wires (i.e., axons) connecting neurons largely determines the network interconnectivity pattern, so the role of neurons is predominantly computational (e.g., weighted addition, thresholding, spiking). In Chapter 6, semiconductor excitable lasers capable of some of these behaviors were discussed. However, par-
Reconfigurable Analog Photonic Networks
a)
193
Nonlinear/dynamical Element (Neuron)
Figure 8.1 An optical broadcast-and-weight network showing parallels with the neural network model. (a) Functional model of a spiking neural network, depicting four neurons. Each neuron has one output signal, which is sent to multiple other neurons. Input signals are independently weighted by an analog coefficient (represented by grayscale value) before summation. The summed signal drives a dynamical processing model, such as spiking leaky integrate-and-fire (represented by the phase portrait of an excitable system). Copyright 2014 IEEE. Adapted and reprinted, with permission, from Tait et al. J. Lightwave Technol. 32, 3427–3439 (2014) Ref. [1]. (b) Broadcast-and-weight network. An array of source lasers outputs distinct wavelengths (represented by solid color). These channels are wavelength multiplexed (WDM) in a single waveguide (multicolor). Independent weighting functions are realized by tunable spectral filters at the input of each unit. Demultiplexing does not occur in the network. Instead, the total optical power of each spectrally weighted signal is detected, yielding the sum of the input channels. The electronic signal is transduced to an optical signal.
194
Neuromorphic Photonics
ticipating in an optical broadcast network saddles photonic neurons with additional responsibilities of network control (configurable routing, wavelength conversion, WDM signal generation, etc.). A subcircuit that can perform both sets of functions is called a processing-network node (PNN). Figure 8.1 compares the computational and optoelectronic functions in a broadcast network of PNNs, and Fig. 8.2 illustrates signal flow along the ultrafast pathway of a single PNN. A WDM input signal is weighted by spectral filters, which effectively configure the analog network. Weighted WDM signals are summed via O/E conversion in which distinct wavelength information is intentionally destroyed. The generated photocurrent modulates a nonlinear processor, which also serves as an E/O converter. The single wavelength output can then be multiplexed and broadcast to similar PNNs. While the rest of this chapter focuses on a specific version of the PNN, Section 10.3 explores a wide variety of different formulations that abide by the concept of a PNN.
Processing Network Node essential features 1. Weighting: the ability to configure and reconfigure the strength of influence that each spiking laser element has on other elements 2. Fan-in: the ability to combine weighted signals from many sources into a single physical variable, which can then modulate a spiking element. 3. Nonlinear/dynamical operation: for spiking networks, we require a set of properties listed in Section 2.2: integration, thresholding, reset/refractoriness, and clean pulse generation. Spiking dynamics observed in photonic devices have been reviewed in Chapter 6. Non-spiking approaches must meet criteria for cascadability. 4. Cascadable output generation: the ability of one laser element to produce signals that are physically capable of modulating several other elements; this includes sufficient power and the correct wavelength.
8.2.1 WDM WEIGHTED ADDITION The PNN interacts with a WDM waveguide via two tunable filter banks. One filter bank represents the weights of excitatory (positive) input connections while the other controls inhibitory (negative) inputs. These weight profiles could be stored in local co-integrated or off-chip CMOS memory. The two weighted (i.e., spectrally filtered) subsets of the broadcast channels are dropped – without demultiplexing – to a balanced photodiode pair. Photodetectors output a current that represents total optical power, thus computing the weighted sum of WDM inputs in the process of transducing them to an electronic signal, which is capable of modulating a laser device. The balanced
Reconfigurable Analog Photonic Networks
195 electrical wire
spectral filter
photodetector
modulator/laser
Figure 8.2 Signal pathway in PNN. Multiple inputs carried by distinct wavelengths λ1 , λ2 , λ3 enter a spectral filter with some transmission function T (λ) which effectively applies individual weights w1 , w2 , w3 to each signal. The result impinges on a photodetector, which responds to the sum of weighted signals. The resulting current signal drives an elerctro-optic transducer, which could be a laser neuron or modulator.
photodiode configuration enables inhibitory weighting, which is an essential capability for analog and neural networks based on weights. Fan-in (i.e., many-to-one coupling) is the defining elemental operation in neural networks. The ability to combine multiple signals from many sources within a processing network allows the overall distributed system to be radically more complex than its constituent elements. Usually, addition serves as the fan-in function in artificial neural networks. Coherent mutual interference effects are very relevant for optical neural networks, because of the required fan-in. Weighting that is tunable is also a necessary part of any neural network because this ability to reconfigure the network allows a tremendous variety of different behaviors to be exhibited and tasks to be performed. Tunability within individual processing elements (e.g., laser bias control) is also useful for some computational neuroscience tasks; however, it always takes a second seat to weight configurability. 8.2.2 TOTAL POWER DETECTION Total optical power detection of a still multiplexed signal is a relatively rare technique because it irreversibly strips WDM signals of any trace of their identifying wavelength. This property has been exploited in several applications including subcarrier optical multiplexing [4], a multi-input OR function [5], and analog RF photonic signal processing [6]; nevertheless, it is counterproductive in the majority of situations. Information about a signal’s origin is desirable in multiwavelength communication systems and is maintained by demultiplexing prior to detection. In the neurocomputing context, however, this destruction of channel information is precisely correspondent with the summation function. A photodiode can therefore be viewed in this sense of dual purpose, not just as a transducer but also as an additive computational element capable of many-to-one wavelength fan-in.
Neuromorphic Photonics
196
.......,. electrical link
- - · control wire
Output CrossCoupler (A.j) -V
Broadcast Wave uide
1N Excitatory Weight Bank{A.J
Inhibitory Weight Bank{;\}
8X THROUGH
Figure 8.3 A processing-network node (PNN) is coupled to a broadcast waveguide. The front end consists of two banks of continuously tunable microring drop filters that partially drop WDM channels that are present. Two waveguide integrated photodetectors (PDs) convert the optical signal to an electronic current and perform summation operations on the weighted excitatory and inhibitory inputs. A short wire subtracts these photocurrents and modulates current injection into an excitable laser neuron, which performs threshold detection and pulse formation in an optical cavity. The output of the laser is coupled back into the broadcast waveguide and sent to other PNNs. Insets represent example spectrograms of the waveguides. (a) Broadcast waveguide with 6 WDM channels: (b) three of these channels are shown partially dropped into the excitatory PD, and (c) two other channels are shown partially dropped into the inhibitory PD. The channel subsets that are dropped are determined by the tuning state of each filter (driving circuitry not shown). Copyright 2014 IEEE. Adapted and reprinted, with permission, from Tait et al. J. Lightwave Technol. 32, 3427–3439 (2014) Ref. [1].
The PNN front-end is not subject to well-known optical-electronic-optical (O/E/O) conversion overhead. The cost, energy, and complexity typically involved in O/E/O are due not, in fact, to the physical transduction itself but instead to the electronic receiver stages (i.e., amplification, sampling, and quantization) that normally follow detection in fiber communication links [7]. The “receiver-less” pathway connecting photodiodes to laser neuron is not significantly affected by dispersion or electromagnetic interference (EMI) in this case because it can be made very short (∼20 µ) regardless of fan-in degree. A structure that met these conditions was proposed in Ref. [8] and the physics of the O/E/O conversion was recently explored in Ref. [9]. The electronic signal from the balanced photodetector pair modulates a laser processor, which performs some dynamical and strongly nonlinear process, described in more detail in Refs. [9, 10]. The modulated laser gain medium is an active optical semiconductor, which acts as a subthreshold temporal integrator with timeconstant equal to carrier recombination lifetime. The laser system itself acts as a threshold detector, rapidly dumping
Reconfigurable Analog Photonic Networks
197
energy stored in the gain medium into the optical mode when the net gain of the cavity crosses unity, much like a passively Q-switched laser biased below threshold. In this way, it emulates one of the most critical dynamical properties of a spiking neuron – excitability – on picosecond timescales. Although the possibility of WDM was not explicitly discussed in prior work, the lasing wavelengths of an array of excitable distributed feedback (DFB) lasers could be tailored by altering the pitch of their gratings [11]. These wires are roughly analogous to passive dendritic conduction, with the key difference that there is only one wire regardless of the number of input channels. Although transmission lines can suffer from many effects that render them unsuitable for high-bandwidth interconnects, the sensitivity-bandwidth of the neural front end is not significantly reduced during electronic conversion and passive processing. Distortions introduced by impedance mismatch, attenuation, dispersion, and radiative interference coupling are all negligible for wires much shorter than the transmission line wavelengths of the signals of interest. A cointegrated wire design is employed to keep the electronic connection local. A 20 µm wire exhibits lumped circuit behavior upto nearly 10 THz, and does not introduce significant transmission line distortion for sub-THz signals. This link is demonstrated and further analyzed in Chapter 10. 8.2.3 NONLINEAR E/O CONVERSION Any proposal for a computational primitive must address the issue of practical cascadability, especially if multiple wavelength channels are intended to be used. Cascadable output generation encompasses some logical notions covered by spiking (e.g., logic-level restoration), but also includes physical requirements. One solution to the wavelength cascadability issue is to perform additive fan-in processing and modulation injection in the electronic domain. By generating clean, stereotyped pulses at a single wavelength, the laser provides an optical signal that can be received by many other PNNs. A nonlinear and/or dynamical process is also required to curtail the propagation of analog noise in a large-scale computation. All linear, analog processors are limited in complexity because signal-to-noise ratio degrades in linear stages. There must be some way to increase the SNR at each stage. This can be thought of as the basic idea underlying the nonlinear activation function of every neural model; however, spiking does more than just amplitude noise rejection. When spike codes are to be used, spike energy normalization is the basic nonlinearity, but the pulse shape and width must also be regenerated in order to garner the benefits of spike coding in the first place. The devices discussed above demonstrate the essential properties of spiking dynamics applied to optoelectronic physical variables—one step towards spike processing networks. Other implementations of nonlinear and/or dynamical processes that can be combined with E/O conversion are discussed in Section 10.3. Finally, an output coupler adds the generated signal to the broadcast waveguide. Other wavelengths are nominally unaffected by this coupler, but
Neuromorphic Photonics
198
Table 8.1 Correspondence Between Computing and Networking Functions in the Primary Signal Pathway Element
Process Function
Network Function
Spectral filter
Weight multiplication
WDM circuit routing
Photodetector
Addition/subtraction
Multiwavelength fan-in
O/E link
Temporal integration
Laser modulation
E/O converter
Spiking or nonlinearity
Clean signal generation
any incoming signals at the PNN’s assigned wavelength will be completely dropped and terminated, avoiding collision with the newly generated output.
8.3 BROADCAST LOOP The final aspect of the proposed networking architecture is the physical medium that transports WDM optical signals between the output couplers and input spectral filter banks of a group of PNNs. Since routing is already performed by the PNN filters, the broadcast medium must simply implement an all-to-all interconnection, supporting all N 2 potential—not necessarily actual—connections between participating units. This role can be performed by a single integrated waveguide with ring topology, which we refer to as a broadcast loop (BL). A broadcast-and-weight cell thus consists of several PNN primitives coupled to a BL medium, as illustrated in Fig. 8.4. Its ring shape is reminiscent of metropolitan fiber networks, though the implications the BL has for scalability and and modularity of analog networks on chip are further considered in Section 8.4. The BL waveguide is fully multiplexed at all points along its length. Most signal power is allowed to continue through a PNN, even if a portion of it is dropped. This technique called drop-and-continue is an instance of lightpath splitting, where the information carried by an optical channel can be copied passively and instantaneously, albeit with a reduction in power [12]. The weight-dependent signal power distribution of drop-and-continue does create an undesirable interdependency between filter weights at different neurons, which could present a control problem in adaptive systems. Drop-andcontinue is a physical solution to optical multicasting that can radically reduce network traffic for a given virtual interconnect density [13]. In the BL, this technique reaches its maximum potential, supporting N 2 independent interconnections in a waveguide with only N channels. An example of a folded layout for tight packing is shown in Figure 8.5. Channel wavelength allocation is the only protocol defining channels in a subnetwork. Due to a lack of other constraints, this allocation can be flexi-
Reconfigurable Analog Photonic Networks
199 optical implementation
all-to-all network
Figure 8.4 Correspondence between fully connected network and optical implementation. The number of ‘synapses’ (i.e., connections from one node to another) puts an upper bound on the computational complexity and reconfigurability of a given network. A wavelength division multiplexed (WDM) networking approach provides an elegant solution to the networking bottleneck. (a) Example of a fully interconnected network with N = 4 nodes (minus self-referent connections). In this configuration, every neuron fans out to three other neurons. Naively, this would require twelve separate wires. (b) In an optical implementation based on WDM, only one optical waveguide is required. A single waveguide could potentially support several hundred virtual nodes simultaneously, corresponding to tens of thousands of simultaneous interconnects.
bly altered even in different subnetworks on the same chip. In some cases, it may be advantageous to have a relatively small number of high-speed neurons preprocess an input before a more densely multiplexed network of slower neurons performs higher-level processing on preprocessed information synthesized many independent sources. The allocation of channels can be controlled by the properties of the filter coupling photonic neuron output into the broadcast subnetwork Spatial Layout A BL waveguide can potentially take any shape to accommodate any layout of a group of PNNs. This contrasts many approaches to physical neuromorphic architectures (e.g., cross bar arrays or holographic matrix-vector multipliers), where the layout of computational primitives follows from the particular networking approach. In a situation where signals are distinguished based on their position, wire, or wavevector, physical layout inherits the geometrical constraints of the interconnect, which can give rise to tangible limitations to interconnect structure (e.g., Rent’s rule [14]). Biology can avoid multiplexing altogether by using dedicated wires (i.e., axons) for every connection. However, this 3-dimensional approach is not possible with state-of-the-art
Neuromorphic Photonics
200
e-.
BL waveguide
0
Non-interfacial PNN
Channellightpath
Figure 8.5 Example folded layout of a broadcast-and-weight cell showing 5 PNNs (delimited by green areas). The lightpath of one channel (blue, dashed) is shown traversing the BL waveguide and branching into multiple filter banks. Originating and terminating in the leftmost PNN, this signal can be partially dropped into any of the PNNs around the BL. Each processing node must transmit on a unique wavelength channel, and each node’s filter bank can drop a linear superposition of the present channels, resulting in a fully reconfigurable all-to-all interconnect. Inhibitory pathways not shown.
(quasi-)2D fabrication techniques. While the exact implications of this dimensional disparity are beyond the current scope, one can assert provisionally that any conservation of spatial degrees of freedom could be supremely important in integrated system layout. In broadcast-and-select as well, spatial degrees of freedom are essentially undetermined: node identity is distinguishable based on wavelength alone. In terms of length scales, the large bandwidth-distance product of optical waveguides means the corrupting role of dispersion remains small over a wide range of spatial scales, compared to electrical transmission lines [15]. Although WDM and bandwidth-distance properties of optics have been used for decades in communication networks, distributed processing consequences of spatial indeterminacy have not been explored. This is not a matter of oversight but rather context. Fiber telecommunication networks transport signals between geographic locations, a purpose intrinsically tied to space. On the other hand, processing networks transport signals between a group of compu-
Reconfigurable Analog Photonic Networks
201
tational nodes; it makes no essential difference where its nodes or its signals are located. At any spatial scale, BL implementation relies on an identical device repertoire (i.e., filters, photodetectors, and excitable lasers). Spatial invariance in multiplexing protocol, signal transmission, and device technology – in the context of distributed processing – results in the possibility to implement interesting and important structures in multi-BL architectures.
8.4 MULTIPLE BROADCAST LOOPS Multiple BLs integrated on the same chip could interact by designating interfacial PNNs: nodes that receive inputs from one BL and transmit into another (Fig. 8.6). In this way, a unified processing system consisting of multiple BLs can be created without any additional arbitration, routing, or device technology. BLs interacting via interfacial PNNs constitute distinct broadcast media and can thus reuse the same optical spectrum, much like a cellular telephone network reuses spectrum geographically. Unlike a cellular phone network, however, the operation of these broadcast media is dissociated from their exact geometry, as long as the loop topology is present. The associated spatial freedoms will be seen to yield many possibilities for multi-BL architectures in Chapter 11. Although PNNs in different loops can interact indirectly via interfacial PNNs, a multi-BL system does not exhibit the same all-to-all potential interconnection observed in a single BL. This could cause informatic fragmentation and bottlenecks between different parts of a system with many interfaced BLs, effectively neutralizing the computational usefulness of scaling the node count. We argue that interconnect sparsity resulting from spectral reuse is not necessarily detrimental to overall computational complexity, provided design can follow appropriate principles. When determining structural constraints in distributed processing networks, communication and computation become fundamentally intertwined, so design rules for organizing multi-BL architectures must shift to invoke concepts outside of the field of communication networks [16], which is further discussed in Section 11.4. We find that the ability to incorporate these distributed processing principles in an optical system is made possible by a special topological property of broadcast-and-weight, which we call spatial layout freedom. Fig. 8.7 illustrates a multi-BL structure, demonstrating key features of hierarchical organization. Each BL reuses the same spectrum and WDM channelization, but can represent different hierarchical levels of organization. A level-1 BL interfaces with other level-1 BLs (via “lateral” PNNs) and a level2 BL (via “uplink” and “downlink” PNNs). Interfacial PNNs can be thought of as regular PNNs whose input spectral weight bank receives the broadcast signals of a different BL (Fig. 8.5). While similar in some ways to routing interfaces in conventional optical communication networks (which can also have hierarchical organizations), the PNN interfaces are spike processors that intrinsically transform information while transporting it. As a result of the
Neuromorphic Photonics
202 -
a)
BL waveguide
D
Non-interfacial PNN
DD Int erfacial PNN
-------------). Broadcast Loop 2
Broadcast Loop 1
1 ~8
1 8 1
-(-------------
18
1 8~
-------------).
1
IO.t81881 b)
Figure 8.6 Example of a PNN interface between 2 BLs showing 6 non-interfacial PNNs (green areas) and 2 interfacial PNNs (blue areas). A total of 4 wavelengths are reused twice, although the overall network is no longer all-to-all. Output wavelength of each PNN is indicated by the color of its add coupler immediately following its E/O converter (gray boxes). Inhibitory pathways not shown. b) Logical connection diagram of this 2-BL system, where circle color indicates the wavelength of the corresponding PNN. Non-interfacial PNNs are connected subcomponents (black arrows). Interfacial PNNs receive inputs from one BL and project outputs (blue arrows) to the PNNs in the opposite BL.
processing done in PNN interfaces, network nodes in a given BL cannot directly send their outputs to nodes in other BLs, and multi-BL systems can no longer implement all-to-all interconnects. Instead of attempting to faithfully transfer any one signal from one BL to another, the PNN interfaces create mutual informatic relationships that extend beyond BL boundaries.
Reconfigurable Analog Photonic Networks
203
Processing-Network Node (PNN)
BL Level2
Figure 8.7 Hierarchical organization of the waveguide broadcast architecture showing a scalable modular structure. Colored rectangles represent PNNs. Green PNNs indicate input and output coupling to the same broadcast loop. Blue PNNs interface between distinct BLs and are classified as “uplink,” “downlink,” or “lateral” varieties based on their position in the hierarchy. Each transmitting PNN has a unique output wavelength within its given broadcast space, but spectrum is reused between different BLs. Copyright 2014 IEEE. Adapted and reprinted, with permission, from Tait et al. J. Lightwave Technol. 32, 3427–3439 (2014) Ref. [1].
Neuromorphic Photonics
204
B
A IN
OUT
c
D
- ~ - ~ - ~ - rent
Broadcast Loop: ) Neuron Cluster
F
E
®
Fully interconnected neuron cluster
Broadcast Loop Level1 (BL:1)
v
Local Connection Between Clusters
H
G BL:1
BL:1
BL:1
BL :2
BL:1
BL:2
BL:3
Figure 8.8 Overview of the equivalence between hierarchical broadcast loops and small-world architecture. (a) Sketch of PNN showing optical inputs and outputs as pulses. (b) Sketch of the corresponding neuron. (c) Many PNNs attached to the same broadcast loop can communicate to each other via wavelength-encoded pulses, forming (d) an all-to-all recurrently interconnected neuron cluster. (e) Interfacial PNNs, instead of outputting back into the same broadcast loop from which they receive their inputs, can also output into other broadcast loops, connecting them. This leads to (f) connection between topologically neighboring clusters and (g) hierarchical network organization, allowing for rapid transaction of pulses between topologically distant clusters. (h) A complex interconnection pattern of photonic neurons can be created. (h) From Merolla et al. Science 345, 668–673 (2014) Ref. [17]. Reprinted with permission from AAAS.
Reconfigurable Analog Photonic Networks
205
At the same time, PNN interfaces do not experience additional buffering or wavelength allocation constraints, and the BL communication load is constant across different levels of the hierarchy instead of growing exponentially as in pure communication networks. Other multi-BL layouts and their corresponding virtual networks are shown in Fig. 8.8. Figure 8.9 shows a layout that corresponds to the network diagram of Fig. 8.7. The lowest level is a tightly packed group of computational primitives connected by a folded loop (Fig. 8.9(c)). Some computational primitives interface with other loops, either directly with nearby first level loops, or with a second level loop that connects physically distant components on the chipscale. The second level loop (Fig. 8.9(d)) has a similar functionality compared to the first level, but it occupies a much larger area and represents a more complex dynamical processing network. Although the chip scale corresponds to just the second level in this example, intermediate levels on chip are entirely possible. Continuing in this direction of hierarchical levels, a multi-chip system based on fiber loops (Fig. 8.9(e)) could be considered. Interfacing multiple silicon photonic chips all-optically through a intra-board or intra-rack optical waveguide represents an interesting possibility for future investigation. Spatial layout freedom can be viewed as a powerful tool to combat the sparse interconnection constraints inherent in multi-BL spectral reuse and allow a wide potential variety of system organizations. However, determining particular multi-BL organizations and the number of PNNs allocated at each interface represent significant design challenges. Design parameters that impact network structure fundamentally exceed pure communication theory and must invoke theories of distributed computation, such as functional neural networks and/or cortical organization. These topics will be revisited in Section 11.4.
8.5 DISCUSSION Using dynamical systems for computing relies heavily on the ability to control complex behavior. The physical dynamics of electro-optic systems, laser systems, and nonlinear optical devices constitute a fascinating field of analysis with few parallels in electronics [18–22], yet no optical system has emerged as a clear winner in information processing applications. To solve a computing task, a dynamical system must be complex; it must possess enough state variables and configuration parameters to exhibit an extremely large repertoire of behaviors. This is not often the case for distributed nonlinearity systems, as they eventually become easy to describe (i.e., reduce) to simpler processes. With a sufficiently large repertoire, there is a high likelihood that one of the possible behaviors performs the computing task at hand; however, developing methods for picking this particular behavior out of an enourmous repertoire presents a significant challenge. This is the problem of programmability, which is exacerbated as complexity increases.
Neuromorphic Photonics
206
BL waveguide (Ieveil) BL waveguide (level 2)
= :======
BL fiber (level 3) Lateral interface
(e)
Figure 8.9 An example layout strategy for a hierarchical network demonstrating the scale-independent nature of a waveguide BL. (a) An interfacial PNN, whose output is coupled into a different BL waveguide than its inputs. (b) A non-interfacial PNN, which transmits and receives in the same BL. (c) A broadcast-and-weight network constitutes the first level of hierarchy and consists of a group of potentially all-to-all connected PNNs. A folded layout can be used for the sake of packing efficiency. (d) A chip-scale second-level broadcast network interconnects the interfacial PNNs from many first-level BLs. First-level BLs can also interface directly via lateral interfacial PNNs (purple dotted lines). (e) A multi-chip third level network illustrating a compatibility with fiber implementations of a BL. The broadcast-and-weight network is conceptually the same as in other levels, but the BL waveguide consists of coupled fibers and integrated waveguides. Copyright 2014 IEEE. Adapted and reprinted, with permission, from Tait et al. J. Lightwave Technol. 32, 3427–3439 (2014) Ref. [1].
Neural network models are a promising candidate for bridging the gap between dynamical systems and computing, to a large extent due to the amount of study they have received from various fields over many decades. A network is a combinatoric object that provides a modular way to continually increase
Reconfigurable Analog Photonic Networks
207
the number of dynamical system variables (neuron states) and configurable parameters (network weights). An extensive, interdisciplinary knowledge base surrounds neural network algorithms, application, and programming. Network algorithms continue to be an active area of research in both software [23] and hardware [24]. The current field of neuromorphic electronics relies heavily on the decades of known techniques and strategies for programming neural networks. In a similar way, an optical system capable of providing a configurable analog network model could leverage much of this knowledge.
8.6 SUMMARY The broadcast-and-weight architecture draws together principles of fiber optic communication, techniques of computational neuroscience, and recent technical advances in photonic system manufacturing. A reconfigurable processingnetwork node was proposed to grant networking functionality to a recently developed excitable laser processor, which behaves dynamically like a spiking neuron model. The PNN is a circuit method: it can be implemented with existing standard devices but could generalize to incorporate more advanced technologies, or even electronic dynamical units. By combining spike processing with WDM, a broadcast loop network exhibits a spatial flexibility that enables scalable spectrum reuse with great potential for organizational variety. An architecture of interfaced BLs appears to address many of the challenges encountered in prior proposals for scalable and feasible optical information processing, due in large part to particular correspondences between physical processes in optoelectronics and behavioral functions in the spiking model. The following three chapters contain a more technical account of some of the central and novel aspects of broadcast-and-weight processing networks. First, Chapter 9 reviews recent progress on silicon photonic weight banks, including control methods, quantitative analysis, and simulation techniques. Chapter 10 further considers the performance of the PNN, focusing particularly on the electrical signal pathway between detector and E/O converter or laser neuron. Finally, Chapter 11 introduces system design principles for analog photonic processing networks and multi-BL systems.
8.7 REFERENCES 1. A. N. Tait, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Broadcast and weight: An integrated network for scalable photonic spike processing,” J. Lightw. Technol., vol. 32, no. 21, pp. 3427–3439, Nov. 2014. 2. K. Preston, N. Sherwood-Droz, J. S. Levy, and M. Lipson, “Performance guidelines for wdm interconnects based on silicon microring resonators,” in CLEO:2011 - Laser Applications to Photonic Applications. Optical Society of America, 2011, p. CThP4. 3. R. Ramaswami, “Multiwavelength lightwave networks for computer communication,” Communications Magazine, IEEE, vol. 31, no. 2, pp. 78–88, 1993.
208
Neuromorphic Photonics
4. T. Wood and N. K. Shankaranarayanan, “Operation of a passive optical network with subcarrier multiplexing in the presence of optical beat interference,” Journal of Lightwave Technology, vol. 11, no. 10, pp. 1632–1640, 1993. 5. Q. Xu and R. Soref, “Reconfigurable optical directed-logic circuits using microresonator-based optical switches,” Optics Express, vol. 19, no. 6, pp. 5244–5259, Mar. 2011. 6. J. Chang, Y. Deng, M. P. Fok, J. Meister, and P. R. Prucnal, “Photonic microwave finite impulse response filter using a spectrally sliced supercontinuum source,” Applied Optics, vol. 51, no. 19, pp. 4265–4268, July 2012. 7. D. A. B. Miller, “Are optical transistors the logical next step?” Nat. Photon, vol. 4, no. 1, pp. 3–5, Jan. 2010. 8. M. A. Nahmias, A. N. Tait, B. J. Shastri, and P. R. Prucnal, “An evanescent hybrid silicon laser neuron,” in Photonics Conference (IPC), 2013 IEEE, Sept. 2013, pp. 93–94. 9. M. A. Nahmias, A. N. Tait, B. J. Shastri, T. F. de Lima, and P. R. Prucnal, “Excitable laser processing network node in hybrid silicon: analysis and simulation,” Optics Express, vol. 23, no. 20, pp. 26 800–26 813, Oct. 2015. 10. M. A. Nahmias, B. J. Shastri, A. N. Tait, and P. R. Prucnal, “A Leaky Integrate-and-Fire Laser Neuron for Ultrafast Cognitive Computing,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 19, no. 5, 2013. 11. A. Fang, M. Sysak, B. Koch, R. Jones, E. Lively, Y. -H. Kuo, D. Liang, O. Raday, and J. Bowers, “Single-wavelength silicon evanescent lasers,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 15, no. 3, pp. 535–544, 2009. 12. X. Zhang, J. Wei, and C. Qiao, “Constrained multicast routing in WDM networks with sparse light splitting,” Journal of Lightwave Technology, vol. 18, no. 12, pp. 1917–1927, 2000. 13. J. Psota, J. Miller, G. Kurian, H. Hoffman, N. Beckmann, J. Eastep, and A. Agarwal, “ATAC: Improving performance and programmability with on-chip optical networks,” in Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), 2010, pp. 3325–3328. 14. P. Christie and D. Stroobandt, “The interpretation and application of Rent’s rule,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 8, no. 6, pp. 639–648, 2000. 15. D. A. B. Miller, “Device requirements for optical interconnects to silicon chips,” Proceedings of the IEEE, vol. 97, no. 7, pp. 1166–1185, 2009. 16. P. Merolla, J. Arthur, B. Shi, and K. Boahen, “Expandable networks for neuromorphic chips,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 54, no. 2, pp. 301–311, Feb. 2007. 17. P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser, R. Appuswamy, B. Taba, A. Amir, M. D. Flickner, W. P. Risk, R. Manohar, and D. S. Modha, “A million spiking-neuron integrated circuit with a scalable communication network and interface,” Science, vol. 345, no. 6197, pp. 668–673, 2014. 18. N. N. Rosanov, Spatial Hysteresis and Optical Patterns. Springer Series in
Reconfigurable Analog Photonic Networks
209
Synergetics. Springer-Verlag, Berlin, Heidelberg 2013. 19. B. Romeira, J. Javaloyes, J. Figueiredo, C. Ironside, H. Cantu, and A. Kelly, “Delayed feedback dynamics of lienard-type resonant tunneling-photo-detector optoelectronic oscillators,” IEEE Journal of Quantum Electronics, vol. 49, no. 1, pp. 31–42, Jan. 2013. 20. M. C. Soriano, S. Ortn, D. Brunner, L. Larger, C. R. Mirasso, I. Fischer, and L. Pesquera, “Optoelectronic reservoir computing: tackling noise-induced performance degradation,” Opt. Express, vol. 21, no. 1, pp. 12–20, Jan. 2013. 21. M. Aono, M. Naruse, S.-J. Kim, M. Wakabayashi, H. Hori, M. Ohtsu, and M. Hara, “Amoeba-inspired nanoarchitectonic computing: Solving intractable computational problems using nanoscale photoexcitation transfer dynamics,” Langmuir, vol. 29, no. 24, pp. 7557–7564, 2013. 22. B. Garbin, J. Javaloyes, G. Tissoni, and S. Barland, “Topological solitons as addressable phase bits in a driven laser,” Nature Communications, pp. 1–7, 2015. 23. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016. 24. S. Friedmann, N. Fremaux, J. Schemmel, W. Gerstner, and K. Meier, “Reward-based learning under hardware constraints - using a RISC processor embedded in a neuromorphic substrate,” Frontiers in Neuroscience, vol. 7, no. 160, 2013.
9 Photonic Weight Banks Photonic wavelength division multiplexed (WDM) weight banks are the core device associated with networking in broadcast-and-weight systems. Controlling WDM signals with tunable spectral filters enables multicast routing in optical networks; at the same time, configurable weighted addition is the crucial function describing interconnection and fan-in operations in neural networks. Photonic weight bank device performance is therefore closely tied to broadcast-and-weight system bandwidth, scalability, and reconfigurability. Microring resonators (MRRs) are a good candidate to implement photonic weight banks on-chip because of their compactness, ubiquity, and ease of tuning. MRR weight banks consist of parallel-coupled microring weights, each of which controls one of the WDM signals independently (Fig. 9.1). When an optical signal wavelength resonates with an MRR, it is completely rerouted to a “drop” port, otherwise continuing to a “through” port. When followed by a photodetector, this subcircuit can perform WDM weighted addition. The complementary output is exploited for positive/negative weighting using a balanced detector. Intermediate analog weight values are attained by tuning continuously along each filter edge, directing power to drop and through ports in a controlled ratio. The tuning speed of weights is meant to be much slower than signal bandwidth, so even thermal tuning is sufficient to configure an ultrafast network. The ability to weight inputs over the range –1 to +1 is essential for analog and neural processing, including for stability in motor pathway control [1] and coding efficiency in the visual pathway [2, 3]. Inhibition is also a requisite capability for applying the Neural Engineering Framework (NEF), a powerful methodology for compiling practical engineering functions into analog networks [4]. Complementary weighting can be a challenge in optical direct detection systems where signals are represented by the power envelope, which is strictly positive. RF photonic circuits based on matched filtering also always require some way to effect negative weights (also known as coefficients) for matched filtering. This application context has also elicited a variety of approaches in fiber [5, 6], which have long included differential detection [7]. MRR weight banks share some similarities with other WDM devices based on MRRs, but several key differences necessitate novel engineering approaches. Figure 9.2 compares circuit diagrams, images, and transmission spectra of an example MRR add-drop demultiplexer (mux) and MRR weight bank. Resonator characteristics including Q-factor and finesse are closely tied to the performance of both kinds of MRR-based WDM device. The port terminology “drop” and “through” are also shared (Figs. 9.2(a,b)). Signals that do not interact with the MRRs leave the device on the through (also known as
211
Neuromorphic Photonics
212
a)
MRR weight bank
Balanced PD
WDM inputs
RF output
WDM Weighted Addition c) IN d)
b)
THRU
DROP
THRU
IN
100 µm
20 µm
DROP
Figure 9.1 Photonic WDM weight bank. (a) Microring resonator (MRR) weight bank in a wavelength division multiplexed (WDM) weighted addition circuit. Tuning MRRs between on- and off- resonance switches a continuous amount of optical power between drop and through ports. A balanced photodetector (PD) yields the sum and difference of weighted signals. (b) Optical micrograph of a silicon MRR weight bank, showing a bank of four thermally-tuned MRRs. (c) Wide area micrograph, showing fiber-to-chip grating couplers [8].
“thru”) port, while on-resonance signals leave on a drop port. Unlike muxes, weight banks provide outputs that are still multiplexed up until detection because the detector is meant to compute the analog summation of these channels. This means the weight bank has only one drop spectrum with peak features attributable to all the microrings, whereas the mux has a different drop spectrum for each channel with features of only one microring on each (Figs. 9.2(e,f)). As cross-talk between different ports plays an important role in analysis, this difference was found to motivate unique performance metrics for channel density (Section 9.4.2) [9]. Another essential tenet of a weight bank is its role as a reconfigurable analog processing device (weight multiplication) as opposed to just WDM signal routing device. While reconfigurable optical add/drop demultiplexers (ROADM) are commonly used for digital network configuration, each MRR can be understood as a two-state cross-bar switch [10]. MRR weight banks, on the other hand, must be analyzed as continuously tunable processing devices (Section 9.4.1) [9]. The combined need of precise continuous control with
Photonic Weight Banks
213
a lack of demultiplexed outputs for power monitoring precludes the use of feedback control. Because of that, calibration steps and feedforward control methods were developed for practical weight control [11]. These are extensively discussed in Section 9.2. Weight banks have two parallel WGs – referred to as “bus” WGs – that are coupled to every MRR. Photonic circuits with this geometry have been studied for tailoring certain features of optical filter responses [12–14]; however, this device topology is uncommon in MRR-based WDM devices, which typically have a single bus WG. If portions of an optical signal couple partially through more than one MRR, multiple optical paths can coherently interfere. Coherent inter-MRR effects involving the bus WGs are further discussed in Section 9.3 and found to have a significant impact on simulation requirements (Section 9.6) and performance analysis (Section 9.4.2) in MRR weight banks. This chapter will first review demonstrations of MRR weight banks, complementarity, and filter edge control. Practical control methodology will be discussed in detail in Section 9.2. Relevant physical effects applicable to weight banks will be explored in Section 9.3, and quantitative analysis incorporating these effects will be derived in Section 9.4. Mathematical modeling and computational methods important for control and analysis will be detailed in Section 9.5 and Section 9.6.2, respectively. Appendices will discuss some further directions for MRR weight bank design (Section 9.7) and practical aspects of MRR characterization and tuning (Section 9.8).
9.1 DEMONSTRATION As discussed in Section 3.3.1, any change in the refractive index of MRRs will have a dramatic effect on their transmission coefficient, due to the circulating buildup effect. In particular, we are interested in shifting the resonance peaks of MRR weight banks in an individual way. This could be done by using two convenient effects in silicon: the thermo-optic effect or the plasma dispersion effect. The thermo-optic effect refers to the change in the refractive index due to the change in temperature. The typical coefficient for silicon in room temperature is dn/ dT = 10−4 K−1 [16], which is enough to shift the resonance wavelength considerably due to the high Q-factor of the cavity. The plasma-dispersion effect relates the change in free-carrier (electron/hole plasma) density to cause a change in the refractive index. This change can be triggered via carrier injection by voltage-biasing a p-i-n junction [17]. Compared to thermo-optic modulation, the plasma-dispersion effect offers better response and lower power consumption, but requires precise doping profiles in the fabrication process. In this section, we will introduce results based on MRRs modulated by thermo-optic effect. However, most of it still applies to other forms of refractive index modulation. Complementary weighting using thermal resonance tuning was demonstrated in Ref. [18]. Four current sources were connected to the heating elements in the MRR weight bank, and a spectrum analyzer was used to tune
Neuromorphic Photonics
214 a)
IN
)
DROP ~
•
• • ROP ~ ROP ~
) THRU
) THRU
3
c)
~·
8
Figure 9.2 MRR add-drop (de)mux and MRR weight bank. (a) Concept and ports of MRR add/drop WDM multiplexer. Each DROP port has a demultiplexed channel. (b) Concept and ports of MRR WDM weight bank. The single DROP port is still multiplexed. (c) SEM image of two 20-channel reconfigurable MRR demultiplexers. (d) SEM image of 8-channel MRR weight bank. (e) Transmission spectra of DROP ports of the demultiplexer in (c). Color indicates port number from 1(violet) to 11(red). f) Transmission spectra of drop (black) and through (gray) ports of the weight bank in (d) over two free spectral ranges (FSR). The FSR of a single resonator is indicated in red. The single DROP spectrum has 8 peaks within each FSR. The racetrack resonators in the measured device have perimeters of [30.0, 30.1, . . . , 30.7]µm, and all directional coupling gaps between bus and MRR waveguides are 200 nm wide with a 2 µm long straight section. The 8 sets of resonances are roughly evenly spaced, although no thermal tuning is applied. A single filter has a measured FSR of 18.7 nm and Q-factor of 11,070. That means their finesse is 133, which is an important figure for determining the WDM channel count that would be possible using resonators of this design. (c,e) Reproduced with permission from Dahlem et al. Opt. Express 19, 306–316 (2011) Ref. [15]. Copyright 2011 Optical Society of America.
a resonance of each MRR to one of four WDM carrier wavelengths, which are spaced by 200GHz. Since there is only one drop waveguide for all of the MRRs, ascribing spectral peaks to different MRRs first required tuning them individually and recording peak shifts. The drop and through outputs are equalized in fiber to ensure both power and delay matching between positive and negative arms, before being detected in a balanced PD. The setup is further detailed in Section 9.2.1. Figure 9.3 shows resonance tuning of a single channel. Figure 9.3(a) shows that the other MRR filters are minimally affected by tuning channel 2. Electronic subtraction of the drop and through outputs in a balanced detector
Photonic Weight Banks
a)
b)
215
MRR 1
MRR 2
MRR 3
MRR 4
thermal detuning
MRR 2
d)
c)
d)
2 ns
2 ns
Figure 9.3 Resonance tuning for complementary weighting. Current is applied to the thermal elements of the device pictured in Fig. 9.1(b) to bias the 4 filter peaks on resonance with a 4-WDM signal (blue), and to bias MRR 2 slightly off-resonance (red). (a) Power spectrum of the WDM signal (black) and weight bank DROP spectra over one FSR. (b) Magnification around channel 2. WDM channel 2 is then modulated with 800ps pulses, and DROP and THRU outputs are coupled to a balanced PD. (c) Biasing MRR 2 on-resonance (blue) results in a net positive weight. (d) Detuning MRR 2 from resonance (red) results in a net negative weight. Copyright 2016 IEEE. Adapted, with permission, from Tait et al. IEEE Photon. Technol. Lett. 28, 887–890 (2016) Ref. [20].
result in complementary positive (Fig. 9.3(c)) and negative (Fig. 9.3(d)) effective weights. Pulse spreading in Fig. 9.3(c) is caused by on-resonance dispersion introduced by backscatter coupling in this particular resonance, which results in the visibly split resonance. By detuning continuously between (b) and (c), a full complementary range of effective weight is attainable. Continuous tuning along the filter edge in order to effect a precise range of weights was shown in Refs. [19, 20].
Neuromorphic Photonics
216
Channel 2 weight
a)
b)
1
~ ~ ~~
0.5
~ ~ W= . (-1 ,: 0 . ,0 ]
0
aN.'- . ~ .-·_ ~
-0.5
-1 -1
-0.5
0
0.5
Channel 1 weight
1
>
-
. a
=(O,O, O,O]
.. ,. -- ·-·--,--,- ·..:,._,-,
., ·.. '. .
~ = ( . 1 ,0 ]
, ·.
_- - _~ .....--· _ . -. --· -..- . ---_- . . ·. ·' _...
.,
'
~ t'V\3~
0
5
10
0
5 10 Time (ns)
0
5
10
Figure 9.4 Precise, continuous control of a multi-channel MRR weight bank. (a) Two-dimensional weight sweep showing controller accuracy and precision. After the calibration procedure, the target weight is swept 5 times over a grid of values from −1 to 1 (black grid). Black points are measured weight data. Red lines show the mean offset from each target grid point. Blue ellipses indicate one standard deviation around the mean. Mean error magnitude is less than 0.072 over the span. Standard deviation remains below 0.063, with a tendency to be larger for negative weights. From this plot, the weight can be controlled with an accuracy of 3.8bits. (b) [1-9] Output signals corresponding to points labeled in (a). The expected signal is in red, while measured traces are in blue. All time and voltage axes have identical scales.
Precise, continuous control of a multi-channel MRR weight bank was demonstrated in Ref. [11] using the 4-channel device pictured in Figs. 9.1(b,c). After a calibration procedure (Section 9.2), the commanded weight vector is swept in two-dimensions at a time while the actual weight is recorded, shown in Fig. 9.4. Figure 9.4(b) shows time traces compared to expectation at several weight values. Traces 2 and 6 represent the original inputs and traces 8 and 4 their respective inverses. The sweep in Fig. 9.4(a) is used to analyze accuracy, also known as mean error or repeatable error (red lines), and precision, also known as dynamic error or non-repeatable error (blue ellipses). Mean error is less than 0.072 over the range, corresponding to a weight accuracy of 3.8 bits, and dynamic error was less than 0.062 for a weight precision of 4.0 bits, plus one bit for sign. In other words, every MRR weight can be independently set to 32 distinguisable values between –1 and +1. Feedforward control of filter edge in an MRR weight bank requires specialized offline calibration procedures. The following section describes techniques for a single channel as well as model-based approaches for dealing with more channels and higher dimensions without exponential growth of calibration times.
Photonic Weight Banks
217
9.2 CONTROL OF MRR WEIGHT BANKS Sensitivity to fabrication variations, thermal fluctuations, and thermal crosstalk has made MRR control an important topic for WDM demultiplexers [10], high-order filters [21], modulators [22], and delay lines [23]. Commonly, the goal of MRR control is to track a particular point in the resonance relative to the signal carrier wavelength, such as its center or maximum slope point. Feedback control approaches are well-suited to MRR demultiplexer and modulator control [24, 25], but this is not the case for MRR weight control. An MRR weight must be biased at arbitrary points in the filter roll-off region in order to multiply an optical signal by a continuous range of weight values. WDM signals are never demultiplexed before detection, making it difficult to monitor the complete filter state. Another difficulty is that these approaches rely on having a reference signal with consistent average power. In analog networks, signal activity can depend strongly on the weight values, so these signals cannot be used as references to estimate weight values. These reasons dictate a feedforward control approach based on a pre-calibration stage in which reference signals are known [20]. 9.2.1 SETUP AND METHODS The experimental setup shown in Fig. 9.5(a) consists of a multiwavelength reference input generator [6] that produces statistically independent signals by imparting channel-dependent delays on a 2Gbps pseudo-random bit sequence (PRBS) (Fig. 9.5(b)). A 4-channel, 13-bit digital-to-analog converter (DAC), NI PCI-6723, is used to tune the electrical power dissipated in each MRR heater. The heaters share a common connection to reduce electrical I/O count. Since this common wire is not perfectly conducting, the effective common voltage can fluctuate with total current flow. Current-mode drivers are used to avoid this issue. The drop and through outputs of the MRR weight bank are amplified, their net delays matched and detected by a balanced photodiode (PD). A transmission spectrum analyzer (not shown) is also connected to the device to simultaneously monitor the filter resonance peaks, tune them onto resonance with the WDM input signals (Fig. 9.5(c)), and assist in thermal model calibration. Samples were fabricated on silicon-on-insulator wafers at the Washington Nanofabrication Fabrication Facility through the UBC SiEPIC rapid prototyping group. Silicon thickness is 220 nm, and buried oxide thickness is 3 µm. 500 nm wide WGs were patterned by Ebeam lithography and fully etched to the buried oxide [26]. The weight bank circuit consists of two bus WGs with MRRs in a parallel add/drop configuration (Fig. 9.2(d)). Ti-gold heating contacts were then deposited on top of a 3 µm oxide passivation layer. Ohmic heating in these contacts causes thermo-optic index shifts, so that a heater patterned on top of an MRR can tune its resonant wavelength. Heater power is controlled by the DAC, which is buffered to provide up to 80 mA per chan-
Neuromorphic Photonics
218
nel. The sample is mounted on a temperature-controlled alignment stage. The TE waveguide mode is coupled from the silicon circuit to a fiber array using focusing subwavelength grating coupler arrays [8]. An optical transmission spectrum analyzer measures the transfer functions from IN to DROP ports and from IN to THRU ports (see Fig. 9.2(f)). a)
PRBS
DFBs
ch4 ~
T
.
ch3
A W G CPU
b)
PPG
FBG array
MZM
ch2 ch1
0
DAC
DROP
THRU
PD
Microring Weight Bank
scope
5
10
c)
15 Time (ns)
20
25
30
ill 0 :E'.. -5
~
-10
~-15
~
. -20 -25 1546
1548
1550
1552
Wavelength (nm)
EDFAs
Figure 9.5 Setup to test multi-channel MRR weight bank. (a) Experimental setup. An input generator creates uncorrelated signals on different wavelengths by time delaying a single PRBS. DFB: distributed feedback laser; AWG: arrayed-waveguide grating; PPG: pulse pattern generator; MZM: Mach–Zehnder Modulator; FBG: fiber Bragg grating. The microring weight bank is thermally tuned by a current-mode DAC (digital-to-analog converter). Drop and thru outputs are amplified by erbium doped fiber amplifiers (EDFAs) and delay-matched before detection by a balanced photodetector (PD). A computer (CPU) executes the calibration routine. (b) Time domain traces of reference input signals on different wavelength channels. (c) Optical spectrum of WDM inputs (red) and transmission spectra of the drop port when tuning current is off (gray) and tuned onto resonance (blue), measured with a drop port spectrum analyzer (not shown).
Although input signals to the MRR weight bank are not necessarily known during an operation phase, the calibration phase can take advantage of known reference inputs in order to simultaneously measure the effective weight of each channel. In this case, references were delayed PRBS signals, each of which is stored as xi (t). If channel delays exceed one bit period, then the correlation hxi (t) · xj (t)it approaches zero for a sufficiently long pattern (in this work, 27 bits). All weights µi can then be determined by decomposing a single measurement m(t) in terms of stored references: µi =
hxi (t) · m(t)it . hxi (t) · xi (t)it
(9.1)
The calibration routine estimates a mapping of applied current to weight ~i → µ ~ . The inverse of this mapping becomes the feedforward control rule for
Photonic Weight Banks
219
effecting a desired weight vector. We separate the map into physical stages for ~ ~ →T ~ ), and actual thermal tuning (~i → ∆λ), MRR bank transmission (∆λ ~ →µ ~ ). detected weight (T 9.2.2 SINGLE CHANNEL CONTINUOUS CONTROL In Refs. [19, 20], continuous weight control of a single channel was shown using an interpolation-based calibration approach. The result of tuning sweep calibration shown in Fig. 9.6(d) indicates that simple resonance tuning with feedforward calibration is sufficient to reliably attain a continuous range of analog weight values. Figure 9.6(i-iii) shows in the time domain that the signal effectivity can be inverted by using the balanced PD and only one tuning degree-of-freedom. The zero-weight output in Fig. 9.6(ii) is not exactly cancelled. Although the drop and through port power ratios can be balanced, the dispersive effects of the filter are not accounted for in the present calibration model. This effect might be improved by using coupling modulation, as opposed to resonance tuning, in order to move the signal away from the highly dispersive filter edge [27]. Filter Edge Transmission Interpolation In order to determine the detuning that effects a desired/commanded weight, two calibration sweeps are performed. First, the filter resonance is detuned far from the carrier wavelength, where almost no input power goes to the drop port, and a signal reference, r(t), is measured and stored. The effective weight, µ, of a measured signal, m(t), is here defined as a normalized projection onto this reference: µ≡
hr(t) · m(t)it = f (∆λ), hr(t) · r(t)it
(9.2)
where h·it is a time average. The projection-based approach is used instead of simple RMS in order to reduce the effect of noise when the weight is near zero. Effective weight is an unknown function, f , of detuning. To estimate f , the tuning current is swept so the filter goes through resonance while the effective weight is monitored (Figs. 9.6(a,b)). A linear interpolation of the measured points provides an estimate of the weight function. The calibrated control rule is then simply the function inverse: ∆λ ← fˆ−1 (ˆ µ),
(9.3)
where fˆ is the estimated tuning function and µ ˆ is the command weight. Because of the sharp nonlinearity of f , its initial estimate is poorly sampled, leading to an inaccurate calibration. The calibration is refined by performing a second sweep over µ ˆ and updating the estimate of fˆ with a more uniformly sampled dataset. Once calibrated, a sweep over command weight is used to assess the accuracy of the controller (Figs. 9.6(c,d)). In Fig. 9.6(b), weight
Neuromorphic Photonics
220
.E
Ol
"iii
s ~
-~
-0.5 -1
ffi
5
::i._-1.5
Time (ns) -0.1
0
0.1
6,,\: Detuning (nm)
-1
10
0
jL: Command Weight
Figure 9.6 Continuous weight control of a single channel was shown using an interpolation-based calibration approach. (a-b) Initial sweep in detuning: (a) shows filter relative transmission spectra vs. resonance detuning (i.e. ∆λ = λ − λ0 ). The input optical signal, which is stronger than the spectrum analyzer’s swept laser source, is visible at λ − λc = 0. (b) Measured weight vs. detuning, showing a full range from –1 to +1. Weight values are calculated based on Eq. (9.2). This curve is used as an estimate of the tuning function fˆ. (c-d) Calibrated sweep in command weight: (c) the shift in filter resonant wavelength is a strongly nonlinear function of the command weight, which roughly reflects the inverse of the tuning function estimate from (b), such that in (d), there is a well-controlled correspondence between command and measured effective weight. The ideal x=y line is plotted in red. (i-iii) Time traces of the weighted output when the effective weight is positive (i), zero (ii), and negative (iii). Cancellation at zero is not perfect because the 5Gbps signal distorts slightly when crossing the MRR. Copyright 2016 IEEE. Reprinted, with permission, from Tait et al. IEEE Photon. Technol. Lett. 28, 887–890 (2016) Ref. [20].
values extend below –1 because the DROP/THRU arms were not exactly balanced in amplitude; however, the values outside –1 to +1 are simply ignored in this case (Fig. 9.6(d)). The transmission effect of each MRR filter edge is treated as an independent function, fi : ∆λi → Ti , and calibrated with an interpolation-based approach orginally developed for a single channel [20]. 20 samples per filter are interpolated to get a continuous estimate of the forward function, fˆi (∆λi ), and inverse, fˆi−1 (Tˆi ). This estimate is refined by taking a second set of samples that are nominally uniform in Ti . Calibrated edge transmission functions are
Photonic Weight Banks
221
shown in Fig. 9.8. The advantage of the interpolation approach is robustness to arbitrary and non-ideal filter edge shapes; however, it requires that the channel spacing be large enough that filters do not interact optically. In this case, a minimum channel spacing of about 150GHz gives sufficient isolation (Fig. 9.5(c)), but future work to increase channel density must reexamine the edge calibration approach.
Measured weight error
0.1
|µ
µ ˆ| ≤ 0.096
0.05
0
-0.05
(µ
µ)
2
≤ 0.053
-0.1 -1
-0.5
0
0.5
1
Command weight Figure 9.7 Repeatability of calibrated weight control versus commanded weight values, µ ˆ, showing data from 5 sweeps following a single calibration. Errors that are repeatable between trials manifest as offsets to the mean error (|hµi − µ ˆ|, solid blue line), whereas errors that are unrepeatable, such as noise, manifest by widening the standard deviation envelope (RMS(µ − hµi), pale blue area). In this experiment, repeatable errors are dominant, which suggests that accuracy can be improved with a more robust calibration algorithm. Copyright 2016 IEEE. Reprinted, with permission, from Tait et al. IEEE Photon. Technol. Lett. 28, 887–890 (2016) Ref. [20].
Accuracy and Precision Weight control accuracy can be characterized in terms of the ratio of weight range (normalized to 1.0) to worst-case weight inaccuracy over a sweep and stated in terms of bits or a dynamic range. Figure 9.7 shows a high-resolution test of the stored calibration model of 5 sweeps over 44 distinct command weights, without intermediate recalibration. This data indicate a dynamic range (i.e., range divided by maximum error) of the weight controller of 9.2dB, in other words, the ability to reliably set the weight with a 3.1 bit precision. From this figure, it is also apparent that the error is dominated by repeatable inaccuracies (±0.096), as opposed to noise standard deviation (±0.053). This means that improvements to the controller accuracy will most likely take the
222
Neuromorphic Photonics
form of more sophisticated calibration methods; for example, iterating more times and/or rejecting outlying data points during function estimation and interpolation. Improvements to the controller algorithm could then yield a noise-limited resolution of 4.2 bits (dynamic range of 12.7dB). Sources of nonrepeatable inaccuracy are most likely dominated by polarization drift in the fiber system impacting fiber-to-chip coupling efficiency, and, to a lesser extent, ambient thermal fluctuations in the temperature-controlled silicon photonic chip. Polarization control is not an issue for light generated on-chip, and could be ameliorated in fiber experiments by using polarization maintaining fiber or polarization splitting or transforming fiber-to-chip couplers. 9.2.3 MULTI-CHANNEL SIMULTANEOUS CONTROL Another crucial feature of an MRR weight bank is simultaneous control of all channels. When sources of cross-talk between one weight and another are considered, it is impossible to interpolate the transfer function of each channel independently. Extending the prior interpolation-based approach of of measuring a set of weights over the full range would require a number of calibration measurements that scales exponentially with the channel count, since the dimension of the range grows with channel count. Simultaneous control in the presence of cross-talk therefore motivates model-based calibration approaches, which were demonstrated in a 4-channel device in Ref. [11], resulting in a weight accuracy of 3.8 bits. Model-based, as opposed to interpolationbased, calibration involves parameterized models for cross-talk inducing effects. These models must provide a map of tuning current to weight, the inverse weight to current map, and a means to observe model parameters based on measurement protocols. The predominant sources of cross-talk are thermal leakage between nearby integrated heaters and, in a lab setup, inter-channel cross-gain saturation in fiber amplifiers, although optical amplifiers are not a concern for fully integrated systems. Thermal cross-talk occurs when heat generated at a particular heater slightly affects the temperature of neighboring devices. In principle, the neighboring channel could counter this effect by slightly reducing the amount of heat its heater generates. A calibration model for thermal effects provides two basic functions: forward modeling (given a vector of applied currents, what will the vector of resultant temperatures be?) and reverse modeling (given a desired vector of temperatures, what currents should be applied?). Models such as this must be calibrated to physical devices by fitting parameters to measurements. In the case of a thermal model that is linear, the parameters are elements of a near-diagonal matrix, K, which maps applied current squared to filter wavelength shift. K represents a series of physical effects (heater effective resistance, heat flow, thermo-optic effect, and resonator properties); however, the underlying physics are not directly observable. K is directly observable through spectrum analyzer measurements. Calibrating a parameterized model requires at least as many measurements as free parameters.
Photonic Weight Banks
223
Model-based calibration is required for the two predominant sources of weight interdependence, thermal cross-talk and cross-gain saturation. In this work, we expand on preliminary results in Ref. [28], developing models whose parameters can be fit (i.e., calibrated) with a O(N ) routine of spectral and oscilloscope measurements. Whereas an interpolation-only approach with 20 points resolution would require 204 = 160, 000 calibration measurements, the presented calibration routine takes roughly 4 × [10(heater) + 20(filter) + 4(amplifier)] = 136 total calibration measurements. We then assess factors affecting weight precision, including the complexity of the thermal cross-talk model. We demonstrate simultaneous 4-channel MRR weight control with an accuracy of 3.8 bits and precision of 4.0 bits (plus 1.0 sign bit) on each channel. While optimal weight resolution is still a topic of discussion in the neuromorphic electronics community [29], several state-ofthe-art architectures with dedicated weight hardware have settled on 4-bit resolution [30, 31]. Practical, accurate, and scalable MRR control techniques are a critical step towards large scale analog processing networks based on MRR weight banks. Thermal Cross-talk Model The temperature of an MRR waveguide is affected predominantly by the heater directly above, but heat can also leak between nearby MRRs. The relationship between dissipated electrical power, i2~R, and resonant wavelength ~ −λ ~ 0 , is linear and can be modeled by a matrix, K [15]. Assuming shift, λ heater resistance is constant, ~ −λ ~ 0 = K i~2 , λ
(9.4)
where λ0 is the resonant wavelength at zero tuning current, and K is a nearly diagonal matrix that describes the thermo-optic effect, heat transfer coupling, and heater resistance. Off-diagonals of K describe unintended heat transfer from a given heater to filters of different channels, also known as thermal cross-talk. Substituting qj ≡ i2j for notational clarity, this equation can be put in a differential form around WDM signal wavelengths, λ~sig , and the ~ , tuning current needed to bias filters on-resonance with these signals, qbias ~ sig − λ ~ 0 = K~q λ bias ~ ~ λ − λsig = K (~q − ~q bias ) ~ = K ∆q. ~ ∆λ
(9.5) (9.6) (9.7)
This linear model is simple to calibrate and invert, but it relies on an assumption of constant heater resistance. In general, heater resistance is also temperature dependent due to thermo-electric self-heating. For a single current-driven heater with ambient resistance of R0 and thermo-electric
Neuromorphic Photonics
224
coefficient α, R(q) = R0 [1 + αqR(q)] =
R0 , 1 − αR0 q
(9.8)
which is certainly not constant, and even has a singularity at q = (αR0 )−1 , signifying a thermal runaway. Instead of combining the multivariate and nonlinear equations above, we simply note that non-constant resistance means ~ in terms of ∆q ~ are non-zero that second and higher-order derivatives of ∆λ but small enough that a Taylor expansion can incorporate the nonlinearities. ~ ≈ ∆λ
D X
d
~ , Kd ∆q
(9.9)
d=1
~ is element-wise, and the where D is model order, the exponentiation of ∆q model now contains D distinct K matrices. The Taylor approximation’s main advantage for calibration is a simple method to fit K matrices. The tuning current of each channel is swept over an operating range of interest (∼4 filter linewidths), while the wavelength shift of every filter peak is measured with the spectrum analyzer. Each peak shift function is fit with a D-order polynomial to obtain one element of each K matrix. The process is repeated for every channel. K values found in this experiment are shown in Fig. 9.8. To prevent overfitting, there must be at least D spectrum measurements per channel. We made 5DN measurements for added robustness and found a D = 2 made thermal modelling error sufficiently small so as not to limit overall precision, which is revisited in Section 5.4. ~ must be inverted to provide a feedfor~ → ∆λ The polynomial mapping ∆q ward control rule. While it does not have a closed-form inverse, the following iterative solution converges quickly. ~ [0] = ~0 ∆q ... ~ [n+1] = K1 ∆q
(9.10) −1
"
~ − ∆λ
D X d=2
~ [n] Kd ∆q
d
#
.
(9.11)
The iteration takes advantage of the fact that the thermo-electric effect on heater resistance, represented by the Kd>1 matrices, are relatively small perturbations. As heaters are biased closer to thermal runaway singularities from Eq. (9.8), the thermo-electric effects become stronger. This means more steps are needed to converge, and higher orders of Taylor approximation must be used, necessitating more calibration measurements. Cross-Gain Saturation Model EDFAs at the output of the weight bank (Fig. 9.5(a)) are subject to slow timescale cross-gain saturation, which depends on the weight of each channel
Photonic Weight Banks
225
Bias i
Heater
sig,1 : 1547nm qbias,1 : (51.8mA)2
q
: 1549nm qbias,2 : (49.0mA)2 sig,2
sig,3 : 1550nm qbias,3 : (39.9mA)2 sig,4 : 1552nm qbias,4 : (11.5mA)2
Filter 1
Eq. 7 K1 7.38 0.18 0.14 0.07
0.02 10.5 0.63 0.04
K2 0.03 1.54 0.24 0.07
0.85 0.77 0.08 -3.13 -1.10 -0.23 1.39 -1.29 -0.39 -0.19 -0.32 -2.52
(x100)
0.06 0.04 7.52 0.22
0.12 0.41 0.05 5.91
Amplifier T
µ
Eq. 14
f1
0 1
Pos. Amp. Neg. Amp.
B+
f2
0 1
(x1,000)
f3
0 1
C+
B–
C–
1.23 0.006
2.89 0.056
2.53 0.189
8.93 0.081
2.67 0.042
6.92 0.114
2.04 0.021
4.52 0.093
f4
0 0
0.2 !"#
Figure 9.8 Diagram of modeling stages showing calibrated parameter values fit during this experiment. Bias stage puts variables in differential form around the state ~ sig . Heater stage models thermoof all filters being on-resonance with signals, λ electric, heat transfer, and thermo-optic effects with a predominantly diagonal, linear K1 matrix and nonlinear corrections (order D = 2 shown). Filter stage consists of four independent interpolation-based estimates of the transmission along each MRR filter edge. Amplifier stage models absolute optical powers and fiber amplifier saturation characteristics preceding photodetection.
in addition to absolute power levels that can fluctuate with polarization, ambient temperature, and fiber strain. The present fiber experiment must model this cross-gain saturation to obtain unbiased weight bank results. While optical amplifiers are not yet widely available on silicon PICs, semiconductor and rare earth ion amplifiers in silicon have been investigated [32, 33], and could potentially use a similar model. We model the cross-gain saturation effect for two homogeneously broadened EDFAs in non-depleted pump regimes as + − gi,ss gi,ss µi = Pin,i Tc,i Ti (9.12) − γi (1 − Ti ) + − Pamp Pamp 1 + P+ 1 + P− s s X + Pin,j Tc,j Tj (9.13) Pamp = j
− Pamp
=
X j
Pin,j Tc,j γj (1 − Tj ),
(9.14)
where i indicates channel number and T is the tunable microring through port transmission. Pin is input power, Tc is net coupling efficiency, γ is drop efficiency, gss is amplifier small-signal gain, and Ps is saturation power, which is not channel-dependent. Pamp signifies total power incident on an EDFA. (+,–) superscripts respectively indicate the amplifiers on through and drop output ports. Not all physical parameters are observable from weight measurements,
Neuromorphic Photonics
226
but the following parameterization yields a fittable model: µi =
Bi− B+ (1 − Ti ) . P i + Ti − P − 1 + j C j Tj 1 + j Cj (1 − Tj )
(9.15)
~ (+,−) and C ~ (+,−) (totaling 4N parameters) can The parameter vectors B be fit (i.e., calibrated) with a series of 4N measurements at particular tuning (xy) states. We introduce a notation µi to signify the measured weight of channel i when the transmission of channel Tj=i is x and the transmission of other (10) channels Tj6=i are y. For example, µ2 signifies the weight of channel 2 when it is transmitted to the through port (T = 1) and channels 1, 3, and 4 are coupled (11) to the drop port (T = 0). The calibration procedure starts by measuring µi (10) and µi : (11)
µi
=
1+
Bi+ P
(10)
µi
+ j Cj
=
Bi+ . 1 + Ci+
(9.16)
These equations containing 2N unknown parameters and 2N known measurements can be solved analytically as follows. (11)
µi
(10)
µi
=
1 + Ci+ X Cj+ 1+
(9.17)
j
Ci+ =
(11) µi (10) µi
1 +
X j
Cj+ − 1.
(9.18)
By summing this equation over all i and rearranging, the sum of C + can be stated entirely in terms of measured weights,
X i
Ci+
N− = X j
X µ(11) j
(10) j µj (11) µj −1 (10) µj
,
(9.19)
at which point it can be substituted into Eq. (9.18) to recover individual C + parameters. The B + parameters then fall trivially from Eq. (9.16). Drop port amplifier parameters, C − and B − , follow an identical procedure upon (00) (01) measuring µi and µi . The ability to decompose single measurements of ~ (11) and µ ~ (00) only require m(t) into all weights via Eq. (9.1) means that µ one measurement each, while the dissimilar measurements call for distinct tuning states and therefore N measurements per amplifier. In this derivation, it was assumed that complete switching down to T = 0 is possible, which
Photonic Weight Banks
227
is not always the case in practice. A more algebraically complex calibration technique with nonzero Tmin can be derived similarly, but is omitted here. Calibrated parameter values found for this experiment are shown in Fig. 9.8. Once the forward model parameters have been calibrated, we must invert ~ →µ ~ , Eq. (9.15), in order to work as a feedforward controller the mapping T rule. µi +
1+
X j
Bi− Cj− (1 − Tj )
=
B− B+ Xi X i Ti + Ti (9.20) Cj+ Tj Cj− (1 − Tj ) 1+ 1+ j
Ti = 1+
j − B X i µi + Cj− (1 − Tj ) 1+ j B− Bi+ X X i + Cj+ Tj Cj− (1 − 1+ j j
.
(9.21)
Tj )
This is solved iteratively as follows: Ti[0] = 1 ...
(9.22) µi + 1+
X
Bi− − Cj 1
j
Ti[n+1] =
Bi+
1+
X
Cj+ Tj[n]
j
+
1+
− Tj[n]
X j
Bi−
Cj− 1 − Tj[n]
(9.23)
This iteration converges quickly when C parameters are small, as in Fig. 9.8, which is the case when signal powers are less than amplifier saturation power. Simplified Thermal Physics Models The effects of using simplified models for thermal physics are shown in Fig. 9.9. When thermal cross-talk and self heating are completely neglected (i.e., D = 1 and K1 is diagonal), accuracy is reduced to 2.8 bits. A constant resistance model (i.e., D = 1) is used for Fig. 9.9(b), yielding a small improvement to 3.0 bits. In both cases, mean errors in Fig. 9.9 show no clear trend, besides being less accurate towards more negative weight values. Surprisingly, introducing a linear cross-talk model barely improves weight accuracy. This can be explained by the sharp sensitivity of filter transmission to resonant wavelength. The sensitive response of the MRR filter edge necessitates very accurate thermal modeling; in this case, D = 2 provided significant improvement. For the devices in this study, we found D = 3 to yield negligible improvement over D = 2 since other factors limited precision; however, MRR weight banks with
Neuromorphic Photonics
228
No thermal calibration
Thermal cross-talk calibration
a) 1
b)
1
0.6
0.5
Channel 2 weight
Channel 2 weight
0.8
0
-0.5
0.4 0.2 0 -0.2 -0.4 -0.6 -0.8
-1
-1 -1
-0.5
0
0.5
-1
1
-0.5
0
0.5
1
Channel 1 weight
Channel 1 weight
Xtalk + thermo-electric calibration
Channel 2 weight
c)
1
0.5
0
-0.5
-1 -1
-0.5
0
0.5
1
Channel 1 weight
Figure 9.9 Weight sweeps for simplified thermal cross-talk models over 5 iterations. The target grid (black), mean error vectors (red), and standard deviation ellipses (blue) are used as in Fig. 9.4(a). (a) No model of thermal cross-talk is applied, and weight accuracy is 2.8 bits (8.4 dB dynamic range). (b) A first-order D = 1 (i.e., constant resistance) model of thermal cross-talk is applied, and weight accuracy is 3.0 bits (9.0dB dynamic range). (c) A second-order (D = 2) thermal model is applied to account for thermo-electric changes in heater resistance. Weight accuracy is 3.8 bits (11.4 dB dynamic range). In all cases, the amplifier cross-gain calibration model is applied in order to isolate the effect of thermal cross-talk modeling.
different biases, heater designs, materials, etc. may need increased Taylor orders for sufficient thermal model accuracy. An alternative to thermal tuning is depletion modulation [34], which could eliminate thermal cross-talk and the current-squared dependence, yet requires a more involved fabrication process with a partial etch of the top silicon layer and four dopant levels.
Photonic Weight Banks
229
9.3 COHERENT EFFECTS BETWEEN MRR CHANNELS Coherent interactions between filters for different channels are a novel physical effect in a WDM system context, which factors into the design and analysis of weight banks. Weight banks have two parallel bus WGs that are parallelcoupled to every MRR, while other WDM devices based on resonators have only one bus WG (Fig. 9.10). Circuit geometries with two busses have been studied to tailor various filter aspects [14], including high-FSR [13, 35] and flat passband [12]. Double-channel SCISSORs are periodic versions of this structure that can be characterized by Bloch modes [36, 37]. However, these parallel-coupled MRR devices are not used for WDM signal control. To the authors’ knowledge, coherent effects affecting WDM channel count are unique to the WDM photonic weight bank in which each MRR filter controls a distinct WDM channel [9]. The performance impact of coherent effects is observed in Section 9.4.2, and their phyiscal origin is formalized in Section 9.6.1. This section describes observation and high-level discussion of coherent interactions. Multi-MRR coherent interactions are especially relevant when resonances are closely spaced, so accounting for them is essential for controlling dense WDM (DWDM) signals and analyzing channel density limits. At the same time, these interactions mean that a weight bank cannot be modeled as a simple combination of individual MRR models. In this type of structure, when a given signal partially couples through a neighboring MRR, instead of causing inter-channel cross-talk, it can couple back through the opposite bus WG and complete a round trip to form a coherent feedback loop involving multiple MRRs (Fig. 9.10(c)). In MRR weight banks with an even number of serial MRRs (i.e., poles) per filter, the character of interference changes from feedback (resonator-like) to feedforward (interferometer-like), depicted in Fig. 9.10(d). The feedforward interference condition depends on the bus path length difference instead of the sum. This fact can be especially important when considering practical fabrication methods, in which absolute effective optical path lengths are difficult to fabricate exactly. The resonant feedback interaction between 1-pole MRRs depends coherently on the total optical path length of the bus WGs (see mathematical description of this phenomenon in Section 9.6.1). We fabricated a 2-channel silicon weight bank with bus tuning contacts in order to observe how the phase condition of the bus affects multi-MRR interaction. The weight bank contains two racetrack resonators with perimeters of 80.0 and 80.1 µm. Individual Qfactors are 7,750. Additional heaters are patterned over each bus WG, which are 60 µm long. To test this device, we first used the MRR heaters to adjust two resonances to a 0.4 nm separation (2 filter linewidths). We then tuned bus heaters nonuniformly between 0 and 70 mA, such that the data traces were taken at intervals approximately uniform in electrical power. Current is applied equally to each bus heater to prevent creating an asymmetric temperature profile across the device. The heater’s “S”-shape is intended to localize heat as much as possible, but thermal cross-talk is still
Neuromorphic Photonics
230
a) Add/Drop Multiplexer IN
THRU A
B
THRU A
B
IN
C
c) Weight bank (1-pole) IN
b) Modulator
C
THRU A
B
C
x1
x2
x3
d) Weight bank (2-pole) IN
THRU A
B
C
A
B
C
DROP Coherent feedback path (resonator-like)
DROP Coherent feedforward path (interferometer-like)
Figure 9.10 Microresonators in WDM systems. (a) MRR optical add-drop multiplexer. Cross-talk occurs when a portion of one wavelength channel arrives at the incorrect DROP port. (b) WDM modulator with modulation signals xi . Cross-talk occurs when a wavelength channel is partially modulated by a neighboring signal. (c) MRR weight bank, in which there is only one DROP port and one THRU port. The presence of two bus waveguides creates a path for coherent feedback between resonances of similar frequencies. (d) 2-pole MRR weight bank. Instead of feedback paths, feedforward coherent interference is possible within the bus WGs.
present, causing changes in bus current to shift filter resonances. By applying current equally and relying on the (rough) symmetry of the device, both resonators are made to shift together, maintaining their spacing at 2 linewidths. Background shifts are removed in Fig. 9.11(c). Differences in phase between bus WGs would be difficult to study with this technique, but are not expected to have an impact on resonator-like coherent interaction effects. Figure 9.11(c) shows that bus tuning significantly affects the dip between filters, whose depth ranges from −2.7 dB to −25.0 dB relative to peak transmission. The steepness of rolloff regions is also slightly affected. The measurements closely match corresponding simulations in which the effective bus phases were parameterized and swept uniformly, shown in Fig 9.11(b). This verifies that the parametric transmission simulator can make accurate predictions about weight banks in the dense channel regime. From an intuitive standpoint, it seems that this coherent effect that depends
Photonic Weight Banks
OSA DROP
IN
MRR heater
Transmission (dB)
a)
231 0
b)
-10 -20
Bus WG L:
-30 -40
.05 – .98 a.u.
Simulated -3
-2
-1
0
1
2
3
Wavelength (linewidth normalized)
c) bus heater Bus heater current: 20µm
Measured
0 – 70 mA
Figure 9.11 Experimental verification of coherent weight interaction within a 2channel weight bank with bus tuning heaters. (a) SEM of device under test. An optical spectrum analyzer (OSA) measures the transmission between the bank’s IN port and DROP port. (b) Simulated response of this device when resonances are nearby and bus waveguide phase is swept over a half period, 0–π. (c) Measured response of this device as bus heater current is applied equally to both bus heaters. MRRs are tuned until separated by 0.4 nm (2 linewidths). Then bus current is varied over 0 mA to 70 mA such that applied power is sampled approximately uniformly. Thermal cross-talk causes the resonances to shift absolutely, but their spacing stays consistent. Absolute wavelength shifts of the central minimum between peaks have been removed.
on bus phase could have an impact on channel density. If the goal is to be able to set the weight/transmission of neighboring WDM channels independently, then it would be disadvantageous to have the responses blur into a single peak like the red traces in Figs. 9.11(b,c). On the other hand, it may be possible to take advantage of the deep isolation between peaks represented by the blue traces. Coherent inter-MRR effects play an important role in the behavior of photonic weight banks. Simulation tools for accurate modeling, control, and quantitative analysis must incorportate these effects. In the following section, we will quantify the degree to which weights can be set independently and use the simulator to study how this metric is affected by channel spacing and MRR coherent interaction.
9.4 QUANTITATIVE ANALYSIS FOR PHOTONIC WEIGHT BANKS Engineering analysis and design rely on quantifiable descriptions of performance called metrics. The natural questions of “how many channels are possible” and, subsequently, “how many more or fewer channels are garnered by a different design” are typically resolved by studying tradeoffs. Increasing the channel count performance metric will eventually degrade some other aspect
Neuromorphic Photonics
232
of performance until the minimum specification is violated. A convenient normalization that is used in this section is the filter linewidth or full-width at half maximum (FWHM). Linewidth normalization and linewidth units are somewhat independent of resonator performance, so it emphasizes the circuit effect of resonator-based WDM devices. Linewidth normalization applies equally to the frequency and wavelength domains. The channel spacing in these units can be used as a WDM figure of merit. Dividing the finesse of a given resonator design by the linewidth-normalized channel spacing yields the maximum number of WDM channels that can be supported within a single FSR. At first, it would seem that WDM weight bank analysis can proceed similarly to that of wavelength demultiplexers based on MRRs in conventional digital interconnects. In conventional analyses of MRR devices for multiplexing, demultiplexing, and modulating WDM signals, the tradeoff that limits channel spacing is inter-channel cross-talk [38–40]. Unlike MRR demultiplexers where each channel is coupled to a distinct waveguide output [10], MRR weight banks have only two outputs with some portion of every channel coupled to each. All channels are meant to be sent to both detectors in some proportion, so the notion of cross-talk between signals breaks down. Analyses driven by cross-talk metrics are therefore approximate at best when applied to photonic weight banks. Cross-Talk Driven Analysis In demultiplexer and modulator analyses, such as in Ref. [38, 40], performance is limited by the inter-channel cross-talk metric, which degrades with increasing channel density. The transfer function of a resonator drop filter is approximated by a Lorentzian function: T (δ) =
1 Q (ω − ω0 ) , , where δ = 1 + δ2 ω0
(9.24)
in which δ is linewidth-normalized frequency, Q is quality factor (Q ≈ 10, 300), and ω0 is filter center frequency. Interchannel-crosstalk is the transmission of filter i at neighboring channel wavelengths i + 1 and i − 1. The FSR-limited maximum wavelength count for a silicon WDM link was found to be N =62 for a transmission window of 50nm and channel spacing of ∆λ0 =5.3 linewidths (0.8nm). While cross-talk driven analysis can be used to estimate the order of magnitude of channel density [41], the differences in weight bank geometry and function motivate a more exact analysis technique, especially for dense channels. Filter peaks merge together eventually, reducing the weight bank’s ability to weight neighboring signals independently. To quantify this effect as a power penalty, the cross-weight metric must include a notion of tuning range (Section 9.4.1). After this is described, an example channel density analysis
Photonic Weight Banks
233
is carried out to derive the scalability of weight banks that use microresonators of a particular finesse (Section 9.4.2). The implementation details of programs for efficiently dealing with continuous, multidimensional ranges and device modeling will be discussed in Section 9.6.2. 9.4.1 CROSS-WEIGHT POWER PENALTY METRIC As described in the beginning of Section 9.4.2, the notion of inter-channel cross-talk is not a meaningful concept in MRR weight banks. This presents a problem for channel scaling analysis since previous analytical methods use inter-channel cross-talk as the driving metric that degrades with channel count. Furthermore, coherent interactions between multiple MRRs in a weight bank make metrics based on isolated add-drop filters because the bank response must be treated as that of a single transmission element with several degrees of freedom. A new metric that accounts for these unique features is called for. In Ref [9], we introduced a metric called cross-weight power penalty that quantifies the ability of a real WDM weight bank to independently control signals as compared to an ideal WDM weight bank. In the single-channel case, an ideal tunable weight bank possesses a range of tuning states that include directing an incident optical signal completely to a through port (positive weight), completely to a drop port (negative weight), or to any intermediate ratio of both. If a real weight incurs some loss, its weight range becomes a subset of the ideal. Supposing there is a difference in loss between the drop and through ports, then the attainable weight range will also be unbalanced. It is fair to assume that in the majority of cases where weights are used, they are required to be balanced such that, by some normalization factor, they can effect a range of weights from –1 to +1. We can then define the usable range of the single-channel weight as the zero-centered interval whose span is determined by the minimum absolute value of the extreme positive weight and the extreme negative weight (Fig. 9.12). Comparing the usable range to the ideal range yields a ratio, W , that quantifies the real device’s ability to perform tunable optical weighting. W (1-D) = min max(µ), max(−µ) , (9.25) p
p
where p is the tuning parameter and µ is the weight. At this point, it should be noted that this balanced normalization factor relies on the concept of the extrema of a weight range. While it is often obvious, in general, we do not know which states in the tuning parameter space correspond to these weight extrema prior to sweeping, searching, or otherwise optimizing. Since weighting is a linear function, the normalization factor between ideal and usable ranges is conveniently stated as a power penalty, meaning the additional optical input power needed to reproduce the power level that would be present with an ideal device. Power penalty is a useful tool for quantitative system analysis because it allows comparison of disparate effects. Single-
Neuromorphic Photonics
234
1
a)
ideal range
b)
weight bounda
tuning range
ry
usable range
Weight 2
Detuning2 /
0.5
Detuning1 /
0
-0.5
-1 -1
-0.5
0
0.5
1
Rel. Trans. (dB)
Weight 1 0
c)
-5
ch1
-10 -2
-1
0
ch2 1
2
3
4
Frequency (linewidth units) Figure 9.12 Example of cross-weight power penalty in a 2-channel MRR weight bank. (a) The device has two tuning degrees-of-freedom, which are resonance detunings of each filter. A (red, blue) color vector is used to indicate tuning state, which means that a) depicts (red=x, blue=y). (b) The range of possible weight states attainable by the weight bank relative to the ideal range (outer bounding box). (red,blue) color indicates the tuning state which maps to a particular weight point. The usable range (green box) is graphically the largest square that lies fully within the possible weight range centered at zero. (c) Drop port spectra of the same model over a 5×5 parameter grid, with trace color used to indicate tuning. Frequency is normalized so that the MRR 1 peak has a center of 0 and full-width half-maximum (FWHM) of 1.0. Channel spacing in this simulation is 1.31 linewidths.
channel examples of effects that can all be associated with a power penalty include a responsivity mismatch in the balanced PD, in-ring propagation loss disproportionately affecting dropped signals, and, of course, bus propagation loss affecting all signals. A multi-channel power penalty metric is needed to study the effect of multi-MRR interaction and – most importantly for this analysis – channel spacing.
Photonic Weight Banks
235
In the N-channel case, the ideal WDM weight bank is able to switch WDM channels completely independently from one another. It can be thought of as a set of N isolated, ideal single-channel weights; however, the N-channel generalization of a non-ideal weight bank is more complex. If a given tuning parameter can affect multiple weight values, then the bank’s weight range cannot be linearly separated into any composition of non-ideal single-channel weight ranges. Figure 9.12 depicts this mapping for a simulated 2-channel bank that is parameterized by the MRR detunings. Channel spacing is 1.31 linewidths, and WG loss is 2dB/cm. The tuning range (Fig. 9.12(a)) is constrained to best represent a realistic N-channel case, such that 0 refers to on-resonance and 1 refers to one channel spacing. While the 2-channel weight range could be expanded by detuning in opposite directions or by detuning by many channel spacings, neither strategy is possible with more channels. A uniform 2 D sweep over tuning range results in an irregular distribution of points in 2 D weight space (Fig. 9.12(b)). To visualize the correspondence between tuning and weights, a (red, blue) color vector is assigned to each weight point to indicate the corresponding (detuning 1, detuning 2) tuning vector. As in the 1 D case, a usable range can be defined as the largest balanced interval (i.e., a zero-centered square in 2 D) that is completely covered by the attainable weight range. The usable range (green square in Fig. 9.12(b)) is compared to the theoretical ideal (black bounding box in Fig. 9.12(b)). The ratio of their side lengths, Wx , represents the non-idealities associated with multi-channel weighting (Wx =0.45 in this example). This ratio is referred to as the cross-weight power penalty and can be stated in decibels as −10 log(Wx ) = 3.5 dB. Figure 9.12(c) shows the simulated drop port transmission spectra over a 5×5 parameter sweep. A (red, blue) coloring of traces indicates correspondence with the tuning vector, although it is difficult to make sense of every trace. The (0,0)-detuning resonance condition (pure black) has two peaks centered at zero and one channel spacing where channels 1 and 2 are respectively located. (0,1)detuning (pure blue) has the deepest dip at the channel 2 center frequency. (1,0)-detuning (pure red) appears as a single peak centered on channel 2. Cross-weight penalty can be formulated in more precise terms. The point cloud in Fig. 9.12(b) is actually a sampling of a continuous and smooth manifold. The tuning range is a Cartesian parameterization of this weight manifold, with a mapping described by the transmission theory (Section 9.5). Since its parameterization is bounded, the weight manifold must also have a welldefined boundary. In 2D, that boundary is a simple closed curve (black outline in Fig. 9.12(b)) meaning that this boundary, B ∈ R2 , can be parameterized by a circle: B(s) = [µ1 (s), µ2 (s)]
0 < s < 2π,
(9.26)
where µi is the channel i weight. We estimate this boundary curve from the discrete point cloud using a conforming boundary algorithm [42, 43]. The conforming boundary is unlike
Neuromorphic Photonics
236
the convex hull in that it can shrink inwards to better estimate a continuous manifold sampled by discrete points, provided the samples are sufficiently dense. On the curve B, there is a single point that limits the usable tuning range, Wx , which is found as (9.27) Wx (2-D) = min max |µi (s)| , s
i=1,2
where the absolute value ensures +/– balance and the inner maximum over channels ensures equal ranges between channels. Graphically, Wx is the size of the largest square that intersects the boundary B. This definition of crossweight penalty as the intersection of a manifold boundary with a zero-centered interval can extend conceptually to higher dimensions and WDM weight banks with an arbitrary number of channels. In three dimensions, the boundary is a mapping of a sphere parameterized by (s1 = φ, s2 = θ) where −π < s1 < π and 0 < θ < 2π. In N dimensions, the boundary is a (N − 1) dimensional closed manifold parameterized by the (N − 1) dimensional vector, ~s. The cross-weight penalty can then be defined as Wx (N-D) = min max |µi (~s)| . (9.28) ~ s
i∈1...N
Wx quantifies the non-idealities a photonic weight bank. We assumed that a weight bank must be able to provide complementary and independent control of each WDM channel, but the range over which this is possible is a subset of the full tuning range. Intuitively, Wx can be thought of as an efficiency of implementing weight functions. 1.0 is its ideal maximum, and less than 1.0 manifests as a smaller range or weaker weight effect. Supposing Wx = 0.5, then the weight bank is equivalent to an ideal Wx = 1.0 weight bank with an insertion loss of 0.5. Wx can therefore be stated as a power penalty in dB: −10 log(Wx ) describes the additional input power (in dB) required to make a non-ideal weight bank behave like an ideal weight bank with a given output power. 9.4.2 WEIGHT BANK CHANNEL LIMITS The final step of channel density analysis is to study the degradation of the limiting metric as WDM channel spacing becomes more dense. A useful figure of merit for discussing the efficacy of a resonator-based circuit at a WDM task is the ratio of finesse to channel count. This figure is equivalent to the linewidth-normalized channel spacing, δω, and is independent of the type or performance of the resonator platform. A theoretical minimum of this figure is 1.0. Furthermore, δω −1 can be thought of as a channel packing efficiency forfeit in order for a WDM circuit to perform its functions. Simulations are performed in linewidth units so that channel packing efficiency can be analyzed first, and a discussion of finesse and resonator implementations will be deferred until the end.
Photonic Weight Banks
237
The 2-channel weight bank model described in the previous section is modified so that its system operating points (channel spacing, δω, and bus length changes, ∆L) are variable. A 50×50 sweep over operating points is then performed. δω ranges from 0 to 9 linewidths, and ∆L ranges from 0 to 0.2 in units relative to the initial bus length. The lengths of both bus WGs are held equal to one another. For each operating point in this sweep, the usable weight range and cross-weight power penalty are calculated from a 300x300 point sweep over MRR detunings constrained between 0 and 1 channel spacing, as described in the previous section. At a given operating point, the cross-weight algorithm holds δω and ∆L constant, treating them as fixed system parameters. The 50×50 sweep over 300×300 tuning points and 2 channel frequencies, comprising 450M evaluations of Eqs. (9.41) and (9.42), is completed in roughly 5 minutes due to the optimizations described in Section 9.6.2. Fig. 9.13 shows the resulting power penalty contours of −10 log(Wx ) vs. δω and ∆L. We can make several conclusions from this plot. Firstly, the power penalty has an asymptote as channel spacing hits a wall. As the filter peaks merge together, all frequencies are coupled to the drop port, making it impossible to reach a weight of (0,0). This means there is an absolute minimum channel spacing, regardless of acceptable power penalty. The displayed 10 dB contour (yellow) is very close to this asymptote. Secondly, the cross-weight power penalty decreases smoothly as channel spacing is increased above the absolute cutoff. This represents a system design tradeoff between WDM channel spacing and power penalty. The maximum channel count can be determined based on the power budget allowed for weighting, or the excess power requirement can be set by a given channel specification. The power penalty cannot quite reach 0 dB because of optical losses. Thirdly, both the channel density wall and the tradeoff between density and power are significantly affected by bus length changes. The resulting approximate periodicity (here, ∼0.12 in arbitrary length units) is indicative of a coherent effect, which also can be expected based on the possibility of resonator-like multi-MRR interactions when resonances are at similar frequencies. Examining the 10 dB contour, the channel spacing wall fluctuates with ∆L between 0.85 and 1.4 linewidths. What is perhaps surprising is that the effect of bus length remains significant even when channels are spaced relatively far apart. The 1 dB contour line (blue) fluctuates between 2.7 and 3.4 linewidths over a period of ∆L. The outsets of Fig. 9.13 depict attainable weight ranges at 5 channel spacings and the best- and worst-case bus lengths. Plot format is as introduced in Fig.9.12(b). This offers further information about some of the mechanisms behind the performance trends. With decreasing channel spacing, the usable range (green) is impacted both by an overall smaller area covered by the possible weight range but also by an increased warping of this range away from square. From row to row, one sees that the top always performs better. The best case for the bus tuning phase does not depend on channel spacing. The
Neuromorphic Photonics
238 OW=4.4
OW=5.5
OW=6.6
Lf')
0
,.-t
0 II
_J
0. We call Y1 = P1 X the first principal component of the vector X, Y2 the second, etc. P1 is the weight vector of the first PC, also known as principal component loading. This algorithm can be implemented with unsupervised learning rules in a neuron network [7], where each neuron in a PCA network computes one principal component. The PCA network learns the principal components by gradually updating each weight vector in a way that they converge to the principal component loadings while being orthogonal to each other. This realtime computing is called online learning. 12.1.2
OJA’S RULE
Diagonalizing a matrix is rather complex, especially if this matrix is large. However, it is faster to compute multiplications between matrices and vectors. Given a known covariance matrix Σ, we can decompose any vector w in its eigenvector basis: X w= ai PTi . (12.8) i
Successively multiplying w on the left by Σ yields X ai λni PTi . Σn w = i
(12.9)
Principles of Neural Network Learning
319
Note that if λ1 > λ2 , the vector Σn w aligns itself with P1T more than with others as n increases. Power iteration rules, such as Hebb’s and Oja’s are all based on this characteristic. However, λ1 will always be greater than 1, which implies that the magnitude of the vector Σn w explodes with n. To stabilize the convergence, Hebb’s rule can be written with a renormalization term after each weight update: w(m + 1) =
w(m) + ηyx(m) . kw(m) + ηyx(m)k
(12.10)
One might argue that a normalization factor is too complex an operation. Oja’s rule takes care of that with an additive term instead: w(m + 1) = w(m) + η yx(m) − y 2 w(m) (12.11)
where y = w(m) · x(m) is the observed output of the neuron at time m and η is a small number called learning rate. P It should satisfy these P two properties to guarantee convergence as m → ∞: m η(m) = ∞ and m η 2 (m) < ∞. This is not important for us because we will never reach the ∞ limit. Note that in those conditions, w(m) converges [12]. Note that hyxi = hwx · xi = hxT x · wi ≈ Σ(X) · w,
(12.12)
so w(m) orients itself towards PT1 as iteration order m increases (see Eq. (12.9)). The normalization term in the Oja’s rule corresponds to a Lagrange multiplier in the maximization of y 2 due to the constraint of constant kwkL2 . Using the same technique, a simple generalization of Eq. (12.11) for multiple principal components can be derived: X yi wi , (12.13) wj (m + 1) = wj (m) + η yj x(m) − yj2 w(m) − 2 i 0. In the examples shown in Fig. 12.1, the directions of second-order uncorrelation do not minimize the mutual information between the two new variables. As a result, an algorithm that attempts to find independent components needs to take into account higher-order moments, such as the kurtosis of the RV distributions (E[Xi4 ] − 3(E[Xi2 ])2 ). This is the provenance of a technique called Independent Component Analysis, or ICA [14]. By finding the direction of maximum kurtosis, in many cases one can separate independent signals after linear mixing (Fig. 12.2). Note that for these examples the independent components vastly differ from the principal components.
Neuromorphic Photonics
322
12.2.1
MATHEMATICAL FORMULATION OF ICA
The key assumption behind the formulation of ICA is that the observed input random variables (X) are a linear mixture of independent, underlying sources (S) X = AS + N,
(12.15)
where N is the noise term as in Section 12.1.1 and A is a full rank m × n matrix; m is greater than n. The goal of ICA is to find the unmixing matrix W ˆ = WX, S (12.16) ˆ best approximates S. The matrix W can be understood as a such that S pseudo-inverse of matrix A, which can be approached as a singular-value decomposition problem. This process can be visualized in Fig. 12.3. The SVD problem is well-conditioned when m (the number of inputs) is greater than n (the number of sources). The matrix A can be decomposed into three others: A = UΣVT . As we will see, PCA alone can be used to find U and Σ, but as shown in Fig. 12.3(left), the rotation matrix V can only be found by minimizing out mutual information other than explained by variance. Because we assumed that the mixing was linear, this extra mutual information must show up in a form of non-gaussianity, such as kurtosis, as presented before, or negentropy. Kurtosis basically measures the spikiness of a distribution and is zero only for Gaussian. Spikier or flatter than Gaussian are called respectively super- (kurtosis > 0) or subgaussian (kurtosis < 0). Negentropy stands for negative entropy, and technically requires the knowledge of the probability density function of the distribution (negentropy = −EX [log p(x)]), but can be approximated by suitable nonlinear functions of the realizations. The ICA procedure can be conducted in four steps: 1. 2. 3. 4.
Centering: X ← X − E[X] Whitening (PCA): Σ−1 UT Selecting n principal components. ˆ Identifying V. Recovering S:
Two preprocessing steps make the ICA estimation simpler and better conditioned. First, X is centered around its mean (m = E[X]), so as to make X a zero-mean variable, facilitating the computation of covariance matrices. This is made solely to simplify the ICA algorithms: this mean can be added back ˆ simply by adding A−1 m at the end of the procedure. to the estimation S Secondly, the inputs are whitened via PCA, described in Section 12.1. Whitening means that the variables are linearly transformed so that they become uncorrelated and of unit variance. Note that, although PCA alone cannot be used to estimate independent components exampled in Fig. 12.2, it is an important step for ICA. Mathematically, a whitened version of X corresponds to the vector Y normalized by the standard deviations of Yi that
Principles of Neural Network Learning
323
A V
V
T
Σ
U
Σ-1
UT
W Figure 12.3 Graphical depiction of the singular value decomposition (SVD) of matrix A = UΣVT assuming A is full-rank. Vn×n and Um×m are rotation matrices and Σm×n is a rectangular diagonal matrix. Red and blue arrows are vectors that correspond to the columns of matrix V (i.e., the basis of the row space of A). Note how the basis rotates, stretches, and rotates during each successive operation. The composition of all three matrix operations is equal to the operation performed by A. The inverse of matrix A defined as W = VΣ−1 UT performs each linear operation in the reverse order. Reprinted with permission from author, Ref. [13].
form the diagonal matrix Σ(Y). We can thus write Xwhitened = [Σ(Y)]−1/2 PX.
(12.17)
In Fig. 12.3, we identify UT as P. At first inspection, we want to identify Σ as [Σ(Y)]1/2 . However, the matrix Σ has dimension n×m, which is lower than the dimension of [Σ(Y)]1/2 , which is m × m, with m ≥ n discussed previously. This means that we need, at this third step, to select the first n components of the vector Xwhitened , which correspond to the n first principal components. We proceed to identify Σ as In×m [Σ(Y)]1/2 . It is important to appreciate that PCA not only facilitated the formulation of the rest of the algorithm, but also reduced the complexity of the analysis from inverting an m × m-matrix to finding a n × n-unitary matrix, which has a very significant practical advantage (Section 12.2.1). Moreover, reducing dimensionality often has the effect of reducing noise. It also prevents overlearning, which can sometimes be observed in ICA [14]. Indeed, after PCA, the ICA task become equivalent to finding the best rotation matrix V that corresponds ˆ = VXwhitened ). to the direction of maximum kurtosis or negentropy (S Hyv¨ arinen et al. developed an algorithm dubbed FastICA to solve for V [14]. The authors approximated the mutual information between y and x, and then devised a robust iterative algorithm that converges to the right direction vector V. It relies on being able to compute more complicated nonlinear functions of y that approximate the negentropy of the distribution of y. In Section 12.3 we will discuss how the use of local plasticity rules can result in global learning. In Section 12.3.3, we will discuss a neural network implementation of ICA using STDP and IP pioneered by Savin et al. [5].
Neuromorphic Photonics
324
12.3 12.3.1
UNSUPERVISED LEARNING WITH STDP AND IP SYNAPTIC TIME DEPENDENT PLASTICITY
STDP is a highly parallel, linear gradient descent algorithm that tends to maximize the mutual information between the input and output of a given neuron. It operates independently on each connection and thus scales proportionally with the number of neurons times the average fan-in number. In the context of biological networks, it is an adaptive biochemical system that operates at the synapse—or gap—between communicating neurons. Its generality and simplicity allow it to be utilized for both supervised and unsupervised learning for a large variety of different learning tasks [15], including models of bird song generation and learning [16] and in delay-based machine learning for spatiotemporal pattern representation [17], classification [18], and working memory [19]. The rules governing STDP are as follows: suppose there are two neurons i, j, such that neuron i (pre-synaptic) connects to neuron j (post-synaptic) with weight wij . If neuron i fires at tpre before j fires at tpost , STDP strengthens the weight, wij , between them. If neuron j fires before i, wij weakens. Figure 12.4 illustrates the change in weight wij as a function of the pre- and post-synaptic neuron relative spike timing difference ∆T = tpost − tpre . This plasticity curve asymmetric in ∆T is not the only kind of learning rule. Hebbian learning is an example of a symmetric kind of plasticity; however, STDP is more difficult to implement at ultrafast time scales because of the sharp discontinuity at ∆T = 0. We therefore focus on STDP because it requires a photonic implementation, whereas other learning rules could be implemented in slower electronic devices. !J.W Post-pre firing
Jll 8T t
Figure 12.4 Characteristic of classic STDP. The change in weight wij for a given edge as a function of the pre- and post- spike timings. Reproduced from Tait et al. Photonic Neuromorphic Signal Processing and Computing ch. 8, 183–222 (2014) in Nanophotonic Information Physics by M. Naruse, Ed. Ref. [20]. With permission of Springer-Verlag.
If a powerful signal travels through a network, the connections along which it travels tend to strengthen. As dictated by the STDP, since the pre-neuron i will fire before the post-neuron j, the strength between them will increase. However, misfires of the post-neuron or spike blockage of the pre-neuron will
Principles of Neural Network Learning
325
tend to decrease the corresponding weight. As a general rule, STDP emphasizes connections between neurons if they are causally related in a decision tree. Alternatively, one can view STDP in terms of mutual information (see Eq. (12.14)). In a given network, a neuron j can receive signals from thousands of other channels. Neurons, however, must compress that data into a single channel for their output. Since STDP strengthens the weight of causally related signals, it will attempt to minimize the difference between the information in the input and output channels of neuron j. It thereby maximizes the mutual information between input and output channels. STDP is naturally suited for unsupervised learning and cluster analysis. After feeding the network ordered input, STDP will correlate neural signals to each other, organizing the network and designating different neurons to fire for different patterns or qualities. For supervised learning, forcing output units at the desired values allows STDP to correlate input and output patterns. Since STDP attempts to change the network connection strengths to reflect correlation between the input and output of each node, it will automatically correlate associated examples and mold the network for the given task. In biological networks, the STDP mechanism is governed by biomolecular protein transmitters and receivers. This mechanism lies within the threedimensional plastic fabric of the brain, organized in an optimal geometry. Unfortunately, photonic technology cannot match the scalability of biology. Because there are N · k connections for a network of N neurons with mean indegree k, there is correspondingly a need for N ·k STDP circuits to adjust each connection. This presents a scaling challege for photonic STDP because integrated photonic neural primitives themselves already approach the diffraction limit of light, and each may have many inputs. 12.3.2
INTRINSIC PLASTICITY
Intrinsic Plasticity (IP) refers to homeostatic mechanisms through which the neuron regulates the neural spike rate distribution. This can be explained by a persistent change in the excitability property of a neuron due to certain learning tasks. It can be modeled by a set of adaptive algorithms that regulate only the internal dynamics of neurons in order to maintain an exponential distribution of firing rates [21]. But it was suggested that IP and STDP can work synergistically to find sparse directions in the input [5, 21]. Since IP controls spiking dynamics rather than connection strengths, its complexity scales with the number of neurons, N , and does not present a significant architectural challenge. Photonic neurons exhibit changes in their dynamics with changes in the current injected into the semiconductor, providing this as a potential mechanism for photonic IP. The combination of IP algorithms with STDP encourages network stability and allows for a higher diversity of applications, including ICA [5].
Neuromorphic Photonics
326
12.3.3
INDEPENDENT COMPONENT ANALYSIS WITH STDP AND IP
Savin et al. [5] showed that spiking neurons equipped with STDP rule, synaptic scaling (or weight normalization) and a special kind of IP can result in an efficient ICA solver. The IP rule optimizes the transfer function of the neuron to enforce an exponential distribution of the neuron’s firing rate by optimizing three parameters (r0 , u0 and ua ). See [5](Methods) for more details. Additionally, Hebbian synaptic plasticity, implemented by nearest-neighbor STDP, changes incoming weights, and a synaptic scaling mechanism keeps the sum of all incoming weights constant over time.
Time (samples)
D
a.--n:/3
a.--n:/6
a=- 7tl6
6
6
x"' 0
x"' 0
-6
~:-=
-6
-6.___ _ _ _ -6 0 6 .~
0 x1
6
x103
T ime (samples)
x103
x1
-6.___ _ _ -6 0 .:-~
6
x1
Figure 12.5 A demixing problem: two rotated Laplace directions. (a) Evolution of the weights (w1 in blue, w2 in red) for different initial conditions, with α = π/6, and L1 weight normalization. (b) Evolution of the instantaneous firing rate g, sampled each 1000 ms, for the initial weights w1 = 0.4, w2 = 0.6. (c) Corresponding changes in transfer function parameters, with r0 in Hz and u0 and ua in mV. (d) Final weight vector for different rotation angles α (in red). In the first example, normalization was done by kwkL1 = 1 (the estimated rotation angle is a = arctan(w2 /w1 ) = 0.5215, instead of the actual value 0.5236); for the others kwkL2 = 1 was used. In all cases the final weight vector was scaled by a factor of 5, to improve visibility. Reproduced from Savin et al. PLoS Comput. Biol. 6, e1000757 (2010) Ref. [5]. Licensed under CC BY.
A mixing test can be performed with two independent random variables following Laplace distribution with unit variance. Note that the second order
Principles of Neural Network Learning
327
correlation provides no insight on the directions of independence, so PCA alone would not be able to exploit the input statistics and would just perform a random walk in the input space, thus failing the demixing test. The results of the test are shown in Fig. 12.5(a), which depicts the evolution of synaptic weights for different starting conditions. As the IP rule adapts the neuron parameters to make the output distribution sparse (Figs. 12.5(b,c)), the weight vector aligns itself along the direction of one of the sources. With this simple model, they are able to demix a linear combination of two independent sources for different mixing matrices and different weights constraints (Fig. 12.5(d)), as any other single-unit implementation of ICA. It is noteworthy that the mechanism introduced here is not quite similar to the one discussed in Section 12.2.1, which tries to find good representations of a high-dimensional space by projecting data on a lower dimensional space by finding interesting projection directions that will maximize a certain quantity, such as the mutual information or the negentropy of the distribution of the data. The neuron was able to achieve this task by the IP mechanism, which guided the synaptic learning towards the interesting, heavy-tailed directions in the input. This numerical experiment is an example of how local learning rules can yield complex signal processing tasks. Future research directions will elucidate how to engineer these formidable properties discussed in the computation neuroscience community into hardware neural systems. In Section 12.4, we will discuss independent experimental demonstrations of the STDP and PCA tasks using optoelectronic circuits.
12.4
EXPERIMENTAL ADVANCES ON PHOTONIC LEARNING CIRCUITS
The photonic neural networks discussed in Chapter 8 use coherent light pulses multiplexed into a broadcast waveguide, with neurons networked in a broadcast-and-weight scheme. This photonic scheme is purposefully designed to support high bandwidth in information flow across processing stages of the network. The synaptic connection can be physically implemented with microring resonator-based filters, where light pulses undergo passive weighting depending on their wavelength. Although the signal bandwidth can exceed GHz rates, learning can happen in a much slower timescale. This concept is compatible with electronic control of the state of the analog photonic network: both the synaptic plasticity and the IP of a neuron are controlled electronically. For example, the microring resonators can be tuned via plasma dispersion effect, allowing for voltage control of the resonance wavelength of each filter. The behavior of processing-network nodes (PNNs), such as firing threshold, can be tuned electrically by changing the current pump driving the gain region (see Fig. 10.4). For these reasons, high density microelectronics can be used to implement adaptive learning rules [22]. Furthermore, successful implementation of PCA
Neuromorphic Photonics
328
and ICA will allow for a robust, error-tolerant architecture that can, without supervision, separate statistically independent unknown features from noise background or interference introduced by the electronic links. In the context of spiking neurons, we discuss in this section synaptic time-dependent plasticity (STDP) and intrinsic plasticity (IP). 12.4.1
PHOTONIC PCA
Previous work in MWP filtering [23], beamforming [24], and channel estimation [25] has included high-bandwidth analog signal processing. One of the fastest and simplest iterative methods for finding the first PC, celebrated by the neuroscience community, is Oja’s rule, discussed in Section 12.1.1. Iterative PCA learning on photonic signals was recently demonstrated using wavelength-division multiplexing weighted addition [26, 27]. A preliminary bench-top experiment was conducted to demonstrate the robustness of the algorithm [27]. The input signals were constructed so as to have a known first PC weight vector the system should converge to. Then, the system was initialized with the worst-case scenario weight vector for convergence purposes. The algorithm was let run for 40 iteration steps, when the converged weight vector was compared with the expected first PC loading. We chose the number 40 to be conservative, but the convergence step count was typically around 10 to 15. An example of one of these runs can be seen in Fig. 12.6. The experiment was repeated for 290 different inputs, achieving good convergence in over 90% of the cases (Fig. 12.7). We observe that the performance does not depend on the orientation of the PC coefficients. It only depends on the magnitude of R—the ratio between first and second principal component variances λ1 /λ2 . The presence of noise and the inexact command of effective weight in the weight bank prevent the system from achieving perfect close-to-100% accuracy for all R. Noise stems mostly from the electronic components, since we observed that thermal noise overwhelms amplified spontaneous emission. Inexact command, more importantly can cause convergence to a different PC or to a linear combination of PCs in cases where R < 1.2. Both effects can be minimized by integrating the system onto an optoelectronic chip, with less polarization drift, less variability across WDM channels, and more precise controls over filters and cross-saturation [28]. In contrast to standard DSP, the WDM approach offers the possibility to scale fan-in without severely jeopardizing bandwidth in the weighted addition operation. The C-band and L-band combined could support simultaneous transmission and passive weighting of 200 channels operating at ∼10 GHz each [29, 30]. A single PD performs addition by transforming total optical power of all channels into photocurrent. The final optoelectronic circuit for first PC extraction requires one modulator and one filter per channel and one PD, all with the same bandwidth requirement as each of the input channels.
Principles of Neural Network Learning ...
(a)
329
ch2+ ch1+ ch1ch2ch3...
PD current (a.u.)
ch3+
PD current (a.u.)
(b) 4
Measured 1st PC
Calculated 1st PC
2 0 -2 -4
0
1
2
3
4
5
6
7
time (ns) Figure 12.6 Example of a PCA task on a 13 Gbaud bit pattern yielding 93% accuracy. (a) The modulated waveforms before the weight bank. We show only 3 of the 8 channels for clarity purposes. The circuit was tested with wideband NRZ signals that were correlated bit-by-bit. Although we use NRZ signals with Markov sequence for testing, we treat them as analog waveforms throughout the experiment. (b) After completion of the PCA task, the measured first PC waveform is compared to the one calculated by SVD. Copyright 2016 IEEE. Reprinted, with permission, from Ferreira de Lima et al. IEEE Photon. J. 8 (2016) Ref. [27].
Here, the modulator and PD ultimately impose the bandwidth limitation of each channel. We note, however, that commercially available components already offer tens of GHz of bandwidth. A device like this could be an essential front-end to DSP systems with a large number of high-bandwidth inputs. One can expect the partial correlations of signals coming from an antenna array to drift over time. In the cognitive radio blind-source separation (BSS) context, e.g., this can be caused by moving sources. Thus, the circuit has to learn and self-adjust to these new conditions. In Ref. [27], the controller was programmed into a desktop CPU, which limited the time step duration to about a second. However, a dedicated field-programmable gate array (FPGA) could do the same task in tens of microseconds, bringing the convergence time to the submillisecond scale. An optoelectronic circuit for performing Oja’s rule (Eq. 12.11) in real-time is depicted in Fig. 12.8. A short convergence time will allow optoelectronic circuits to be used to follow non-stationary PCs in real-time [31]. A low convergence time will allow the system to track
Neuromorphic Photonics
330
(b)
(a)
51 samples (p > 0.9 & R > 1.4)
100%
0.6 OA
0.2
::i: o -o.2 -o.
0.9) converges to the correct weight for R > 1.2, which translates to the first PC being 20% more “prominent” than the second PC. Low accuracy can be explained by lack of convergence or convergence to the wrong PC. (b) 51 weight vectors contained in the rightmost bin (darker blue) of the histogram in (a). Four examples were highlighted for clarity. The weight vectors were projected onto a convenient plane that highlights the spatial diversity of the explored weight vectors generated by different Markov parameters. Copyright 2016 IEEE. Reprinted, with permission, from Ferreira de Lima et al. IEEE Photon. J. 8 (2016) Ref. [27].
moving sources in phased-arrayed systems. By blending capabilities of silicon photonics together with fast analog electronics on CMOS, we expect the convergence speed to be even faster by three orders of magnitude, thus enabling a wide range of applications in RF signal processing and neuromorphic computing [10]. 12.4.2
PHOTONIC STDP
Artificial STDP has been explored in VLSI electronics [33], and more recently, has been proposed [34] and demonstrated [35] in memristive nanodevices. Some ongoing projects in microelectronics seek to develop hardware platforms based on this technology [36, 37]. The analog nature and resilience to noise of neuromorphic processing naturally complements the high variability and failure rates of nanodevices [38]. The technology is predicted to lead to adaptive spiking networks with higher connection density than human cortical tissue [39]. STDP was first explored in the optical domain by Fok et al. [40]. The first optical STDP circuit demonstrated in that paper is illustrated in
Principles of Neural Network Learning
331 1st PC OUT
a x(t)
y = w(t) · x(t)
WDM
w(t+Llt) L\.t: weight update time step
b CW -·!~:
A
1
~-
WDM Input Generator
Microring Weight Bank
PSoC
---optical =electrical = electrical wire array
Weight update logic
Figure 12.8 (a) Schematic of the data flow of the Hebbian learning rule. (b) Diagram of experimental photonic circuit for wideband online PCA. An array of N input electric signals (x(t)) is encoded into multi-wavelength continuous-wave (CW) light via power modulation. Through wavelength division multiplexing (WDM), all single-wavelength lightwaves are channeled into one waveguide. Weighted addition (w(t)x(t)) is carried out by a bank of tunable microring resonator filters as described in Ref. [28], plus two fast balanced photodetectors which sum all weighted outputs into unbiased AC voltage signals. The output y(t) is used to modulate a Mach–Zehnder modulator (MZM) to yield z(t) in a multiplexed lightwave. After demultiplexed, z(t) is fed into N identical weight update circuits (gray boxes) whose inputs are zi (t) and y(t), and output wi (t). The detailed weight update circuit is implemented using a programmable system on-chip (PSoC), with an integrated FPGA. This design assumes that the effective principal component does not evolve quickly over a time period of 0.1 ms, reasonable for considered applications. Copyright 2015 IEEE. Adapted with permission, from Ferreira de Lima et al. in Proc. IEEE Photon. Soc. Summer Topical Meeting Series, paper MP2, 97–98 (2015) Ref. [31].
Fig. 12.9. This implementation uses two free carrier integrators (i.e., one SOA and one EAM) and optical summing to create an exponential-like response function. The resulting response is then incident on a photodetector, which regulates an electronic circuit to control the weight between two neurons.
Neuromorphic Photonics
332 Post-synaptic S()ik_C
Pr~
· syn~plfC Spike
SOA
~
~O
/
EAM
f ilter l
¢(' \
STOP OUt;)Ul
A ller?
p.
i
(a) 0.8
0.4
~
0 -{).4
-200
,.
-1 00
0
100
200
t""•- \roe (ps) (b)
Figure 12.9 Photonic STDP. (a) Photonic circuit diagram of STDP and (b) experimental function of output power vs. spike interval. Pre-and post-synaptic inputs cross phase modulate one another, with time constants determined by the SOA and EAM to give the characteristic STDP curve. Reproduced with permission from Fok et al. Opt. Lett. 38, 419–421 (2013) Ref. [32]. Copyright 2013 Optical Society of America.
While the electronic adjustment of the weight can happen on slow timescales, the sharp asymmetric response of STDP at ∆t = 0 requires an optical temporal precision on the order of the pulse width to be effective. The response function shape can be dynamically adjusted with various control parameters (Fig. 12.10). This photonic STDP design can be integrated, but the exceptional need for up to N ·k independent units limits the scalability of an overall system. Analogous STDP devices proposed in electronics such as memristors in nano-crossbar arrays [38] overcome this scaling challenge with extremely small nano-devices (on the order of nanometers). Novel implementations of STDP based on electronic-optical, photonic crystal [41], or plasmonic [42] technologies may become important to support scaling of complete adaptability. The requirement of every connection to adapt without supervision can also be relaxed, for example, by organizing the system as a liquid state machine (LSM) or in other reservoir architecture [43]. More recently, a supervised learning scheme was implemented on a photonic neuron equipped with this photonic STDP in which a teacher determined the way the neuron should fire spikes in response to its inputs (Fig. 12.11) [44]. Figure 12.11(a) depicts the weighted outputs before integration, Fig. 12.11(b) shows the photonic neuron outputs, Fig. 12.11(c) shows the teacher signal,
Principles of Neural Network Learning
333
1
1
o.0.0• 0.4 ;;. U:.! ~ 0 .0?
•• ••• 04
1'-
-
~ ~
0.4
I:AMI:Ira$:
.0(,
-
.tJ
.~ -1 L __ _ ~OU
-200
-1\YJ
uv
-I !.\'
2.UV 3V -
'J..!J\' J .SV
.1
~_:.
trw.
II
1011
~m ., ~
)
.0
~ ~ ~!"l:'
.... ...••.•
- 10/1\f\
!J(.'fnl\ 60a V\ - 7Cfltl\
- AOnlll -91:ntA , - 1Ut.rnA - 11iXRA
c:._j
;roo
JIYJ
:1110
?00
100 ~
1
0 ..
l,_.
10:>
IClO
:1no
{ ~o:)
no' nr.
u.• "" 04
~
.,
n'oF~§ .,,
~;
Jf "~
.. 0.4
~
~-
•••
l rti.g
1 r :.ri
- 10e1::1
- r.dB
~
-
I>
'1110
\ ··
~-
0.4 0.2
...
u
0.2
...·•
a ·
- .t II."