DeepDrum: An Adaptive Conditional Neural Network

DeepDrum: An Adaptiv e Conditional Neural Network f or generating drum rh ythms Dimos Makris * 1 Maximos Kaliakatsos-Papakostas * 2 Katia Lida Kermanidis * 1 Abstract Considering music as a sequence of ev ents with multiple complex dependencies, the Long Short- T erm Memory (LSTM) architecture has prov en very efﬁcient in learning and reproducing musi- cal styles. Howe ver , the generation of rhythms requires additional information regarding musical structure and accompanying instruments. In this paper we present DeepDrum, an adaptiv e Neu- ral Network capable of generating drum rhythms under constraints imposed by Feed-Forw ard (Con- ditional) Layers which contain musical parame- ters along with giv en instrumentation informa- tion (e.g. bass and guitar notes). Results on gen- erated drum sequences are presented indicating that DeepDrum is effecti v e in producing rhythms that resemble the learned style, while at the same time conforming to giv en constraints that were unknown during the training process. 1. Introduction Dev eloping computational systems that can be characterized as creativ e ( Deli ` ege & W iggins , 2006 ) has long been the focus of research. Such systems raging tasks from melody and chords’ composition to lyrics and rhythm. W ith the advent of Deep Learning architectures numerous research works hav e been published using dif ferent types of Artiﬁcial Neural Networks and, especially the Long Short-T erm Mem- ory (LSTM) networks for composing music since these are capable of modeling sequences of e vents (e.g. ( Hadjeres & Pachet , 2016 ; Kalingeri & Grandhe , 2016 ) and ( Briot et al. , 2017 ) for further references). T o the best of our kno wledge there is limited word address- ing learning and generating percussion (drum) progressions 1 Department of Informatics, Ionian Univ ersity , Corfu, Greece 2 Institute for Language and Speech Processing, R.C. “ Athena”, Athens, Greece. Correspondence to: Dimos Makris < c12makr@ionio.gr > . Pr oceedings of the 35 th International Confer ence on Machine Learning , Stockholm, Sweden, PMLR 80, 2018. Copyright 2018 by the author(s). with such architectures ( Hutchings , 2017 ; Choi et al. , 2016 ). Most of these methods focus on the generation of sequences that imitate a learned style. Howe v er , the performance of human drummers is potentially inﬂuenced, e.g. by what the guitar and bass player plays, while the tempo of a song affects the density of their playing. In this work we present DeepDrum, a combination of LSTMs and Feed-Forward (FF), or Conditional, modules capable of composing drum sequences based on musical pa- rameters, i.e. guitar and bass performance, tempo, grouping and metrical information. DeepDrum is able to combine im- plicitly learned information (LSTM) and explicitly deﬁned conditions (FF), allo wing the generation of drum rhythms that resemble a musical styles through implicit learning and, at the same time, satisfy some e xplicitly declared conditions that are potentially not encountered in the learned style. 2. Data Collection and Architectur e Inf ormation The utilised corpus consists of 70 songs from two progres- siv e rock bands, that ha ve common musical characteristics, and were collected manually from web tablature learning sources 1 . Following the same methodology for conditional composition, similarly to ( Makris et al. , 2017 ), the represen- tation of input training data was based on text words with one-hot encodings. The proposed architecture comprises separate modules for predicting the next drum ev ent. The LSTM module learns sequences of consecuti ve drum events, while the Condi- tional (FF) module handle musical information re garding guitar , bass, metrical structure, tempo and grouping. This information is the sum of features of consecuti v e time-steps, in one-hot encodings, of the Conditional Input space within a moving window giving information about the past, current and future time-steps. DeepDrum has 3 input spaces for different elements of drums, thus leading to 3 LSTM Block modules, while the Conditional input space is separated to 2 FF modules. The Pre-FF carries information for the past time-steps and is 1 http://www.911tabs.com/ DeepDrum: An Adaptive Conditional Neural Network Figure 1. DeepDrum Neural Network Architecture. merged with each corresponding drum input. The Post-FF contains information for current and future time-steps which are merged with each LSTM block output, thus leading to independent softmax outputs. Concerning the conﬁgura- tion, two stacked LSTM layers and single FF layers (linear activ ation) with 256 Hidden Units were used along with dropouts of 0 . 2 in ev ery connection. In our e xperiments we used Keras ( Chollet et al. , 2015 ) library with T ensorﬂow ( Abadi et al. , 2016 ) deep learning framework as backend. Figure 1 summarises the proposed architecture. 3. Experimental Setup The introduced architecture is examined for producing drums rhythms according to the conditions giv en by pieces that were not included in the training set, with some of them having musical characteristics that ha ve been not encoun- tered in an y piece of the training corpus (e.g. time signatures 3/8, 9/8). These four pieces, howe ver , pertain to the learned style of the training corpus (denoted as P T - P F ). In ad- dition we used 2 pieces from a dif ferent genre (in Disco style denoted as AB ). Multiple generations were produced using initial seed-sentences, in different stages of the learn- ing process with adjustable di versity parameter . Interested readers can listen to sev eral e xcerpts generated with the proposed architecture on the web page of Humanistic and Social Informatics Lab of Ionian Univ ersity 2 . Drum rhythm features ( Kaliakatsos-Papakostas et al. , 2013 ) were extracted from the generated pieces. The mean values 2 https://hilab .di.ionio.gr/index.php/en/deepdrum-an-adapti v e- conditional-neural-network-for -generating-drum-rhythms/ Figure 2. T wo-dimensional mapping of the features of all Deep- Drum compositions. Features of the Ground-T ruth (G-T) rhythms are illustrated with a × symbols, while the features of rhythms with less (early - ◦ ) and more (late -  ) than 100 epochs are shown separately . and standard deviations of these features cov ering all the bars of the generated content were used as global features of each piece. Figure 2 illustrates the two dimensional reduction of the global features of all pieces using the t- SNE ( van der Maaten , 2009 ) technique. W e can notice that: a) the network composes AB pieces that approach the features of this style (late generations are closer) while b) P T - P F pieces composed with unkno wn conditions of the learned styles cover areas around the corresponding Ground-T ruth pieces. 4. Conclusions This work introduces DeepDrum, an adapti ve Neural Net- work application which learns and generates sequences un- der giv en musical constraints. The proposed architecture consists of a Recurrent module with LSTM blocks that learns sequences of consecuti ve drums’ ev ents along with two Feed-Forw ard (Conditional) Layers handling informa- tion for musical instruments, metrical structure, tempo and the grouping (phrasing). The results shows the importance of the Conditional Layers which enable DeepDrum to simulate humans drummers in two tasks: respond to create “groo ve” with other instruments in any musical style, and foresee future musical changes (e.g. phrase and tempo changes). In addition, the Condi- tional Layers allo ws to keep the entire network “on-track” and enable it to respond to constraints that were not encoun- tered during training (e.g. unknown – to the network – time signatures). DeepDrum: An Adaptive Conditional Neural Network Acknowledgements This research has been ﬁnancially supported by General Secretariat for Research and T echnology (GSR T) and the Hellenic Foundation for Research and Innovation (HFRI) (Scholarship Code: 953). References Abadi, Mart ´ ın, Barham, Paul, Chen, Jianmin, Chen, Zhifeng, Davis, Andy , Dean, Jef frey , De vin, Matthieu, Ghemawat, Sanjay , Irving, Geoffrey , Isard, Michael, et al. T ensorﬂow: A system for large-scale machine learning. In OSDI , volume 16, pp. 265–283, 2016. Briot, Jean-Pierre, Hadjeres, Ga ¨ etan, and Pachet, Fran c ¸ ois. Deep learning techniques for music generation-a surv ey . arXiv pr eprint arXiv:1709.01620 , 2017. Choi, Keunw oo, Fazekas, Geor ge, and Sandler , Mark. T ext- based lstm networks for automatic music composition. arXiv pr eprint arXiv:1604.05358 , 2016. Chollet, Franc ¸ ois et al. Keras, 2015. Deli ` ege, Ir ` ene and W iggins, Geraint A. Musical creativ- ity: Multidisciplinary r esear ch in theory and practice . Psychology Press, 2006. Hadjeres, Ga ¨ etan and Pachet, Fran c ¸ ois. Deepbach: a steer- able model for bach chorales generation. arXiv preprint arXiv:1612.01010 , 2016. Hutchings, P . T alking drums: Generating drum grooves with neural networks. arXiv preprint , 2017. Kaliakatsos-Papakostas, M., Floros, A., and Vrahatis, M. N. Evodrummer: deriving rhythmic patterns through inter- activ e genetic algorithms. In International Confer ence on Evolutionary and Biologically Inspir ed Music and Art , pp. 25–36. Springer , 2013. Kalingeri, V asanth and Grandhe, Srikanth. Music generation with deep learning. arXiv pr eprint arXiv:1612.04928 , 2016. Makris, Dimos, Kaliakatsos-Papak ostas, Maximos, Karydis, Ioannis, and Kermanidis, Katia Lida. Combining lstm and feed forward neural networks for conditional rhythm composition. In International Confer ence on Engineering Applications of Neural Networks , pp. 570–582. Springer , 2017. van der Maaten, Laurens. Learning a parametric embedding by preserving local structure. RBM , 500(500):26, 2009.

DeepDrum: An Adaptive Conditional Neural Network

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment