A Functional Taxonomy of Music Generation Systems

69 A Functional T axonom y of Music Generation Systems DORIEN HERREMANS , Singapore University of T echnology and Design & Queen Mary University of London CHING-HU A CHU AN , University of North Florida ELAINE CHEW , Queen Mary University of London Digital advances have transformed the face of automatic music generation since its beginnings at the dawn of computing. Despite the many breakthroughs, issues such as the musical tasks targeted by different machines and the degree to which they succeed remain open questions. W e present a functional taxonomy for music generation systems with reference to existing systems. The taxonomy organizes systems accord- ing to the purposes for which they were designed. It also reveals the inter-relatedness amongst the systems . This design-centered approach contrasts with predominant methods-based surveys , and facilitates the iden- tiﬁcation of grand challenges so as to set the stage for new breakthroughs . CCS Concepts: • Applied computing → Sound and music computing; • Information systems → Mul- timedia information systems; • Computing methodologies → Artiﬁcial intelligence; Machine learning; Additional Key W ords and Phrases: music generation, taxonomy , functional survey , survey , automatic com- position, algorithmic composition ACM Reference F ormat: Dorien Herremans , Ching-Hua Chuan and Elaine Chew , 2016. A Functional T axonomy of Music Generation Systems. A CM Comput. Surv . 50, 5, Article 69 (September 2017), 33 pages. DOI: 10.1145/3108242 1. INTRODUCTION The history of automatic music generation is almost as old as that of computers. That machines can one day generate “elaborate and scientiﬁc pieces of music of any degree of complexity and extent” [Lovelace 1843] was anticipated by visionaries such as Ada Lovelace since the initial designs for a general purpose computing device were laid down by Charles Babbage. Indeed, music generation or automated composition was a task accomplished by one of the ﬁrst computers built, the ILLIAC I [Hiller Jr and Isaacson 1957]. T oday , computer -based composition systems are aplenty . The recent announcement of Google Magenta 1 , “a research project to advance the state of the art in machine intelligence for music and art generation, ” underscores the importance and popularity of automatic music generation in artiﬁcial intelligence. 1 http://magenta.tensorﬂow .org/welcome- to- magenta This project has received funding from the European Union’s Horizon 2020 research and innovation pro- gramme under grant agreement No 658914. Author’s addresses: D . Herremans, Information Systems T echnology and Design Pillar , Singapore Univer- sity of T echnology and Design, Singapore University of T echnology & Design, 8 Somapah Road, 1.502-18, Singapore 487372, for part of the work, D . Herremans was at the School of Electronic Engineering and Com- puter Science, Queen Mary University of London; E. Chew , School of Electronic Engineering and Computer Science, Queen Mary University of London, Mile End Road, E1 NS4 London, UK; C.-H. Chuan, School of Computing, University of North Florida, 1 UNF Drive , Jacksonville , FL 32224, US . P ermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. T o copy otherwise, or repub- lish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and /or a fee. Request permissions from permissions@acm.org. c  2017 ACM. 0360-0300/2017/09-ART69 $15.00 DOI: 10.1145/3108242 Preprint of ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. 69:2 D . Herremans et al. Despite the enthusiasm of researchers , using computers to generate music remains an ill-deﬁned problem. Although several survey papers on automatic music genera- tion [P apadopoulos and Wiggins 1999; Nierhaus 2009; F ernández and Vico 2013] exist, researchers still debate the kinds of musical tasks that can be performed by machines and the degree to which satisfactory outcomes can be achieved. Outstanding ques- tions include: what compositional tasks are solved and which remain challenges? How is each compositional task modeled and how do they connect to each other? What is the relationship between systems proposed for different compositional tasks? What is the goal deﬁned for eac h task and how can the objective be quantiﬁed? How are the systems evaluated? While individual questions or subsets of these questions might be addressed in speciﬁc papers , previous surveys fail to provide a systematic comparison of the state of the art. This paper aims to answer these questions by proposing a functional taxonomy of automatic music generation systems . F ocusing on the purpose for which the systems were developed, we examine the manner in which each music composition task was modeled and describe the connection between different tasks within and across sys- tems. W e propose a concept map for automatic music generation systems based on the functions of the systems in Section 1.1. A brief history of early automatic music generation systems is provided in Section 1.2, followed by a discussion on the general approach to evaluating computer generated music (Section 1.3). A detailed survey of systems designed based on each functional aspect is then presented in Section 2. 1.1. Function and design concepts in automatic music generation systems The complexity and types of music generation systems is almost as varied as music it- self . It would be a gross simpliﬁcation to consider and judge all automatic music gener - ation systems in a homogeneous fashion. The easiest wa y to understand the complexity of these systems and their connections one to another is to examine the functions for which they were designed. Figure 1 illustrates a concept map showing the functional design aspects that form the proposed taxonomy of music generation systems. The map is centered around two basic concepts crucial to music generation systems: the composition (the higher grey node) and the note (the lower gray node), which possesses properties such as pitch, duration, onset time, and instrumentation. Between the note and the composition lie four essential elements of music compo- sition: melody , harmony , rhythm , and timbre . Systems that focus on any one of the four aspects generate a sequence of notes that fulﬁlls a speciﬁc set of goals, which can vary widely amongst the systems. F or example, for melody generation, a system could be designed to simply produce a monophonic sequence of notes [Brooks et al. 1957], or be constrained to ﬁt a given accompaniment [Pac het and Roy 2001]. F or an automatic harmonization system, the goal could involve generating three lines of music for a given melody without breaking music theoretic rules (e.g ., harmonizing chorales [Ebcio ˘ glu 1988], or producing substitute chord progressions in jazz [Chemil- lier 2001]. F or rhythm generation, a system could focus on producing rhythmic pat- terns that sound like rock n’ roll [T okui and Iba 2000], or on changing the timing of onsets to make the rendering of the piece sound more human-like [Tidemann and Demiris 2008]. Timbre is unique in that it is based only on the acoustic characteristic of music. Timbre can be generated either by playing notes on a real instrument or by artiﬁ- cially synthesizing sounds for a note or several notes. In automatic music composition, timbre generation surfaces as a problem in orchestration, which is often modeled as a retrieval problem [Psenicka 2003], or a multi-objective search problem [Carpentier et al. 2010]. ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. A Functional T axonomy of Music Generation Systems 69:3 Fig. 1 . Concept map for automatic music generation systems. The objective of a system, such as matching a target timbre, will directly impact the problem deﬁnition and prospective solution techniques, such as multi-objective search or retrieval. Also notice that a music generation system can tackle more than one functional aspect—melody , harmony , rhythm, timbre—either by targeting multiple goals at the same time or focusing on one goal with other musical aspects considered constant and provided by the user . Returning to Figure 1, three high-level concepts are shown above composition : nar - rative , interactive composing , and difﬁculty . Interactive composing refers to an online problem solving approach, which can be real-time or not, to music generation that employs user input. A system can be designed to generate each of the four essential musical elements, or a combination of them, in an interactive manner . F or example, a system can listen to a person’s playing and learn her or his style in real time, and improvise with the player in the same style [P achet 2003; Assayag et al. 2006]. An- other type of interactive system incorporates a user’s feedback in the music generation process, using it either as critique for reinforcement learning [Franklin 2001] or as a source of parameters in music generation [François et al. 2013]. The narrative contributes to the emotion, tension, and/or story line perceived by the listener when listening to music [Huron 2006]. The concept of difﬁculty focuses on physical aspects of playing the instrument. Systems with ergonomic goals must consider the playability of certain note combinations on a particular instrument. T o achieve these goals, the long-term and/or hierarchical structure of the music plays an important role. These high-level goals and the long-term structure have been the focus of recent development in automatic music generation, a trend that will persist into the near future. As shown in Figure 1, automatic music generation evokes a number of computa- tional problems and demonstrates capabilities that span almost the entire spectrum of artiﬁcial intelligence. F or example, generating music can be described as a sen- ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. 69:4 D . Herremans et al. sorless problem (generating monophonic melody without accompaniment), a partially observable problem (with accompaniment but not the underlying chord progression), or a fully observable problem (accompaniment with labeled chord progression). Dif- ferent agent types, including model- and knowledge-based [Chuan and Chew 2011], goal-based [P achet and Roy 2001], utility-based [McVicar et al. 2014], and statistical learning [T okui and Iba 2000], have been used for music generation. In music generation, states can be deﬁned in terms of discrete (e.g., pitch, interval, duration, chord) as well as continuous (e.g ., melodic contour , acoustic brightness and roughness) features . In addition, various techniques , such as stochastic approaches , probabilistic modeling, and combinatorial optimization, have been applied to music generation. In such a rich problem domain, it is thus especially important to under- stand the intricacies within each subproblem and the manner in which the subprob- lems are interconnected one with another . 1.2. A utomating composition: early year s The idea of composers relinquishing some degree of creative control and automating certain aspects of composition has been around for a long time. A popular early exam- ple is Mozart’ s Musikalisches Würfelspiel (Musical Dice Game), whereby small frag- ments of music are randomly re-ordered by rolling a dice to create a musical piece. Mozart was not the only one experimenting with this idea. In fact, the ﬁrst musi- cal dice game, called Der allezeit fertige Menuetten und P olonaisencomponist (The Ever -Ready Minuet and P olonaise Composer) can be traced back to J ohann Philipp Kirnberger [Kirnberger 1757]. According to Hedges [1978], at least twenty musical dice games where published between 1757 and 1812, making it possible for musical novices to compose polonaises, minuets , marches, w alzes, and more . John Cage, Charles Dodge, Iannis Xenakis and other avant-garde composers have continued the ideas of chance-inspired composition. J ohn Cage’s Atlas Eclipticalis was composed by randomly placing translucent paper on a star chart and tracing the stars as notes [Pritchett 1994]. In the piece called “ Analogique A ”, Xenakis uses statistical models (Markov) to determine how musical sections are ordered [Xenakis 1992]. The composer David Cope began his “Experiments in Musical Intelligence” in 1981 as the result of a composer’s block; the aim of his resultant software was to model his own composing style, so that at any given point one could request a next note , next bar , and so on. In later experiments, Cope also modeled styles of other composers [Cope 1996]. Some of the music composed using this approach proved to be fairly successful. A more extensive overview of such avant-garde composers is given by Cope [2000]. State-of-the-art music generation systems extend these ideas of mimicking styles and pieces, be it in the form of statistical properties of styles or explicitly repeated fragments. Before Section 2 describes in greater depth music generation systems in terms of their functions , the next section focuses on how the goals of music generation systems are deﬁned and evaluated, depending on the technique used for generation. 1.3. Measuring success F or automatic music generation systems, unless the end goal is the process rather than the outcome, evaluation of the resulting composition is usually desired, and for some systems an essential step in the composition process. The output of music generation systems can be evaluated by human listeners, us- ing music theoretic rules, or using machine-learned models. The choice of evaluation method is primarily inﬂuenced by the goal of the music generation system, such as similarity to a corpus or a style (as encapsulated by rules or machine-learned models) versus music that sounds good. All of these goals are interrelated and impact the im- plementation of the music generation system and the quality of the generated pieces. ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. A Functional T axonomy of Music Generation Systems 69:5 While human feedback may arguably be the most sound approach for evaluating post-hoc if the generated pieces sound good [Pearce and Wiggins 2001; Agres et al. 2017], requiring people to rate the output at each step of the process can take an ex- cessive amount of time. This is often referred to as the human ﬁtness bottleneck [Biles 2001]. A second issue with human evaluation is fatigue. Continuous listening and eval- uating can cause signiﬁcant psychological strain for the listener [T okui and Iba 2000]. So while it can be useful, and arguably essential, to let human listeners test the ﬁnal outcome of a system, human ratings aren’t practically possible to guide or steer during the generation process. If the goal of the automatic composition process is to create music similar to a given style or body of work by a particular composer , one could look to music theory for well- known rules such as those for music in the style of a composer , say P alestrina. These could be incorporated into an expert system or serve as a ﬁtness function, say of a genetic algorithm. The downside to this approach is that existing rule sets are limited to a few narrowly-deﬁned styles that have been comprehensively analyzed by music theorists or systematically outlined by the composer , which constrains its robustness and wider applicability , or are so generic as to result in music lacking deﬁnable char- acteristics. The third approach, using machine-learned models, seems to offer a solution to the aforementioned problems . By learning the style of either a corpus of music or a partic- ular piece, music can be generated with characteristics following those in the training pieces. The characteristics ma y include distributions of absolute or relative pitch sets , durations, intervals, and contours. A large collection of features is suggested by T owsey et al. [2001] and Conklin and Witten [1995]. Markov chains form a class of machine- learned models; they capture the statistical occurrence of features in a particular piece or corpus. Sampling from Markov models results in pieces with similar statistical dis- tributions of the desired musical features . Other machine-learning approaches include neural networks and, more recently , deep-learning methods , which attempt to capture more complex relationships in a music piece. A concept that directly relates to the task of evaluating the generated music, re- gardless of which of the above three methods are used, is similarity . In the ﬁrst case, human listeners have formed their frame of reference through previous listening expe- riences [P eretz et al. 1998; Krumhansl 2001] and will judge generated pieces based on their similarity to pieces with which they are familiar . Secondly , a piece generated with music theoretic rules will possess attributes characteristic of those in the target style. Finally , pieces generated by machine-learned models will have features distributed in wa ys similar to the original corpus. Since similarity is central to metrics of success in music generation systems, an im- portant challenge then becomes one of ﬁnding the right balance between similarity and novelty or creativity . In the words of Hiller [1989]: “It is interesting to specu- late how much must be changed to create a new work. ” F or example, music based on fragments of an already existing composition, as in the case with high-order Markov models, run the risk of crossing the ﬁne line between stylistic similarity and plagia- rism [Papadopoulos et al. 2014]. Evaluating the creativity , which is sometimes equated to novelty , of the generated music is a complex topic treated in greater length in Agres et al. [2017]. In order to facilitate the comparison of results from different music generation sys- tems, the authors have set up an online computer generated music repository 2 . This repository allows researchers to upload both audio ﬁles and sheet music generated by their systems. This will facilitate dissemination of results and promote research 2 http://dorienherremans.com/cogemur ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. 69:6 D . Herremans et al. transparency so as to better assess the impact of different systems. Access to concrete examples through the website will allow visitors to better understand the behavior of the music generation systems that created them. In the remainder of this paper , we will discuss each of the functional areas on which music generation systems can focus. Rather than aiming to provide an exhaustive list of music generation systems, we choose to focus on research that presented novel ideas which were later adopted and extended by other researchers. This function and design-based perspective stands in contrast to existing survey papers , which typically categorize generation systems according to the techniques that they employ , such as Markov models, genetic algorithms, rule-based systems, and neural networks— see [P apadopoulos and Wiggins 1999; Nierhaus 2009; F ernández and Vico 2013]. By offering a new taxonomy inspired by the function and design foci of the systems, we aim to provide deeper insights into the problems that existing systems tackle and the current challenges in the ﬁeld, thereby inspiring future work that pushes the bound- aries of the state-of-the-art. 2. A FUNCTIONAL INDEX OF MUSIC GENERA TION SYSTEMS This section explores functional aspects addressed in different music generation sys- tems which form the taxonomy proposed in this paper; example systems are given for each aspect. The functional aspects discussed, in order of appearance, are melody , harmony , rhythm , timbre , interaction , narrative , and difﬁculty . W e also touch upon long-term structure in relation to some of these categories . It is worth pointing out that the aspects , while separate in their own right, can often be conﬂated; for example, rhythm is inherent in most melodies . Therefore, a system mentioned in the context of one aspect may also touch upon other functional aspects . In T able I an overview is given of the different techniques used within these func- tional aspects . Systems are classiﬁed by their main technique and listed with their most prominent aspect. Typically , music generation systems can belong to more than one category . In this paper (and therefore also in T able I), the most important contri- bution of the systems is emphasized and only the systems with a clear contribution are listed. In the next subsections, the individual functional aspects will be discussed in greater detail. T able I: Functional overview of selected music generation systems by their main technique. Markov models Melody [Pinkerton 1956; Brooks et al. 1957; Moorer 1972; Conklin and W itten 1995; P achet and Roy 2001; Davismoon and Eccles 2010; Pearce et al. 2010; Gillick et al. 2010; McVicar et al. 2014; P apadopoulos et al. 2014] Harmony [Hiller Jr and Isaacson 1957; Xenakis 1992; F arbood and Schoner 2001; Allan and Williams 2005; Lee and Jang 2004; Yi and Goldsmith 2007; Simon et al. 2008; Eigenfeldt and P asquier 2009; De Prisco et al. 2010; Chuan and Chew 2011; Bigo and Conklin 2015] Rhythm [Tidemann and Demiris 2008; Marchini and Purwins 2010; Hawryshkewich et al. 2011] Interaction [Thom 2000] Narrative [Prechtl et al. 2014a,b] Difﬁculty [McVicar et al. 2014] ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. A Functional T axonomy of Music Generation Systems 69:7 F actor oracles Interaction [Assayag et al. 2006; W einberg and Driscoll 2006; François et al. 2007; Assayag et al. 2010; Dubnov and Assayag 2012; François et al. 2013; Nika et al. 2015] Rhythm [W einberg and Driscoll 2006] Incremental parsing Interaction [P achet 2003] Reinforcement learning Interaction [Franklin 2001] Rule/Constraint satisfaction/Grammar-based Melody [Keller and Morrison 2007; Gillick et al. 2010; Herremans and Sörensen 2012] Harmony [Hiller Jr and Isaacson 1957; Steedman 1984; Ebcio ˘ glu 1988; Cope 1996; As- sayag et al. 1999b; Cope 2004; Huang and Chew 2005; Anders 2007; Anders and Miranda 2009; Aguilera et al. 2010; Herremans and Sörensen 2012, 2013; T anaka et al. 2016] Narrative [Rutherford and Wiggins 2002] Difﬁculty [Lin and Liu 2006] Interaction [Lewis 2000; Chemillier 2001; Morales-Manzanares et al. 2001; Marsden 2004] Narrative [Casella and Paiva 2001; F arbood et al. 2007; Brown 2012; Nakamura et al. 1994] Neural networks/Restricted Boltzmann machines/ LSTM Harmony [Lewis 1991; Hild et al. 1992; Eck and Schmidhuber 2002; Boulanger - Lewandowski et al. 2012; Herremans and Chuan 2017] Melody [T odd 1989; Duff 1989; Mozer 1991; Lewis 1991; T oiviainen 1995; Eck and Schmidhuber 2002; Franklin 2006; Agres et al. 2009; Boulanger-Lew andowski et al. 2012] Interaction [Franklin 2001] Narrative [Browne and F ox 2009] Evolutionary/P opulation-based optimization algorithms Melody [Horner and Goldberg 1991; T owsey et al. 2001; W ASCHKA II 2007; Herremans and Sörensen 2012] Harmony [McIntyre 1994; P olito et al. 1997; Phon-Amnuaisuk and Wiggins 1999; Geis and Middendorf 2007; W ASCHKA II 2007; Herremans and Sörensen 2012] Rhythm [T okui and Iba 2000; Pearce and W iggins 2001; Ariza 2002] Interaction [Biles 1998, 2001] Difﬁculty [Tuohy and P otter 2005; De Prisco et al. 2012] Timbre [Carpentier et al. 2010] Local search-based optimization Melody [Herremans and Sörensen 2012] ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. 69:8 D . Herremans et al. Harmony [Herremans and Sörensen 2012; Herremans et al. 2015a] Narrative [Browne and F ox 2009; Herremans and Chew 2016a] Timbre [Carpentier et al. 2010] Integer Programming Melody [Cunha et al. 2016] Other optimization methods Melody [Davismoon and Eccles 2010] Harmony [Tsang and Aitken 1999; F arbood and Schoner 2001; Bemman and Meredith 2016] Timbre [Hummel 2005; Collins 2012] Difﬁculty [Radisavljevic and Driessen 2004] 2.1. Melody Melody constitutes one of the ﬁrst aspects of music subject to automatic generation. This section explores the range of automatic systems for generating melody . The gen- eration of simple melodies is studied ﬁrst, followed by the transformation of existing ones, then the more constrained problem of generating melodies that ﬁt an accompa- niment or chord sequence. 2.1.1. Melodic generation. When considering the problem of generating music , the sim- plest form of the exercise that comes to mind is the composition of monophonic melodies without accompaniment. Problem description. In most melody generation systems, the objective is to compose melodies with characteristics similar to a chosen style—such as W estern tonal music or free jazz—or corpus—such as music for the Ethiopian lyre the bagana [Herremans et al. 2015b], a selection of nursery rhymes [Pinkerton 1956], or hymn tunes [Brooks et al. 1957]. These systems depend on a function to evaluate the ﬁtness of output sequences or to prune candidates . Such a ﬁtness function, as discussed in Section 1.3 is often based on similarity to a given corpus, style, or piece . The music is often reduced to extracted features; these features can then be compared to that of the exemplar piece or corpus , a model, or existing music theoretic rules . Example features inc lude absolute or relative pitch [Conklin 2003], intervals [Herremans et al. 2015a], durations [Conklin 2003], and contours [Alpern 1995]. Not all studies provide details of the extracted features, which makes it difﬁcult to compare the objectives and results . Early work. Building on the ideas of the aforementioned avant garde composers, some early work on melody generation uses stochastic models. These models capture the statistical occurrence of features in a particular song or corpus to generate music having selected feature distributions similar to the target song or corpus . The ﬁrst attempts at generating melodies with computers date back to 1956, when Pinkerton built a ﬁrst order Markov model, the “Banal Tune-Maker”, based on a corpus of 39 simple nursery rhymes. Using a random walk process, he was able to generate new melodies that “sound like nursery rhymes”. The following year , Brooks et al. [1957] built Markov models from order one up to eight based on a dataset of 37 hymns . When using a random walk process, they noted that melodies generated by higher order ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. A Functional T axonomy of Music Generation Systems 69:9 models tend to be more repetitive and those generated by lower order models had more randomness. The trade-off between composing pieces similar to existing work and novel, creative input is a delicate one. Although Stravinsky is famously quoted as having said, “good composers borrow and great composers steal” [Raines 2015], machines still lack the ability to distinguish between artful stealing and outright plagiarism. Concepts of irony and humor can also be difﬁcult to quantify . In order to avoid plagiarism and create new and original compositions , an automatic music generation system needs to ﬁnd the balance between generating pieces similar to a given style , yet not too similar to individual pieces. P apadopoulos et al. [2014] examined problems of plagiarism arising from higher order Markov chains. Their resulting system learns a high order model, but intro- duces MaxOrder , the maximum allow able subsequence order in a generated sequence, to curb excessive repeats of material from the source music piece. The sequences are generated using ﬁnite-domain constraint satisfaction. The idea of adding control con- straints when generating music using Markov models was further explored by P a- chet and Roy [2001]. Examples of applications of such control constraints include re- quirements that a sequence be globally ascending or follows an arbitrary pitch con- tour . Although there have been some tests of using control constraints with mono- phonic melodies, the research of P achet and Roy [2001] focuses on the even more con- strained problem of generating jazz solos over accompaniment, a topic that is explored in greater detail in Section 2.1.3. Structure and patterns. Composing a monophonic melody may seem like a simple task compared to the scoring of a full symphony . Nevertheless, melodies are more than just movements between notes , they normally possess long term structure. This struc- ture may result from the presence of motives , patterns , and variations of the patterns. Generating music from a Markov model with a random walk or Gibbs sampling typi- cally does not enforce patterns that lead to long term structure. In recent years , some research has shown the effectiveness of using techniques such as optimization and deep learning to enforce long-term structure. Davismoon and Eccles [2010] were some of the ﬁrst researchers to frame music gen- eration as a combinatorial optimization problem with a Markov model integrated in its objective function. In order to evaluate the music generated, their system builds a (second) Markov model based on the generated music so as to enable to system to minimize a Euclidean distance between the original model and the new model. They used simulated annealing, a metaheuristic inspired by a metallurgic technique used to cool a crystalline solid [Kirkpatrick et al. 1983], to solve this distance-minimization problem. This allowed them to pose some extra constraints to control pitch drift and solve end-point problems. P earce et al. [2010]’s IDyOM system uses a combination of long- and short-term Markov models. A dataset of modern W estern tonal-style music was used to train a long-term model, combined with a short-term model trained incrementally on the piece being generated. The short-term model captures the c hanges in melodic expectation as it relates to the growing knowledge of the current fragment’ s structure. Local repeated structures are more likely to recur; this model will therefore recognize and stimulate repeated structures within a piece. The result is an increase in the similarity of the piece with itself , which can be considered a precursor to form. A recent study by Roig et al. [2014] generates melodies by concatenating rhyth- mic and melodic patterns sampled from a database. Selection is done based on rules combined with a probabilistic method. This approach allows the system to generate melodies with larger-scale structure such as repeated patterns , which causes the piece ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. 69:10 D . Herremans et al. to have moments of self-similarity . Cunha et al. [2016] adopt a similar approach, using integer programming with structural constraints to generate guitar solos from short existing licks. The objective function consists of a combination of rules. Bemman and Meredith [2016] mathematically formalized a problem posed by composer Milton Bab- bitt. Babbit is famous for composing twelve-tone serial music and formulated the “all- partition array”-problem, which consists of ﬁnding a rectangular area of pitch class integers that can be partitioned into regions whereby each region represents a distinct integer partition of 12. There are only very few solutions to this computationally hard composition problem with a very structured nature, one of which w as found by T anaka et al. [2016] through constraint programming. Herremans et al. [2015b] investigates the integration of Markov models in an op- timization algorithm, exploring multiple ways in which a Markov model can be used to construct an objective function that forces the music to have the same statistical distribution of features as a corpus or piece . This optimization problem is solved using a variable neighborhood search (VNS). The main advantage of this approach is that it allows for the inclusion of any type of constraint. In their paper , the generated piece is constrained to an AABCA structure. The approach was implemented and evaluated by generating music for the bagana, an Ethiopian lyre. Since this system uses the semi- otic pattern from a template piece, the newly generated pieces can be considered as having structure like the template . The MorpheuS system [Herremans and Chew 2016a] expands on the VNS method, adding constraints on recurring (transposed) patterns and adherence to a given ten- sion proﬁle. Repeated patterns are detected using the compression algorithm COSI- ATEC [Meredith 2013]. COSIATEC ﬁnds the locations where melodic fragments are repeated in a template piece, thus supplying higher-level information about repetitions and structural organization. T onal tension is quantiﬁed using measures [Herremans and Chew 2016b] based on the spiral array [Chew 2014]. In recent years, more complex deep learning models such as recursive neural net- works have gained in popularity . The trend is due in part to the fact that such models can learn complex relationships between notes given a large-enough corpus. Some of these models also allow for the generation of music with repeating patterns and no- tions of structure. The next paragraphs examine research on neural network-based melody generation. Deep learning and structure. The ﬁrst computational model based on artiﬁcial neu- ral networks (ANNs) was created by McCulloch and Pitts [1943]. Starting in the eight- ies, more sophisticated models have emerged that aim to more accurately capture complex properties of music. The ﬁrst neural network for music generation was de- veloped by T odd [1989], who designed a three-layered recurrent artiﬁcial neural net- work, whose output (one single pitch at a time) forms a melody line. Building on this approach, Duff [1989] created another ANN using relative pitches instead of absolute pitches to compose music in J .S. Bach’ s style. Recurrent neural networks are a fam- ily of neural networks built for representing sequences [Rumelhart et al. 1988]. They have cyclic connections between nodes that create a memory structure . Mozer [1991] implemented a recurrent connectionist network (called CONCERT), that was used in an experiment to generate music that sounds like J .S . Bach’s min- uets and marches. Novel in this approach was the representation of pitches in a psychologically-grounded multidimensional space. This representation enabled the system to capture a notion of similarity between pitches . Although CONCERT is able to learn some structure, such as that of diatonic scales, its output lacks long-term co- herence such as that produced by repetition and the statement of the theme at the beginning and its return near the end. While the internal memory of recursive neural ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. A Functional T axonomy of Music Generation Systems 69:11 networks [Rumelhart et al. 1985] can, in principle, deal with the entire sequence his- tory . It remains a c hallenge, however , to efﬁciently train long term dependencies [Ben- gio et al. 1994]. x In the same year , Lewis [1991] designed another ANN framework with a slightly different approach. Instead of training the ANN on a corpus , he mapped a collection of patterns—drawn from music ranging from random to very good—to a musicality score. T o create new pieces, the mapping was inverted and the musical- ity score of random patterns was maximized with a gradient-descent algorithm to re- shape the patterns. Due to the high computational cost, the system was only tested on simple and short compositions . Agres et al. [2009] built a recurrent neural network that learned the tonal structure of melodies , and examined the impact of the number of epochs of training on the quality of newly generated melodies. They showed that better -liked melodies were the result of models that had more sparse internal rep- resentations. Conceptually , this sort of sparse representation may reﬂect the w ay in which the human cortex encodes musical structure. Since these initial studies, deep learning networks have increased in popularity . Franklin [2006] developed a Long Short-T erm Recurrent Neural Network (LSTM) that generates solos over a reharmonization of chords . She suggests that hierarchi- cal LSTM networks might be able to learn sub-phrase structures in future work. LSTM was developed in 1997 by [Hochreiter and Schmidhuber 1997]. It is a recur - rent neural network architecture that introduces a memory structure in its nodes . More recently , Boulanger-Lew andowski et al. [2012] used a piano roll representation to create Recurrent T emporal Restrictive Boltzmann Machine (RT -RBM)-based mod- els for polyphonic pieces . An RBM, originally called Harmonium by the original de- veloper [Smolensky 1986], is a type of neural network that can learn a probability distribution over its inputs. While the model of Boulanger-Lew andowski et al. [2012] is intended mainly to improve the accuracy of transcription, it can equally be used for generating music. The RBM-based model learns basic harmony and melody , and local temporal coherence. Long-term structure and musical meter are not captured by this model. The capability for RBM’s to recognize long-term structures such as motives and phrases is acknowledged in a recent paper by Lattner et al. [2015], in which an RBM is used to segment musical pieces. The model reaches an accuracy rate that competes with current state-of-the-art segmentation models. Recent work by Herremans and Chuan [2017] takes a different approach inspired by linguistics. They use neural net- works to evaluate the ability of semantic vector space models (word2vec) to capture musical context and semantic similarity . The results are promising and show that mu- sical knowledge such as tonality can be modeled by solely looking at the context of a musical segment. 2.1.2. T ransf or mation. Horner and Goldberg [1991], pioneers in applying genetic algo- rithms (GAs) to music composition, tackle the problem of thematic bridging, the trans- formation of an initial musical pattern to a ﬁnal one over a speciﬁed duration. A ge- netic algorithms is a type of metaheuristic that became popular in the 70s through the work of [Holland 1992]. It typically maintain a set (called population) of solutions and combine solutions from this set to form new ones. In the work of Horner and Gold- berg [1991], based on a set of operators, an initial melodic pattern is transformed to resemble the ﬁnal pattern using a GA. The ﬁnal result consists of a concatenation of all patterns encountered during this process. Ralley [1995] uses the same technique (GA) for melodic development, a process in which key characteristics of a given melody are transformed to generate new material. The results are mixed as no interesting transformed output were found. According to Ralley [1995], the problem lies in the high subjectivity of the desired outcome. ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. 69:12 D . Herremans et al. GenDash, a compositional tool developed by composer Rodney W aschka II [W ASCHKA II 2007], is not a fully automated composition system but works in tan- dem with a human composer . The genetic algorithm does not have any type of ﬁtness function (human or other); it simply evolves measures of music at random. In this pro- cess, each measure is treated as a different population for evolution. Using GenDash, W aschka composed the opera Sappho’s Breath by using a population that consists of twenty-six measures from typical Greek and Medieval songs [Dostál 2013]. Recently , Sony Computer Science Labs’ Flow Composer has been used to reorches- trate Ode to Joy , the European Anthem, in seven different styles, including Bach chorales and Penny Lane by The Beatles [P achet 2016]. The reorchestrations are based on max-entropy models, which are often used in ﬁelds such as physics and biology to model probability distributions with observed pairwise correlations [Lezon et al. 2006]. 2.1.3. Chord constraints. A melody is most often paired either with counterpoint or with chords that harmonize the melody . While there exists muc h work on generating chords given a melody (see Section 2.2.3), some studies focus on generating a melody that ﬁt a chord sequence. Moorer [1972], for instance , ﬁrst generates a chord sequence , then a melodic line against the sequence. The melody notes are restricted to only those in the correspond- ing chord at any given point in time. At each point, a decision is made, based on a second-order Markov model, to invert melodic fragments based on the c hord, or to copy the previous one. The resulting short melodies have a strangely alien sound, which the author attributes to the fact that the “plan” or approach is not one that humans use, and the system does not discriminate against unfamiliar sequences. The generation of jazz solos over an accompaniment is a popular problem [P achet and Roy 2001; T oiviainen 1995; K eller and Morrison 2007]. The improvisation system (Impro-Visor) designed by Keller and Morrison [2007] uses probabilistic grammars to generate jazz solos. The model successfully learns the style of a composer , as reﬂected in an experiment described by Gillick et al. [2010], where human listeners correctly matched 95% of solos composed by Impro-Visor in the style of the famous performer Clifford Brown to the original solo. The accuracy w as 90% for Miles Davis , and slightly less, 85% for Freddie Hubbard. They state that “The combination of contours and note categories seems to balance similarity and novelty sufﬁciently well to be characterized as jazz”. The system does not capture long-term structure, which the authors suggest might be solved by using the structure of an existing solo as a template. Eck and Schmidhuber [2002] tackle a similar problem, the generation of a blues melody following the generation of a chord sequence. They use a Long Short T erm Memory RNN , which the authors claim handles long-term structure well. However , the paper does not provide examples of output for the evaluation of the long-term structure. In the next section, we review music generation systems that focus on harmony . 2.2. Harmony Besides melody , harmony is another popular aspect for automatic music generation. This section describes automatic systems for harmony generation, focusing on the manner in which harmonic elements such as chords and cadences are computation- ally modeled and produced in accordance to a speciﬁc style. In the generation of harmonic sequences , the quality of the output depends primarily on similarity to a target style. F or example, in chorale harmonization, this similarity is deﬁned explicitly by adherence to voice-leading rules. In popular music, where c hord progressions function primarily as accompaniment to a melody , the desired harmonic ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. A Functional T axonomy of Music Generation Systems 69:13 progression is achieved mostly by producing patterns similar to existing examples hav- ing the same context. The context is deﬁned by the vertical relation between melody and harmony (i.e., notes sounding at the same time) as well as horizontal patterns of chord transitions (i.e., the relationship of notes over time). In addition to direct comparisons of harmonic similarity , the output of a chord gen- eration system can also be evaluated under other criteria such as similarity to a genre or to the music of a particular artist. The system must generate sequences recognizably in the target genre or belonging to a particular corpus, yet not “substantially similar” to it [Liebesman 2007] so as to avoid accusations of plagiarism. It is only a short step from similarity and plagiarism to copyright infringement. On copyright protection of ubiquitous patterns such as har - monic sequences, Gherman [2008] argues that: “When determining whether two mu- sical works are substantially similar . . . the simple , basic harmony or variation should not be protectable as it is functional . . . . The harmony that goes beyond the triviality of primary tonal level and bloc ked chords is and should be protectable under copyright law . ” The next sections discuss the task of counterpoint generation, followed by harmo- nization of chorales , general harmonization, and the generating of chord sequences . 2.2.1. Counter point. Counterpoint is a speciﬁc type of polyphony . It is deﬁned by a strict set of rules that handle the intricacies that occur when writing music that has multiple independent (yet harmonizing) voices [Siddharthan 1999]. In Gradus Ad P arnassum, a pedagogical volume written in 1725, Johann Fux docu- mented a comprehensive set of rules for composing counterpoint music [Fux and Mann 1971], which forms the basis of counterpoint texts up to the present day . Counterpoint, as deﬁned by Fux, consists of different “species”, or levels of increasing complexity , which include more rhythmic possibilities [Norden 1969]. Problem description. The process of generating counterpoint typically begins with a given melody called the cantus ﬁrmus (“ﬁxed song”). The task is then to compose one or more melody lines against it. As the rules of counterpoint are strictly deﬁned, it is relatively easy to use rules to generate or evaluate if the generated sequence sounds similar to the style of the original counterpoint music. The Palestrina-P al sys- tem developed by Huang and Chew [2005] offers an interactive interface to visualize violations of these harmonic, rhythmic and melodic rules . Automatic counterpoint composition systems typically handle two to four voices. The systems for generating four-part counterpoint are grouped together with four- part chorale harmonization in the next section because they follow similar rules. The systems and approaches described below handle fewer than four voices . Approaches. Three main approaches exist for emulating counterpoint style: the ﬁrst uses known rules to generate counterpoint; the second uses the rules in an evaluation function of an optimization algorithm; and, the last uses machine learning to capture the style. In the ﬁrst category , Hiller Jr and Isaacson [1957] uses rules for counterpoint to generate the ﬁrst and second movements of the Illiac Suite. David Cope composes ﬁrst species counterpoint given a cantus ﬁrmus in his system “Gradus. ” Gradus analyses a set of ﬁrst species counterpoint examples and learns the best settings for 6 general counterpoint goals or rules. These goals are used to sequentially generate the piece, using a rule-based approach [Cope 2004]. Another system, developed by Aguilera et al. [2010] uses logic based on probabil- ity rules to generate counterpoint parts in C major , over a ﬁxed cantus ﬁrmus. In the generation process, the system evaluates only the harmony characteristics of the coun- ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. 69:14 D . Herremans et al. terpoint, but not the melodic aspects . The original theory of J ohann Fux contains rules that focus both melodic and harmonic interaction [Fux and Mann 1971]. The second approach, using counterpoint rules as tools for evaluation, is employed in the system called GPmuse, a GA developed by P olito et al. [1997]. GPmuse composes ﬁfth species (mixed rhythm) counterpoint starting from a given cantus ﬁrmus. It ex- tracts rules based on the homework problems formulated by Fux and uses the rules to deﬁne the ﬁtness functions for the GA. The music generated by GPmuse sounds simi- lar to the original style of counterpoint music. A problem with the system is that some “obvious” rules were not deﬁned by Fux, such as the need for the performer (singer) to breathe. Since these rules were not explicitly programmed in GPmusic, one example output contained very long phrases which solely contained eight notes without any rests. Strasheela is a generic constraint programming system for composing music. Anders [2007] uses the Strasheela system to compose ﬁrst species counterpoint based on six rules from music theory . Other constraint programming languages, such as PWCon- straints developed at IRCAM can be used to generate counterpoint, provided the user inputs the correct rules [Assayag et al. 1999b]. Herremans and Sörensen [2012] uses a more extensive set of eighteen melodic and ﬁfteen harmonic rules based on Johann Fux’s theory to generate a cantus ﬁrmus and ﬁrst species counterpoint. The authors implement the rules in an objective function and optimize (increase) the adherence to these rules using a variable neighborhood search algorithm (VNS). VNS is a combinatorial optimization algorithm based on local search proposed by Mladenovi ´ c and Hansen [1997]. Herremans and Sörensen [2012]’s system was also implemented as a mobile app [Herremans and Sorensen 2013], and later extended by adding additional rules based on Fux to generate ﬁfth species coun- terpoint [Herremans et al. 2015a]. A ﬁnal approach to the counterpoint generation problem can be seen in the appli- cation of a machine-learning method to Palestrina-style counterpoint. F arbood and Schoner [2001] implemented a Hidden Markov Model to capture the different rules of such counterpoint; they found the resulting music to be “musical and comparable to those created by a knowledgeable musician. ” Hidden Markov Models, ﬁrst described by Baum and P etrie [1966], are used to model systems that are Markov processes with unobserved (hidden) states , and have since become known for their application in tem- poral pattern recognition [Y amato et al. 1992]. 2.2.2. Harmonizing chorales. The harmonizing of chorales is one of the most popular music generation tasks pertaining to harmony . Chorale harmonization produces highly structured music that has been widely studied in music theory , and a rich body of theoretical knowledge offers clear directions and guidelines for composing in this style. Problem deﬁnition. The problem of c horale harmonization has been formulated com- putationally in a variety of different wa ys. The most common form is to generate three voices designed to harmonize a given melody , usually the soprano voice [Allan and Williams 2005; Ebcio ˘ glu 1988; Geis and Middendorf 2007; Hild et al. 1992; Phon- Amnuaisuk and Wiggins 1999; Tsang and Aitken 1999; Y i and Goldsmith 2007]. In contrast, the Bach-in-a-Box system proposed by McIntyre [1994] aims to harmo- nize a user -created melody , which can form one of any four possible voices . Given a monophonic sequence, the system must generate, using GA, three other notes to form a chord with each melody note while ensuring that the given melodic notes are not mu- tated in the process. The quality of a generated four -part sequence is then measured via ﬁtness functions related to the construction of the chord, the pitch range and mo- tion, the beginnings and endings of chords , smoothness of the chord progressions and chord resolution. ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. A Functional T axonomy of Music Generation Systems 69:15 Some systems simplify the process by assuming that all melody notes are chord tones and chords exist at every melody note, i.e. that the polyphony is homophonic [Phon- Amnuaisuk and Wiggins 1999; McIntyre 1994; Yi and Goldsmith 2007]. In many sys- tems, non-chord tones such as passing notes are added as an after-thought following the establishing of the chord progression; others incorporate explicit considerations of non-chord tones in the generation process [Allan and W illiams 2005; Hild et al. 1992]. Solution approach. As described in [Hild et al. 1992], the results of chorale har - monization can be expressed in multiple ways , including: as a harmonic skeleton, a chord skeleton, or as four full parts complete with passing tones. A harmonic skeleton describes the chord progression as a sequence of symbols—such as roman numerals— that represent the functional role of each chord in the progression; the rhythm is im- plied or considered as given. A chord skeleton shows the constituent notes of each chord without passing tones. Most chorale harmonization systems aim to generate chord skeletons; few cover all three kinds of abstractions . Some systems generate only a harmonic skeleton. F or example, De Prisco et al. [2010]’s system produces functional harmonizations represented as roman numerals with indications of whether the c hord is in the root position or some inversion. The sys- tem proposed by Anders and Miranda [2009] generates the harmonic backbone without requiring melodic input. It is worth noting that the terminology for the abstractions are not used consistently in the literature. F or example, Ebcio ˘ glu [1988] describes chord skeletons as sequences of rhythmless chords with fermatas, like the harmonic skeleton in Hild et al. [1992]. The actual notes including passing tones and suspensions are generated by a ﬁll-in view object that takes the chord skeleton as input in [Ebcio ˘ glu 1988]. Search space and context. The complexity of the harmonization problem is deﬁned by the size of the search space for viable chords. This size is in turn determined by the problem description, which includes the number of chord types and of chords to be generated. The size of the search space is relevant to the number of states in a hid- den Markov models [Allan and Williams 2005], the number of nodes in a neural net- work [Hild et al. 1992], and the length of the chromosome in a genetic algorithm [McIn- tyre 1994]. In chorale harmonization, for a given key , the basic set of chords consists of: I, ii, iii, IV , V , V 7 , vi, and vii o and their positions (root or inversions). The size of the basic set is signiﬁcantly increased if details such as secondary dominant, pivot chords , and key modulations are considered. The size of the search space can also be determined by examining composed examples. Generating chorale harmonization can be approached as an iterative process . Given a melody , the system ﬁrst generates possible conﬁgurations for the chord progression, then modiﬁes the patterns based on certain criteria. F or example, the expert system in [Ebcio ˘ glu 1988] takes the generate-and-test approach using three types of rules implemented as ﬁrst-order logic: production rules, constraints, and heuristics. Sys- tems that use genetic algorithms also follow this iterative nature: the “chromosomes” or “population” are modiﬁed iteratively to improve the quality based on ﬁtness func- tions [McIntyre 1994]. However , this iterative nature becomes computationally expen- sive when the chorales become longer . T o overcome this problem of search space explosion, many researc hers focus on local patterns instead of the entire compositions. This is not only a practical solution for computational reasons, but also a reasonable approach because many of the voice- leading rules in chorales are concerned only with local movements in and between individual voices. ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. 69:16 D . Herremans et al. The modeling of local or short-term patterns is even more prominent in approaches that use neural networks and Markov models. In general, to determine a chord at the current time point, such systems deﬁne the local context by considering the re- cent chord sequences, and the melody note in the previous , current, and immediate future time points [Allan and Williams 2005]. F or example, De Prisco et al. [2010] discuss three models—one considering only the current chord, one incorporating the current and the immediately preceding chord, and one considering the current and the two closest preceding chords—and their combinations to determine the current chord. Eigenfeldt and Pasquier [2009] also proposed a third-order Markov model for chord generation. Cadences. While the cadence is a key harmonic feature used in the delineation of phrase boundaries, the modeling of cadences is not always explicitly addressed in chorale harmonization systems. Even for systems that account for phrase structure, cadences are handled to varying degrees of detail. The generation of cadences is typically achieved through constraints. F or example, ending each phrase with a cadence can be set as a hard constraint for any chord pro- gression [Anders and Miranda 2009]. Cadences can also be induced through heuris- tics [Ebcio ˘ glu 1988] or preferences, say , in cost functions [Phon-Amnuaisuk and Wig- gins 1999; McIntyre 1994] that bias the system towards producing more desirable cadential patterns. F or example, in [Phon-Amnuaisuk and Wiggins 1999], wrong ca- dences are penalized up to 100 points, 10 times more than any other rules governing voice leading, while McIntyre [1994] awarded points for proper tritone resolution, in- cluding transitions from V 7 and vii o to I or vi. In the work of Tsang and Aitken [1999], cadence formation is realized through four rules (out of a total of nine). Also using a rule-based approach, Geis and Middendorf [2007] included a resolution rule as a part of the harmonic score calculation. The gen- erated cadence is then constrained through the rules to be similar to the chosen style. The modeling of cadences as constraints or preference rules can be readily incorpo- rated in systems that use combinatorial approaches such as genetic algorithms and constraint programming. In contrast, cadential closure is discussed almost as a by-product in systems us- ing statistical approaches such as neural networks and Markov models . F or example, in [Hild et al. 1992], harmonic closure relies on explicit coding of the beginnings and endings of phrases . Allan and W illiams [2005] report that their HMM with the V iterbi algorithm generates plausible cadences similar to those in the chosen corpus; little in- formation is provided regarding how the system ensures correct cadences, especially the ones midstream, when seeking the most likely chord progression. Recently , Yi and Goldsmith [2007] proposed an interesting Markov model-based ap- proach: instead of generating the most probable sequence, the authors modeled the harmonization problem as a Markov decision process so that sequences with the high- est rewards, including those considering cadences, are selected. The reward is pro- duced by a utility function, which can be either formulated based on music theory or learned from a dataset. In [Yi and Goldsmith 2007], only two rules are encoded in the utility function: chords for which melodic notes are chord tones are preferred, and authentic cadences are preferred while plagal cadences are acceptable. 2.2.3. General har monization. The general harmonization problem can be considered as one of determining multiple synchronized-note events to ﬁt certain user -deﬁned cri- teria. Compared to chorale harmonization, the problem of generating harmonic se- quences in other genres is less well deﬁned. Unlike chorales in which ﬁtness functions can be established based on well-studied music theoretic rules, the style, cadences, har - ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. A Functional T axonomy of Music Generation Systems 69:17 monic quality , and even chord labels can often be unclear in the general harmonization problem. A number of studies focus on generating harmonizations to user-created melodies in a popular style. Most studies adopt data-driven approaches to determine possible chords for a given melodic segment and to ensure chord-to-chord transitions are com- monly observed in the examples. The systems typically produce a harmonic skeleton and use predeﬁned patterns for creating rhythmic textures and instrumental arrange- ments. Lee and J ang [2004] used the ﬁrst-order Markov model with dynamic programming to determine the harmonic skeleton for a user -hummed tune; the state transition prob- abilities are learned from 150 songs. Simon et al. [2008] took a similar approach; training their system on 298 songs from various genres such as jazz, rock, pop, blues , and others. Both systems are evaluated via subjective feedback from listening experi- ments. A drawback of this approach is that the chord sequences generated tend to be generic and indistinct in style. T o preserve a recognizable style, rather than training on multi-style datasets, Chuan and Chew [2011] focused on music composed by only one artist/band, or even a single piece in the extreme case; the problem of data sparsity is overcome through a chord tone determination module that generates a set of possible chords , and the use of neo- Riemannian operations to ﬁll in missing transitions between chords. The generated chord sequence was evaluated subjectively , and quantitatively using cross entropy . A system for reharmonizing uplifting trance music based on a newly generated chord sequence was developed by [Bigo and Conklin 2015]. The system was extensively tested in an empirical study , in which Agres et al. [2016] found that repetitiveness in harmonic structure and tension, not solely rhythmic structure, is a contributor to listener enjoyment in this form of electronic dance music. 2.2.4. Jazz Chord Sequences. The problem of generating chord sequences uncon- strained by melodic considerations is more frequently seen in jazz. Research on the generation of jazz chord progressions has focused on chord substitution and variation. Steedman [1984] studied 12-bar blues and deﬁned a small set of rules using a gener - ative grammar that produces recognizable 12-bar blues chord progressions. As noted in the article, there was no explicit attempt to generate good chord progressions or to avoid bad ones . Instead, Steedman examined the harmonically meaningful chord progressions and substitutions in order to generate sequences to accompany melodies . Chemillier [2001] provides a scenario in which a chord sequence of n bars is repeated as a loop with variations as a foundational jazz accompaniment to explain why identi- fying substitutions to the original sequence is crucial in jazz improvisation. Although the substitution module in Chemellier’s system randomly applies Steedman’s rules to the chord sequence to generate variations, the author suggests ways that the user could interact with the system to steer the selection. In the next section, we discuss systems for rhythm generation, without regard for pitch. 2.3. Rhythm This section discusses systems that automatically generate rhythm. While some of the above mentioned systems already include aspects of rhythm, such as duration, the focus of this section lies on research that focuses on music generation for percussion instruments. In music generation systems, rhythm is often considered as given or embedded as an attribute of note events. Overall, there exists far fewer systems that solely gen- ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. 69:18 D . Herremans et al. erate rhythm than systems focusing on melody and harmony , but similar modeling approaches have been applied to rhythm generation. T okui and Iba [2000] proposed the CONGA system which combines genetic algo- rithms and genetic programming to produce rhythmic patterns that evolve, with user feedback as ﬁtness function. Short fragments of rhythmic patterns form the chromo- some elements in the genetic algorithm; the manner in which these patterns are con- catenated into a sequence is determined by genetic programming . F or evaluation, par - ticipants use the system to produce rhythmic progressions that sound like rock n’ roll. A genetic algorithm w as also implemented by Ariza [2002] to generate rhythms that consists of a sequence of genetic variations . His ﬁtness function consists of calculating the distance between a rhythm and a user -provided “ﬁt-rhythm” through ﬁve distance measures. The results were not evaluated in the paper . Tidemann and Demiris [2008] used hidden Markov models to learn and generate core patterns and variations similar to examples played by different drummers. Core patterns and variations are deﬁned by the supermaximal repeats (i.e., a repeated pattern that is not part of another pattern) in the melody that correspond to struc- tural parts such as the verse, chorus , and bridge. T o produce more “human-like” drum patterns, note onset times and velocities are modeled as Gaussian distributions with noise. The generated patterns were evaluated via a classiﬁcation task to determine if the generated patterns belong to the same class as the training corpus . Hawryshkewich et al. [2011] also applied statistical approaches to generate rhyth- mic patterns . The Beatback system uses variable-length Markov models to store users’ input via a MIDI drum interface and to generate rhythmic patterns consistent with the users’ styles and pattern complexity . Each drum event is described by its dura- tion, velocity , and instrument such as hi-hat or snare; drum patterns are generated by reproducing highly-likely sequences as observed in the user’s pla ying. Marchini and Purwins [2010] used variable-length Markov models to generate per - cussion sequences, while their system learned and reproduced sequences from audio examples. Percussion sounds are ﬁrst segmented into note events using onset detec- tion; each event is then mapped to symbolic sequences via hierarchical clustering based on acoustic similarity . Preference is given to symbolic labels that maximize tem- poral regularity . T o generate future events that respect the metrical structure, a tem- poral grid is created via beat and tempo detection. Similarity in a social context is explored by W einberg and Driscoll [2006], who cre- ated Haile, an interactive robot that plays the drums. Haile analyses, in real time, perceptual aspects of human players, and decides on one of six interaction modes in which to generate rhythms to play with the human player based on this analysis. The six interaction modes are: imitation, stochastic transformation, perceptual transfor - mation, beat detection, simple accompaniment, and perceptual accompaniment. The next section moves awa y from typical music generation systems and deals with the more complex tasks of orchestrating music and of accounting for timbre in music generation. 2.4. Timbre This section considers the aspect of timbre in music generation. The timbre, also re- ferred to as tone color of sound, is the property that distinguishes different voices and musical instruments. In the context of music generation it forms an important aspect to consider when, for instance, composing music for an orchestra because the timbre of each individual voice has an effect on the perception of the composite sound. The problem of orchestration with a target timbre is often modeled as a combina- torial problem in which the system aims to search a database of instrument sound ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. A Functional T axonomy of Music Generation Systems 69:19 samples to retrieve some combination of a subset of the sounds that produce a similar perceived timbre. In early systems, timbral closeness was only measured using similarity in the fre- quency spectrum. F or example, Psenicka [2003] proposed the SPORCH system, which provides orchestration for any acoustic instrument ensemble to match as best possible any arbitrary sound ﬁle. The database consists of descriptive instrument features such as pitch range, the loudest/softest dynamic levels, and notation information (e.g. clef , transposition, etc.) about sound samples . Orchestration is then determined through it- erative search for the best instrument sample mix with frequency spectral peaks most similar to those of the target ﬁle. Similarly , in Hummel [2005], the system searches iteratively for virtual music in- struments that when synthesized together create speech-like timbres. The algorithm minimizes the difference in spectral envelope between the current sound and the tar- get timbre by iteratively adding sounds that minimize the residual error . McCormack [1996] developed an L-grammar that generates polyphonic music . Their model learns features based on pitch, duration and timbre. The details of the timbral characteristics were not disclosed in the paper . Carpentier et al. [2010] points out that acoustic instrument orchestration is signiﬁ- cantly different from, and more complex than, sound synthesis . They model orchestra- tion as a constrained multi-objective search problem wherein the system aims to ﬁnd combinations of sounds similar to a target timbre . Sound samples stored in a database are represented by their sound attributes and features . Sound attributes are symbolic labels related to discrete variables in compositions, including pitch, dynamics, and playing style; sound features represent psychoacoustic characteristics such as bright- ness and roughness that can be used to quantify perceptual dissimilarity . T o minimize the perceived dissimilarity , the authors used randomly weighted Chebychev aggrega- tion functions to model dissimilarity as a set of mono-objective problems. Finally , a genetic algorithm is employed to ﬁnd the optimal combination given constraints on sound attributes as well as perceptual dissimilarity . More recently , Collins [2012] applied machine learning to automatic composition of electroacoustic music, taking into account the quality of the ﬁnal mix. A piece is com- posed by combining and modifying existing audio segments . The system ﬁrst analyses the features of the audio segments, such as percussive onset patterns and absolute peak amplitude, to produce suitable intermediate material for mixing. The segments are then modiﬁed by applying effects including delays, ﬁlters, and time stretching. The composition is iteratively reﬁned using a density envelope on structural parame- ters such as perceived loudness, sensory dissonance, and increased tension to control the level of activities. The best mix is selected as the one most similar to an exemplar piece using dynamic time warping. A similar approach is employed by Sturm [2006], who uses adaptive concatenative sound synthesis to generate and transform digital sound. Short segments are used to synthesize variations of sound much like a collage, based on a measure of similarity , the L 1 -norm of the difference, of audio features . In the next three sections, aspects newly associated with music generation are dis- cussed, such as interaction, narrative and difﬁculty . The section to follow most imme- diately will explore interactive improvisation systems which are capable of performing together with a human pla yer . Style replication in real-time jazz and other improvisa- tion systems is also addressed in the upcoming section. 2.5. Interaction This section considers systems in which two-way communication between the com- puter and human player(s) exist; both the player and the system listen to what is being played, anticipate, and improvise new music in real time. Previous examples ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. 69:20 D . Herremans et al. addressed music generation in the absence of live interaction with a player , based on the generated music’ s similarity to either a target style or piece. Here, we focus on similarity in a social context and turn to interactive systems , in which the generation algorithm “improvises” in real-time with a player . In this scenario, similarity to the style of the player and self-similarity to what is previously played within the piece become an important goal, thus shifting the focus to the requirements of interaction. While there are computer -assisted composition systems that allow the user/composer to interact with the system and iteratively improve generated so- lutions in a non-performance setting (e.g . [F arbood et al. 2007], most of these systems have been discussed in their respective sections above . In this section the focus lies on real-time performance systems . Early W ork. One of the earliest automatic improvisation systems was created by George Lewis in the 1980s . One of his compositions, called V oyager , is composed in automatic response to a musician playing , as well as to the program’s “own internal processes”. In this early work, the performer is not able to control the system during performance [Lewis 2000]. Structured improvisation. One of the ﬁrst interactive jazz solo generators, GenJam [Biles 1998], generates melody lines over a given chord progression. It listens to a human player’ s last four bars, maps it to a chromosome representation and evolves what it “hears” with a GA into what it will play in real time. Fitness evaluation is performed by a human listener who continually gives feedback, rating the output as “good” or “bad”. Thom [2000] created Band-out-of-the-Box (BoB), an agent built for interactive jazz/blues improvisation of four-bar solos, with the goal of developing a system that is realistic and fun to play with. A probabilistic approach is used, based on variable tree encoding with multiple features—pitch c lass, interval, and melodic direction. The model is trained on warm-up sequences prior to the performance; the features ex- tracted in the warm-up are ﬁrst clustered based on histograms; the resulting statistics are then used during real-time generation to determine the current musical environ- ment. Interactive jazz generation is explored further in research by Franklin [2001], who uses a set of rudimentary rules for jazz and a neural network in combination with reinforcement learning to trade fours between a musician and the system. The system, called CHIME, has a stochastic element that allows for out-of-chord changes , which the author suggests can be done more pointedly and purposefully in future research. The author also points out that the hard coded rules do not encompass the developing of a statement or the creation of a shape. F ree improvisation. A second type of improvisation system generates music more freely with a performer , in real-time, without a ﬁxed, predeﬁned structure. Pachet [2003]’s Continuator uses a Lempel-Ziv parsing algorithm [Assayag et al. 1999a]— adapted to properly handle rhythm, beat, harmony and imprecision—to learn the char - acteristics of any style . It is able to concurrently learn and generate a stream of music that is similar to a style such as jazz or a player’ s own style; it can generate music, either as a standalone system, as continuations of a performer’s input, or as an inter- active improvisation backup. By aggregating clusters of notes and treating them as units, the Continuator is able to handle polyphonic music . Using a different data structure, the factor oracle, improvisation systems belonging to the OMax family Assayag et al. [2006] can also concurrently encode and generate music in a player’ s style, and handle polyphonic music . The factor oracle is a ﬁnite state automaton, originally designed to efﬁciently search for substrings (factors) in a ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. A Functional T axonomy of Music Generation Systems 69:21 text [Allauzen et al. 1999]. More recent extensions further allow the OMax system to handle audio signals instead of symbolic MIDI; the resulting audio-based system is called Ofon. OMax’ s approach has also been applied to speech to simulate rap, and to video frames [Bloch et al. 2008]. Assayag et al. [2010] and Dubnov and Assayag [2012] further experimented with audio-based factor oracles to improvise music resulting in systems such as the OMax- Ofon system developed by Assayag et al. [2010]. The system developed by Dubnov and Assayag [2012] produces variations from an audio recording using a graph of repeated factors found in the recording; the system’s main challenge consisting of the marking and allocating of regions in the original audio that are deemed most promising for the oracle to focus on in order to achieve the desired result. T o do this, Dubnov and As- sayag introduce an analysis method based on Information Rate (a concept previously linked to musical anticipation [Dubnov 2006]); in contrast to the previous factor ora- cle systems, the audio analysis is done in advance, and a performance is in a sense pre-planned. The earlier OMax systems are agnostic to rhythmic and longer-range structures . Nika et al. [2015]’s system ImproteK, also based on the factor oracle, generates both rhythms and harmonies. It was recently built into an architecture that aims at com- bining reactivity and anticipation in the music generation processes steered by a “sce- nario”, which can be a four-bar jazz chord sequence. The system composes off-line given a scenario; during the performance, this can be re-written to ﬁt the performance . P erformer feedback. Mimi [François et al. 2007], like the OMax family of interactive improvisation systems, is based on the factor oracle. A novelty of this system is that it allows the user to visually see the recent past and future generated music, so as to be aware of the musical context and to plan a response. The performer is also the operator of Mimi, controlling her learning rate, switching on and off the learning, and clearing her memory [Schankler et al. 2014]. A variation on Mimi, Mimi4x [François et al. 2013], allows the user to control four interacting Mimi instances and to structure an improvisation by deciding when and which of the four Mimi-generated streams start and stop, their re-combination rates , and the playback dynamic levels. Other interactions. Besides the above-mentioned systems, in which music is gener- ated in partnership with a human musician, some systems use other types of inter - actions. A system created by Marsden [2004] generates melodies based on the move- ments of a dancer , combined with elaborations based on Schenkerian analysis. The system is given a “background”, which consists of one note per bar , key and meter information. Depending on the speed of the dancer’s movements, more target notes are generated. Gait is another determining inﬂuence, for example, a crouching gait is associated with high regularity . The output of the system follows harmonic and inter - vallic patterns found in real music, yet lacks subdivision into meaningful phrases . The association, reﬂecting similarity , between the movements of a dancer and changes in melody is credible, although it does not convey the feeling that the dancer is controlling the music, mostly because of a slight lag in the system. The interactive music impro- visation system (SICIB) developed by Morales-Manzanares et al. [2001] has a similar setup, as it detect motion from sensors attached to dancers, and uses a rule-based ap- proach to translate this to music. Motion characteristics such as curvature and torsion of movements, and velocity and acceleration are taken into account. The generation is performed by the Escamol system which uses grammar -rules [Morales-Manzanares 1992] and real-time synthesis by Aura [Dannenberg and Brandt 1996] Another interactive system, the robot drummer called Haile, developed by W einberg and Driscoll [2006] is discussed in more detail in Section 2.3, whic h calls out the rhyth- ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. 69:22 D . Herremans et al. mic aspects of music generation systems. The next section will explore music with a narrative, which inc ludes game music and video background music. emotion 2.6. Narrative Narrative music is music that tells a story . The narration consists of a set of repre- sentational, organizational, and discursive cues that deliver story information to the audience. In this section we focus on different types of narratives, such as tension proﬁles, the blending of fragments for game music , leitmotifs, and ﬁlm music. Narrative cues create structure within music, including variation in emotions evoked throughout a piece (synchronized with video or game play), tension proﬁles, leitmotifs, repeated patterns and others . The enforcing of narrative structure can lead to similarity within a piece (e.g. repeated patterns and motifs) or , on a higher -level, be- tween emotions evoked by the music and simultaneous media such as video or games . In recent years, there has been increased interest in creating background music for games and video. The goal in these types of composition problems is that the music should match the content or emotional content of a scene/narrative. The idea of pro- gram music, music having an extra-musical narrative or purpose, is an old one. F or example, “the adventures of Don Quixote” composed by Richard Strauss and Hector Berlioz’s “Symphonie fantastique” both derive inspiration from extra-musical sources . Inherent in music with a narrative is the existence of long term structure, which was already touched upon in Section 2.1; the discussion continues in the following para- graphs. T ension. An important tool for evoking emotion is the use of musical tension. Au- dio features , such as roughness, have been shown to correlate with perceived ten- sion/relaxation patterns in music [V assilakis 2005]. F arbood [2012] conducted an ex- tensive experiment to build a perceptual tension model that takes into account the dynamic, temporal aspects of listening. F arbood models tension in terms of multiple musical parameters, inclusive of both audio and score-based features. F arbood et al. [2007] also created Hyperscore, a graphical, computer-assisted composition system in which users can intuitively edit and visualize musical structures. Both low-level and high-level musical features (such as tone color , melodic shape, dynamics , harmonic ten- sion) are mapped to graphical elements that users can control and which allows them to create compositions. This allows users to, for instance, draw a tension line for the new composition. Browne and F ox [2009] used simulated annealing to arrange pre-written motifs ac- cording to a pre-speciﬁed musical tension proﬁles . The tension proﬁl es were computed using an ANN model, and Kullback–Leibler divergence was employed to measure the distance between the desired and the observed tension proﬁles. More recently , Herremans and Chew [2016a] used a tonal tension model based on the spiral array [Herremans and Chew 2016b] to calculate tension of a polyphonic piece. The algorithm, called MorpheuS , constrains the detected patterns and generates music that ﬁts as best possible a given tension proﬁle. This tension proﬁle can be provided by the user or calculated based on a template piece. The generation process is guided by a variable neighborhood search algorithm. The breaking of rules that govern W estern tonal music elicits tension. In Rutherford and Wiggins [2002]’ s scary music study , more scary music is generated by breaking the W estern tonal music rules . The results were veriﬁed by human listeners who noted the scariness dimensions of the generated music. Blending. Game music is most frequently generated by cross-fading between audio ﬁles each time the player shifts from one game state to another [Collins 2008]. An ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. A Functional T axonomy of Music Generation Systems 69:23 exception is the music for Depression Quest 3 which generates music dynamically as one moves through the different scenarios in the game. Brian Eno, a composer known for creating generative systems for ambient music, collaborated with Maxis/EA games to create a soundtrack generation system for the game “Spore”, in which the music changes based on a gamer’ s style of pla y [Johnson 2006]. Details of how these systems work are not publicly available. In traditional games that use cross-fading, however , it is not uncommon for the two fragments to clash rhythmically or harmonically . The clash can be ameliorated by techniques such as crossfading quickly , which can be dis- tracting or jarring. Müller and Driedger [2012] devised an automatic DJ system for crossfading that ensures smooth blending, yet still requires the audio fragments to be harmonically and rhythmically similar . Smooth blending can be improved by restricting the range of allowed rhythms and harmonies; however , this would also restrict the musical varia- tions and expressive capacity of the music. In order to solve this problem, Prechtl et al. [2014a] created a real-time system that generates the music from scratch instead of using existing fragments. The music generation process uses a stochastic model and takes into account emotion parameters such as alarm or danger . Leitmotifs. The system developed by Brown [2012] focuses on “Leitmotifs”, short and distinctive musical fragments associated with game characters and elements, a strat- egy commonly employed in W estern opera. Each of these motifs are embedded in differ- ent musical forms; each musical form is associated with different degrees of harmonic tension and formal regularity , thus conveying different amounts of “markedness”. In combination, the leitmotifs and forms correspond to different states of the story of a game. See Collins [2009] for a more complete overview of procedural music in games . Film music. Music with a narrative is frequently used as background music to ﬁlms . The effect that music has on perceived emotion in ﬁlm has been studied by P arke et al. [2007]. When mapping perceived emotion to a three-dimensional space of stress , activity , and dominance, the geometrical center of mass of the three perceived emotions (in this space) when experiencing ﬁlm and music combined is found to be in between that of the participants who listened to music alone and watched ﬁlm alone. In the study , the ﬁlm clips were selected for their ambiguous meanings. Prechtl et al. [2014b] argue for the need for thorough empirical evaluation when generating music purported to communicate particular emotions. Nakamura et al. [1994] created a prototype system that automatically generates sound effects and background music for short video animations. Music—harmony , melody and rhythm—is generated for each scene, taking into consideration the mood, the intensity of the mood, and the musical key used in the preceding scene. The char- acteristics and intensity of the movements on screen determine the sound effects. An- other example application for video bac kground music is MAgentA, created by Casella and P aiva [2001], the goal for which is to generate “ﬁlm-like” music using the mood of the environment within which it is embedded. The ﬁnal functional aspect in music generation systems , that of the instrument pla y- ing difﬁculty , is discussed in the next section. 2.7. Difﬁculty The difﬁculty of a piece of music refers to the level of skill required for a musician to play the piece. When automatically composing musical pieces , the manipulation of features such as melody , harmony , rhythm, and timbre often rise to the fore, and 3 https://isaacschankler .bandcamp.com/album/depression- quest- ost ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. 69:24 D . Herremans et al. ergonomic goals such as ease of playing are often ignored. One could argue that if a model is trained on existing pieces using the appropriate feature set, a new piece that is sampled from this model should be equally playable. It would be interesting to explicitly measure difﬁculty to verify this causality . Thus , generating a piece of similar playability to pieces in a corpus, or of a predeﬁned difﬁculty level for an instrument could be the main goal of a music generation system. Music generation according to difﬁculty level. Tuohy and P otter [2005] developed a genetic algorithm that generates playable guitar music by minimizing hand and ﬁnger movements. More recently , McVicar et al. [2014] automatically generated lead and rhythm guitar music in tablature notation, based on a given chord and key sequence. Sébastien et al. [2012] implemented a system that measures the difﬁculty of a piano piece based on seven different characteristics including harmony , ﬁngering , polyphony , and irregularity of rhythm. Such a system could easily be improved with systems for automatically computing piano ﬁngerings [Lin and Liu 2006; De Prisco et al. 2012; Balliauw et al. 2017] and string instrument ﬁngerings [Sayegh 1989; Radisavljevic and Driessen 2004; Radicioni et al. 2004]. The combination of ﬁngering and difﬁculty evaluation systems with music generation systems provides an opportunity to evaluate pieces in a non-traditional, yet essential wa y . 3. FUTURE CHALLENGES Over the last few decades, research in music generation has achieved tremendous progress in generating well-deﬁned aspects of music suc h as melody , harmony , rhythm, and timbre. State-of-the-art statistical models, advanced optimization techniques , larger digital databases on which to train models, and increase in computing power have all led to the ﬁeld producing better systems. Why then are we not using music generation systems in our day to day lives? The above survey shows that an important overarching challenge remains: that of creating music with long-term structure . Long-term structure, which often takes the form of recurring themes, motivs, and patterns, is an essential part of any music listening experience [Lerdahl and J ackend- off 1983]. Recent music generation systems have tackled this challenge by constrain- ing certain types of long-term structure, such as recurrent patterns [Herremans and Chew 2016a], form [T anaka et al. 2016; Herremans et al. 2015b], cadence [Cunha et al. 2016], and pitch contour [P achet and Roy 2001]. Secondly , developments in the ﬁeld of deep learning [Eck and Schmidhuber 2002; Boulanger -Lewandowski et al. 2012] show that neural networks can incorporate memory structures when learning sequential data. The ability of techniques such as RNN and LSTM to capture long-term structure should be further investigated. In order to make computer generated music systems part of our daily lives, there is a crucial need for more “intelligent” systems in which newly composed music matches higher -level concepts. This intelligence can be expressed in the functional domain of the “narrative”. While there are recent attempts at generating music with tension [F ar - bood et al. 2007; Herremans and Chew 2016a], that matches a computer game [Prechtl et al. 2014a], that embody leitmotivs [Brown 2012] or that accompany ﬁlm [Nakamura et al. 1994], and others , there is still ample room for better understanding the connec- tion between music and emotion, so as to integrate this crucial relationship in music generation systems . This could lead to real-life practical applications such as real-time music generation for games, and bac kground music for ﬁlm and video. While machine learning techniques can be extremely useful in tasks such as the above-mentioned modeling of emotion in music, they usually require large amounts of data. Therefore, the ﬁeld has seen an ongoing need for more data. There lies a real potential for future work to move towards intelligent systems that do not require ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. A Functional T axonomy of Music Generation Systems 69:25 copious amounts of data, that are capable of innate reasoning, and thus better mirror the workings of the human mind. This would also solve the continuous challenge of ﬁnding a balance between regenerating existing music and novel fragments without plagiarizing, as touc hed upon in Section 1.3. One of the characteristics of computer generated music that is often neglected is playing difﬁculty . While one application would be to tailor novel music to a certain level of musician skill, there is also potential for using detected/calculated playing difﬁculty as an evaluation measure for generated music. A challenge related to making music generation systems usable by the general public is, not only the quality of the generated musical content, but also the quality of the rendering. While this is not an aspect that we explicitly surveyed in this paper , it nevertheless is important in creating a real-life applications. In recent years, the ﬁeld of automatic music production has gained increasing traction. Research topics in this ﬁeld include human-like rendering of midi ﬁles with expressive timing [Bresin and Friberg 2000; Grachten et al. 2014] and automatic mixing [Deruty 2016]. Furthering the development of systems for realistic rendering of generated music, which is often in MIDI, will stimulate the attractiveness and usability of music that is generated automatically . The success of music generation systems is not only measured through the prac- tical adoption of the systems. Over the course of the years, researchers have adopted multiple methods for evaluating the output of systems, as outlined in Section 1.3. It re- mains difﬁcult, however , to objectively compare different systems as they usually take different input parameters, generate different aspects of music, are trained on differ - ent styles, or do not have audio examples available for the reader . Furthermore, in listening experiments , the Mere-exposure effect [Zajonc 1968] will make listeners pre- fer existing pieces over new ones, as familiarity causes a higher enjoyment. T o address this need for the proper comparison of systems to assess the state-of-the art, the au- thors of this paper have set up a publicly accessible repository of computer generated music systems 4 . Apart from the goal of stimulating the visibility of music generation systems and their outputs, this online repository will facilitate the comparison of sys- tems by collecting detailed information such as the nature of the system’s input/output, and potential manual corrections performed. 4. CONCLUSIONS This article has presented a taxonomy for the key concepts that form the functional goals of music generation systems. W e then provided a survey of the state-of-the-art in music generation systems with respect to this functional taxonomy . By focusing on what current systems can and cannot do, rather than the algorithmic techniques, we obtain a clearer view of the frontiers of automatic music generation, thus setting the stage for new breakthroughs. This approach has allowed us to identify uncharted areas and challenges for the ﬁeld of automatic music composition. In line with the current trend of companies such as Google (through the Magenta project) and Jukedeck, music generation systems will become ever more prominent in our day to day lives. The functional overview of systems described in this paper shows the areas with opportunities for further advancement to make automatic music generation a viable tool for applications ranging from artistic innovation to the cre- ation of adaptive , copyright-free music for games and videos. Current c hallenges of the ﬁeld include generating music with long-term structure; capturing higher-level con- tent such as emotion and tension; creating models that possesses innate reasoning so as to reduce the amount of training data needed; and the promotion of transparent and 4 http://dorienherremans.com/cogemur ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. 69:26 D . Herremans et al. objective evaluation methods. In order to facilitate the latter and stimulate visibility and evaluation of current music generation systems , the authors have set up an online repository for computer generated music results 4 . References K. Agres, J .E. DeLong, and M. Spivey. 2009. The sparsity of simple recurrent networks in musical structure learning. In Proceedings of the 31th Annual Conference of the Cognitive Science Society . 3099–3104. K. Agres, J . F orth, and G.A. Wiggins. 2017. Evaluation of Musical Creativity and Musical Metacreation Systems. Computers and Entertainment 14, 3, Article 3 (Jan. 2017), 33 pages. K. Agres, D . Herremans, L. Bigo, and D . Conklin. 2016. Harmonic Structure Predicts the Enjoyment of Uplifting Trance Music. F rontiers in Psychology 7 (2016), 1999. G . Aguilera, J . Luis Galán, R. Madrid, A.M. Martínez, Y . Padilla, and P . Rodríguez. 2010. Automated generation of contrapuntal musical compositions using probabilis- tic logic in Derive. Mathematics and Computers in Simulation 80, 6 (2010), 1200– 1211. M. Allan and C .K.I. Williams. 2005. Harmonising chorales by probabilistic inference. Advances in neural information processing systems 17 (2005), 25–32. C . Allauzen, M. Crochemore, and M. Rafﬁnot. 1999. F actor oracle: A new structure for pattern matching. In International Conference on Current T rends in T heory and Practice of Computer Science . Springer , 295–310. A. Alpern. 1995. T echniques for algorithmic composition of music . On the web: http://hamp . hampshire. edu/˜ adaF92/algocomp/algocomp 95 (1995). T . Anders. 2007. Composing music by composing rules: Design and usage of a generic music constraint system . Ph.D . Dissertation. Queen’s University Belfast. T . Anders and E.R. Miranda. 2009. A computational model that generalises Schoen- berg’s guidelines for fa vourable chord progressions. In Proceedings of the Sound and Music Computing Conference . 48–52. C . Ariza. 2002. Prokaryotic Groove: Rhythmic Cycles as Real-V alue Encoded Genetic Algorithms.. In Proceedings of the International Computer Music Conference (ICMC) . 561–568. G . Assayag, G. Bloch, and M. Chemillier . 2006. OMax-ofon. Sound and Music Com- puting (SMC) 2006 (2006). G . Assayag, G . Bloch, A. Cont, and S . Dubnov . 2010. Interaction with machine impro- visation. In T he Structure of Style . Springer , 219–245. G . Assayag, S . Dubnov, and O. Delerue. 1999a. Guessing the Composer’s Mind. In Proceedings of the International Computer Music Conference (ICMC . Bejing, China, 1–1. G . Assayag, C . Rueda, M. Laurson, C . Agon, and O . Delerue. 1999b. Computer -assisted composition at IRCAM: from PatchW ork to OpenMusic. Computer Music Journal 23, 3 (1999), 59–72. M. Balliauw, D. Herremans, D . Palhazi Cuervo, and K. Sörensen. 2017. A variable neighbourhood search algorithm to generate piano ﬁngerings for polyphonic sheet music. International T ransactions Of Operations Research, Special Issue on V ariable Neighbourhood Search 24 (2017), 509–535. Issue 3. L.E. Baum and T . P etrie. 1966. Statistical inference for probabilistic functions of ﬁnite state Markov chains. The annals of mathematical statistics 37, 6 (1966), 1554–1563. B . Bemman and D. Meredith. 2016. Generating Milton Babbitt’s all-partition arrays . J ournal of New Music Research 45, 2 (2016), 184–204. Y . Bengio, P . Simard, and P . Frasconi. 1994. Learning long-term dependencies with ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. A Functional T axonomy of Music Generation Systems 69:27 gradient descent is difﬁcult. Neural Networ ks , IEEE T ransactions on 5, 2 (1994), 157–166. L. Bigo and D . Conklin. 2015. A viewpoint approach to symbolic music transforma- tion. In International Symposium on Computer Music Multidisciplinary Research . Springer , 213–227. J .A. Biles. 1998. Interactive GenJam: Integrating real-time performance with a genetic algorithm. In Proceedings of the 1998 international computer music conference . 232– 235. J .A. Biles. 2001. Autonomous GenJam: eliminating the ﬁtness bottleneck by elimi- nating ﬁtness. In Proceedings of the GECCO-2001 W orkshop on Non-routine Design with Evolutionary Systems . Morgan Kaufmann, San Francisco , California, USA. G . Bloch, S . Dubnov, and G. Assayag. 2008. Introducing video features and spectral descriptors in the OMax improvisation system. In International Computer Music Conference , V ol. 8. N . Boulanger-Lew andowski, Y . Bengio, and P . Vincent. 2012. Modeling T emporal De- pendencies in High-Dimensional Sequences: Application to Polyphonic Music Gen- eration and Transcription. In Proceedings of the 29th International Conference on Machine Learning (ICML-12) , John Langford and J oelle Pineau (Eds.). ACM, New Y ork, NY , USA, 1159–1166. R. Bresin and A. Friberg. 2000. Emotional coloring of computer-controlled music per - formances. Computer Music J ournal 24, 4 (2000), 44–63. F .P . Brooks, A.L. Hopkins, P .G . Neumann, and W .V . Wright. 1957. An experiment in musical composition. Electronic Computers , IRE T ransactions on 3 (1957), 175–182. D . Brown. 2012. Mezzo: An adaptive, real-time composition program for game sound- tracks . In Proceedings of the AIIDE W orkshop on Musical Metacreativity . T . M. Browne and C. F ox. 2009. Global Expectation-Violation as ﬁtness function in evo- lutionary composition. In Applications of Evolutionary Computing . Springer , 538– 546. G . Carpentier , G . Assayag , and E. Saint-James. 2010. Solving the musical orches- tration problem using multiobjective constrained optimization with a genetic local search approach. Journal of Heuristics 16, 5 (2010), 1–34. P . Casella and A. P aiva. 2001. Magenta: An architecture for real time automatic com- position of background music. In Intelligent V irtual Agents . Springer , 224–232. M. Chemillier. 2001. Improvising jazz chord sequences by means of formal grammars. In J ournées d’informatique musicale . 121–126. E. Chew. 2014. Mathematical and Computational Modeling of T onality: Theory and Applications . V ol. 204. Springer , New Y ork. C .-H. Chuan and E. Chew. 2011. Generating and evaluating musical harmonizations that emulate style. Computer Music J ournal 35, 4 (2011), 64–82. K. Collins. 2008. Game sound: an introduction to the history , theory , and practice of video game music and sound design . Mit Press . K. Collins. 2009. An introduction to procedural music in video games. Contemporary Music Review 28, 1 (2009), 5–15. N . Collins. 2012. Automatic composition of electroacoustic art music utilizing machine listening. Computer Music J ournal 36, 3 (2012), 8–23. D . Conklin. 2003. Music generation from statistical models. In Proceedings of the AISB Symposium on Artiﬁcial Intelligence and Creativity in the Arts and Sciences . Aberys- twyth, W ales, 30–35. D . Conklin and I. W itten. 1995. Multiple viewpoint systems for music prediction. Jour - nal of New Music Research 24, 1 (1995), 51–73. D . Cope. 1996. Experiments in musical intelligence . V ol. 12. AR editions Madison, WI. D . Cope. 2000. New directions in music . W aveland Press . ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. 69:28 D . Herremans et al. D . Cope. 2004. A musical learning algorithm. Computer Music J ournal 28, 3 (2004), 12–27. N . Cunha, A. Subramanian, and D. Herremans. 2016. Uma abordagem baseada em programação linear inteira para a geração de solos de guitarra. XL VIII Simpósio Brasileiro de P esquisa Operacional (SBPO) (09/2016 2016). R.B . Dannenberg and E. Brandt. 1996. A ﬂexible real-time software synthesis sys- tem. In Proceedings of the International Computer Music Conference, International Computer Music Association . 270–273. S . Davismoon and J . Eccles . 2010. Combining musical constraints with Markov tran- sition probabilities to improve the generation of creative musical structures. In Ap- plications of Evolutionary Computation . Springer , 361–370. R. De Prisco, A. Eletto, A. T orre, and R. Zaccagnino. 2010. A Neural Network for Bass Functional Harmonization. Applications of Evolutionary Computation 6025 (2010), 351–360. R. De Prisco, G . Zaccagnino, and R. Zaccagnino . 2012. A differential evolution algo- rithm assisted by ANFIS for music ﬁngering. In Swarm and Evolutionary Compu- tation . Springer , 48–56. E. Deruty. 2016. Goal-Oriented Mixing. In Proceedings of the 2nd AES W orkshop on Intelligent Music Production , V ol. 13. London. M. Dostál. 2013. Evolutionary music composition. In Handbook of Optimization . Springer , 935–964. S . Dubnov. 2006. Spectral anticipations. Computer Music Journal 30, 2 (2006), 63–83. S . Dubnov and G. Assayag. 2012. Music design with audio oracle using information rate. In Musical Metacreation: P apers from the 2012 AIIDE W orkshop . M.O . Duff. 1989. Backpropagation and Bach’ s 5th cello suite (Sarabande). In Proceed- ings of the International J oint Conference on Neural Networks . 575. K. Ebcio ˘ glu. 1988. An expert system for harmonizing four -part chorales . Computer Music J ournal 12, 3 (1988), 43–51. D . Eck and J . Schmidhuber . 2002. A ﬁrst look at music composition using lstm re- current neural networks. Istituto Dalle Molle Di Studi Sull Intelligenza Artiﬁciale (2002). A. Eigenfeldt and P . P asquier. 2009. A realtime generative music system using au- tonomous melody , harmony , and rhythm agents. In 12th Generative Art Conference GA . M.M. F arbood. 2012. A parametric, temporal model of musical tension. Music P ercep- tion 29, 4 (2012), 387–428. M. F arbood, H. Kaufman, and K. Jennings . 2007. Composing with hyperscore: An intu- itive interface for visualizing musical structure. In Proceedings of the International Computer Music Conference (ICMC) , V ol. 59. M. F arbood and B . Schoner. 2001. Analysis and synthesis of P alestrina-style coun- terpoint using Markov chains. In Proceedings of the International Computer Music Conference . 471–474. J .D. F ernández and F . Vico. 2013. AI methods in algorithmic composition: A compre- hensive survey . J ournal of Artiﬁcial Intelligence Research (2013), 513–582. A.R.J . François, E. Chew, and D. Thurmond. 2007. Visual feedback in performer- machine interaction for musical improvisation. In Proceedings of the 7 th interna- tional conference on New interfaces for musical expression . ACM, 277–280. A.R.J . François, I. Sc hankler, and E. Chew. 2013. Mimi4x: An interactive audio-visual installation for high-level structural improvisation. International J ournal of Arts and T echnology 6, 2 (2013), 138–151. J .A. Franklin. 2001. Multi-phase learning for jazz improvisation and interaction. In Proceedings of the Eighth Biennial Symposium for Arts & T echnology . ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. A Functional T axonomy of Music Generation Systems 69:29 J .A. Franklin. 2006. Recurrent neural networks for music computation. INFORMS J ournal on Computing 18, 3 (2006), 321–338. J .J . Fux and A. Mann. 1971. The study of counterpoint from Johann Joseph Fux’s Gradus Ad P arnassum - 1725 . Norton, New Y ork. M. Geis and M. Middendorf. 2007. An ant colony optimizer for melody creation with baroque harmony . In IEEE Congress on Evolutionary Computation . 461–468. S . Gherman. 2008. Harmony and its Functionality: A Gloss on the Substantial Simi- larity T est in Music Copyrights. F ordham Intell. Prop . Media & Ent. LJ 19 (2008), 483. J . Gillick, K. T ang, and R.M. K eller. 2010. Machine learning of jazz grammars. Com- puter Music J ournal 34, 3 (2010), 56–66. M. Grachten, C .E. Cancino Chacón, and G. Widmer . 2014. Analysis and prediction of expressive dynamics using Bayesian linear models. In Proceedings of the 1st interna- tional wor kshop on computer and robotic Systems f or Automatic Music P erformance . 545–552. A. Hawryshkewic h, P . P asquier, and A. Eigenfeldt. 2011. Beatback: A real-time in- teractive percussion system for rhythmic practise and exploration. In Proceedings of the NIME Conference . 100–105. S .A. Hedges. 1978. Dice music in the eighteenth century . Music & Letters 59, 2 (1978), 180–187. D . Herremans and E. Chew. 2016a. MorpheuS: Automatic music generation with re- current pattern constraints and tension proﬁles . IEEE TENCON (November 2016). D . Herremans and E. Chew. 2016b. T ension ribbons: Quantifying and visualising tonal tension. In Second International Conference on T echnologies for Music Notation and Representation (TENOR) . Cambridge, UK. D . Herremans and C .-H. Chuan. 2017. Modeling Musical Context with W ord2vec. First International W orkshop On Deep Learning and Music 1 (05/2017 2017), 11–18. D . Herremans and K. Sörensen. 2012. Composing ﬁrst species counterpoint with a variable neighbourhood search algorithm. J ournal of Mathematics and the Arts 6, 4 (2012), 169–189. D Herremans and K Sörensen. 2013. Composing Fifth Species Counterpoint Music With A V ariable Neighborhood Search Algorithm. Expert Systems with Applications 40, 16 (2013), 6427–6437. D . Herremans and K. Sorensen. 2013. FuX, an android app that generates coun- terpoint. In Computational Intelligence for Creativity and Affective Computing (CI- CAC), 2013 IEEE Symposium on . 48–55. D . Herremans, K. Sörensen, and D. Martens. 2015a. Classiﬁcation and generation of composer-speciﬁc music using global feature models and variable neighborhood search. Computer Music Journal 39 (2015), 91. D . Herremans, S . W eisser, K. Sörensen, and D . Conklin. 2015b. Generating structured music for bagana using quality metrics based on Markov models. Expert Systems with Applications 42, 21 (2015), 7424–7435. H. Hild, J . F eulner , and W . Menzel. 1992. HARMONET : A neural net for harmoniz- ing chorales in the style of JS Bach. In Advances in Neural Information Processing Systems . 267–274. L. Hiller. 1989. Lejaren Hiller: Computer Music Retrospective (1957-1985) . W ergo Schallplatten. 79 pages . A. Hiller Jr , L. and L.M. Isaacson. 1957. Musical composition with a high speed digital computer . In Audio Engineering Society Convention 9 . Audio Engineering Society . S . Hochreiter and J . Schmidhuber . 1997. Long short-term memory . Neural computa- tion 9, 8 (1997), 1735–1780. J .H. Holland. 1992. Adaptation in natural and artiﬁcial systems: an introductory anal- ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. 69:30 D . Herremans et al. ysis with applications to biology , control, and artiﬁcial intelligence . MIT press. A. Horner and D .E. Goldberg. 1991. Genetic algorithms and computer -assisted music composition. Urbana 51, 61801 (1991), 437–441. C .Z.A. Huang and E. Chew. 2005. P alestrina Pal: a grammar checker for music com- positions in the style of Palestrina. In Proceedings of the 5th Conference on Under - standing and Creating Music . T .A. Hummel. 2005. Simulation of human voice timbre by orchestration of acoustic mu- sic instruments . In Proceedings of International Computer Music Conference (ICMC) . 185. D .B . Huron. 2006. Sweet anticipation: Music and the psychology of expectation . MIT press. S . J ohnson. 2006. The long zoom. New Y ork times magazine (2006), 50–55. R.M. K eller and D .R. Morrison. 2007. A grammatical approach to automatic improvi- sation. In Proceedings , F ourth Sound and Music Conference , Lefkada, Greece , Jul y . 330–336. S . Kirkpatrick, C .D. Gelatt, and M.P . V ecchi. 1983. Optimization by simulated anneal- ing. science 220, 4598 (1983), 671–680. J .P . Kirnberger . 1757. Der Allezeit fertige Menuetten-und P olonoisenkomponist. Berlin: Germany Winter (1757). C .L. Krumhansl. 2001. Cognitive foundations of musical pitch . Oxford University Press. S . Lattner , M. Grachten, K. Agres, and C .E.C. Chacón. 2015. Probabilistic segmen- tation of musical sequences using restricted Boltzmann machines. In International Conference on Mathematics and Computation in Music . Springer , 323–334. H.-R. Lee and J .S. R. Jang. 2004. i-Ring: A system for humming transcription and chord generation. In Multimedia and Expo , 2004. ICME’04. 2004 IEEE International Conference on , V ol. 2. IEEE, 1031–1034. F . Lerdahl and R. Jackendoff. 1983. An overview of hierarchical structure in music. Music P erception: An Interdisciplinary J ournal 1, 2 (1983), 229–252. G .E. Lewis. 2000. T oo many notes: Computers, complexity and culture in voyager . Leonardo Music J ournal 10 (2000), 33–39. J .P . Lewis. 1991. Creation by reﬁnement and the problem of algorithmic music compo- sition. Music and connectionism (1991), 212. T .R. Lezon, J .R. Banavar, M. Cieplak, A. Maritan, and N .V . F edoroff. 2006. Using the principle of entropy maximization to infer genetic interaction networks from gene expression patterns. Proceedings of the National Academy of Sciences 103, 50 (2006), 19033–19038. Y .J . Liebesman. 2007. Using Innovative T echnologies to Analyze for Similarity Be- tween Musical W orks in Copyright Infringement Disputes. AIPLA QJ 35 (2007), 331. C .-C . Lin and D . S .-M. Liu. 2006. An intelligent virtual piano tutor . In Proceedings of the 2006 ACM international conference on V irtual reality continuum and its applica- tions . ACM, 353–356. A. Lovelace. 1843. ‘Notes on L. Menabrea’s ‘Sketch of the Analytical Engine Invented by Charles Babbage, Esq. ”’. T aylor’s Scientiﬁc Memoirs 3 (1843). M. Marchini and H. Purwins. 2010. Unsupervised generation of percussion sound sequences from a sound example. In Sound and Music Computing Conference , V ol. 220. A. Marsden. 2004. Novagen: a combination of Eyesweb and an elaboration-network representation for the generation of melodies under gestural control.. In COST287- ConGAS Symposium on Gesture Interfaces for Multimedia Systems . J . McCormack. 1996. Grammar based music composition. Complex systems 96 (1996), ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. A Functional T axonomy of Music Generation Systems 69:31 321–336. W .S . McCulloch and W . Pitts. 1943. A logical calculus of the ideas immanent in nervous activity . The bulletin of mathematical biophysics 5, 4 (1943), 115–133. R.A. McIntyre. 1994. Bach in a box: The evolution of four part baroque harmony using the genetic algorithm. In Evolutionary Computation, 1994. IEEE W orld Congress on Computational Intelligence . IEEE, 852–857. M. McVicar, S. Fukayama, and M. Goto. 2014. AutoLeadGuitar: Automatic generation of guitar solo phrases in the tablature space . In Signal Processing (ICSP), 2014 12th International Conference on . IEEE, 599–604. D . Meredith. 2013. COSIATEC and SIATECCompress: P attern discovery by geometric compression. In International Society for Music Information Retrieval Conference . N . Mladenovi ´ c and P . Hansen. 1997. V ariable neighborhood search. Computers & operations research 24, 11 (1997), 1097–1100. J .A. Moorer. 1972. Music and computer composition. Commun. ACM 15, 2 (1972), 104–113. R. Morales-Manzanares. 1992. Non-Deterministic Automatons Controlled by Rules for Composition. In Proceedings of the International Computer Music Conference . 400– 400. R. Morales-Manzanares, E.F . Morales, R. Dannenberg, and J . Berger. 2001. SICIB: An interactive music composition system using body movements. Computer Music J ournal 25, 2 (2001), 25–36. M.C . Mozer. 1991. Connectionist music composition based on melodic, stylistic and psychophysical constraints . Music and connectionism (1991), 195–211. M. Müller and J . Driedger. 2012. Data-Driven Sound Track Generation.. In Multi- modal Music Processing . 175–194. J .-I. Nakamura, T . Kaku, K. Hyun, T . Noma, and S. Y oshida. 1994. Automatic back- ground music generation based on actors’ mood and motions . The J ournal of V isual- ization and Computer Animation 5, 4 (1994), 247–264. G . Nierhaus. 2009. Algorithmic composition: paradigms of automated music genera- tion . Springer . J . Nika, D. Bouche, J . Bresson, M. Chemillier , and G . Assayag. 2015. Guided improvi- sation as dynamic calls to an ofﬂine model. In Sound and Music Computing (SMC) . H. Norden. 1969. Fundamental Counterpoint . Crescendo Publishing Co., Boston. F . P achet. 2003. The continuator: Musical interaction with style. J ournal of New Music Research 32, 3 (2003), 333–341. F . P achet. 2016. A joyful ode to automatic orchestration. ACM T ransactions on Intelli- gent Systems and T echnology (TIST) 8, 2 (2016), 18. F . P achet and G. Roy , P .and Barbieri. 2001. Finite-length Markov processes with con- straints. transition 6, 1/3 (2001). A. P apadopoulos, P . Roy, and F . P achet. 2014. A voiding Plagiarism in Markov Sequence Generation. In Proceedings of AAAI . Quebec. G . P apadopoulos and G. Wiggins . 1999. AI methods for algorithmic composition: A sur- vey , a critical view and future prospects . In AISB Symposium on Musical Creativity . Edinburgh, UK, 110–117. R. Parke, E. Chew, and C . Kyriakakis. 2007. Quantitative and visual analysis of the impact of music on perceived emotion of ﬁlm. Computers in Entertainment (CIE) 5, 3 (2007), 5. M.T . Pearce, Ruiz M.H., S. Kapasi, G .A. Wiggins , and J . Bhattacharya. 2010. Un- supervised statistical learning underpins computational, behavioural, and neural manifestations of musical expectation. NeuroImage 50, 1 (2010), 302–313. M. Pearce and G . Wiggins. 2001. T owards a framework for the evaluation of machine compositions. In Proceedings of the AISB’01 Symposium on Artiﬁcial Intelligence ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. 69:32 D . Herremans et al. and Creativity in the Arts and Sciences . 22–32. I. P eretz, D . Gaudreau, and A.-M. Bonnel. 1998. Exposure effects on music preference and recognition. Memory & Cognition 26, 5 (1998), 884–902. S . Phon-Amnuaisuk and G . Wiggins. 1999. The four -part harmonisation problem: a comparison between genetic algorithms and a rule-based system. In Proceedings of the AISB’99 Symposium on Musical Creativity . 28–34. R.C . Pinkerton. 1956. Information theory and melody . Scientiﬁc American 194, 2 (1956), 77–86. J . P olito, J . Daida, and T . Bersano-Begey. 1997. Musica ex Machina: Composing 16th- Century Counterpoint with Genetic Programming and Symbiosis.. In Evolutionary Programming VI (Lecture Notes in Computer Science) , V ol. 1213. Springer , 113–124. A. Prechtl, R. Laney, A. Willis, and R. Samuels. 2014a. Algorithmic music as intel- ligent game music. In AISB50: The 50th Annual Convention of the AISB , 1-4 April 2014, London, UK . A. Prechtl, R. Laney, A. Willis, and R. Samuels. 2014b. Methodological approaches to the evaluation of game music systems. In Proceedings of the 9th Audio Mostly: A Conference on Interaction With Sound . ACM, 26. J . Pritchett. 1994. The Completion of John Cage’s Freeman Etudes. P erspectives of new music (1994), 264–270. D . Psenicka. 2003. Sporch: An algorithm for orchestration based on spectral analy- ses of recorded sounds. In Proceedings of International Computer Music Conference (ICMC) . 184. D . Radicioni, L. Anselma, and V . Lombardo. 2004. An algorithm to compute ﬁngering for string instruments. In Proceedings of the National Congress of the Associazione Italiana di Scienze Cognitive , Ivrea, Italy . A. Radisavljevic and P . Driessen. 2004. P ath difference learning for guitar ﬁngering problem. In Proceedings of the International Computer Music Conference , V ol. 28. R. Raines. 2015. Composition in the Digital W orld: Conversations with 21st Century American Composers . Oxford University Press. 241 pages . D . Ralley. 1995. Genetic algorithms as a tool for melodic development. Urbana 101 (1995), 61801. C . Roig, L. J . T ardón, I. Barbancho , and A. M. Barbancho. 2014. Automatic melody composition based on a probabilistic model of music style and harmonic rules. Knowledge-Based Systems 71 (2014), 419–434. D .E. Rumelhart, G.E. Hinton, and R.J . Williams. 1988. Learning representations by back-propagating errors . Cognitive modeling 5, 3 (1988), 1. D . E. Rumelhart, G . E. Hinton, and R. J . Williams. 1985. Learning internal represen- tations by error propagation . T echnical Report. DTIC Document. J . Rutherford and G Wiggins. 2002. An experiment in the automatic creation of music which has speciﬁc emotional content. In Proc . for the 7th International Conference on music P erception and Cognition . S . I. Sayegh. 1989. Fingering for string instruments with the optimum path paradigm. Computer Music J ournal (1989), 76–84. I. Schankler, E. Chew, and A. François. 2014. Improvising with Digital Auto- Scaffolding: How Mimi Changes and Enhances the Creative Process. In Digital Da V inci . Springer , 99–125. V . Sébastien, H. Ralambondrainy, O. Sébastien, and N . Conruyt. 2012. Score Analyzer: Automatically Determining Scores Difﬁculty Level for Instrumental e-Learning .. In ISMIR . 571–576. R. Siddharthan. 1999. Music , mathematics and Bach. Resonance 4, 5 (1999), 61–70. I. Simon, D . Morris, and S. Basu. 2008. MySong: automatic accompaniment generation for vocal melodies. In Proceedings of the SIGCHI Conference on Human F actors in ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017. A Functional T axonomy of Music Generation Systems 69:33 Computing Systems . ACM, 725–734. P . Smolensky . 1986. P arallel distributed processing: Explorations in the microstructure of cognition, vol. 1. chapter Information Processing in Dynamical Systems: F ounda- tions of Harmony Theory . MIT Press , Cambridge, MA, USA 15 (1986), 18. M.J . Steedman. 1984. A generative grammar for jazz chord sequences . Music P ercep- tion (1984), 52–77. B .L. Sturm. 2006. Adaptive concatenative sound synthesis and its application to mi- cromontage composition. Computer Music J ournal 30, 4 (2006), 46–66. T . T anaka, B. Bemman, and D. Meredith. 2016. Constraint programming formulation of the problem of generating Milton Babbitt’s all-partition arrays . In Proceedings of the 22nd International Conference on Principles and Practice of Constraint Program- ming . T oulouse, France . B . Thom. 2000. BoB: an interactive improvisational music companion. In Proceedings of the fourth international conference on A utonomous agents . ACM, 309–316. A. Tidemann and Y . Demiris. 2008. A Drum Machine That Learns to Groove. In KI 2008: Advances in Artiﬁcial Intelligence . Springer , 144–151. P .M. T odd. 1989. A connectionist approach to algorithmic composition. Computer Mu- sic J ournal (1989), 27–43. P . T oiviainen. 1995. Modeling the target-note technique of bebop-style jazz improvisa- tion: An artiﬁcial neural network approach. Music P erception (1995), 399–413. N . T okui and H. Iba. 2000. Music composition with interactive evolutionary compu- tation. In Proceedings of the Third International Conference on Generative Art , V ol. 17:2. 215–226. M.W . T owsey, A.R. Brown, S .K. Wright, and J . Diederich. 2001. T owards melodic ex- tension using genetic algorithms. Educational T echnology & Society 4, 2 (2001), 54–65. C .P . Tsang and M. Aitken. 1999. Harmonizing music as a discipline of constraint logic programming. In International Computer Music Conference . D .R. Tuohy and W .D. P otter. 2005. A genetic algorithm for the automatic generation of playable guitar tablature. In Proceedings of the International Computer Music Conference . sn, 499–502. P .N . V assilakis . 2005. An improvisation on the Middle-Eastern mijwiz; auditory rough- ness proﬁles and tension/release patterns. The J ournal of the Acoustical Society of America 117, 4 (2005), 2476–2476. RODNEY W ASCHKA II. 2007. Composing with genetic algorithms: GenDash. In Evolutionary Computer Music . Springer , 117–136. G . W einberg and S. Driscoll. 2006. T oward robotic musicianship. Computer Music J ournal 30, 4 (2006), 28–45. I. Xenakis. 1992. F ormalized Music: Thought and mathematics in composition . Num- ber 6. P endragon Pr . J . Y amato, J . Ohya, and K. Ishii. 1992. Recognizing human action in time-sequential images using hidden markov model. In Computer V ision and P attern Recognition, 1992. Proceedings CVPR’92., 1992 IEEE Computer Society Conference on . IEEE, 379–385. L. Yi and J . Goldsmith. 2007. Automatic Generation of F our -part Harmony . In BMA, CEUR W orkshop . R.B . Zajonc. 1968. Attitudinal effects of mere exposure. J ournal of personality and social psychology 9, 2p2 (1968), 1–27. Received August 2016; revised 000; accepted 000 ACM Computing Surveys, V ol. 50, No. 5, Article 69, Publication date: September 2017.

A Functional Taxonomy of Music Generation Systems

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment