Statistical Piano Reduction Controlling Performance Difficulty

We present a statistical-modelling method for piano reduction, i.e. converting an ensemble score into piano scores, that can control performance difficulty. While previous studies have focused on describing the condition for playable piano scores, it…

Authors: Eita Nakamura, Kazuyoshi Yoshii

Statistical Piano Reduction Controlling Performance Difficulty
SIP (2015), page 1 of 12 © The Authors, 2015. The online version of this article is published within an Open Access en vironment subject to the conditions of the Creati ve Commons Attrib ution-NonCommercial-ShareAlike license . The written permission of Cambridge Univ ersity Press must be obtained for commercial re-use. doi:0000000000 Statistical Piano Reduction Contr olling P erf ormance Difficulty E I T A N A K A M U R A 1 A N D K A Z U Y O S H I YO S H I I 1 , 2 W e pr esent a statistical-modelling method for piano reduction, i.e. con verting an ensemble scor e into piano scor es, that can contr ol performance difficulty . While pr evious studies have focused on describing the condition for playable piano scores, it depends on player’ s skill and can change continuously with the tempo. W e thus computationally quantify performance difficulty as well as musical fidelity to the original scor e, and formulate the pr oblem as optimization of musical fidelity under constr aints on dif ficulty values. F irst, performance difficulty measur es ar e developed by means of pr obabilistic generative models for piano scores and the r elation to the r ate of performance err ors is studied. Second, to describe musical fidelity , we construct a probabilistic model inte grating a prior piano-scor e model and a model r epr esenting how ensemble scor es ar e likely to be edited. An iter ative optimization algorithm for piano r eduction is de veloped based on statistical infer ence of the model. W e confirm the effect of the iterative pr ocedure; we find that subjective dif ficulty and musical fidelity monotonically incr ease with contr olled dif ficulty values; and we show that incorporating sequential dependence of pitches and fingering motion in the piano-scor e model impro ves the quality of r eduction scor es in high-difficulty cases. Keyw ords: I. INTR ODUCTION Music arrangement in volving a change of instrumentation (e.g. arrangement for piano, guitar , etc.) is an important process of music creation to increase the variety of music performances. Arranging a musical piece to change diffi- culty , for example, to make it playable for beginners, is also widely practiced. T o automate these processes, systems for piano arrangement [1 – 5], guitar arrangement [6–8], and orchestration [9, 10] hav e been studied. This study aims at a system for piano reduction, i.e. con verting an ensemble score (e.g. orchestral and band scores) into a piano score that can control performance difficulty and retain as much musical fidelity to the original score as possible (Fig. 1). T o computationally judge whether a musical score is playable, previous studies have dev eloped conditions on the pitch and rhythmic content. For piano scores, condi- tions such as ‘there can be at most 5 simultaneous notes for each hand’ and ‘simultaneous pitch spans for each hand must be less than 14 semitones (or so)’ hav e been consid- ered [2, 3]. Howe ver , these conditions cannot be thought of as necessary nor sufficient conditions for playable scores because in reality there can be a piano score with chords with more than 5 notes and/or spanning a large pitch inter- val that are con ventionally played as broken chords, and ev en scores without chords (melodies) can be unplayable in fast tempos. In fact, it is difficult to find a complete description of playable scores that is valid in e very situ- ation because the condition depends on player’ s skill and 1 Graduate School of Informatics, Kyoto Uni versity , Kyoto 606-8501, Japan 2 RIKEN Center for Advanced Intelligence Project, T okyo 103-0027, Japan Corresponding author: Eita Nakamura Email: enakamura@sap.ist.i.kyoto-u.ac.jp & & ? # # # # # # 4 2 4 2 4 2 œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ Œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ b œ œ œ œ œ œ b œ œ œ œ œ œ b œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ Musical fidelity Difficulty Input: Ensemble score Outputs: Reduction scores ... Piano reduction note deletions octave pitch shifts & ? # # # # 4 2 4 2 œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ Œ‰ J œ œ œ œ œ & ? # # # # 4 2 4 2 œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ ≈ œ ≈ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ b œ œ ‰ œ œ ‰ œ œ œ œ œ œ œ œ “Hard” reduction score “Easy” reduction score Fig. 1. Overview of the proposed system for piano reduction that can control performance difficulty . can change continuously with the tempo. A possible solu- tion is to quantify performance difficulty and use it as an indicator of playable scores in each situation of skill lev el, tempo, etc. [11, 12]. As there is generally a trade-off between performance difficulty and musical fidelity to the original score, it is necessary to quantify musical fidelity and develop an opti- mization method. Music arrangers remove notes and shift pitches in an ensemble score for the piano reduction score to match a target dif ficulty le vel [2, 4]. From a statisti- cal point of vie w , one can assign probabilities for these edit operations and use them to quantify musical fidelity . Follo wing the analogy with statistical machine transla- tion [13], if one can construct a model for the probability P ( R | E ) of a reduction score R giv en an ensemble score E , the piano reduction problem can be formulated as optimiza- tion of P ( R | E ) under constraints on difficulty values. A 1 2 E I TA N A K A M U R A , e t a l . similar approach without controls of performance difficulty has been studied for guitar arrangement [7, 8]. T o realize this idea, a statistical-modelling approach for piano reduction that can control performance dif ficulty has been proposed in a recent conference paper [4]. Follo wing the thought that fingering motion is closely related to the cost or dif ficulty of performance [14–16], quantitativ e mea- sures of performance difficulty were developed based on a probabilistic generative model of piano scores incorporat- ing fingering motion [17, 18]. T o estimate the probability P ( R | E ) , a hidden Markov model (HMM) integrating the piano-score model and a model representing how ensem- ble scores are likely to be edited was constructed. A piano reduction algorithm was developed based on the V iterbi algorithm. While the potential of the method was suggested by the results of piano reduction for one e xample piece, formal e valuations and comparisons with other approaches were left for future work. There was also a problem of the optimization method that the upper -bound constraints on difficulty values were often not properly satisfied, due to the limitation of the V iterbi algorithm. In this study , we extend the w ork of [4] and propose an improved piano reduction method using iterative opti- mization. W e also carry out systematic ev aluations on the difficulty measure and the piano reduction method. In particular , we ev aluate difficulty measures in terms of their ability of predicting performance errors, which is to our knowledge the first attempt in the literature to objec- tiv ely ev aluating performance dif ficulty measures. Piano reduction methods are e valuated both objectiv ely and sub- jectiv ely: an objectiv e ev aluation is conducted to examine the effect of the iterative optimization strategy; an subjec- tiv e ev aluation is conducted to assess the quality of the generated reduction scores. The main results are: • The proposed difficulty measures can be used as indica- tors of performance errors and measures incorporating the sequential nature of piano scores can better predict performance errors. • The proposed iterative optimization method yields better controls of difficulty than the method in [4]. • Both subjective difficulty and musical fidelity of gener- ated reduction scores monotonically increase with con- trolled difficulty v alues. • By comparing methods based on different models, it is shown that incorporating sequential dependence of pitches and fingering motion in the piano-score model improv es musical naturalness and the rate of unplayable notes of reduction scores in high-difficulty cases. The following are limitations of the current system: • Melodic and bass notes are manually indicated. • Score typesetting, especially estimation of voices within each hand part, is currently done manually . Automating these processes is an undeniable direction for future work. See section IV.D for discussions. The rest of the paper is or ganized as follo ws. In the next section, we discuss generati ve piano-score models and per - formance difficulty measures. In section III, we present our method for piano reduction. In section IV, we present and discuss results of ev aluation of the piano reduction method. W e conclude the paper in the last section. II. QU ANTIT A TIVE MEASURES OF PERFORMANCE DIFFICUL TY W e formulate quantitati ve performance difficulty measures based on probabilistic generative models of piano scores. A generati ve model incorporating piano fingering and sim- pler models are described in section II.A and performance difficulty measures are discussed in section II.B. A) Generative Models f or Piano Scores 1) Models for One Hand Let us first discuss models for one hand. A piano score is represented as a sequence of pitches p 1: N = ( p n ) N n =1 and corresponding onset times t 1: N = ( t n ) N n =1 ( N is the num- ber of musical notes). A generative model for piano scores (piano-score model) is here defined as a model that yields the probability P ( p 1: N ) . Simple piano-score models can be constructed based on the Markov model. The probability P ( p 1: N ) is factor- ized into an initial probability P ( p 1 ) and the transition probabilities P ( p n | p n − 1 ) as P ( p 1: N ) = P ( p 1 ) N Y n =2 P ( p n | p n − 1 ) . (1) The simplest model is obtained by assuming that the ini- tial and transition probabilities obey a uniform distribution ov er pitches. Writing N p = 88 for the number of possible pitches, the model yields P ( p 1: N ) = (1 / N p ) N . Since this model yields the same probability for any piano score of the same length, it is here called a no-information model . A more realistic model can be b uild by incorporating sequential dependence of pitches. For example, a statisti- cal tendency called pitch proximity , that successiv e pitches tend to be close to each other , can be incorporated in initial and transition probabilities described with Gaussians: P ( p 1 = p ) ∝ Gauss( p ; p 0 , σ 2 p ) + , (2) P ( p n = p | p n − 1 = p 0 ) ∝ Gauss( p ; p 0 , σ 2 p ) + . (3) Here, Gauss( · ; µ, σ 2 ) denotes a Gaussian distribution with mean µ and standard deviation σ , p 0 is a reference pitch to define the initial probability , and  is a small positiv e constant for smoothing the probability for pitch transitions with a large leap. W e call this model a Gaussian model . Although the Gaussian model can capture the tendency of pitch proximity , the simplification can lead to unrealis- tic consequences. First, pitch transitions in volving 10 or 11 semitones have higher probabilities than octa ve motions, S TA T I S T I C A L P I A N O R E D U C T I O N 3 Fingeringmotion Pianoscore & # 4 3 œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ (3  4  3  2  3  2  1  2  3  4  5  4  3  2  1) 343232123454321… 1 2 3 4 5 f n f n − 1 P ( f n | f n − 1 ) p n − 1 p n P ( p n | p n − 1 ,f n − 1 ,f n ) Fig. 2. Piano-score model incorporating fingering motion. which opposes the reality [18]. Second, since the model does not distinguish white keys and black keys, it yields the same probability for piano scores transposed to any keys, which opposes the fact that “simpler k eys” in volv- ing less black keys are more frequently used. In general, the difficulty or naturalness of a piano score changes when it is transposed to another key since the geometry of the piano keyboard requires different fingering motions [11]. T o solve this, it is necessary to construct a model that describes fingering motions in addition to pitch transitions. A model (called fingering model ) incorporating finger- ing motions and the geometry of the piano keyboard has been proposed in [18]. The model is based on HMM, which has been first applied to the piano fingering model in [17]. In general, we can introduce a stochastic variable f n representing a finger used to play the n th note. The variable f n takes one of the following five values: 1 = thumb, 2 = index finger , · · · , 5 = little finger 1 . According to the model (Fig. 2), a fingering motion f 1: N = ( f n ) N n =1 is first generated by an initial probability P ( f 1 ) and transi- tion probabilities P ( f n | f n − 1 ) . Next, a pitch sequence p 1: N is generated conditionally on f 1: N : the first pitch is gener- ated by P ( p 1 | f 1 ) and the succeeding pitches are generated by P ( p n | p n − 1 , f n − 1 , f n ) , which describes the probability that a pitch would appear follo wing the pre vious pitch and the pre vious and current fingers. Thus, the joint probability of pitches and fingering motion P ( p 1: N , f 1: N ) is gi ven as P ( f 1 ) P ( p 1 | f 1 ) N Y n =2 P ( f n | f n − 1 ) P ( p n | p n − 1 , f n − 1 , f n ) . In general, the parameters of the fingering model can be learned from music data with pitches and annotated finger - ings. For want of a sufficient amount of data, the proba- bility P ( p n | p n − 1 , f n − 1 , f n ) , which hav e 88 2 · 5 2 param- eters, cannot be trained effecti vely in a direct way . W e thus introduce simplifying assumptions to reduce the num- ber of parameters. First, we assume that the probability depends on pitches through their geometrical positions on the keyboard (Fig. 2). The coordinate on the keyboard of a pitch p is represented as ` ( p ) = ( ` x ( p ) , ` y ( p )) . W e also assume translational symmetry in the x -direction and time 1 In this study , we do not consider the possibility of finger substitu- tions where two or more fingers are assigned to a note. in version symmetry , which is expressed as P ( p n = p | p n − 1 = p 0 , f n − 1 = f 0 , f n = f 0 ) = F ( ` x ( p ) − ` x ( p 0 ) , ` y ( p ) − ` y ( p 0 ); f 0 , f ) = F ( ` x ( p 0 ) − ` x ( p ) , ` y ( p 0 ) − ` y ( p ); f , f 0 ) . (4) W e also assume reflection symmetry between left and right hands. The above model can be extended to includ- ing chords, by sequencing the contained notes from low pitch to high pitch [16]. W ith the fingering model, one can estimate the fingering f 1: N from a giv en sequence of pitches p 1: N by calculating the maximum of the probabil- ity P ( f 1: N | p 1: N ) ∝ P ( p 1: N , f 1: N ) . This maximization can be computed by the V iterbi algorithm [19]. 2) Models for Both Hands A piano-score model with the left and right hand parts can be obtained by first constructing a model for each hand part and then combining the two models. If musical notes are already assigned to two hand parts, such a combined model can be obtained directly . On the other hand, if the part assignment is not giv en, as in the piano reduction prob- lem, the model should be able to describe the probability for all cases of part assignment. Such a model for piano music with unknown hand parts can be constructed based on the merged-output HMM [18, 20]. The idea is to combine outputs from two com- ponent Markov models or HMMs respectively describing the two hand parts. W e here describe a model combining two fingering models. First, the hand part (left or right) associated with a note p n is represented by an additional stochastic variable η n ∈ { L , R } . The generativ e process of η n is described with a Bernoulli distribution: P ( η n = η ) = α η with α L + α R = 1 . If η n is determined, then the pitch is generated by the corresponding component model. For each η ∈ { L , R } , let a η f 0 f = P η ( f | f 0 ) and b η f 0 f ( p 0 , p ) = F η ( ` ( p ) − ` ( p 0 ); f 0 , f ) denote the fingering and pitch tran- sition probabilities of the component model. This process can be described as an HMM with a state space indexed by k = ( η , f L , p L , f R , p R ) with the following initial and transition probabilities: P ( k n = k | k n − 1 = k 0 ) = ( α L a L f 0 L f L b L f 0 L f L ( p 0 L , p L ) δ f 0 R f R δ p 0 R p R , η = L; α R a R f 0 R f R b R f 0 R f R ( p 0 R , p R ) δ f 0 L f L δ p 0 L p L , η = R , P ( p n = p | k n = k ) = δ pp η , (5) where δ denotes Kronecker’ s delta. Using this model, one can estimate the sequence of latent variables k 1: N from a pitch sequence p 1: N . This can be done by maximizing the probability P ( k 1: N | p 1: N ) ∝ P ( k 1: N , p 1: N ) The most probable sequence ˆ k 1: N has the information of the optimal configuration of hands ˆ η 1: N , which yields separated two hand parts and the optimal fin- gering for both hands ( ˆ f L 1: N and ˆ f R 1: N ). For more details, see [18]. The Gaussian model and the no-information model can be similarly extended to models for both hands. 4 E I TA N A K A M U R A , e t a l . B) Perf ormance Difficulty 1) Difficulty Measures One can define a quantitativ e measure of performance dif- ficulty based on the cost of music performance. From the statistical vie wpoint, a natural choice is the probabilistic cost, which is the negati ve logarithm of a probability . T o include the dependence on tempo, we define a performance difficulty as the time rate of the probabilistic cost D ( t ) = − ln P ( p ( t )) / ∆ t. (6) Here, ∆ t is a time width, p ( t ) is the sequence of pitches in the time range [ t − ∆ t/ 2 , t + ∆ t/ 2] , and P ( p ( t )) is defined with one of the piano-score models in section II.A. W ith the fingering model, one can use the joint probability of pitches and fingering to define a difficulty measure [18]: D ( t ) = − ln P ( p ( t ) , f ( t )) / ∆ t, (7) where f ( t ) denotes the fingering corresponding to the pitches p ( t ) . If the fingering is unknown, one can substi- tute the maximum-probability estimate ˆ f ( t ) in Eq. (7). F or each note n with onset time t n , we write D ( n ) = D ( t n ) . The difficulty measure can be defined for each hand part using the pitches in that hand part and a piano-score model for one hand, which is denoted by D L ( t ) or D R ( t ) . In addition, the total difficulty can be defined as the sum of dif ficulties for both hands: D B ( t ) = D L ( t ) + D R ( t ) . The quantity D B ( t ) can be relev ant as well as D L ( t ) and D R ( t ) since the difficulty can be high ev en if difficulties for individual hand parts are not so high. In previous studies [11, 12], features such as playing speed, note density , pitch entropy , hand displacement rate, hand stretch, and fingering complexity have been con- sidered to estimate the difficulty level of piano scores. These features are incorporated in the abov e dif ficulty mea- sures, although in an implicit manner . If one uses the no-information model, the dif ficulty measure takes into account the note density and playing speed. W ith the Gaus- sian model, pitch entropy and hand displacement rate, and hand stretch are incorporated in addition. W ith the finger- ing model, fingering complexity is further incorporated. 2) Evaluation T o formally examine ho w the proposed measures reflect real performance difficulty , we study their relation with the rate of performance errors. W e use a dataset [21] consist- ing of 90 MIDI piano performance signals of 30 classical musical pieces; for each piece there are performances by three different players that are recorded in international piano competitions. In the dataset, musical notes in a per- formance signal are matched to notes in the corresponding score and the following three types of performance errors are manually annotated: pitch error (a performed note with a corresponding note in the score but with a different pitch); extra note (a performed note without a corresponding note in the score); and missing note (a note in the score without a corresponding note in the performance). Timing errors are not annotated in the data and not considered in this study . 0 5 10 15 0 5 10 15 20 25 30 35 40 45 Number of errors 0 5 10 15 0 20 40 60 80 100 120 140 160 180 Number of errors 0 5 10 15 0 20 40 60 80 100 120 140 160 Number of errors Difficulty value No-information model Gaussian model Fingering model D B Fig. 3. Relations between the dif ficulty v alue D B and the number of perfor - mance errors. Points and bars indicate means and standard deviations. Arrows indicate onsets of performance errors (see text). W e calculate performance difficulty values for each onset time and calculate the number of performance errors in the time range of width ∆ t around the onset time. In the following, we set ∆ t to be 1 s. For the Gaussian model,  = 4 × 10 − 4 and p 0 is C3 (C5) for the left (right) hand. Other parameters of the Gaussian and fingering models are taken from a previous study [18] where a different dataset was used for training. Fig. 3 shows the relation between difficulty value D B and the rate of performance errors for the three models. W e see that for each model there is an onset (roughly , 10 for the no-information model, 30 for the Gaussian model, and 40 for the fingering model) below which the average num- ber of errors is almost zero and above which it gradually increases. This suggests that the dif ficulty measures can be used as indicators of performance errors. For comparative ev aluation, we predict performance errors by thresholding the difficulty v alues and measure the predictiv e accuracy for the three models. Using three thresholds D th L , D th R , and D th B , a prediction of performance errors at time t is defined positiv e if one of three conditions ( D th L > D L ( t ) , D th R > D R ( t ) , and D th B > D B ( t ) ) is satis- fied. W e calculate the number of true positives N TP , that of f alse positi ves N FP , and that of true neg ativ es N TN , and the following quantities are used as e valuation measures: P = N TP N TP + N FP , R = N TP N TP + N TN , F = 2 P R P + R . Since more frequent errors indicate lar ger dif ficulty , we can also define the following weighted quantities: P w = N 0 TP N 0 TP + N FP , R w = N 0 TP N 0 TP + N 0 TN , (8) F w = 2 P w R w P w + R w , (9) where N 0 TP and N 0 TN are obtained by weighting N TP and N TN with the number of performance errors. The results are shown in T able 1 where the thresholds are optimized S TA T I S T I C A L P I A N O R E D U C T I O N 5 Model Threshold ( D th R , D th L , D th B ) F P R F w P w R w No-information (9 , 10 , 14) 52 . 4 43 . 0 67 . 1 69 . 8 63 . 0 78 . 1 Gaussian (30 , 30 , 42) 54 . 2 46 . 3 65 . 2 71 . 3 66 . 4 77 . 0 Fingering (41 , 39 , 53) 53 . 9 49 . 1 59 . 8 70 . 6 69 . 3 73 . 8 T able 1. Accuracies of performance error prediction. with respect to F w for each model. W e see that the Gaus- sian model has the highest F measures, e ven though the differences are rather small. A possible reason is the rela- tiv ely small size of the data used for training the fingering model. Since the Gaussian model has only one parame- ter σ p to train, it has better generalization ability for such small training data. Such a trade-off between model com- plexity and the required amount of training data is common in many machine-learning problems. W e thus use the dif- ficulty measures defined with the Gaussian model in the following. III. PIANO REDUCTION METHOD In the statistical formulation of piano reduction, we try to find the optimal reduction score ˆ R that maximizes the probability P ( R | E ) for a gi ven ensemble score E . In anal- ogy with the statistical approach for machine translation [13], we first construct generati ve models describing the probability P ( R ) and P ( E | R ) respectively and integrate them for calculating P ( R, E ) ∝ P ( R | E ) . W e then deriv e optimization algorithms for piano reduction that take into account the constraints on performance difficulty v alues. Prior to the main processing step, we conv ert an input ensemble score to a condensed score by removing redun- dant notes with the same pitch and simultaneous onset time (the number of such redundant notes is memorized and used later in the calculation of Eq. (13)). Although they are different strictly , we call such a condensed score an ensemble score in what follows. What is really meant by the symbol E is also a condensed score. A) Model f or Piano Reduction T o construct a generative model that yields the probabil- ity P ( R, E ) , we integrate a piano-score model describing the probability P ( R ) and an edit model that describes the process yielding P ( E | R ) . As a piano-score model, we can use either the Gaussian model or the fingering model dis- cussed in section II.A.2, which statistically describes the naturalness of a generated (reduction) score. For the edit model, we assign probabilities for edit oper - ations applied to musical notes. As in [4], we focus on the two most common edit operations, note deletion and octave pitch shift. As we model the inv erse process of generat- ing an ensemble score from a piano score, we introduce probabilities of note addition and octav e pitch shift in the edit model. For each note in the ensemble score, the prob- ability that it is an added note and not originated from the piano score is denoted by β NP (‘NP’ for not played). In this case, the note’ s pitch p is drawn from a uniform distribution Piano (reduction) score Ensemble (condensed) score ξ NP R L Octave pitch shifts Model for not-played notes Model for right-hand part Model for left-hand part R E Fig. 4. Generative process of the model for piano reduction. c unif ( p ) . If it is originated from the piano score and the cor- responding note has a pitch q , the probability of the note’ s pitch p denoted by c q ( p ) = P ( p | q ) is supposed to obey c q ( p ) = ( 1 − 2 γ oct , p = q ; γ oct , p = q ± 12 , (10) where γ oct denotes the probability of an octav e shift. W e can integrate the fingering model in section II.A.2 and the edit model in the following fashion based on the merged-output HMM (Fig. 4), which leads to tractable inference algorithms. For each note in the output ensemble score, inde xed by m , we introduce a stochastic variable ξ m that can take one of three values { NP , L , R } . It is generated from a discrete distribution as P ( ξ m = ξ ) = β ξ , where parameters β ξ obey β NP + β L + β R = 1 . If ξ m = NP , then its pitch p m has a probability P ( p m = p ) = c unif ( p ) . If ξ m = L or R , then its pitch is generated from the com- ponent fingering model of the corresponding hand part and may undergo an octave shift. Writing f L , p L , f R , and p R for the finger and pitch v ariables of the two component fin- gering models, the latent state of the merged-output HMM is described by a set of variables r = ( ξ , f L , p L , f R , p R ) . The transition and output probabilities are defined as P ( r m = r | r m − 1 = r 0 ) =      β NP δ f L f 0 L δ f R f 0 R δ p L p 0 L δ p R p 0 R , ξ = NP; β L a L f 0 L f L b L f 0 L f L ( p 0 L , p L ) δ f R f 0 R δ p R p 0 R , ξ = L; β R a R f 0 R f R b R f 0 R f R ( p 0 R , p R ) δ f L f 0 L δ p L p 0 L , ξ = R , P ( p m = p | r m = r ) =      c unif ( p ) , ξ = NP; c p L ( p ) , ξ = L; c p R ( p ) , ξ = R . (11) The model indeed generates a piano score specified by ( p L m , p R m ) m and an ensemble score specified by ( p m ) m . In the process of piano reduction, which is explained in the next section, the parameter β NP represents how much notes in the ensemble score are remov ed. Thus, properly adjusting β NP is crucial to control the performance diffi- culty of resulting reduction scores. Roughly speaking, if the note density around a note is high, it is necessary to 6 E I TA N A K A M U R A , e t a l . remov e more notes around that note by setting β NP large. In addition, some notes like melodic notes and bass notes are musically more important than others and should hav e a small probability of deletion, or small β NP in the present model. These conditions can be realized in the following form of β NP ( m ) , which depends on each note m : β NP ( m ) =  1 − ζ ( m )  e − κh ( m ) , (12) where h ( m ) ≥ 0 represents the musical importance of note n , κ > 0 is a coefficient to control the effect of h ( m ) , and ζ ( m ) ∈ [0 , 1] is a factor to control the overall rate of note deletion. If ζ ( m ) ' 1 , β NP ( m ) ' 0 and almost all notes remain in the reduction score. If ζ ( m ) ' 0 , β NP ( m ) ' 1 unless κh ( m ) is large (i.e. note m is musically important), so most musically unimportant notes will be remov ed. In addition to the importance of melodic and bass notes, it is not difficult to imagine that pitches in an ensemble score that are played simultaneously by multiple instru- ments are musically important. Thus, the following form is used for defining musical importance h ( m ) : h ( m ) = I ( m ∈ M ) + I ( m ∈ B ) + a Mult( m ) , (13) where I ( C ) = 1 if a condition C is true and 0 otherwise, M denotes the set of melodic notes, B denotes the set of bass notes, and Mult( m ) is the multiplicity of note m , defined as the number of notes in the ensemble score ha ving the same pitch and onset time as note m excluding m itself. The parameters κ and a are adjustable parameters, and ζ ( m ) is adjusted according to target difficulty values as explained in the next section. B) Algorithms f or Piano Reduction Let us deriv e algorithms for piano reduction based on the model in section III.A and the difficulty measures in section II.B. The piano reduction problem is here formulated as finding a reduction score R that maximizes P ( R | E ) for a giv en ensemble score E with constraints on R ’ s per- formance difficulty values. Specifically , we impose the following constraints for each note n in R : [ D L ( n ) < e D L ] ∧ [ D R ( n ) < e D R ] ∧ [ D B ( n ) < e D B ] , (14) where e D L , e D R , and e D B are some target dif ficulty v alues. W ithout the constraints (14), finding the maximum of P ( R | E ) is a basic inference problem for HMMs and can be achiev ed with the V iterbi algorithm [19]. Howe ver , the constraints (14) cannot be easily treated because difficulty values at each note depends on the e xistence of other notes in the time range of ∆ t , which violates the Markovian assumption for the V iterbi algorithm. In other words, if we know appropriate values of ζ ( m ) in Eq. (12) for control- ling difficulty values, the optimization problem is directly solvable, b ut finding those v alues is not easy . In the following, we present two strategies for optimiza- tion. In a previous study [4], appropriate values of ζ ( m ) were estimated and the V iterbi algorithm was applied once to obtain the result. A slight extension of this one-time optimization method is presented in section III.B.1. On the other hand, if one can apply the V iterbi algorithm itera- tiv ely , it would be possible to find appropriate v alues of ζ ( m ) from tentati ve results, by starting from ζ ( m ) = 1 and gradually lessening it. This iterative optimization method is dev eloped in section III.B.2. 1) One-time optimization algorithm In [4], appropriate v alues of ζ ( m ) were estimated by matching the expected dif ficulty values to the target v alues with the following equation: ζ ( m ) = min  e D L D L ( m ) , e D R D R ( m )  , (15) where D L ( m ) etc. represent the dif ficulty v alues calculated for the ensemble score at its m th note. One can include a factor in volving D B ( m ) in the abo ve equation in general. W e can generalize this method by introducing a scaling factor ρ and modifying Eq. (15) to ζ ( m ) = ρ min  e D L D L ( m ) , e D R D R ( m )  . (16) By choosing the value of ρ , one can control the expected av erage of resulting difficulty v alues. F or e xample, one can use a maximum value of ρ that can satisfy the constraints (14) for most outcomes. 2) Iterative optimization algorithm For iterative optimization, the V iterbi algorithm is applied in each iteration to obtain a tentative reduction score R ( i ) , with tentati ve v alues of ζ ( i ) ( m ) ( i is an index for itera- tions). For each note n in R ( i ) , we calculate the difficulty values D ( i ) L ( n ) , D ( i ) R ( n ) , and D ( i ) B ( n ) . If the constraints (14) are not all satisfied at note n , then we lessen the val- ues of ζ ( m ) for all notes m in the ensemble score around n within the time range of width ∆ t as ζ ( i +1) ( m ) = λζ ( i ) ( m ) (17) with some constant 0 < λ < 1 . The iterati ve algorithm is initialized with ζ ( i =1) ( m ) = 1 for all notes m . The algorithm ends when the constraints (14) are satisfied at e very note in the reduction score, or the number of iterations exceed some predefined value i max . For efficient and stable computation, the V iterbi algorithm at iteration i + 1 is applied only to those regions of the ensemble score where the constraints (14) are not still sat- isfied at iteration i . Specifically , we first construct a set of notes m in the ensemble score whose onset time t m is included in the range [ t n − ∆ t/ 2 , t n + ∆ t/ 2] around some onset time t n in the reduction score for which the diffi- culty constraints are not satisfied. This set is then split into a set Ψ of isolated regions of notes. For each such isolated region, the V iterbi algorithm is applied with fixed bound- ary states at one note before the beginning of the region and one note after the end. S TA T I S T I C A L P I A N O R E D U C T I O N 7 The iterativ e algorithm is summarized as follo ws. (i) Initialize ζ ( i =1) ( m ) = 1 and apply the V iterbi algorithm to the whole ensemble score. (ii) Calculate difficulty values and obtain regions Ψ where the constraints (14) are not satisfied. Exit if Ψ is empty or i ≥ i max . (iii) Update the control f actor ζ ( m ) as in Eq. (17) and apply the V iterbi algorithm to each region of Ψ . Increment i and go back to step (ii). IV . EV ALU A TION OF PIANO REDUCTION ALGORITHMS A) Setup T o ev aluate the piano reduction algorithms, we prepared a dataset of orchestral pieces of W estern classical music. The dataset consists of 10 pieces by different composers and with dif ferent instrumentations; each pieces has a length of around 20 bars. The list of the pieces are av ailable in the accompanying webpage 2 . W e compare one-time optimization algorithms and iter- ativ e optimization algorithms based on the Gaussian model and the fingering model; in total we hav e four meth- ods labelled as One-time Gaussian , One-time F ingering , Iterated Gaussian , and Iterated F ingering methods. The parameters of the piano-score models are set as in section II.B.2. The other parameters are set as follows: a = 0 . 01 , κ = 11 , λ = 0 . 85 , γ oct = 0 . 001 , and β R ( m ) = β L ( m ) = (1 − β NP ( m )) / 2 where β NP ( m ) is set as in Eq. (12). Dif- ficulty v alues are calculated with the difficulty measures using the Gaussian model with ∆ t = 1 s. These parameter values were fixed after some trials by one of the authors and there is room for further optimization. As a baseline method we also implement a method based on a simple piano-score model (called the distance model) that takes into account the distance between each note in the ensemble score and its closest melodic or bass note, but not sequential dependence of pitches. Specifically , for each note m in the ensemble score the closest melodic or bass notes CMB( m ) is obtained by first searching in the direction of onset time and then in the direction of pitches. Then the probability of its pitch p m is giv en as P ( p m ) ∝ Gauss( p m ; CMB( m ) , σ 2 p ) . (18) Integrating this piano-score model into the piano reduction model in section III.A and using the iterative optimization algorithm, a baseline Iterated Distance method is obtained. B) Quantitative Ev aluation of Difficulty Control W e first examine the effect of the iterative optimization algorithms in controlling the difficulty values of output 2 http://pianoarrangement.github .io/demo.html 0 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 Difficulty values (for both hands) Max. difficulty Out-of-range rate Mean difficulty ρρρ (15,15,30) (30,30,40) (40,40,50) Fig. 5. Difficulty metrics for the One-time Gaussian method for varying ρ , for three cases of target difficulty v alues ( e D L , e D R , e D B ) indicated in the insets. Difficulty v alues are those for both hands ( D B , D max B , etc.) and horizontal lines indicate corresponding values for the Iter ated Gaussian method. reduction scores, in comparison with the one-time opti- mization algorithms. W e run the four algorithms, One- time Gaussian , One-time Fing ering , Iterated Gaussian , and Iterated F ingering , for the test dataset with three sets of target difficulty values ( e D L , e D R , e D B ) = (15 , 15 , 30) , (30 , 30 , 40) , and (40 , 40 , 50) . For the scaling factor ρ for the one-time optimization algorithms, we test values in { 0 . 1 , 0 . 2 , . . . , 1 . 0 } . For the iterativ e optimization algo- rithms, i max is set to 50 . T o ev aluate a reduction score R , we compute difficulty values ( D L ( n ) etc.) for each note n in R and calculate the follo wing measures: • Mean Difficulty V alues { D L , D R , D B } : D L = 1 # R X n ∈ R D L ( n ) etc . (19) • Maximum Difficulty V alues { D max L , D max R , D max B } : D max L = max n ∈ R { D L ( n ) } etc . (20) • Out-of-Range Rate (proportion of regions where diffi- culty values e xceed target v alues) { A L out , A R out , A B out } : A L out = #  n ∈ R   D L ( n ) > e D L  # R etc . (21) • Additional-Note Rate (proportion of notes in the reduc- tion score other than melodic and bass notes) A add : A add = # R − # M − # B # M + # B . (22) V ariations of dif ficulty values of the reduction scores by the One-time Gaussian method are shown in Fig. 5, with corresponding values for the Iterated Gaussian method. Here, for simplicity , only difficulty values for both hands ( D B , D max B , etc.) are shown. It is observed that for those values of ρ where A B out is equi valent to that for the iterativ e optimization method, D B and D max B are smaller compared to the iterativ e optimization method. This means that with the same level of satisfaction for the difficulty constraints, 8 E I TA N A K A M U R A , e t a l . Algorithm T arget dif ficulty Mean difficulty Max. difficulty Out-of-range rate (%) A out (%) One-time Gaussian (15 , 15 , 30) (10 . 0 , 5 . 4 , 15 . 4) (22 . 5 , 14 . 6 , 30 . 5) (18 . 6 , 7 . 3 , 2 . 3) 7 . 1 Iterated Gaussian (15 , 15 , 30) (11 . 0 , 6 . 1 , 17 . 0) (22 . 3 , 15 . 5 , 31 . 8) (18 . 2 , 7 . 2 , 2 . 2) 20 . 3 One-time Fingering (15 , 15 , 30) (12 . 9 , 9 . 1 , 22 . 0) (30 . 7 , 27 . 8 , 50 . 5) (30 . 0 , 18 . 0 , 20 . 9) 30 . 9 Iterated Fingering (15 , 15 , 30) (12 . 7 , 8 . 4 , 21 . 1) (29 . 0 , 23 . 9 , 46 . 5) (27 . 5 , 15 . 7 , 14 . 9) 31 . 7 Iterated Distance (15 , 15 , 30) (11 . 9 , 6 . 1 , 18 . 0) (28 . 0 , 15 . 7 , 37 . 3) (23 . 4 , 7 . 4 , 5 . 2) 21 . 8 One-time Gaussian (30 , 30 , 40) (10 . 4 , 5 . 5 , 15 . 9) (23 . 2 , 15 . 4 , 30 . 7) (0 . 7 , 0 , 0 . 6) 11 . 4 Iterated Gaussian (30 , 30 , 40) (16 . 2 , 8 . 3 , 24 . 5) (30 . 0 , 21 . 2 , 39 . 8) (0 . 4 , 0 , 0 . 6) 62 . 3 One-time Fingering (30 , 30 , 40) (13 . 2 , 9 . 4 , 22 . 7) (31 . 8 , 28 . 6 , 51 . 2) (6 . 5 , 5 . 8 , 11 . 6) 33 . 4 Iterated Fingering (30 , 30 , 40) (16 . 3 , 10 . 6 , 26 . 9) (34 . 3 , 28 . 6 , 50 . 9) (3 . 6 , 3 . 0 , 6 . 3) 60 . 1 Iterated Distance (30 , 30 , 40) (17 . 8 , 8 . 3 , 26 . 0) (35 . 9 , 21 . 7 , 44 . 8) (2 . 4 , 0 , 2 . 3) 61 . 9 One-time Gaussian (40 , 40 , 50) (13 . 4 , 7 . 0 , 20 . 4) (30 . 6 , 19 . 2 , 40 . 1) (0 . 1 , 0 , 0 . 1) 39 . 0 Iterated Gaussian (40 , 40 , 50) (20 . 9 , 11 . 1 , 32 . 0) (36 . 8 , 27 . 8 , 48 . 8) (0 , 0 , 0) 98 . 3 One-time Fingering (40 , 40 , 50) (13 . 5 , 9 . 5 , 22 . 9) (32 . 4 , 29 . 2 , 51 . 7) (2 . 8 , 3 . 4 , 5 . 7) 34 . 7 Iterated Fingering (40 , 40 , 50) (20 . 2 , 13 . 6 , 33 . 8) (40 . 1 , 33 . 1 , 54 . 9) (1 . 7 , 1 . 0 , 1 . 6) 88 . 9 Iterated Distance (40 , 40 , 50) (22 . 1 , 10 . 3 , 32 . 4) (42 . 5 , 27 . 3 , 53 . 6) (0 . 8 , 0 , 0 . 8) 88 . 3 T able 2. Comparison of average values of dif ficulty metrics for reduction scores. T riplet values in parentheses indicate one for left-hand part, right-hand part, and both hand parts, from left to right. results of the iterativ e optimization method hav e lar ger dif- ficulty v alues on the av erage, which is a desired property . On the other hand, if ρ is increased sufficiently , it is possi- ble for the one-time optimization algorithm to achiev e the same lev el of D B as the iterative optimization method, but then A B out is larger , meaning that the difficulty constraints are less strictly satisfied. Analyses of difficulty values for each hand and comparison between One-time F ingering and Iterated F ingering methods rev eal similar tendencies. The results for all three kinds of difficulty values (for each of two hands and for both hands) are shown in T able 2. Here, for one-time optimization methods, results are sho wn for the smallest value of ρ such that all three out-of-range rates exceed those for the corresponding iterative optimiza- tion methods. In addition to the same tendencies as found in the above analysis, one can observe that for the same level of satisfaction of difficulty constraints, the iterative opti- mization methods yields larger additional-note rates than the corresponding one-time optimization methods. These results indicate that the iterativ e optimization methods are more appropriate for controlling difficulty v alues. Even for iterative optimization algorithms, the out-of- range rates can be nonzero especially for small target difficulty values. One reason for this is that for some pieces the minimal reduction score with only melodic and bass notes has difficulty values larger than the target values. Another reason is the greedy-like nature of the iter- ativ e optimization algorithms: when some regions of the reduction score is fixed and used as boundary conditions for updates, the V iterbi search sometimes cannot reduce notes e ven for smaller v alues of ζ ( m ) . Comparing iterati ve optimization methods in cases of target difficulty values (30 , 30 , 40) and (40 , 40 , 50) , we find that while the Iter - ated Gaussian method has the largest additional-note rate, it has the least values for most difficulty ev aluation mea- sures. If the additional-note rate increases with the fidelity to the original ensemble score, this indicates the Iterated Gaussian method has the ability to efficiently increase the fidelity while retaining low difficulty values. This is prob- ably because the Gaussian model is used for calculating difficulty measures. C) Subjective Ev aluation W e conduct a subjectiv e ev aluation experiment to ev aluate the quality of reduction scores by the proposed algorithms 3 . In particular , we e xamine how much of the additional notes (notes other than melodic and bass notes) are actually playable and how the musical quality such as fidelity and difficulty changes with varying tar get difficulty v alues. For this, we asked professional piano arrangers to ev aluate the piano reductions generated by the Iterated F ingering , Iter- ated Gaussian , and Iterated Distance methods with three sets of target difficulty v alues (15 , 15 , 30) , (30 , 30 , 40) , and (40 , 40 , 50) . T wo music arrangers participated in the ev aluation and each reduction score was ev aluated by one of them. Evaluators are provided manually typeset reduc- tion scores, the input condensed scores, and corresponding audio files of the 10 tested musical pieces, which are uploaded to the accompanying demo page 4 . The e v aluation metrics are as follows: • Musical fidelity (10 steps; 1 : not faithful at all, . . . , 10 : very faithful) — How the reduction score is faithful to the original ensemble score in terms of music acoustics. • Subjective difficulty (10 steps; 1 : very easy , . . . , 10 : very difficult) — How difficult the reduction score is for playing with two hands. • Musical naturalness (10 steps; 1 very unnatural, . . . , 10 : very natural) — How natural the reduction score is as a piano score. • Number of unplayable notes N unp — How many notes and which notes should be removed from the reduction score to make it playable by a skillful pianist. W e define the unplayable-note rate A unp , a quantity normalized by 3 Readers who wish to have access to the raw experimental data and source code should contact the authors. 4 http://pianoarrangement.github .io/demo.html S TA T I S T I C A L P I A N O R E D U C T I O N 9 2 3 4 5 6 7 8 0 20 40 60 80 100 120 Musical fidelity Additional-note rate (%) 2 4 6 8 10 0 20 40 60 80 100 120 Subjective difficulty Additional-note rate (%) 4 5 6 7 8 0 20 40 60 80 100 120 Musical naturalness Additional-note rate (%) 0 2 4 6 8 10 12 0 20 40 60 80 100 120 Unplayable-note rate (%) Additional-note rate (%) Iterated Fingering Iterated Gaussian Iterated Distance (a) (b) (c) (d) Fig. 6. Subjective e valuation results. For each method, the average results for the three sets of target difficulty are indicated with points. Bars indicate their standard errors. the number of additional notes: A unp = N unp # R − # M − # B . (23) Results are summarized in Fig. 6, where statistics (mean and standard deviation) are sho wn for each ev aluation met- rics and for each method. The results in Figs. 6(a) and 6(b) indicate that subjectiv e difficulty and musical fidelity monotonically increase with the additional note rate, which confirms the ability of the proposed methods for control- ling performance difficulty . For these two quantities, few differences can be found in the results for the three meth- ods. The result in Fig. 6(c) shows that musical naturalness tends to decrease when increasing the additional-note rate. This can be understood from the fact when A add ' 0 the reduction score consists mostly of melodic and bass notes, which should have high naturalness, and for larger A add it becomes more demanding for the models to retain naturalness. For the highest difficulty case with target dif- ficulty values (40 , 40 , 50) and A add ∼ 90% – 100% , the Iterated Gaussian and Iterated F ingering methods outper- form the baseline Iterated Distance method. This suggests the importance of incorporating sequential dependence of pitches in the piano score model for improving musical naturalness. The result in Fig. 6(d) sho ws that, especially in the high difficulty regime, the unplayable-note rate is reduced by incorporating sequential dependence of pitches in the piano score model and ev en further so by incorporating the fingering motion. This suggests that although the same difficulty measure is used and it is not a perfect measure for describing the real difficulty of a piano score, a better piano score model can generate reduction scores with less unplayable notes. D) Example Results and Discussions Examples of piano reduction scores obtained by the Iter- ated F ingering method are shown in Fig. 7, together with the results of the subjectiv e ev aluation (the ev aluation scores are giv en for the whole piece including the part not shown in the figure) 5 . W e see that results for lar ger tar- get difficulty values hav e more harmonizing notes and are giv en larger fidelity and subjecti ve-dif ficulty values. In the cases with ( e D L , e D R , e D B ) = (15 , 15 , 30) and (30 , 30 , 40) , all notes are playable in the shown section and the latter case has a larger musical-naturalness value. On the other hand, there are several unplayable notes in the case with ( e D L , e D R , e D B ) = (40 , 40 , 50) , which leads to a smaller musical-naturalness value. W e were informed from the ev aluators (professional music arrangers) that keeping more notes in a reduction score does not always improve musical naturalness. One reason is that flexibility for performance expression can be reduced by adding too many notes. W e hav e there- fore two important directions to further improve the piano reduction methods. One is to construct a more precise fingering model and difficulty measures based on it. How- ev er , as we discussed in section II.B.2, a more complex model typically requires more training data for appropriate learning. Since a large-scale fingering dataset is currently not av ailable, construction of such a dataset is also an important issue. Another is to incorporate more musical knowledge in the piano reduction model, particularly on harmonic aspects (e.g. completion of chordal notes and voice leading) and cognitiv e aspects (e.g. restricting notes ov er melodic notes to av oid mishearing of melodies). Other left issues are identification of melodic and bass notes and score typesetting for reduction scores, which 5 See http://pianoarrangement.github .io/demo.html for more exam- ples with sound files. 10 E I TA N A K A M U R A , e t a l . & ? ? & ? & ? & ? 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 ˙ ˙ ˙ ˙ ˙ ˙ . . . œ œ œ j œ œ œ œ j œ ‰ . . . œ œ œ J œ œ . . ˙ ˙ œ œ ˙ ˙ ˙ ˙ . . . œ œ œ j œ œ œ Œ œ œ œ œ Œ . . . . ˙ ˙ ˙ ˙ œ œ ˙ ˙ ˙ ˙ . . . œ œ œ j œ œ œ Œ œ Ó . . ˙ ˙ œ œ ˙ ˙ ˙ ˙ . . . œ œ œ j œ œ . ˙ œ ˙ ˙ ˙ œ œ œ œ œ œ . . ˙ ˙ œ œ # œ œ œ œ ˙ ˙ ˙ ˙ ˙ œ œ œ œ œ œ œ œ œ # œ œ œ œ ˙ ˙ ˙ ˙ œ œ œ œ œ œ Œ œ # œ œ œ œ ˙ ˙ ˙ œ œ œ œ œ œ œ œ ˙ œ œ œ œ # œ œ œ œ œ œ œ œ œ n œ œ œ œ œ ∑ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ # œ œ œ œ œ œ œ œ n œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ ∑ . . œ œ j œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ . œ j œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ . œ j œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ . œ j œ œ œ œ œ ˙ ˙ ˙ ˙ ˙ ˙ . . . . œ œ œ œ J œ œ j œ ‰Ó & ˙ ˙ ˙ ˙ ˙ ˙. . . œ œ œJ œ ˙ ˙ ˙ ˙ ˙ ˙. . . œ œ œJ œ ˙ ˙ ˙ ˙ ˙. . . œ œ œJ œ ˙ ˙ ˙ ˙œ œ œ œ . . . ˙ ˙ ˙ œ œ œ œ œ œ œ œ œ œ œ œ # # ˙ ˙ ˙œ œ œ œ ˙ ˙ ˙Œ œ œ œ œ œ œ œ œ œ # # ˙ ˙ Œ œ ˙œ œ œ œ ˙ ˙ ˙Œ œ œ œ œ œ œ œ œ # ˙ ˙œ œ œ œ ˙Ó œ œ œ œ # ˙ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ # œ œ œ ˙ ˙ ˙ œ œ œ œ n n œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ n œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ n œ œ œ œ œ œ œ œ œ œ œ œ œ œ n œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ w w œ œ œ œ œ œ œ œ œ œ œ œ œ œ Œ œ œ œ œ œ œ œ w w œ œ œ œ œ œ œ œ œ œ ŒÓ œ œ œ œ œ œ œ œ œ ŒÓ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ ŒÓ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ ŒÓ œ œ œ œ œ œ œ œ œ ŒÓ Unplayable notes Condensed score (input) Reduction scores (output) ( D L , D R , D B ) = (40 , 40 , 50) , Fidelity = 7 , Difficulty = 8 , Naturalness = 4 ,A unp =4 . 7% ( D L , D R , D B ) = (30 , 30 , 40) , Fidelity = 4 , Difficulty = 5 , Naturalness = 6 ,A unp =1 . 6% ( D L , D R , D B ) = (15 , 15 , 30) , Fidelity = 3 , Difficulty = 4 , Naturalness = 5 ,A unp =4 . 0% Fig. 7. Examples of piano reduction scores obtained by the Iterated F ingering method (W agner: Prelude to Die Meistersinger v on Nürnberg). For clear illustration, only the first 9 bars from a 27-bar excerpt in the test data are sho wn. Unplayable notes indicate those identified by the ev aluator . are manually done currently . As for the identification of melodic and bass notes, a simple method of taking the instrument part with the highest (lowest) mean pitch as the melody (bass) part for each bar can reproduce 40 . 6% ( 57 . 0% ) of the indications in our test data. While this calls for a more refined method for automatically estimating the melodic and bass notes, we noticed that the choice is also subjectiv e and it may be important to leave room for user preferences. Finally , since the ev aluation is subjective, it is also important to look at multiple ratings giv en by different ev aluators. Such a large-scale subjectiv e ev aluation would be significant for rev ealing finer relations between human’ s ev aluation and the model’ s prediction. V . CONCLUSION W e have described quantitativ e measures of performance difficulty for piano scores and a piano reduction method that can control the difficulty values based on statistical modelling. W e followed the quantification of performance difficulty using statistical models proposed in [18] and found that the difficulty values can be used as indicators of performance errors. For the current amount of training data, we also found that the difficulty measures based on the Gaussian model yields the best accuracy of predict- ing performance errors. The problem of piano reduction is formalized as a statistical optimization problem following the framew ork of [4], and we impro ved the optimization method by proposing an iterati ve method. W e confirmed the efficacy of the iterativ e optimization method and the algorithms are shown to be able to control subjecti ve difficulty and musical fidelity . It was also found that incor- porating sequential dependence and fingering motion in the piano-score model by using the Gaussian and finger- ing model improv es generated reduction scores in terms of musical naturalness and the rate of unplayable notes. Directions for further improv ements were also discussed. Whereas it has been assumed that the same difficulty measures apply universally for all players in this study , they can be different for individual players depending on, for example, the size of hands. In the present framework, part of such indi viduality can be expressed by adapting the fingering model to indi vidual players. This model adap- tation can be realized in principle if one has a sufficient amount of musical scores that ha ve been already played by an indi vidual player . Another interesting direction is to adapt an individual’ s fingering model using the frequency of errors in his/her performance data, which could reduce the amount of necessary data. A limitation of the present model is that timing errors and other rhythmic aspects are not considered. Rhythmic features may become important especially in polyrhythmic passages in which the left and right hand parts hav e con- trasting rhythms (e.g. two against three rhythms). In such cases, the sum of difficulty values for the two hands may underestimate the total dif ficulty . T o properly deal with S TA T I S T I C A L P I A N O R E D U C T I O N 11 these problems, it would be necessary to incorporate a per - formance timing model and interdependence between the two hands into the present frame work. The present formulation of combining a musical-score model and an edit model can also be applied to other forms of music arrangement if one replaces the piano fingering model with an appropriate score model of the target instru- mentation/style and adapt the edit model for relev ant edit operations. For example, if we combine a score model for jazz music and a proper edit model, it would be possible to dev elop a method for arranging a given piece in the rock music style (or other styles) into a piece in the jazz style. Although this study has focused on piano arrangement, the framework can also be useful for music transcription [22]. In music transcription, musical-score models play an important role to induce an output score to be an appro- priate one that respects musical grammar , style of target music, etc. [23, 24]. Especially in piano transcription, results of multi-pitch detection contain a significant amount of spurious notes (false positiv es), which often make the transcription results unplayable [25]. By integrating the present piano-score model and an acoustic model (instead of the edit model) and applying the method for optimiza- tion de veloped in this study , one can impose constraints on performance difficulty of transcription results and reduce these spurious notes. FINANCIAL SUPPOR T This study was partially supported by JSPS KAKENHI Nos. 26700020, 16H01744, 16J05486, 16H02917, and 16K00501, and JST A CCEL No. JPMJ A C1602. ST A TEMENT OF INTEREST None. R E F E R E N C E S [1] Chiu, S.-C.; Shan, M.-K.; Huang, J.-L.: Automatic system for the arrangement of piano reduction, in IEEE International Symposium on Multimedia , San Diego, California, 2009, 459–464. [2] Onuma, S.; Hamanaka, M.: Piano arrangement system based on composers’ arrangement processes, in International Computer Music Confer ence , New Y ork, 2010, 191–194. [3] Huang, J.-L.; Chiu, S.-C.; Shan, M.-K.: T owards an automatic music arrangement framework using score reduction. A CM Transactions on Multimedia Computing, Communications, and Applications, 8(1) (2012), 8:1–8:23. [4] Nakamura, E.; Sagayama S.: Automatic piano reduction from ensemble scores based on merged-output hidden Markov model, in International Computer Music Conference , Denton, T exas, 2015, 298–305. [5] T akamori, H.; Sato, H.; Nakatsuka, T .; Morishima, S.: Automatic arranging musical score for piano using important musical elements, in International Sound and Music Computing Conference , Aalto, 2017, 35–41. [6] T uohy , D.R.; Potter, W .D.: A genetic algorithm for the automatic generation of playable guitar tablature, in International Computer Music Confer ence , Barcelona, 2005, 499–502. [7] Hori, G.; Y oshinaga, Y .; Fukayama, S.; Kameoka, H.; Sagayama, S: Automatic arrangement for guitars using hidden Mark ov model, in International Sound and Music Computing Conference , Copen- hagen, 2012, 450–456. [8] Hori, G.; Kameoka, H.; Sagayama, S: Input-output HMM applied to automatic arrangement for guitars. J. Info. Processing Soc. Japan, 21(2) (2013), 264–271. [9] Maekaw a, H.; Emura, N.; Miura, M.; Y anagida, M.: On machine arrangement for smaller wind-orchestras based on scores for stan- dard wind-orchestras, in International Confer ence on Music P er cep- tion and Cognition , Bologna, 2006, 268–273. [10] Crestel, L.; Esling, P .: Live orchestral piano, a system for real- time orchestral music generation, in International Sound and Music Computing Confer ence , Espoo, 2017, 434–442. [11] Chiu, S.-C.; Chen, M.-S.: A study on difficulty level recognition of piano sheet music, in IEEE International Symposium on Multimedia , Irvine, California, 2012, 17–23. [12] Sébastien, V .; Ralambondrainy , H.; Sébastien, O.; Conruyt, N.: Score analyzer: Automatically determining scores difficulty level for instrumental e-learning, in International Conference on Music Information Retrieval , Porto, 2012, 571–576. [13] Bro wn, P .F .; Pietra, V .J.D.; Pietra, S.A.D.; Mercer, R.L.: The math- ematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2) (1993), 263–311. [14] Parncutt, R.; Sloboda, J.A.; Clarke, E.F .; Raekallio, M.; Desain, P .: An er gonomic model of keyboard fingering for melodic fragments. Music Perception, 14(4) (1997), 341–382. [15] Hart, M.; Tsai, E.: Finding optimal piano fingerings. The UMAP Journal, 21(1) (2000), 167–177. [16] Al Kasimi; A., Nichols, E.; Raphael, C: A simple algorithm for auto- matic generation of polyphonic piano fingerings, in International Confer ence on Music Information Retrieval , V ienna, 2007, 355–356. [17] Y onebayashi, Y .; Kameoka, H.; Sagayama, S.: Automatic decision of piano fingering based on a hidden Markov models, in International Joint Confer ence on Artificial Intellig ence , Hyderabad, 2007, 2915- 2921. [18] Nakamura, E.; Ono, N.; Sagayama S.: Merged-output HMM for piano fingering of both hands, in International Confer ence on Music Information Retrieval , T aipei, 2014, 531–536. [19] Rabiner , L.: A tutorial on hidden Markov models and selected appli- cations in speech recognition. Proc. IEEE, 77(2) (1989), 257–286. [20] Nakamura, E.; Y oshii, K.; Sagayama, S.: Rhythm transcription of polyphonic piano music based on merged-output HMM for mul- tiple voices. IEEE/ACM Trans. on Audio, Speech and Language Processing, 25(4) (2017), 794–806. [21] Nakamura, E.; Y oshii, K.; Katayose, H.: Performance error detection and post-processing for fast and accurate symbolic music align- ment, in International Conference on Music Information Retrieval , Suzhou, 2017, 347–353. [22] Benetos, E.; Dixon, S.; Giannoulis, D.; Kirchhoff, H.; Klapuri, A.: Automatic music transcription: Challenges and future directions. J. Intelligent Information Systems, 41(3) (2013), 407–434. [23] Raczy ´ nski, S.; V incent, E.; Sagayama, S.: Dynamic Bayesian net- works for symbolic polyphonic pitch modeling. IEEE T rans. on Audio, Speech, and Language Processing, 21(9) (2013), 1830–1840. 12 E I TA N A K A M U R A , e t a l . [24] Ycart, A.; Benetos, E.: Polyphonic music sequence transduction with meter-constrained LSTM networks, in IEEE International Con- fer ence on Acoustics, Speech, and Signal Pr ocessing , Calgary , 2018, 386–390. [25] Nakamura, E.; Benetos, E.; Y oshii, K.; Dixon, S.: T ow ards com- plete polyphonic music transcription: Integrating multi-pitch detec- tion and rhythm quantization, in IEEE International Conference on Acoustics, Speech, and Signal Pr ocessing , Calgary , 2018, 101–105. Biographies Eita Nakamura recei ved his Ph.D. de gree in physics from the Univ ersity of T okyo, T okyo, Japan, in 2012. After hav- ing been a Postdoctoral Researcher at the National Institute of Informatics, Meiji Univ ersity , and Kyoto Univ ersity , Kyoto, Japan, he is currently a Research Fellow of Japan Society for the Promotion of Science. His research interests include music modeling and analysis, music information processing, and statistical machine learning. Kazuyoshi Y oshii recei ved his M.S. and Ph.D. degrees in informatics from Kyoto Univ ersity , Kyoto, Japan, in 2005 and 2008, respectively . He is currently a Senior Lecturer at the Graduate School of Informatics, Kyoto University , and concurrently the Leader of the Sound Scene Under - standing T eam, RIKEN Center for Advanced Intelligence Project, T okyo, Japan. His research interests include music analysis, audio signal processing, and machine learning.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment