Statistical Piano Reduction Controlling Performance Difficulty

SIP (2015), page 1 of 12 © The Authors, 2015. The online version of this article is published within an Open Access en vironment subject to the conditions of the Creati ve Commons Attrib ution-NonCommercial-ShareAlike license . The written permission of Cambridge Univ ersity Press must be obtained for commercial re-use. doi:0000000000 Statistical Piano Reduction Contr olling P erf ormance Difﬁculty E I T A N A K A M U R A 1 A N D K A Z U Y O S H I YO S H I I 1 , 2 W e pr esent a statistical-modelling method for piano reduction, i.e. con verting an ensemble scor e into piano scor es, that can contr ol performance difﬁculty . While pr evious studies have focused on describing the condition for playable piano scores, it depends on player’ s skill and can change continuously with the tempo. W e thus computationally quantify performance difﬁculty as well as musical ﬁdelity to the original scor e, and formulate the pr oblem as optimization of musical ﬁdelity under constr aints on dif ﬁculty values. F irst, performance difﬁculty measur es ar e developed by means of pr obabilistic generative models for piano scores and the r elation to the r ate of performance err ors is studied. Second, to describe musical ﬁdelity , we construct a probabilistic model inte grating a prior piano-scor e model and a model r epr esenting how ensemble scor es ar e likely to be edited. An iter ative optimization algorithm for piano r eduction is de veloped based on statistical infer ence of the model. W e conﬁrm the effect of the iterative pr ocedure; we ﬁnd that subjective dif ﬁculty and musical ﬁdelity monotonically incr ease with contr olled dif ﬁculty values; and we show that incorporating sequential dependence of pitches and ﬁngering motion in the piano-scor e model impro ves the quality of r eduction scor es in high-difﬁculty cases. Keyw ords: I. INTR ODUCTION Music arrangement in volving a change of instrumentation (e.g. arrangement for piano, guitar , etc.) is an important process of music creation to increase the variety of music performances. Arranging a musical piece to change difﬁ- culty , for example, to make it playable for beginners, is also widely practiced. T o automate these processes, systems for piano arrangement [1 – 5], guitar arrangement [6–8], and orchestration [9, 10] hav e been studied. This study aims at a system for piano reduction, i.e. con verting an ensemble score (e.g. orchestral and band scores) into a piano score that can control performance difﬁculty and retain as much musical ﬁdelity to the original score as possible (Fig. 1). T o computationally judge whether a musical score is playable, previous studies have dev eloped conditions on the pitch and rhythmic content. For piano scores, condi- tions such as ‘there can be at most 5 simultaneous notes for each hand’ and ‘simultaneous pitch spans for each hand must be less than 14 semitones (or so)’ hav e been consid- ered [2, 3]. Howe ver , these conditions cannot be thought of as necessary nor sufﬁcient conditions for playable scores because in reality there can be a piano score with chords with more than 5 notes and/or spanning a large pitch inter- val that are con ventionally played as broken chords, and ev en scores without chords (melodies) can be unplayable in fast tempos. In fact, it is difﬁcult to ﬁnd a complete description of playable scores that is valid in e very situ- ation because the condition depends on player’ s skill and 1 Graduate School of Informatics, Kyoto Uni versity , Kyoto 606-8501, Japan 2 RIKEN Center for Advanced Intelligence Project, T okyo 103-0027, Japan Corresponding author: Eita Nakamura Email: enakamura@sap.ist.i.kyoto-u.ac.jp & & ? # # # # # # 4 2 4 2 4 2 œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ Œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ b œ œ œ œ œ œ b œ œ œ œ œ œ b œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ Musical fidelity Difficulty Input: Ensemble score Outputs: Reduction scores ... Piano reduction note deletions octave pitch shifts & ? # # # # 4 2 4 2 œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ Œ‰ J œ œ œ œ œ & ? # # # # 4 2 4 2 œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ ≈ œ ≈ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ b œ œ ‰ œ œ ‰ œ œ œ œ œ œ œ œ “Hard” reduction score “Easy” reduction score Fig. 1. Overview of the proposed system for piano reduction that can control performance difﬁculty . can change continuously with the tempo. A possible solu- tion is to quantify performance difﬁculty and use it as an indicator of playable scores in each situation of skill lev el, tempo, etc. [11, 12]. As there is generally a trade-off between performance difﬁculty and musical ﬁdelity to the original score, it is necessary to quantify musical ﬁdelity and develop an opti- mization method. Music arrangers remove notes and shift pitches in an ensemble score for the piano reduction score to match a target dif ﬁculty le vel [2, 4]. From a statisti- cal point of vie w , one can assign probabilities for these edit operations and use them to quantify musical ﬁdelity . Follo wing the analogy with statistical machine transla- tion [13], if one can construct a model for the probability P ( R | E ) of a reduction score R giv en an ensemble score E , the piano reduction problem can be formulated as optimiza- tion of P ( R | E ) under constraints on difﬁculty values. A 1 2 E I TA N A K A M U R A , e t a l . similar approach without controls of performance difﬁculty has been studied for guitar arrangement [7, 8]. T o realize this idea, a statistical-modelling approach for piano reduction that can control performance dif ﬁculty has been proposed in a recent conference paper [4]. Follo wing the thought that ﬁngering motion is closely related to the cost or dif ﬁculty of performance [14–16], quantitativ e mea- sures of performance difﬁculty were developed based on a probabilistic generative model of piano scores incorporat- ing ﬁngering motion [17, 18]. T o estimate the probability P ( R | E ) , a hidden Markov model (HMM) integrating the piano-score model and a model representing how ensem- ble scores are likely to be edited was constructed. A piano reduction algorithm was developed based on the V iterbi algorithm. While the potential of the method was suggested by the results of piano reduction for one e xample piece, formal e valuations and comparisons with other approaches were left for future work. There was also a problem of the optimization method that the upper -bound constraints on difﬁculty values were often not properly satisﬁed, due to the limitation of the V iterbi algorithm. In this study , we extend the w ork of [4] and propose an improved piano reduction method using iterative opti- mization. W e also carry out systematic ev aluations on the difﬁculty measure and the piano reduction method. In particular , we ev aluate difﬁculty measures in terms of their ability of predicting performance errors, which is to our knowledge the ﬁrst attempt in the literature to objec- tiv ely ev aluating performance dif ﬁculty measures. Piano reduction methods are e valuated both objectiv ely and sub- jectiv ely: an objectiv e ev aluation is conducted to examine the effect of the iterative optimization strategy; an subjec- tiv e ev aluation is conducted to assess the quality of the generated reduction scores. The main results are: • The proposed difﬁculty measures can be used as indica- tors of performance errors and measures incorporating the sequential nature of piano scores can better predict performance errors. • The proposed iterative optimization method yields better controls of difﬁculty than the method in [4]. • Both subjective difﬁculty and musical ﬁdelity of gener- ated reduction scores monotonically increase with con- trolled difﬁculty v alues. • By comparing methods based on different models, it is shown that incorporating sequential dependence of pitches and ﬁngering motion in the piano-score model improv es musical naturalness and the rate of unplayable notes of reduction scores in high-difﬁculty cases. The following are limitations of the current system: • Melodic and bass notes are manually indicated. • Score typesetting, especially estimation of voices within each hand part, is currently done manually . Automating these processes is an undeniable direction for future work. See section IV.D for discussions. The rest of the paper is or ganized as follo ws. In the next section, we discuss generati ve piano-score models and per - formance difﬁculty measures. In section III, we present our method for piano reduction. In section IV, we present and discuss results of ev aluation of the piano reduction method. W e conclude the paper in the last section. II. QU ANTIT A TIVE MEASURES OF PERFORMANCE DIFFICUL TY W e formulate quantitati ve performance difﬁculty measures based on probabilistic generative models of piano scores. A generati ve model incorporating piano ﬁngering and sim- pler models are described in section II.A and performance difﬁculty measures are discussed in section II.B. A) Generative Models f or Piano Scores 1) Models for One Hand Let us ﬁrst discuss models for one hand. A piano score is represented as a sequence of pitches p 1: N = ( p n ) N n =1 and corresponding onset times t 1: N = ( t n ) N n =1 ( N is the num- ber of musical notes). A generative model for piano scores (piano-score model) is here deﬁned as a model that yields the probability P ( p 1: N ) . Simple piano-score models can be constructed based on the Markov model. The probability P ( p 1: N ) is factor- ized into an initial probability P ( p 1 ) and the transition probabilities P ( p n | p n − 1 ) as P ( p 1: N ) = P ( p 1 ) N Y n =2 P ( p n | p n − 1 ) . (1) The simplest model is obtained by assuming that the ini- tial and transition probabilities obey a uniform distribution ov er pitches. Writing N p = 88 for the number of possible pitches, the model yields P ( p 1: N ) = (1 / N p ) N . Since this model yields the same probability for any piano score of the same length, it is here called a no-information model . A more realistic model can be b uild by incorporating sequential dependence of pitches. For example, a statisti- cal tendency called pitch proximity , that successiv e pitches tend to be close to each other , can be incorporated in initial and transition probabilities described with Gaussians: P ( p 1 = p ) ∝ Gauss( p ; p 0 , σ 2 p ) + , (2) P ( p n = p | p n − 1 = p 0 ) ∝ Gauss( p ; p 0 , σ 2 p ) + . (3) Here, Gauss( · ; µ, σ 2 ) denotes a Gaussian distribution with mean µ and standard deviation σ , p 0 is a reference pitch to deﬁne the initial probability , and  is a small positiv e constant for smoothing the probability for pitch transitions with a large leap. W e call this model a Gaussian model . Although the Gaussian model can capture the tendency of pitch proximity , the simpliﬁcation can lead to unrealis- tic consequences. First, pitch transitions in volving 10 or 11 semitones have higher probabilities than octa ve motions, S TA T I S T I C A L P I A N O R E D U C T I O N 3 Fingeringmotion Pianoscore & # 4 3 œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ (3  4  3  2  3  2  1  2  3  4  5  4  3  2  1) 343232123454321… 1 2 3 4 5 f n f n − 1 P ( f n | f n − 1 ) p n − 1 p n P ( p n | p n − 1 ,f n − 1 ,f n ) Fig. 2. Piano-score model incorporating ﬁngering motion. which opposes the reality [18]. Second, since the model does not distinguish white keys and black keys, it yields the same probability for piano scores transposed to any keys, which opposes the fact that “simpler k eys” in volv- ing less black keys are more frequently used. In general, the difﬁculty or naturalness of a piano score changes when it is transposed to another key since the geometry of the piano keyboard requires different ﬁngering motions [11]. T o solve this, it is necessary to construct a model that describes ﬁngering motions in addition to pitch transitions. A model (called ﬁngering model ) incorporating ﬁnger- ing motions and the geometry of the piano keyboard has been proposed in [18]. The model is based on HMM, which has been ﬁrst applied to the piano ﬁngering model in [17]. In general, we can introduce a stochastic variable f n representing a ﬁnger used to play the n th note. The variable f n takes one of the following ﬁve values: 1 = thumb, 2 = index ﬁnger , · · · , 5 = little ﬁnger 1 . According to the model (Fig. 2), a ﬁngering motion f 1: N = ( f n ) N n =1 is ﬁrst generated by an initial probability P ( f 1 ) and transi- tion probabilities P ( f n | f n − 1 ) . Next, a pitch sequence p 1: N is generated conditionally on f 1: N : the ﬁrst pitch is gener- ated by P ( p 1 | f 1 ) and the succeeding pitches are generated by P ( p n | p n − 1 , f n − 1 , f n ) , which describes the probability that a pitch would appear follo wing the pre vious pitch and the pre vious and current ﬁngers. Thus, the joint probability of pitches and ﬁngering motion P ( p 1: N , f 1: N ) is gi ven as P ( f 1 ) P ( p 1 | f 1 ) N Y n =2 P ( f n | f n − 1 ) P ( p n | p n − 1 , f n − 1 , f n ) . In general, the parameters of the ﬁngering model can be learned from music data with pitches and annotated ﬁnger - ings. For want of a sufﬁcient amount of data, the proba- bility P ( p n | p n − 1 , f n − 1 , f n ) , which hav e 88 2 · 5 2 param- eters, cannot be trained effecti vely in a direct way . W e thus introduce simplifying assumptions to reduce the num- ber of parameters. First, we assume that the probability depends on pitches through their geometrical positions on the keyboard (Fig. 2). The coordinate on the keyboard of a pitch p is represented as ` ( p ) = ( ` x ( p ) , ` y ( p )) . W e also assume translational symmetry in the x -direction and time 1 In this study , we do not consider the possibility of ﬁnger substitu- tions where two or more ﬁngers are assigned to a note. in version symmetry , which is expressed as P ( p n = p | p n − 1 = p 0 , f n − 1 = f 0 , f n = f 0 ) = F ( ` x ( p ) − ` x ( p 0 ) , ` y ( p ) − ` y ( p 0 ); f 0 , f ) = F ( ` x ( p 0 ) − ` x ( p ) , ` y ( p 0 ) − ` y ( p ); f , f 0 ) . (4) W e also assume reﬂection symmetry between left and right hands. The above model can be extended to includ- ing chords, by sequencing the contained notes from low pitch to high pitch [16]. W ith the ﬁngering model, one can estimate the ﬁngering f 1: N from a giv en sequence of pitches p 1: N by calculating the maximum of the probabil- ity P ( f 1: N | p 1: N ) ∝ P ( p 1: N , f 1: N ) . This maximization can be computed by the V iterbi algorithm [19]. 2) Models for Both Hands A piano-score model with the left and right hand parts can be obtained by ﬁrst constructing a model for each hand part and then combining the two models. If musical notes are already assigned to two hand parts, such a combined model can be obtained directly . On the other hand, if the part assignment is not giv en, as in the piano reduction prob- lem, the model should be able to describe the probability for all cases of part assignment. Such a model for piano music with unknown hand parts can be constructed based on the merged-output HMM [18, 20]. The idea is to combine outputs from two com- ponent Markov models or HMMs respectively describing the two hand parts. W e here describe a model combining two ﬁngering models. First, the hand part (left or right) associated with a note p n is represented by an additional stochastic variable η n ∈ { L , R } . The generativ e process of η n is described with a Bernoulli distribution: P ( η n = η ) = α η with α L + α R = 1 . If η n is determined, then the pitch is generated by the corresponding component model. For each η ∈ { L , R } , let a η f 0 f = P η ( f | f 0 ) and b η f 0 f ( p 0 , p ) = F η ( ` ( p ) − ` ( p 0 ); f 0 , f ) denote the ﬁngering and pitch tran- sition probabilities of the component model. This process can be described as an HMM with a state space indexed by k = ( η , f L , p L , f R , p R ) with the following initial and transition probabilities: P ( k n = k | k n − 1 = k 0 ) = ( α L a L f 0 L f L b L f 0 L f L ( p 0 L , p L ) δ f 0 R f R δ p 0 R p R , η = L; α R a R f 0 R f R b R f 0 R f R ( p 0 R , p R ) δ f 0 L f L δ p 0 L p L , η = R , P ( p n = p | k n = k ) = δ pp η , (5) where δ denotes Kronecker’ s delta. Using this model, one can estimate the sequence of latent variables k 1: N from a pitch sequence p 1: N . This can be done by maximizing the probability P ( k 1: N | p 1: N ) ∝ P ( k 1: N , p 1: N ) The most probable sequence ˆ k 1: N has the information of the optimal conﬁguration of hands ˆ η 1: N , which yields separated two hand parts and the optimal ﬁn- gering for both hands ( ˆ f L 1: N and ˆ f R 1: N ). For more details, see [18]. The Gaussian model and the no-information model can be similarly extended to models for both hands. 4 E I TA N A K A M U R A , e t a l . B) Perf ormance Difﬁculty 1) Difﬁculty Measures One can deﬁne a quantitativ e measure of performance dif- ﬁculty based on the cost of music performance. From the statistical vie wpoint, a natural choice is the probabilistic cost, which is the negati ve logarithm of a probability . T o include the dependence on tempo, we deﬁne a performance difﬁculty as the time rate of the probabilistic cost D ( t ) = − ln P ( p ( t )) / ∆ t. (6) Here, ∆ t is a time width, p ( t ) is the sequence of pitches in the time range [ t − ∆ t/ 2 , t + ∆ t/ 2] , and P ( p ( t )) is deﬁned with one of the piano-score models in section II.A. W ith the ﬁngering model, one can use the joint probability of pitches and ﬁngering to deﬁne a difﬁculty measure [18]: D ( t ) = − ln P ( p ( t ) , f ( t )) / ∆ t, (7) where f ( t ) denotes the ﬁngering corresponding to the pitches p ( t ) . If the ﬁngering is unknown, one can substi- tute the maximum-probability estimate ˆ f ( t ) in Eq. (7). F or each note n with onset time t n , we write D ( n ) = D ( t n ) . The difﬁculty measure can be deﬁned for each hand part using the pitches in that hand part and a piano-score model for one hand, which is denoted by D L ( t ) or D R ( t ) . In addition, the total difﬁculty can be deﬁned as the sum of dif ﬁculties for both hands: D B ( t ) = D L ( t ) + D R ( t ) . The quantity D B ( t ) can be relev ant as well as D L ( t ) and D R ( t ) since the difﬁculty can be high ev en if difﬁculties for individual hand parts are not so high. In previous studies [11, 12], features such as playing speed, note density , pitch entropy , hand displacement rate, hand stretch, and ﬁngering complexity have been con- sidered to estimate the difﬁculty level of piano scores. These features are incorporated in the abov e dif ﬁculty mea- sures, although in an implicit manner . If one uses the no-information model, the dif ﬁculty measure takes into account the note density and playing speed. W ith the Gaus- sian model, pitch entropy and hand displacement rate, and hand stretch are incorporated in addition. W ith the ﬁnger- ing model, ﬁngering complexity is further incorporated. 2) Evaluation T o formally examine ho w the proposed measures reﬂect real performance difﬁculty , we study their relation with the rate of performance errors. W e use a dataset [21] consist- ing of 90 MIDI piano performance signals of 30 classical musical pieces; for each piece there are performances by three different players that are recorded in international piano competitions. In the dataset, musical notes in a per- formance signal are matched to notes in the corresponding score and the following three types of performance errors are manually annotated: pitch error (a performed note with a corresponding note in the score but with a different pitch); extra note (a performed note without a corresponding note in the score); and missing note (a note in the score without a corresponding note in the performance). Timing errors are not annotated in the data and not considered in this study . 0 5 10 15 0 5 10 15 20 25 30 35 40 45 Number of errors 0 5 10 15 0 20 40 60 80 100 120 140 160 180 Number of errors 0 5 10 15 0 20 40 60 80 100 120 140 160 Number of errors Difficulty value No-information model Gaussian model Fingering model D B Fig. 3. Relations between the dif ﬁculty v alue D B and the number of perfor - mance errors. Points and bars indicate means and standard deviations. Arrows indicate onsets of performance errors (see text). W e calculate performance difﬁculty values for each onset time and calculate the number of performance errors in the time range of width ∆ t around the onset time. In the following, we set ∆ t to be 1 s. For the Gaussian model,  = 4 × 10 − 4 and p 0 is C3 (C5) for the left (right) hand. Other parameters of the Gaussian and ﬁngering models are taken from a previous study [18] where a different dataset was used for training. Fig. 3 shows the relation between difﬁculty value D B and the rate of performance errors for the three models. W e see that for each model there is an onset (roughly , 10 for the no-information model, 30 for the Gaussian model, and 40 for the ﬁngering model) below which the average num- ber of errors is almost zero and above which it gradually increases. This suggests that the dif ﬁculty measures can be used as indicators of performance errors. For comparative ev aluation, we predict performance errors by thresholding the difﬁculty v alues and measure the predictiv e accuracy for the three models. Using three thresholds D th L , D th R , and D th B , a prediction of performance errors at time t is deﬁned positiv e if one of three conditions ( D th L > D L ( t ) , D th R > D R ( t ) , and D th B > D B ( t ) ) is satis- ﬁed. W e calculate the number of true positives N TP , that of f alse positi ves N FP , and that of true neg ativ es N TN , and the following quantities are used as e valuation measures: P = N TP N TP + N FP , R = N TP N TP + N TN , F = 2 P R P + R . Since more frequent errors indicate lar ger dif ﬁculty , we can also deﬁne the following weighted quantities: P w = N 0 TP N 0 TP + N FP , R w = N 0 TP N 0 TP + N 0 TN , (8) F w = 2 P w R w P w + R w , (9) where N 0 TP and N 0 TN are obtained by weighting N TP and N TN with the number of performance errors. The results are shown in T able 1 where the thresholds are optimized S TA T I S T I C A L P I A N O R E D U C T I O N 5 Model Threshold ( D th R , D th L , D th B ) F P R F w P w R w No-information (9 , 10 , 14) 52 . 4 43 . 0 67 . 1 69 . 8 63 . 0 78 . 1 Gaussian (30 , 30 , 42) 54 . 2 46 . 3 65 . 2 71 . 3 66 . 4 77 . 0 Fingering (41 , 39 , 53) 53 . 9 49 . 1 59 . 8 70 . 6 69 . 3 73 . 8 T able 1. Accuracies of performance error prediction. with respect to F w for each model. W e see that the Gaus- sian model has the highest F measures, e ven though the differences are rather small. A possible reason is the rela- tiv ely small size of the data used for training the ﬁngering model. Since the Gaussian model has only one parame- ter σ p to train, it has better generalization ability for such small training data. Such a trade-off between model com- plexity and the required amount of training data is common in many machine-learning problems. W e thus use the dif- ﬁculty measures deﬁned with the Gaussian model in the following. III. PIANO REDUCTION METHOD In the statistical formulation of piano reduction, we try to ﬁnd the optimal reduction score ˆ R that maximizes the probability P ( R | E ) for a gi ven ensemble score E . In anal- ogy with the statistical approach for machine translation [13], we ﬁrst construct generati ve models describing the probability P ( R ) and P ( E | R ) respectively and integrate them for calculating P ( R, E ) ∝ P ( R | E ) . W e then deriv e optimization algorithms for piano reduction that take into account the constraints on performance difﬁculty v alues. Prior to the main processing step, we conv ert an input ensemble score to a condensed score by removing redun- dant notes with the same pitch and simultaneous onset time (the number of such redundant notes is memorized and used later in the calculation of Eq. (13)). Although they are different strictly , we call such a condensed score an ensemble score in what follows. What is really meant by the symbol E is also a condensed score. A) Model f or Piano Reduction T o construct a generative model that yields the probabil- ity P ( R, E ) , we integrate a piano-score model describing the probability P ( R ) and an edit model that describes the process yielding P ( E | R ) . As a piano-score model, we can use either the Gaussian model or the ﬁngering model dis- cussed in section II.A.2, which statistically describes the naturalness of a generated (reduction) score. For the edit model, we assign probabilities for edit oper - ations applied to musical notes. As in [4], we focus on the two most common edit operations, note deletion and octave pitch shift. As we model the inv erse process of generat- ing an ensemble score from a piano score, we introduce probabilities of note addition and octav e pitch shift in the edit model. For each note in the ensemble score, the prob- ability that it is an added note and not originated from the piano score is denoted by β NP (‘NP’ for not played). In this case, the note’ s pitch p is drawn from a uniform distribution Piano (reduction) score Ensemble (condensed) score ξ NP R L Octave pitch shifts Model for not-played notes Model for right-hand part Model for left-hand part R E Fig. 4. Generative process of the model for piano reduction. c unif ( p ) . If it is originated from the piano score and the cor- responding note has a pitch q , the probability of the note’ s pitch p denoted by c q ( p ) = P ( p | q ) is supposed to obey c q ( p ) = ( 1 − 2 γ oct , p = q ; γ oct , p = q ± 12 , (10) where γ oct denotes the probability of an octav e shift. W e can integrate the ﬁngering model in section II.A.2 and the edit model in the following fashion based on the merged-output HMM (Fig. 4), which leads to tractable inference algorithms. For each note in the output ensemble score, inde xed by m , we introduce a stochastic variable ξ m that can take one of three values { NP , L , R } . It is generated from a discrete distribution as P ( ξ m = ξ ) = β ξ , where parameters β ξ obey β NP + β L + β R = 1 . If ξ m = NP , then its pitch p m has a probability P ( p m = p ) = c unif ( p ) . If ξ m = L or R , then its pitch is generated from the com- ponent ﬁngering model of the corresponding hand part and may undergo an octave shift. Writing f L , p L , f R , and p R for the ﬁnger and pitch v ariables of the two component ﬁn- gering models, the latent state of the merged-output HMM is described by a set of variables r = ( ξ , f L , p L , f R , p R ) . The transition and output probabilities are deﬁned as P ( r m = r | r m − 1 = r 0 ) =      β NP δ f L f 0 L δ f R f 0 R δ p L p 0 L δ p R p 0 R , ξ = NP; β L a L f 0 L f L b L f 0 L f L ( p 0 L , p L ) δ f R f 0 R δ p R p 0 R , ξ = L; β R a R f 0 R f R b R f 0 R f R ( p 0 R , p R ) δ f L f 0 L δ p L p 0 L , ξ = R , P ( p m = p | r m = r ) =      c unif ( p ) , ξ = NP; c p L ( p ) , ξ = L; c p R ( p ) , ξ = R . (11) The model indeed generates a piano score speciﬁed by ( p L m , p R m ) m and an ensemble score speciﬁed by ( p m ) m . In the process of piano reduction, which is explained in the next section, the parameter β NP represents how much notes in the ensemble score are remov ed. Thus, properly adjusting β NP is crucial to control the performance difﬁ- culty of resulting reduction scores. Roughly speaking, if the note density around a note is high, it is necessary to 6 E I TA N A K A M U R A , e t a l . remov e more notes around that note by setting β NP large. In addition, some notes like melodic notes and bass notes are musically more important than others and should hav e a small probability of deletion, or small β NP in the present model. These conditions can be realized in the following form of β NP ( m ) , which depends on each note m : β NP ( m ) =  1 − ζ ( m )  e − κh ( m ) , (12) where h ( m ) ≥ 0 represents the musical importance of note n , κ > 0 is a coefﬁcient to control the effect of h ( m ) , and ζ ( m ) ∈ [0 , 1] is a factor to control the overall rate of note deletion. If ζ ( m ) ' 1 , β NP ( m ) ' 0 and almost all notes remain in the reduction score. If ζ ( m ) ' 0 , β NP ( m ) ' 1 unless κh ( m ) is large (i.e. note m is musically important), so most musically unimportant notes will be remov ed. In addition to the importance of melodic and bass notes, it is not difﬁcult to imagine that pitches in an ensemble score that are played simultaneously by multiple instru- ments are musically important. Thus, the following form is used for deﬁning musical importance h ( m ) : h ( m ) = I ( m ∈ M ) + I ( m ∈ B ) + a Mult( m ) , (13) where I ( C ) = 1 if a condition C is true and 0 otherwise, M denotes the set of melodic notes, B denotes the set of bass notes, and Mult( m ) is the multiplicity of note m , deﬁned as the number of notes in the ensemble score ha ving the same pitch and onset time as note m excluding m itself. The parameters κ and a are adjustable parameters, and ζ ( m ) is adjusted according to target difﬁculty values as explained in the next section. B) Algorithms f or Piano Reduction Let us deriv e algorithms for piano reduction based on the model in section III.A and the difﬁculty measures in section II.B. The piano reduction problem is here formulated as ﬁnding a reduction score R that maximizes P ( R | E ) for a giv en ensemble score E with constraints on R ’ s per- formance difﬁculty values. Speciﬁcally , we impose the following constraints for each note n in R : [ D L ( n ) < e D L ] ∧ [ D R ( n ) < e D R ] ∧ [ D B ( n ) < e D B ] , (14) where e D L , e D R , and e D B are some target dif ﬁculty v alues. W ithout the constraints (14), ﬁnding the maximum of P ( R | E ) is a basic inference problem for HMMs and can be achiev ed with the V iterbi algorithm [19]. Howe ver , the constraints (14) cannot be easily treated because difﬁculty values at each note depends on the e xistence of other notes in the time range of ∆ t , which violates the Markovian assumption for the V iterbi algorithm. In other words, if we know appropriate values of ζ ( m ) in Eq. (12) for control- ling difﬁculty values, the optimization problem is directly solvable, b ut ﬁnding those v alues is not easy . In the following, we present two strategies for optimiza- tion. In a previous study [4], appropriate values of ζ ( m ) were estimated and the V iterbi algorithm was applied once to obtain the result. A slight extension of this one-time optimization method is presented in section III.B.1. On the other hand, if one can apply the V iterbi algorithm itera- tiv ely , it would be possible to ﬁnd appropriate v alues of ζ ( m ) from tentati ve results, by starting from ζ ( m ) = 1 and gradually lessening it. This iterative optimization method is dev eloped in section III.B.2. 1) One-time optimization algorithm In [4], appropriate v alues of ζ ( m ) were estimated by matching the expected dif ﬁculty values to the target v alues with the following equation: ζ ( m ) = min  e D L D L ( m ) , e D R D R ( m )  , (15) where D L ( m ) etc. represent the dif ﬁculty v alues calculated for the ensemble score at its m th note. One can include a factor in volving D B ( m ) in the abo ve equation in general. W e can generalize this method by introducing a scaling factor ρ and modifying Eq. (15) to ζ ( m ) = ρ min  e D L D L ( m ) , e D R D R ( m )  . (16) By choosing the value of ρ , one can control the expected av erage of resulting difﬁculty v alues. F or e xample, one can use a maximum value of ρ that can satisfy the constraints (14) for most outcomes. 2) Iterative optimization algorithm For iterative optimization, the V iterbi algorithm is applied in each iteration to obtain a tentative reduction score R ( i ) , with tentati ve v alues of ζ ( i ) ( m ) ( i is an index for itera- tions). For each note n in R ( i ) , we calculate the difﬁculty values D ( i ) L ( n ) , D ( i ) R ( n ) , and D ( i ) B ( n ) . If the constraints (14) are not all satisﬁed at note n , then we lessen the val- ues of ζ ( m ) for all notes m in the ensemble score around n within the time range of width ∆ t as ζ ( i +1) ( m ) = λζ ( i ) ( m ) (17) with some constant 0 < λ < 1 . The iterati ve algorithm is initialized with ζ ( i =1) ( m ) = 1 for all notes m . The algorithm ends when the constraints (14) are satisﬁed at e very note in the reduction score, or the number of iterations exceed some predeﬁned value i max . For efﬁcient and stable computation, the V iterbi algorithm at iteration i + 1 is applied only to those regions of the ensemble score where the constraints (14) are not still sat- isﬁed at iteration i . Speciﬁcally , we ﬁrst construct a set of notes m in the ensemble score whose onset time t m is included in the range [ t n − ∆ t/ 2 , t n + ∆ t/ 2] around some onset time t n in the reduction score for which the difﬁ- culty constraints are not satisﬁed. This set is then split into a set Ψ of isolated regions of notes. For each such isolated region, the V iterbi algorithm is applied with ﬁxed bound- ary states at one note before the beginning of the region and one note after the end. S TA T I S T I C A L P I A N O R E D U C T I O N 7 The iterativ e algorithm is summarized as follo ws. (i) Initialize ζ ( i =1) ( m ) = 1 and apply the V iterbi algorithm to the whole ensemble score. (ii) Calculate difﬁculty values and obtain regions Ψ where the constraints (14) are not satisﬁed. Exit if Ψ is empty or i ≥ i max . (iii) Update the control f actor ζ ( m ) as in Eq. (17) and apply the V iterbi algorithm to each region of Ψ . Increment i and go back to step (ii). IV . EV ALU A TION OF PIANO REDUCTION ALGORITHMS A) Setup T o ev aluate the piano reduction algorithms, we prepared a dataset of orchestral pieces of W estern classical music. The dataset consists of 10 pieces by different composers and with dif ferent instrumentations; each pieces has a length of around 20 bars. The list of the pieces are av ailable in the accompanying webpage 2 . W e compare one-time optimization algorithms and iter- ativ e optimization algorithms based on the Gaussian model and the ﬁngering model; in total we hav e four meth- ods labelled as One-time Gaussian , One-time F ingering , Iterated Gaussian , and Iterated F ingering methods. The parameters of the piano-score models are set as in section II.B.2. The other parameters are set as follows: a = 0 . 01 , κ = 11 , λ = 0 . 85 , γ oct = 0 . 001 , and β R ( m ) = β L ( m ) = (1 − β NP ( m )) / 2 where β NP ( m ) is set as in Eq. (12). Dif- ﬁculty v alues are calculated with the difﬁculty measures using the Gaussian model with ∆ t = 1 s. These parameter values were ﬁxed after some trials by one of the authors and there is room for further optimization. As a baseline method we also implement a method based on a simple piano-score model (called the distance model) that takes into account the distance between each note in the ensemble score and its closest melodic or bass note, but not sequential dependence of pitches. Speciﬁcally , for each note m in the ensemble score the closest melodic or bass notes CMB( m ) is obtained by ﬁrst searching in the direction of onset time and then in the direction of pitches. Then the probability of its pitch p m is giv en as P ( p m ) ∝ Gauss( p m ; CMB( m ) , σ 2 p ) . (18) Integrating this piano-score model into the piano reduction model in section III.A and using the iterative optimization algorithm, a baseline Iterated Distance method is obtained. B) Quantitative Ev aluation of Difﬁculty Control W e ﬁrst examine the effect of the iterative optimization algorithms in controlling the difﬁculty values of output 2 http://pianoarrangement.github .io/demo.html 0 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 Difficulty values (for both hands) Max. difficulty Out-of-range rate Mean difficulty ρρρ (15,15,30) (30,30,40) (40,40,50) Fig. 5. Difﬁculty metrics for the One-time Gaussian method for varying ρ , for three cases of target difﬁculty v alues ( e D L , e D R , e D B ) indicated in the insets. Difﬁculty v alues are those for both hands ( D B , D max B , etc.) and horizontal lines indicate corresponding values for the Iter ated Gaussian method. reduction scores, in comparison with the one-time opti- mization algorithms. W e run the four algorithms, One- time Gaussian , One-time Fing ering , Iterated Gaussian , and Iterated F ingering , for the test dataset with three sets of target difﬁculty values ( e D L , e D R , e D B ) = (15 , 15 , 30) , (30 , 30 , 40) , and (40 , 40 , 50) . For the scaling factor ρ for the one-time optimization algorithms, we test values in { 0 . 1 , 0 . 2 , . . . , 1 . 0 } . For the iterativ e optimization algo- rithms, i max is set to 50 . T o ev aluate a reduction score R , we compute difﬁculty values ( D L ( n ) etc.) for each note n in R and calculate the follo wing measures: • Mean Difﬁculty V alues { D L , D R , D B } : D L = 1 # R X n ∈ R D L ( n ) etc . (19) • Maximum Difﬁculty V alues { D max L , D max R , D max B } : D max L = max n ∈ R { D L ( n ) } etc . (20) • Out-of-Range Rate (proportion of regions where difﬁ- culty values e xceed target v alues) { A L out , A R out , A B out } : A L out = #  n ∈ R   D L ( n ) > e D L  # R etc . (21) • Additional-Note Rate (proportion of notes in the reduc- tion score other than melodic and bass notes) A add : A add = # R − # M − # B # M + # B . (22) V ariations of dif ﬁculty values of the reduction scores by the One-time Gaussian method are shown in Fig. 5, with corresponding values for the Iterated Gaussian method. Here, for simplicity , only difﬁculty values for both hands ( D B , D max B , etc.) are shown. It is observed that for those values of ρ where A B out is equi valent to that for the iterativ e optimization method, D B and D max B are smaller compared to the iterativ e optimization method. This means that with the same level of satisfaction for the difﬁculty constraints, 8 E I TA N A K A M U R A , e t a l . Algorithm T arget dif ﬁculty Mean difﬁculty Max. difﬁculty Out-of-range rate (%) A out (%) One-time Gaussian (15 , 15 , 30) (10 . 0 , 5 . 4 , 15 . 4) (22 . 5 , 14 . 6 , 30 . 5) (18 . 6 , 7 . 3 , 2 . 3) 7 . 1 Iterated Gaussian (15 , 15 , 30) (11 . 0 , 6 . 1 , 17 . 0) (22 . 3 , 15 . 5 , 31 . 8) (18 . 2 , 7 . 2 , 2 . 2) 20 . 3 One-time Fingering (15 , 15 , 30) (12 . 9 , 9 . 1 , 22 . 0) (30 . 7 , 27 . 8 , 50 . 5) (30 . 0 , 18 . 0 , 20 . 9) 30 . 9 Iterated Fingering (15 , 15 , 30) (12 . 7 , 8 . 4 , 21 . 1) (29 . 0 , 23 . 9 , 46 . 5) (27 . 5 , 15 . 7 , 14 . 9) 31 . 7 Iterated Distance (15 , 15 , 30) (11 . 9 , 6 . 1 , 18 . 0) (28 . 0 , 15 . 7 , 37 . 3) (23 . 4 , 7 . 4 , 5 . 2) 21 . 8 One-time Gaussian (30 , 30 , 40) (10 . 4 , 5 . 5 , 15 . 9) (23 . 2 , 15 . 4 , 30 . 7) (0 . 7 , 0 , 0 . 6) 11 . 4 Iterated Gaussian (30 , 30 , 40) (16 . 2 , 8 . 3 , 24 . 5) (30 . 0 , 21 . 2 , 39 . 8) (0 . 4 , 0 , 0 . 6) 62 . 3 One-time Fingering (30 , 30 , 40) (13 . 2 , 9 . 4 , 22 . 7) (31 . 8 , 28 . 6 , 51 . 2) (6 . 5 , 5 . 8 , 11 . 6) 33 . 4 Iterated Fingering (30 , 30 , 40) (16 . 3 , 10 . 6 , 26 . 9) (34 . 3 , 28 . 6 , 50 . 9) (3 . 6 , 3 . 0 , 6 . 3) 60 . 1 Iterated Distance (30 , 30 , 40) (17 . 8 , 8 . 3 , 26 . 0) (35 . 9 , 21 . 7 , 44 . 8) (2 . 4 , 0 , 2 . 3) 61 . 9 One-time Gaussian (40 , 40 , 50) (13 . 4 , 7 . 0 , 20 . 4) (30 . 6 , 19 . 2 , 40 . 1) (0 . 1 , 0 , 0 . 1) 39 . 0 Iterated Gaussian (40 , 40 , 50) (20 . 9 , 11 . 1 , 32 . 0) (36 . 8 , 27 . 8 , 48 . 8) (0 , 0 , 0) 98 . 3 One-time Fingering (40 , 40 , 50) (13 . 5 , 9 . 5 , 22 . 9) (32 . 4 , 29 . 2 , 51 . 7) (2 . 8 , 3 . 4 , 5 . 7) 34 . 7 Iterated Fingering (40 , 40 , 50) (20 . 2 , 13 . 6 , 33 . 8) (40 . 1 , 33 . 1 , 54 . 9) (1 . 7 , 1 . 0 , 1 . 6) 88 . 9 Iterated Distance (40 , 40 , 50) (22 . 1 , 10 . 3 , 32 . 4) (42 . 5 , 27 . 3 , 53 . 6) (0 . 8 , 0 , 0 . 8) 88 . 3 T able 2. Comparison of average values of dif ﬁculty metrics for reduction scores. T riplet values in parentheses indicate one for left-hand part, right-hand part, and both hand parts, from left to right. results of the iterativ e optimization method hav e lar ger dif- ﬁculty v alues on the av erage, which is a desired property . On the other hand, if ρ is increased sufﬁciently , it is possi- ble for the one-time optimization algorithm to achiev e the same lev el of D B as the iterative optimization method, but then A B out is larger , meaning that the difﬁculty constraints are less strictly satisﬁed. Analyses of difﬁculty values for each hand and comparison between One-time F ingering and Iterated F ingering methods rev eal similar tendencies. The results for all three kinds of difﬁculty values (for each of two hands and for both hands) are shown in T able 2. Here, for one-time optimization methods, results are sho wn for the smallest value of ρ such that all three out-of-range rates exceed those for the corresponding iterative optimiza- tion methods. In addition to the same tendencies as found in the above analysis, one can observe that for the same level of satisfaction of difﬁculty constraints, the iterative opti- mization methods yields larger additional-note rates than the corresponding one-time optimization methods. These results indicate that the iterativ e optimization methods are more appropriate for controlling difﬁculty v alues. Even for iterative optimization algorithms, the out-of- range rates can be nonzero especially for small target difﬁculty values. One reason for this is that for some pieces the minimal reduction score with only melodic and bass notes has difﬁculty values larger than the target values. Another reason is the greedy-like nature of the iter- ativ e optimization algorithms: when some regions of the reduction score is ﬁxed and used as boundary conditions for updates, the V iterbi search sometimes cannot reduce notes e ven for smaller v alues of ζ ( m ) . Comparing iterati ve optimization methods in cases of target difﬁculty values (30 , 30 , 40) and (40 , 40 , 50) , we ﬁnd that while the Iter - ated Gaussian method has the largest additional-note rate, it has the least values for most difﬁculty ev aluation mea- sures. If the additional-note rate increases with the ﬁdelity to the original ensemble score, this indicates the Iterated Gaussian method has the ability to efﬁciently increase the ﬁdelity while retaining low difﬁculty values. This is prob- ably because the Gaussian model is used for calculating difﬁculty measures. C) Subjective Ev aluation W e conduct a subjectiv e ev aluation experiment to ev aluate the quality of reduction scores by the proposed algorithms 3 . In particular , we e xamine how much of the additional notes (notes other than melodic and bass notes) are actually playable and how the musical quality such as ﬁdelity and difﬁculty changes with varying tar get difﬁculty v alues. For this, we asked professional piano arrangers to ev aluate the piano reductions generated by the Iterated F ingering , Iter- ated Gaussian , and Iterated Distance methods with three sets of target difﬁculty v alues (15 , 15 , 30) , (30 , 30 , 40) , and (40 , 40 , 50) . T wo music arrangers participated in the ev aluation and each reduction score was ev aluated by one of them. Evaluators are provided manually typeset reduc- tion scores, the input condensed scores, and corresponding audio ﬁles of the 10 tested musical pieces, which are uploaded to the accompanying demo page 4 . The e v aluation metrics are as follows: • Musical ﬁdelity (10 steps; 1 : not faithful at all, . . . , 10 : very faithful) — How the reduction score is faithful to the original ensemble score in terms of music acoustics. • Subjective difﬁculty (10 steps; 1 : very easy , . . . , 10 : very difﬁcult) — How difﬁcult the reduction score is for playing with two hands. • Musical naturalness (10 steps; 1 very unnatural, . . . , 10 : very natural) — How natural the reduction score is as a piano score. • Number of unplayable notes N unp — How many notes and which notes should be removed from the reduction score to make it playable by a skillful pianist. W e deﬁne the unplayable-note rate A unp , a quantity normalized by 3 Readers who wish to have access to the raw experimental data and source code should contact the authors. 4 http://pianoarrangement.github .io/demo.html S TA T I S T I C A L P I A N O R E D U C T I O N 9 2 3 4 5 6 7 8 0 20 40 60 80 100 120 Musical fidelity Additional-note rate (%) 2 4 6 8 10 0 20 40 60 80 100 120 Subjective difficulty Additional-note rate (%) 4 5 6 7 8 0 20 40 60 80 100 120 Musical naturalness Additional-note rate (%) 0 2 4 6 8 10 12 0 20 40 60 80 100 120 Unplayable-note rate (%) Additional-note rate (%) Iterated Fingering Iterated Gaussian Iterated Distance (a) (b) (c) (d) Fig. 6. Subjective e valuation results. For each method, the average results for the three sets of target difﬁculty are indicated with points. Bars indicate their standard errors. the number of additional notes: A unp = N unp # R − # M − # B . (23) Results are summarized in Fig. 6, where statistics (mean and standard deviation) are sho wn for each ev aluation met- rics and for each method. The results in Figs. 6(a) and 6(b) indicate that subjectiv e difﬁculty and musical ﬁdelity monotonically increase with the additional note rate, which conﬁrms the ability of the proposed methods for control- ling performance difﬁculty . For these two quantities, few differences can be found in the results for the three meth- ods. The result in Fig. 6(c) shows that musical naturalness tends to decrease when increasing the additional-note rate. This can be understood from the fact when A add ' 0 the reduction score consists mostly of melodic and bass notes, which should have high naturalness, and for larger A add it becomes more demanding for the models to retain naturalness. For the highest difﬁculty case with target dif- ﬁculty values (40 , 40 , 50) and A add ∼ 90% – 100% , the Iterated Gaussian and Iterated F ingering methods outper- form the baseline Iterated Distance method. This suggests the importance of incorporating sequential dependence of pitches in the piano score model for improving musical naturalness. The result in Fig. 6(d) sho ws that, especially in the high difﬁculty regime, the unplayable-note rate is reduced by incorporating sequential dependence of pitches in the piano score model and ev en further so by incorporating the ﬁngering motion. This suggests that although the same difﬁculty measure is used and it is not a perfect measure for describing the real difﬁculty of a piano score, a better piano score model can generate reduction scores with less unplayable notes. D) Example Results and Discussions Examples of piano reduction scores obtained by the Iter- ated F ingering method are shown in Fig. 7, together with the results of the subjectiv e ev aluation (the ev aluation scores are giv en for the whole piece including the part not shown in the ﬁgure) 5 . W e see that results for lar ger tar- get difﬁculty values hav e more harmonizing notes and are giv en larger ﬁdelity and subjecti ve-dif ﬁculty values. In the cases with ( e D L , e D R , e D B ) = (15 , 15 , 30) and (30 , 30 , 40) , all notes are playable in the shown section and the latter case has a larger musical-naturalness value. On the other hand, there are several unplayable notes in the case with ( e D L , e D R , e D B ) = (40 , 40 , 50) , which leads to a smaller musical-naturalness value. W e were informed from the ev aluators (professional music arrangers) that keeping more notes in a reduction score does not always improve musical naturalness. One reason is that ﬂexibility for performance expression can be reduced by adding too many notes. W e hav e there- fore two important directions to further improve the piano reduction methods. One is to construct a more precise ﬁngering model and difﬁculty measures based on it. How- ev er , as we discussed in section II.B.2, a more complex model typically requires more training data for appropriate learning. Since a large-scale ﬁngering dataset is currently not av ailable, construction of such a dataset is also an important issue. Another is to incorporate more musical knowledge in the piano reduction model, particularly on harmonic aspects (e.g. completion of chordal notes and voice leading) and cognitiv e aspects (e.g. restricting notes ov er melodic notes to av oid mishearing of melodies). Other left issues are identiﬁcation of melodic and bass notes and score typesetting for reduction scores, which 5 See http://pianoarrangement.github .io/demo.html for more exam- ples with sound ﬁles. 10 E I TA N A K A M U R A , e t a l . & ? ? & ? & ? & ? 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 ˙ ˙ ˙ ˙ ˙ ˙ . . . œ œ œ j œ œ œ œ j œ ‰ . . . œ œ œ J œ œ . . ˙ ˙ œ œ ˙ ˙ ˙ ˙ . . . œ œ œ j œ œ œ Œ œ œ œ œ Œ . . . . ˙ ˙ ˙ ˙ œ œ ˙ ˙ ˙ ˙ . . . œ œ œ j œ œ œ Œ œ Ó . . ˙ ˙ œ œ ˙ ˙ ˙ ˙ . . . œ œ œ j œ œ . ˙ œ ˙ ˙ ˙ œ œ œ œ œ œ . . ˙ ˙ œ œ # œ œ œ œ ˙ ˙ ˙ ˙ ˙ œ œ œ œ œ œ œ œ œ # œ œ œ œ ˙ ˙ ˙ ˙ œ œ œ œ œ œ Œ œ # œ œ œ œ ˙ ˙ ˙ œ œ œ œ œ œ œ œ ˙ œ œ œ œ # œ œ œ œ œ œ œ œ œ n œ œ œ œ œ ∑ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ # œ œ œ œ œ œ œ œ n œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ ∑ . . œ œ j œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ . œ j œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ . œ j œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ . œ j œ œ œ œ œ ˙ ˙ ˙ ˙ ˙ ˙ . . . . œ œ œ œ J œ œ j œ ‰Ó & ˙ ˙ ˙ ˙ ˙ ˙. . . œ œ œJ œ ˙ ˙ ˙ ˙ ˙ ˙. . . œ œ œJ œ ˙ ˙ ˙ ˙ ˙. . . œ œ œJ œ ˙ ˙ ˙ ˙œ œ œ œ . . . ˙ ˙ ˙ œ œ œ œ œ œ œ œ œ œ œ œ # # ˙ ˙ ˙œ œ œ œ ˙ ˙ ˙Œ œ œ œ œ œ œ œ œ œ # # ˙ ˙ Œ œ ˙œ œ œ œ ˙ ˙ ˙Œ œ œ œ œ œ œ œ œ # ˙ ˙œ œ œ œ ˙Ó œ œ œ œ # ˙ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ # œ œ œ ˙ ˙ ˙ œ œ œ œ n n œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ n œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ n œ œ œ œ œ œ œ œ œ œ œ œ œ œ n œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ w w œ œ œ œ œ œ œ œ œ œ œ œ œ œ Œ œ œ œ œ œ œ œ w w œ œ œ œ œ œ œ œ œ œ ŒÓ œ œ œ œ œ œ œ œ œ ŒÓ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ ŒÓ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ ŒÓ œ œ œ œ œ œ œ œ œ ŒÓ Unplayable notes Condensed score (input) Reduction scores (output) ( D L , D R , D B ) = (40 , 40 , 50) , Fidelity = 7 , Diﬃculty = 8 , Naturalness = 4 ,A unp =4 . 7% ( D L , D R , D B ) = (30 , 30 , 40) , Fidelity = 4 , Diﬃculty = 5 , Naturalness = 6 ,A unp =1 . 6% ( D L , D R , D B ) = (15 , 15 , 30) , Fidelity = 3 , Diﬃculty = 4 , Naturalness = 5 ,A unp =4 . 0% Fig. 7. Examples of piano reduction scores obtained by the Iterated F ingering method (W agner: Prelude to Die Meistersinger v on Nürnberg). For clear illustration, only the ﬁrst 9 bars from a 27-bar excerpt in the test data are sho wn. Unplayable notes indicate those identiﬁed by the ev aluator . are manually done currently . As for the identiﬁcation of melodic and bass notes, a simple method of taking the instrument part with the highest (lowest) mean pitch as the melody (bass) part for each bar can reproduce 40 . 6% ( 57 . 0% ) of the indications in our test data. While this calls for a more reﬁned method for automatically estimating the melodic and bass notes, we noticed that the choice is also subjectiv e and it may be important to leave room for user preferences. Finally , since the ev aluation is subjective, it is also important to look at multiple ratings giv en by different ev aluators. Such a large-scale subjectiv e ev aluation would be signiﬁcant for rev ealing ﬁner relations between human’ s ev aluation and the model’ s prediction. V . CONCLUSION W e have described quantitativ e measures of performance difﬁculty for piano scores and a piano reduction method that can control the difﬁculty values based on statistical modelling. W e followed the quantiﬁcation of performance difﬁculty using statistical models proposed in [18] and found that the difﬁculty values can be used as indicators of performance errors. For the current amount of training data, we also found that the difﬁculty measures based on the Gaussian model yields the best accuracy of predict- ing performance errors. The problem of piano reduction is formalized as a statistical optimization problem following the framew ork of [4], and we impro ved the optimization method by proposing an iterati ve method. W e conﬁrmed the efﬁcacy of the iterativ e optimization method and the algorithms are shown to be able to control subjecti ve difﬁculty and musical ﬁdelity . It was also found that incor- porating sequential dependence and ﬁngering motion in the piano-score model by using the Gaussian and ﬁnger- ing model improv es generated reduction scores in terms of musical naturalness and the rate of unplayable notes. Directions for further improv ements were also discussed. Whereas it has been assumed that the same difﬁculty measures apply universally for all players in this study , they can be different for individual players depending on, for example, the size of hands. In the present framework, part of such indi viduality can be expressed by adapting the ﬁngering model to indi vidual players. This model adap- tation can be realized in principle if one has a sufﬁcient amount of musical scores that ha ve been already played by an indi vidual player . Another interesting direction is to adapt an individual’ s ﬁngering model using the frequency of errors in his/her performance data, which could reduce the amount of necessary data. A limitation of the present model is that timing errors and other rhythmic aspects are not considered. Rhythmic features may become important especially in polyrhythmic passages in which the left and right hand parts hav e con- trasting rhythms (e.g. two against three rhythms). In such cases, the sum of difﬁculty values for the two hands may underestimate the total dif ﬁculty . T o properly deal with S TA T I S T I C A L P I A N O R E D U C T I O N 11 these problems, it would be necessary to incorporate a per - formance timing model and interdependence between the two hands into the present frame work. The present formulation of combining a musical-score model and an edit model can also be applied to other forms of music arrangement if one replaces the piano ﬁngering model with an appropriate score model of the target instru- mentation/style and adapt the edit model for relev ant edit operations. For example, if we combine a score model for jazz music and a proper edit model, it would be possible to dev elop a method for arranging a given piece in the rock music style (or other styles) into a piece in the jazz style. Although this study has focused on piano arrangement, the framework can also be useful for music transcription [22]. In music transcription, musical-score models play an important role to induce an output score to be an appro- priate one that respects musical grammar , style of target music, etc. [23, 24]. Especially in piano transcription, results of multi-pitch detection contain a signiﬁcant amount of spurious notes (false positiv es), which often make the transcription results unplayable [25]. By integrating the present piano-score model and an acoustic model (instead of the edit model) and applying the method for optimiza- tion de veloped in this study , one can impose constraints on performance difﬁculty of transcription results and reduce these spurious notes. FINANCIAL SUPPOR T This study was partially supported by JSPS KAKENHI Nos. 26700020, 16H01744, 16J05486, 16H02917, and 16K00501, and JST A CCEL No. JPMJ A C1602. ST A TEMENT OF INTEREST None. R E F E R E N C E S [1] Chiu, S.-C.; Shan, M.-K.; Huang, J.-L.: Automatic system for the arrangement of piano reduction, in IEEE International Symposium on Multimedia , San Diego, California, 2009, 459–464. [2] Onuma, S.; Hamanaka, M.: Piano arrangement system based on composers’ arrangement processes, in International Computer Music Confer ence , New Y ork, 2010, 191–194. [3] Huang, J.-L.; Chiu, S.-C.; Shan, M.-K.: T owards an automatic music arrangement framework using score reduction. A CM Transactions on Multimedia Computing, Communications, and Applications, 8(1) (2012), 8:1–8:23. [4] Nakamura, E.; Sagayama S.: Automatic piano reduction from ensemble scores based on merged-output hidden Markov model, in International Computer Music Conference , Denton, T exas, 2015, 298–305. [5] T akamori, H.; Sato, H.; Nakatsuka, T .; Morishima, S.: Automatic arranging musical score for piano using important musical elements, in International Sound and Music Computing Conference , Aalto, 2017, 35–41. [6] T uohy , D.R.; Potter, W .D.: A genetic algorithm for the automatic generation of playable guitar tablature, in International Computer Music Confer ence , Barcelona, 2005, 499–502. [7] Hori, G.; Y oshinaga, Y .; Fukayama, S.; Kameoka, H.; Sagayama, S: Automatic arrangement for guitars using hidden Mark ov model, in International Sound and Music Computing Conference , Copen- hagen, 2012, 450–456. [8] Hori, G.; Kameoka, H.; Sagayama, S: Input-output HMM applied to automatic arrangement for guitars. J. Info. Processing Soc. Japan, 21(2) (2013), 264–271. [9] Maekaw a, H.; Emura, N.; Miura, M.; Y anagida, M.: On machine arrangement for smaller wind-orchestras based on scores for stan- dard wind-orchestras, in International Confer ence on Music P er cep- tion and Cognition , Bologna, 2006, 268–273. [10] Crestel, L.; Esling, P .: Live orchestral piano, a system for real- time orchestral music generation, in International Sound and Music Computing Confer ence , Espoo, 2017, 434–442. [11] Chiu, S.-C.; Chen, M.-S.: A study on difﬁculty level recognition of piano sheet music, in IEEE International Symposium on Multimedia , Irvine, California, 2012, 17–23. [12] Sébastien, V .; Ralambondrainy , H.; Sébastien, O.; Conruyt, N.: Score analyzer: Automatically determining scores difﬁculty level for instrumental e-learning, in International Conference on Music Information Retrieval , Porto, 2012, 571–576. [13] Bro wn, P .F .; Pietra, V .J.D.; Pietra, S.A.D.; Mercer, R.L.: The math- ematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2) (1993), 263–311. [14] Parncutt, R.; Sloboda, J.A.; Clarke, E.F .; Raekallio, M.; Desain, P .: An er gonomic model of keyboard ﬁngering for melodic fragments. Music Perception, 14(4) (1997), 341–382. [15] Hart, M.; Tsai, E.: Finding optimal piano ﬁngerings. The UMAP Journal, 21(1) (2000), 167–177. [16] Al Kasimi; A., Nichols, E.; Raphael, C: A simple algorithm for auto- matic generation of polyphonic piano ﬁngerings, in International Confer ence on Music Information Retrieval , V ienna, 2007, 355–356. [17] Y onebayashi, Y .; Kameoka, H.; Sagayama, S.: Automatic decision of piano ﬁngering based on a hidden Markov models, in International Joint Confer ence on Artiﬁcial Intellig ence , Hyderabad, 2007, 2915- 2921. [18] Nakamura, E.; Ono, N.; Sagayama S.: Merged-output HMM for piano ﬁngering of both hands, in International Confer ence on Music Information Retrieval , T aipei, 2014, 531–536. [19] Rabiner , L.: A tutorial on hidden Markov models and selected appli- cations in speech recognition. Proc. IEEE, 77(2) (1989), 257–286. [20] Nakamura, E.; Y oshii, K.; Sagayama, S.: Rhythm transcription of polyphonic piano music based on merged-output HMM for mul- tiple voices. IEEE/ACM Trans. on Audio, Speech and Language Processing, 25(4) (2017), 794–806. [21] Nakamura, E.; Y oshii, K.; Katayose, H.: Performance error detection and post-processing for fast and accurate symbolic music align- ment, in International Conference on Music Information Retrieval , Suzhou, 2017, 347–353. [22] Benetos, E.; Dixon, S.; Giannoulis, D.; Kirchhoff, H.; Klapuri, A.: Automatic music transcription: Challenges and future directions. J. Intelligent Information Systems, 41(3) (2013), 407–434. [23] Raczy ´ nski, S.; V incent, E.; Sagayama, S.: Dynamic Bayesian net- works for symbolic polyphonic pitch modeling. IEEE T rans. on Audio, Speech, and Language Processing, 21(9) (2013), 1830–1840. 12 E I TA N A K A M U R A , e t a l . [24] Ycart, A.; Benetos, E.: Polyphonic music sequence transduction with meter-constrained LSTM networks, in IEEE International Con- fer ence on Acoustics, Speech, and Signal Pr ocessing , Calgary , 2018, 386–390. [25] Nakamura, E.; Benetos, E.; Y oshii, K.; Dixon, S.: T ow ards com- plete polyphonic music transcription: Integrating multi-pitch detec- tion and rhythm quantization, in IEEE International Conference on Acoustics, Speech, and Signal Pr ocessing , Calgary , 2018, 101–105. Biographies Eita Nakamura recei ved his Ph.D. de gree in physics from the Univ ersity of T okyo, T okyo, Japan, in 2012. After hav- ing been a Postdoctoral Researcher at the National Institute of Informatics, Meiji Univ ersity , and Kyoto Univ ersity , Kyoto, Japan, he is currently a Research Fellow of Japan Society for the Promotion of Science. His research interests include music modeling and analysis, music information processing, and statistical machine learning. Kazuyoshi Y oshii recei ved his M.S. and Ph.D. degrees in informatics from Kyoto Univ ersity , Kyoto, Japan, in 2005 and 2008, respectively . He is currently a Senior Lecturer at the Graduate School of Informatics, Kyoto University , and concurrently the Leader of the Sound Scene Under - standing T eam, RIKEN Center for Advanced Intelligence Project, T okyo, Japan. His research interests include music analysis, audio signal processing, and machine learning.

Statistical Piano Reduction Controlling Performance Difficulty

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment