Convergence and error propagation results on a linear iterative unfolding method

Unfolding problems often arise in the context of statistical data analysis. Such problematics occur when the probability distribution of a physical quantity is to be measured, but it is randomized (smeared) by some well understood process, such as a …

Authors: Andras Laszlo

Convergence and error propagation results on a linear iterative   unfolding method
CONVERGENCE AND ERROR PR OP A GA TION RESUL TS O N A LINEAR ITERA TIVE UNF OLDING METHOD ANDR ´ AS L ´ ASZL ´ O ∗ Abstract. Unfolding problems often arise in the con text of statistical data analysis. Such prob- lematics occur when the probability distribution of a p h ysical quan tit y is to be measured, but it is randomized (smeared) b y some we ll understo od process, such as a non-ideal detector response or a we ll describ ed physical phenomenon. In suc h case it is said that the ori ginal probability distribution of int erest is folded b y a kno wn resp onse function. The reconstruction of the original probabilit y distribution from the measured one i s called unfolding. That tec hnically inv olv es ev aluation of the non-bounded inv erse of an integral op erator o ver the space of L 1 functions, whic h is kno wn to be an ill-p osed problem. F or the p ertinent r egularized operator inv ersion, we propose a linear iterative formula and provide pro of of con v ergence in a pr obabilit y theory con text. F urthermore, we provide formulae for error estimates at finite i teration stopping order which are of utmost i mp ortance in practical applications: the approximation error, the propagate d statistical error, and the propagated systematic error can b e quantified. The arguments are based on the Riesz-Thorin theorem mapping the original L 1 problem to L 2 space, and subsequent application of ordinary L 2 spectral theory of op- erators. A library implementation in C of the al gorithm along with corresp onding error propagation is also prov ided. A n umerical example also il lustrates the method in operation. Key words. unfolding; conv ergence; error pr opagation; probability the ory; statistics; functional analysis; Riesz-Thorin theorem AMS sub ject cla ssi fications. 46E30; 46E27; 62H99 1. I n tro duc tion. In analysis of expe r imental data one commonly faces the prob- lem that the proba bilit y densit y function ( p df ) of a g iven ph ysical quantit y of interest is to b e measured, but some r andom physical pro cess, such as the intrinsic b ehavior of the measurement apparatus smears it. The reconstruction of the p ertinent unknown pdf of interest based on the obse r ved smeared pdf and on the known r esp onse function of the measur e men t pro cedure is called unfolding. More spe cifically , one of the mos t co mmon unfolding scenarios turning up in ex- per imental data analysis is the following. L e t x 7→ f ( x ) be the unknown p df which w e int end to r econstruct, ( y , x ) 7→ ρ ( y | x ) be the known resp onse function of the smearing effect, a nd we a ssume that y 7→ g ( y ) = R ρ ( y | x ) f ( x ) d x is the measured p df after the smearing effect, called folding. In practice, actually often only a statistical estimator of g c an be measure d. Or, putting it differently , g often contains an additiona l error term y 7→ e ( y ) or iginating from statistical co un ting and unaccounted sy stematic mea- surement distortions , in which ca se o ne has y 7→ g ( y ) = R ρ ( y | x ) f ( x ) d x + e ( y ) as the measured pdf estimator. The task of unfolding is to provide some clo s e estimate for x 7→ f ( x ), given y 7→ g ( y ) and ( y , x ) 7→ ρ ( y | x ) along with some estimate on y 7→ e ( y ), i.e. to solve the ab ov e linear integral eq uation. It is quite well known in the liter a ture that such a problem is numerically ill-po sed. The pr imary reaso n for this is Banac h’s closed graph theor e m: due to the pertinent theor e m a generic folding o pe r ator ma ps certain dis tant p dfs to clos e ones whos e differences a fter the folding ar e shadow ed by the c o ntribution of the meas urement err or term e . That quite well unders to o d phenomenon is summarize d e.g. in [1, 2 , 3, 4, 5, 6, 7 , 8]. The problematics of unfolding can a lso be formulated using a la nguage poss i- bly mo re familiar to sta tisticians [9, 1 0]. Let x 1 , . . . , x n be statistica l instances of a ∗ Wigner Research Cen tre f or Physics, Konk oly-Thege M. u. 29-33, Budapest, H-1121, Hungary . (e-mail: laszlo.andras@wigner.mta.h u). 1 2 Andr´ as L´ aszl´ o probability v ariable x , i.e . independent identically distributed rando m v ariables , ea ch having the same but unknown p df f . In the exp erimental s etting, mer ely the random v ariables y i = x i + ε x i ,i ( i = 1 , . . . , n ) ar e observed, i.e. the original x i ( i = 1 , . . . , n ) random v ariables corrupted by an x -dep endent, but otherwise indep endent identi- cally distributed e r ror v ar iable ε x , having a known x -dep endent p df ε x 7→ ρ ( ε x + x | x ) for each fix e d v alue o f x a s a condition. Given all these, the task of unfolding is to provide an estimator fo r the p df f of the undistorted probability v ariable x . In some real exp erimental s ituation, it als o happens that the individual obser ved samples y i = x i + ε x i ,i ( i = 1 , . . . , n ) a r e not published, only their p df estimator g is made av a ila ble, for instance b e c ause ther e is some correction pro cedur e on the p df level, e.g. for inefficiencies. Also, our mo del ( y , x ) 7→ ρ ( y | x ) for the r esp onse function mig h t b e systematically inaccura te, for whic h inaccuracy only a n uppe r b ound might be known. Therefore, often not the sample based observ a tio nal mo del, but rather the previously discussed p df estimator based o bserv ational mo del is more pr a ctical to handle. But whichev er wa y the problem is formulated — based on individual samples or on p dfs — the task remains to be ill-p osed. In order to ov erco me the ill-posedness of the unfolding problem, all the methods use r estrictions on the unknown p df, a nd in s o me sp ecia l ca ses prop erties o f the resp onse function can also b e use d to improv e the situation. F or instance , in the field of image or signal pro cessing, the shap e of the respo nse function is tra nslationally inv ar ia nt in an exact manner , i.e. for all x, y , z one has ρ ( y | x + z ) = ρ ( y − z | x ), and th us the unfolding r educes to the problematics of deconv o lution. In the language o f statistical samples, this would co rresp ond to the obser v ational model when y i = x i + ε i ( i = 1 , . . . , n ) ar e observed, with indep endent identically distributed random v ar iables ε i of a k nown distribution, not dep ending o n x . Due to the a pplicational imp ortanc e of the sp ecial case of deconv olution problems, that br anch has a whole strea m of literature [9, 10, 11, 1 2, 13, 1 4, 15, 16]. The statistical dec onv olution metho ds heavily rely on the applicability o f con volution theorem for the F our ier transformed pdfs, which is p ossible due to the translatio nal in v ariance of the shap e of the resp o nse function, i.e. relies o n the fact that the probability v ariables ε i ( i = 1 , . . . , n ) are independent identically distributed and a re independent from x . The ill-p osedness of the pr oblem, similarly to the case o f a n y generic unfolding metho d, is re gularized by finding and approximativ e so lutio n. The optimal approximation is co n trolled b y the application of the minimax pr inc iple : for a given estimate of the true deconvolv ed p df, a loss (pena lty) function is defined, a nd the minimum of the worst ca se exp e c ted loss is lo oked for a s a function of the r egulariza tion par a meters. It is worth to note that mo st of the adv a nced statistical deco nvolution metho ds ca n work on un binned samples , i.e. they do no t need an a priori histograming of the o bserved data. In Section 6 an illustrative numerical unfolding toy mo del a pplication is pr esented, which also tr ie s to clarify that in an exp erimental con text more gener al approaches than deconv olutio n are also needed in or der to handle re a l meas ur ement situations. Also in the ca se of g eneric — i.e. no n- deconv olutio n — unfolding problems a regular iz ation method must b e applied [1, 3, 4, 5, 6, 7, 8, 16, 17, 18, 19] and an a p- proximate solution of the folding in tegral equation within a r e duced set of a llowed p dfs is searched for. The a ppr oximation is c ontrolled by so me r egulariza tion para meters whose particular v alue brings in a certain degr ee of arbitra r iness to the unfolded pdf (approximation error), which is often difficult to quantify . There are basica lly three main widespread wa ys in the literature addressing the problem o f regular ization. (i) In c ertain data a nalysis problems a parametric ansatz for the unknown p df f is Con v ergence and error propagation r esults on a linear iterative unfoldi ng metho d 3 justified. In that case, one can construct the folded version of f by the resp onse function ρ n umer ically , and that can b e fitted to the observed folded p df g , for instance via a ma ximum likelihoo d metho d. Such method is used for instanc e in inclusive pa rticle identification in e x per imental high energ y particle physics (see for instance [20]). Due to the ill-po sedness of the unfolding problem, one may run into a situation in particular ca ses when the fit is ins e nsitive to some details of the parametrica lly given f . In other words: the log-likelihoo d function ( χ 2 ) may be flat in the direction of certain parameters of the ansa tz for f . (ii) Bin-by-bin fitting of the histogr amed f , such that when numerically folding it by ρ the result gets close to the observed folded p df g , e.g. in a max im um likeliho o d sense. This is very similar to a ppr oach (i) with every bin a mplitude of the histogramed f being a fit para meter. This method is basically equiv ale n t to the naive inv ers ion of the discretized folding op erator as a matrix . Due to the ill- po sedness of the unfolding problem, this is not satisfactory in itself. The usual pro cedure is to add so me artificial p enalty function to the log- likelihoo d function ( χ 2 ) in order to suppress the la rge lo cal gradients. If that is perfor med, the metho d can deliver meaningful answers, but the introduced sys tematic bias by the additional penalty function is difficult to quantify . In addition, similarly to the metho d (i), the fit can b e slightly insensitive to the details of f due to the ill- po sedness o f the pr oblem. The so called SVD metho ds [17] are implementations of this idea. (iii) There a re also iterative metho ds which intend to approximate the true pdf f , given the measure d folded p df g and the resp ons e function ρ . One of the most po pular and mos t pr omising metho ds is the method of conv ergent weights, also c a lled iterative Bay esian unfolding. It was firs t discovered and applied b y Richardson [2 1] and Lucy [22] for imag e pro cessing. Later it was re-discov e red and applied to tomography pr oblems by Shepp and V ar di [23], and b y Kondor [24]. The first serious mathematical s crutiny of the method was done b y M ¨ ulthei and Schorr [25, 26]. In the mid-9 0 s d’Ago stini re-discov ered and po pularized the algorithm in the high energy ph ys ics co mm unit y [18]. Recently , Zec h [19] stud- ied p ossible o ptimal iteration stopping cr iteria for the a lgorithm. One of the main adv an tages of the method of conv er gent weigh ts or Bayesian unfolding is, that it takes in to account the non-negativity and the unitness of the integral of the true pdf f in a n exact manner. F urthermore, if the measur ed folded p df g was a his togram, i.e. its v alues fluctuate a ccording to Poisson counting statis- tics, then the itera tive approximants to f hav e incr easing likelihoo d [25], i.e. the algorithm is a re a lization of a maximum-lik eliho o d approximation. Most unfo r - tunately , despite o f the resea rch efforts [25], there ar e no results stating that the metho d is conv er gent, a lthough num erical evidence suggests its co nv erg en t nature. Mo reov e r , ther e are no exact err or propag ation formulae av a ilable. In ca se of a consistent metho d the a ppr oximation error should conv e rge to zer o when the r egulariza tion para meters are relaxed. In case of an itera tiv e metho d, a n approximating sequence ( f N ) N ∈ N 0 to the unknown f is constructed a nd the regula r- ization parameter is mer e ly the iter ation s topping order N max , i.e. a thresho ld index in the approximating sequence. When a n iterative unfolding metho d is co nsistent, the approximation error , i.e. the dista nce of f N to the true unkno wn f m ust con verge to zero with increas ing num b er of iterations N . A lthough the ab ove consistency prop- erty is an obvious minimal require ment for any unfolding metho d, often this is not easy to show analytically . 4 Andr´ as L´ aszl´ o In a previous pap er [1] we pro po sed a linear iterative unfolding metho d, discussed its pros and cons in comparis on to o ther tec hniques, provided deta ile d description fr o m the pra ctical p oint of vie w for exp er imen talists, along with providing a set of r e le v ant application examples . In the present pap er we provide formal mathematica l pro ofs for the claims therein for the pr op osed unfolding metho d: (i) pro of o f consistency , i.e. that the approximation error conv er ges to zero with increasing n umber of iter a tions, (ii) explicit formula for the approximation error at finite iteration o rder, (iii) explicit formula for the propagated statistical errors on the unfolded p df at finite iteration order given the statistical er r ors of the mea sured folde d pdf, (iv) explicit formula for the propagated systematic e rrors on the unfolded pdf at finite itera tion order given the sys tematic erro rs of the measured folded pdf or of the resp ons e function. Because o f (ii)–(iv) the comp eting error terms b eco me calculable, and therefore these can b e used to define an optimal iteration stopping cr iterion. In addition, the per tinent error terms can b e determined at this optimum. The quantification of these are of utmost imp ortance w he n presenting unfolded exp e r imental results, and is gener ally an unresolved task for other wide ly us ed unfolding metho ds. The key mathematical ingredient of the pr o ofs are mapping o ur orig inally L 1 problem to the L 2 space using Riesz-Thorin theorem, and using sp ectral repre sentation of the op era tors therein. The actual iteration for m ula is formally motiv ated by a preco nditioned Neumann- Landweber-Richardson series , but these ar e no t automatically conv erg e nt in case of L 1 problems: our sp ecific pr econditioning makes the iteration conv ergent in the L 1 setting, given some quite generic conditions. The prop ose d metho d a ls o do es not rely on a n inherent discr etization of the p dfs: it do es work als o in the con tin uum limit or with a ny type of densit y estimators. 1 The obtained results can b e particularly interesting a s the prop osed metho d can be considered as the “linea rized” version of the metho d of convergen t w eig h ts or iterative Ba yesian unfolding [18, 19, 21, 2 2, 23, 24, 25, 26]. By understanding the conv er g ence conditions and error propa gation for the prop osed metho d, the studies of M ¨ ulthei and Sc ho r r [25] could even tually be co mpleted on the Bayesian itera tion, which would be a significant improvemen t in the field. The pap er is orga nized a s follows: in Section 2 the pro blem of unfolding is in tro- duced in a mathematically rigo rous wa y , and the basic prop erties of g eneric folding op erators ar e discussed. In Section 3 o ur prop ose d unfolding metho d is introduced and pr o ofs are provided for its ab ov e listed prop erties. In Section 4 we generalize a bit our results for the case o f probability mea sures which are not describ ed by pdfs. In Section 5 we restrict our results to the special case when the unfolding problem is discrete: this presen tation may be b etter understo o d by statisticians o r exp e rimental ph ysicists not sp ecialized in functional ana lysis. In Section 6 a concr e te numerical example is shown. Finally , in Sec tio n 7 we summarize. 2. Mathematical prop erties of folding op erators and the unfolding. In the text w e shall a bbreviate by p df the notion of pr obability density function, by cp df the no tion of conditional probability density function. W e shall rely on the usual terminology in functional ana ly sis a nd mea sure theory [2 7, 28]. As such, the no tion of Leb esgue a lmost everywhere or Lebe s gue almost every , sha ll b e abbreviated b y a.e. 1 Some unfolding methods rely on an inheren t discretization of pdf s in the problem, and use the assumed discretization as an implici t regularization. Our metho d do es not use such tric k. Con v ergence and error propagation r esults on a linear iterative unfoldi ng metho d 5 Let X a nd Y b e finite dimensional r e al vector spa ces equipp ed with the Leb esgue measure — unique up to a glo ba l p ositive normalizatio n factor. Let L 1 ( X ) and L 1 ( Y ) denote the Banach spaces of X → C and Y → C Leb es g ue integrable function equiv- alence cla s ses, re s pec tiv ely , whe r e the equiv alence o f functions is defined b y b eing a .e . equal. As usual in functiona l analys is texts, we shall call these function eq uiv a lence classes simply functions. W e shall also use the notion of essential b ound for such a function w hich is the smallest upp er bound v alid a.e. Definition 2.1. L et ρ : Y × X → R + 0 , ( y , x ) 7→ ρ ( y | x ) b e a cp df over t he pr o duct sp ac e Y × X , i.e. a non-ne gative L eb esgue me asur able function which satisfies ∀ x ∈ X : R ρ ( y | x ) d y = 1 . Then, the line ar op er ator A ρ : L 1 ( X ) → L 1 ( Y ) , ( x 7→ f ( x )) 7→  y 7→ Z ρ ( y | x ) f ( x ) d x  (2.1) is c al le d the fol ding op er ator by ρ , wher e the function ρ is c al le d the r esp onse function of the folding. Remark 2.1. The fol lowing b asic pr op erties of folding op er ators ar e dir e ct c on- se quenc es of the definition. (i) A p ossible u s ual gener alization of the notion of foldi ng op er ator is when ineffi- ciencies ar e also al low e d, i.e. the less r estrictive c ondition ∀ x ∈ X : R ρ ( y | x ) d y ≤ 1 is r e quir e d for the r esp onse fu n ction ρ of the folding op er ator A ρ . The re sults thr oughout the p ap er ar e also valid for that c ase. (ii) By F ubini’s the or em, a folding is a wel l define d line ar op er ator. (iii) It is also qu ite evident [2] that s u ch op er ator is c ontinuous in the L 1 op er a- tor norm (i.e. in pr ob abilistic sense), mor e over k A ρ k L 1 ( X ) → L 1 ( Y ) = 1 , while k A ρ k L 1 ( X ) → L 1 ( Y ) ≤ 1 whenever inefficiencies ar e al lowe d. It is s een that s uch a folding op erator A ρ is quite w ell b ehav ed: it is linear and is contin uous in the probabilistic sense, i.e. close p dfs are mapp e d to close pdfs in the L 1 sense [1]. A quite imp ortant class o f folding o pe rators are conv olutions , in which case the shap e o f the respo nse function is tra nslationally inv ariant. Definition 2.2. A folding op er ator A ρ is c al le d c onvolution whenever the r e- sp onse fun ction ρ is tr anslational ly invariant in the sense that Y = X and ∀ x, y , z ∈ X : ρ ( y | x + z ) = ρ ( y − z | x ) . Remark 2.2. The fol lowing pr op erties of c onvolution op er ators ar e wel l-known r esults [2, 29, 30]. (i) In c ase a folding op er ator A ρ is a c onvolution, the r esp onse funct ion ρ may b e expr esse d by the single p df η := ρ ( ·| 0) in the form ∀ x, y ∈ X : ρ ( y | x ) = η ( y − x ) . The alternative notation η ⋆ f := A ρ f is often use d in such c ase ( f ∈ L 1 ( X ) ). Note that c onvolution is c ommutative, i.e. one has η ⋆ f = f ⋆ η for al l η , f ∈ L 1 ( X ) . (ii) A c onvolution op er ator is not onto, and its image is not close d. (iii) The image of a c onvolution op er ator is dense if and only if the F ourier t r ansform of the c onvolver funct ion is nowher e zer o (Wiener’s appr oximation the or em). (iv) A c onvolution op er ator is one-to-one if and only if the F ourier tr ansform of the c onvolver function is a.e. nonzer o. (v) Conse quent ly, the inverse of a c onvolution op er ator, whenever exists, c annot b e c ontinuous. Thi s is b e c ause a c onvolution is everywher e define d on the clo se d 6 Andr´ as L´ aszl´ o set L 1 ( X ) , it is c ontinuous, and t her efor e it has close d gr aph by Banach’s close d gr aph the or em; but sinc e t he inverse op er ator’s domain is not close d, again by Banach’s close d gr aph t he or em, it c annot b e c ontinuous. Since the conv olution opera tors form a quite large ex ample cla s s of folding op er- ators, we can state that a g eneric folding op era tor’s inverse, whenever exists, is not contin uous. This finding is often referr ed to as : the in v ersion of a generic folding op erator is ill-po s ed. The a rgument go es as fo llows: we hav e a n unknown p df f , a known resp onse function ρ , and a measure d pdf g = A ρ f + e where e represents a small mea surement er r or term. Then, when o ne would set A − 1 ρ g = f + A − 1 ρ e , the err or term e contains mo des not in the domain of A − 1 ρ in which case A − 1 ρ e is not mea n- ingful, or when approximated numerically , this term shall diverge. Note that even if all modes of e were in the do main of A − 1 ρ , the smallness of A − 1 ρ e is not guaranteed even though e is small. The ill-p osedness of a gener ic unfolding pr oblem may a lso b e stated as: if f 1 and f 2 are distant pdfs, then g 1 := A ρ f 1 + e 1 and g 2 := A ρ f 2 + e 2 may b e close p dfs, i.e. we lose discrimination p ow er on pdfs after a folding [1 ]. The presented a rgument a lso warns us against relying solely o n the so ca lled closur e test when v er ifying an unfolding alg orithm: whenever some unfolding metho d gives some estimate ˆ f for the unknown pdf f , it is usually argued tha t A ρ ˆ f ≈ A ρ f confirms the v alidity of the estimate ˆ f . Clearly , in the light o f o ur o bs erv ations this is no t enough, as ˆ f may b e still far from f in the probabilis tic distance. Due to the ill- po sedness of the unfolding pro blem, any unfolding metho d needs to use some k ind of regulariza tio n: some assumption on the original (unkno wn) pdf, and a wa y to sea rch for an appr oximative s olution dep ending on some regula rization parameters . F urthermore, the conv erg e nce to the original pdf when relaxing these parameters can usually b e only achieved in some weak sense, not in the probabilistic norm o f L 1 ( X ). The most commonly applied unfolding strategies are summarized in [1, 3 , 4, 5, 6 , 7, 8, 16, 17, 1 8, 1 9]. 3. A linear iterativ e unfo lding metho d. Since the folding equatio n Eq.(2.1) is linear, it is quite natural to try applying some itera tive inv er s ion metho ds known in functional analysis, when approximating the true so lution f . One such self-sugges ting metho d is Neumann ser ie s [27, 2 8] which g ua rantees that whenever for a contin uous linear op erator A ov er a Banach space one has k I − A k < 1 ( I b eing the identit y o pe r - ator), then A − 1 = P ∞ n =0 ( I − A ) n where the c onv erg ence holds in the op erator no rm. That c o nv erg ence requirement, howev er , cannot be s a tisfied in c a se of a proba bilit y theory folding op erator b ecause for such an opera tor o ne has k I − A ρ k L 1 → L 1 = 2 as shown in [2]. The Richardson iteration, based o n similar requirements, do es not work for the same reaso n. An o ther evident choice would b e the La ndw ebe r iteration [31] known in the theory of F redholm integral eq uations [27, 2 8]. This a ssumes, in first place, that the unknown function f and the r esult of the folding g resides in the space of square in tegr able functions L 2 ( X ), further mo re that the respons e function ρ satis- fies the r e gularity condition R R   ρ ( y | x )   2 d y d x < ∞ . The latter r e gularity condition, unfortunately , is viola ted in case o f a gener ic cp df, on the contrary to the common belie f in the literature. 2 Despite of the fact that neither the Neumann series, nor the Richardson iteration, 2 It is eviden tly seen that this regularity condition do es not hold for an y conv olution. It is also seen at the price of some calculation that this situation canno t be repaired by a compactification mapping, i. e. if we map the support set of our p df s and resp onse function into a compact region of Y and X . Con v ergence and error propagation r esults on a linear iterative unfoldi ng metho d 7 nor the Landweber iteration can b e directly applied to an unfolding problem, they provide a p oss ible starting p oint. Motiv ated b y these algor ithms w e prop osed a linear iterative unfolding method for a probability theory context, i.e. for the L 1 space [1]. The section is contin ued by re c alling notions necessary for studying the p ertinent algorithm. In the followings we shall denote by L p ( X ) the Banach spa ce of X → C functions [27, 28] which are Lebe sgue in tegrable o f the p -th p ow er (1 ≤ p ≤ ∞ ). The sp ecial case L ∞ ( X ) for p = ∞ is defined as the Bana ch space of the X → C es sentially bo unded functions with their no rm b eing the ess ent ial b ound. Remark 3.1. The ar gu mentation in the fol lowings r elies on some known r esults. (i) The Riesz-Th orin the or em [32] states that if 1 ≤ q ≤ r ≤ ∞ and F ⊂ L q ( X ) ∩ L r ( X ) is a dense line ar subsp ac e in b oth L q ( X ) and L r ( X ) , furthermor e a line ar op er ator T : F → L q ( X ) ∩ L r ( X ) is b ounde d b oth in the L q ( X ) and L r ( X ) norm, then for al l q ≤ p ≤ r values F ⊂ L p ( X ) , it is dense in L p ( X ) , T [ F ] ⊂ L p ( X ) and T is b ounde d in the L p ( X ) norm. Thus, T is uniquely extendable as an L p ( X ) → L p ( X ) b ounde d line ar op er ator. In addition we have that k T k L p → L p ≤ max ( k T k L q → L q , k T k L r → L r ) (3.1) holds for the op er ator norms. (ii) An imp ortant c onse quenc e of t he Riesz-Thorin the or em is that a c onvolution op er ator η ⋆ ( · ) by a function η ∈ L 1 ( X ) is wel l define d and c ontinuous in L p ( X ) for al l 1 ≤ p ≤ ∞ and its op er ator n orm is b ounde d by k η k L 1 . This obviously holds for the p = 1 and p = ∞ c ase due to H¨ old er’s ine quality, and then it is implie d for al l 1 < p < ∞ as wel l by the p ert inen t t he or em. As a c onse quenc e, using the c ommu tativity of c onvolution, it also fol lows that if ϕ ∈ L p ( X ) and η ∈ L 1 ( X ) then ϕ ⋆ η ∈ L p ( X ) , i.e. p dfs may b e mapp e d into L p ( X ) via c onvolution by p dfs inte gr able on the p -t h p ower. (iii) We shal l use in the fol lowings t he sp e ctr al r epr esentation [28] of n ormal op er ators over c omplex sep ar able Hilb ert sp ac es. L et T b e a normal op er ator over the p ertinent sp ac e, i.e. a densely define d line ar op er ator with close d gr aph, satisfying T ∗ T = T T ∗ , ( · ) ∗ b eing the adjoi nt. Then ther e exists a unique pr oje ct ion value d me asur e P over the Bor el sets of the sp e ctrum set of T , Sp( T ) , such t hat T = Z λ ∈ Sp( T ) λ d P ( λ ) (3.2) holds, wher e t he int e gr al is define d in t he we ak s ense. That is, for al l elements f , g in the Hilb ert sp ac e one has a c omplex value d Bor el me asu r e h f , P ( · ) g i such that h f , T g i = Z λ ∈ Sp( T ) λ d h f , P ( λ ) g i . (3.3) In add ition, one has that if M is a p olynomia l, then M ( T ) is also normal op er- ator, furthermor e M ( T ) = Z λ ∈ Sp( T ) M ( λ ) d P ( λ ) (3.4) is satisfie d in t he same sense. 8 Andr´ as L´ aszl´ o Throughout the argumentation we will need the notio n of tra nspo se folding which is intro duced below. Definition 3.1. If A ρ is a folding op er ator such that the r esp onse function ρ ( ·| x ) is squar e-inte gr able for al l x ∈ X , then for al l k ∈ L 2 ( Y ) the expr ession A T ρ k :=  x 7→ Z k ( y ) ρ ( y | x ) d y  (3.5) is me aningful and defines a line ar map fr om L 2 ( Y ) to t he L eb esgue me asur able func- tions X → C . We c al l the line ar op er ator A T ρ the tr ansp ose foldi ng. 3.1. The i ter ative appr oximati on . Equipp ed with the listed no tio ns, we can int ro duce the following a ppr oximating sequence for solution of the unfolding problem. Let g = A ρ f b e our unfolding problem where f is to b e determined, with g and ρ being known. W e try to approximate the solution in the for m: K ρ := sup x ∈ X Z Z ρ ( y | z ) ρ ( y | x ) d y d z , f 0 := K − 1 ρ A T ρ g , f N +1 := f N +  f 0 − K − 1 ρ A T ρ A ρ f N  ( N ∈ N 0 ) . (3.6) This is, formally , the itera tive express io n for Neumann s e ries after pr econditioning by K − 1 ρ A T ρ , i.e. for the co mp os ite op era tor K − 1 ρ A T ρ A ρ . 3.2. Conver genc e c onditions . The following theo rem shows that under quite generic conditions the a pproximating sequence ( f N ) N ∈ N 0 in terms of Eq.(3 .6) is w ell- defined and conv erges to f whenever A ρ is one-to-o ne, and it converges to the clo s est po ssible function to f whenever A ρ is not one-to-o ne. Theorem 3. 2. (Conver genc e) L et A ρ b e a folding op er ator and assume that its r esp onse function ρ has the pr op erty that for al l x ∈ X the function ρ ( ·| x ) is squ ar e- inte gr able, furthermor e K ρ < ∞ . Assume that the u nknown p df f in the u nfolding pr oblem g = A ρ f is squar e-inte gr able. Then: (i) F or any c omp act set U ⊂ X : lim N →∞ 1 V olume( U ) Z x ∈ U  f − P Ker( A ρ ) f − f N  ( x ) d x = 0 , (3.7) wher e P Ker( A ρ ) is the L 2 ortho gonal pr oje ction onto the kernel s et of A ρ . (ii) We have t hat lim N →∞   f − P Ker( A ρ ) f − f N   L 2 = 0 (3.8) and the c onver genc e is monotone. Pr o of . It is seen that whenev er the regular it y condition ∀ x ∈ X : ρ ( ·| x ) ∈ L 2 ( Y ) holds, the function α : X × X → R + 0 , ( z , x ) 7→ α ( z , x ) := Z ρ ( y | z ) ρ ( y | x ) d y (3.9) Con v ergence and error propagation r esults on a linear iterative unfoldi ng metho d 9 is well defined. By constructio n, it is symmetric, i.e. ∀ z , x ∈ X : α ( z , x ) = α ( x, z ). F urthermor e, b ecause of K ρ < ∞ and symmetricity , sup x ∈ X Z z ∈ X α ( z , x ) d z = sup z ∈ X Z x ∈ X α ( z , x ) d x = K ρ < ∞ (3.10) holds. With this, we see that the op erator A T ρ A ρ is well defined a s L 1 ( X ) → L 1 ( X ) and is b ounded, its L 1 → L 1 op erator nor m b eing K ρ . This is b ecaus e for any f ∈ L 1 ( X )   A T ρ A ρ f   L 1 = Z     Z α ( z , x ) f ( x ) d x     d z ≤ Z Z α ( z , x ) | f ( x ) | d x d z = Z  Z α ( z , x ) d z  | f ( x ) | d x ≤ sup x ∈ X  Z z ∈ X α ( z , x ) d z  Z x ∈ X | f ( x ) | d x = K ρ k f k L 1 (3.11) due to of mono tonicity o f integration, F ubini’s theorem and H¨ older’s inequalit y . It is also seen that the op erator A T ρ A ρ is well defined as L ∞ ( X ) → L ∞ ( X ) and is bo unded, its L ∞ → L ∞ op erator norm b eing K ρ . Tha t is be cause fo r a ny f ∈ L ∞ ( X )   A T ρ A ρ f   L ∞ = sup z ∈ X     Z α ( z , x ) f ( x ) d x     ≤ sup z ∈ X Z α ( z , x ) | f ( x ) | d x ≤ sup z ∈ X  Z α ( z , x ) d x sup x ∈ X | f ( x ) |  = sup z ∈ X  Z x ∈ X α ( z , x ) d x  sup x ∈ X | f ( x ) | = K ρ k f k L ∞ (3.12) due to monotonicity of in teg ration and H¨ older’s inequa lity . Now, using Riesz-Thor in theor em we hav e that the op erator A T ρ A ρ is well-defined as L 2 ( X ) → L 2 ( X ) and is bounded, its L 2 → L 2 op erator norm b eing bound by K ρ . It is also ea s ily seen that for any f ∈ L 2 ( X ) one has  f , A T ρ A ρ f  = h A ρ f , A ρ f i ≥ 0, therefore it is a s elf adjoint and p ositive op erator in L 2 ( X ). Thus, its sp ectrum lie s within the interv al [0 , K ρ ]. F o r brevity , we introduce the notation A := K − 1 ρ A T ρ A ρ for the r e-normaliz ed comp osite fo lding op er ator. Let us observe that the iterative formula Eq.(3.6) may also be written in the series expansion for m f N = P N n =0 ( I − A ) n f 0 where we have that f 0 = Af , f b eing the unknown pdf. This form is particularly useful beca use then we s ee b y induction that P N n =0 ( I − A ) n A = I − ( I − A ) N +1 , i.e. we ha ve the explicit for m ula f − f N = ( I − A ) N +1 f for the residual term. By the observed prop er ties of A it is q uite evident that Sp( A ) ⊂ [0 , 1]. Thus, there exists a unique pro jection v alued measure P on the Borel sets of [0 , 1] such that A = Z λ ∈ [0 , 1] λ d P ( λ ) (3.13) in the weak sense. This implies that for any h ∈ L 2 ( X ) w e hav e h h, f − f N i = Z λ ∈ [0 , 1] (1 − λ ) N +1 d h h, P ( λ ) f i = Z λ ∈{ 0 } (1 − λ ) N +1 d h h , P ( λ ) f i + 10 Andr´ as L´ aszl´ o Z λ ∈ ]0 , 1] (1 − λ ) N +1 d h h, P ( λ ) f i . (3.14) Since R λ ∈{ 0 } (1 − λ ) N +1 d P ( λ ) = P Ker( A ρ ) for a ll N ∈ N 0 , we ar rive at the identit y  h, f − P Ker( A ρ ) f − f N  = Z λ ∈ ]0 , 1] (1 − λ ) N +1 d h h, P ( λ ) f i , (3.15) and by the monotonicity of in tegr ation    h, f − P Ker( A ρ ) f − f N    ≤ Z λ ∈ ]0 , 1] | 1 − λ | N +1 d |h h, P ( λ ) f i| (3.16) also ho lds, whe r e the symbol | · | when applied to complex v alued measures denotes v ariation, which is a nalogous to absolute v alue of complex v alued functions. The measure h h, P ( · ) f i o n [0 , 1] has finite v aria tion and the function sequence λ 7→ (1 − λ ) N +1 ( N ∈ N 0 ) is bounded indep endently o f N and conv er ges p oint wise to zero on ]0 , 1], ther efore b y Lebesg ue’s theor em of dominated co nv erg ence [2 7, 2 8] we hav e that the sequence of int egrals co n verges to z e r o. Th us, the first part of the theorem is proved b y setting h := 1 V olume( U ) χ U . The second part of the theorem is prov ed b y obs e rving that   f − P Ker( A ρ ) f − f N   2 L 2 = D f ,  ( I − A ) N +1 − P Ker( A ρ )  2 f E = Z λ ∈ ]0 , 1] (1 − λ ) 2 N +2 d h f , P ( λ ) f i (3.17) where h f , P ( · ) f i is a non-neg ative v a lue d finite meas ur e and the integrand which is also non-negative, has a bound indep endent of N , furthermo re it mono tonically decreases at each p oint to zero with increa sing N . Therefor e, by Lebe sgue’s theorem of dominated co nv erge nce a nd by the monotonicity of in tegration we ha ve that the per tinent e xpression conv erges to zer o with increasing N in a monotonically decr easing wa y . Remark 3.2. The fol lowing r emarks clarify the me aning of The or em 3.2 in the c ontext of a pr ob ability the ory setting. (i) F or any folding op er ator A ρ the r esp onse function may b e c onditione d to have the re gu larity c ondition ∀ x ∈ X : ρ ( ·| x ) ∈ L 2 ( X ) by c onvolving it with a squar e- inte gr able p df η whose F ourier t r ansform is nowher e vanishing. Namely, one c an solve t he mo difie d pr oblem η ⋆ g = A η⋆ρ f for f inste ad of the original form g = A ρ f . In that way, t he tr ansp ose foldi ng op er ator c an always b e made wel l- define d. When such a tr e atment is applie d, the iter ation mo difies as K η⋆ρ := sup x ∈ X Z Z ( η ⋆ ρ )( y | z ) ( η ⋆ ρ )( y | x ) d y d z , f 0 := K − 1 η⋆ρ A T η⋆ρ η ⋆ g , f N +1 := f N +  f 0 − K − 1 η⋆ρ A T η⋆ρ A η⋆ρ f N  ( N ∈ N 0 ) . (3.18) with the very same c onver genc e pr op erties as in the pr evious the or em. (ii) The r e gularity c ondition K ρ < ∞ (or K η⋆ρ < ∞ ) holds for a quite lar ge class of r esp onse fun ctions in a pr ob ability the ory c ontext. N amely, it is e asy to che ck Con v ergence and error propagation r esults on a linear iterative unfoldi ng metho d 11 that if A ρ is a c onvolution, t hen K ρ = 1 . F or other pr actic al c ases, this c ondition may b e che cke d numeric al ly as done in [1]. It is shown e.g. that for the r esp onse function of p article ener gy m e asur ement with a typic al c alorimeter devic e, one has K ρ ≈ 1 . 4 . Also the r esp onse fun ction of p article momentum me asur ement using b ending in magnetic field has the p ertinent r e gularity pr op erty. (iii) The r e gularity c ondition for the unknown p df f , i.e. that it has to b e squar e- inte gr able, ho lds for a quite generic class of p dfs. This is automatic for instanc e for any p df which is known to b e essent ial ly b ounde d. (iv) W hen the c onver genc e c ondition is satisfie d, it is se en that if A ρ is one- t o- one, the appr oximating functions ( f N ) N ∈ N 0 c onver ge to the original unknown p df f . When A ρ is not one-to-one, then ( f N ) N ∈ N 0 c onver ge to the closest p ossible function f − P Ker( A ρ ) f . (v) The me aning of c onver genc e r esult (i) in the c ontext of pr ob ability the ory is that the appr oximating functions ( f N ) N ∈ N 0 c onver ge in the sense that the pr ob ability of e ach c omp act set U ⊂ X is r estor e d t o the maximu m p ossible extent, but the r ate of c onver genc e might b e differ ent for differ ent sets . When the p dfs ar e me asur e d or mo dele d by histo gr ams, as usual in statistic al data pr o c essing, this me ans binwise c onver genc e of t he r estor e d histo gr ams, t he c onver genc e r ate b eing p ossibly differ ent for differ ent histo gr am bins. The mor e glob al c onver genc e r esult (ii) do es not have a dir e ct pr ob abili ty the ory interpr etation, but shal l have a r ole in the estimation of appr oximation err or at fi nite iter ation or der N . (vi) N ote t hat whenever our p dfs ar e m o dele d by hi sto gr ams, the op er ation of his- to gr am binning may also b e r e gar de d as p art of the folding op er ator as describ e d in [1], and t hus it is wise to include its effe ct in the folding op er ator A ρ . This might b e done for instanc e by mo deling the true (unknown) p df f and its iter ative appr oximates f N as histo gr ams binne d on much wider domai n with lar ger bin- ning density than the me asur e d p df g . In su ch appr oximation the folding op er ator A ρ may b e thought of as a r e al m atrix which is not squar e. 3.3. Estimation of appr oximation err or . The conv erg ence result means that the r esidual term (approximation err or) f − P Ker( A ρ ) f − f N of the a pproxi- mating sequence defined b y E q.(3.6) decreases to zero w ith increased iteration order N in the sense that it decr eases to zero when av erag ed over any compact set, i.e. we hav e bin w is e conv ergence in the lang uage of histograms . Howev er, it w ould b e very useful to quantify the approximation error a t finite N in or der to define so me sto p- ping cr iterion. T o achiev e this, we need to re call a result from the theory of pro jection v alued measures. Remark 3.3. L et P b e a pr oje ct ion value d me asur e of some sep ar able Hilb ert sp ac e over the Bor el sets of C . Then, whenever α and β ar e C → C me asur able functions, while h and f ar e elements of the Hilb ert sp ac e, one has     Z λ ∈ C α ( λ ) β ( λ ) d h h, P ( λ ) f i     ≤ s Z λ ∈ C | α ( λ ) | 2 d h h, P ( λ ) h i s Z λ ∈ C | β ( λ ) | 2 d h f , P ( λ ) f i (3.19) and the same ine quality also holds when α and β ar e inter change d [28]. This upp er b ound is in the analo gy of the Cauchy-Schwarz ine quality. The follo wing theor e m helps to quan tify the approximation erro r at a finite iter- 12 Andr´ as L´ aszl´ o ation o rder N ∈ N 0 . Theorem 3. 3. (App r oximation err or) T ake the iter ative solution for the unfold- ing pr oblem as in Eq.(3.6) and assume that the c onver genc e c onditions of The or em 3.2 hold. Then, the distanc e of an N -th iter ate f N fr om the closest p ossible fun ction to the t rue unfolde d p df f in the aver age over a c omp act set U ⊂ X has the fol lowing upp er b ounds: (i) One has     1 V olume( U ) Z x ∈ U  f − P Ker( A ρ ) f − f N  ( x ) d x     ≤ 1 p V olume( U )   f − P Ker( A ρ ) f − f N   L 2 . (3.20) (ii) S imilarly, when Ker( A ρ ) is not pr oje cte d out:     1 V olume( U ) Z x ∈ U ( f − f N ) ( x ) d x     ≤ 1 p V olume( U ) k f − f N k L 2 . (3.21) (iii) In addition,     1 V olume( U ) Z x ∈ U  f − P Ker( A ρ ) f − f N  ( x ) d x     ≤   f − P Ker( A ρ ) f   L 2   ξ U − P Ker( A ρ ) ξ U − ξ U,N   L 2 (3.22) is valid, wher e ξ U := 1 V olume( U ) χ U and ξ U,N is the N -th iter at ive appr oximation of ξ U in t erms of Eq.(3 .6). Namely, ξ U, 0 := K − 1 ρ A T ρ ξ U and ξ U,N + 1 := ξ U,N +  ξ U, 0 − K − 1 ρ A T ρ A ρ ξ U,N  . (iv) S imilarly, one has     1 V olume( U ) Z x ∈ U ( f − f N ) ( x ) d x     ≤ k f k L 2   ξ U − ξ U,N   L 2 (3.23) when Ker( A ρ ) is not pr oje cte d out. (v) The identity     1 V olume( U ) Z x ∈ U ( f − f N ) ( x ) d x     =     Z  ξ U − ξ U,N  ( x ) f ( x ) d x     (3.24) also holds. Pr o of . The s e are direct co nsequence of sp ectra l r epresentation of the oper ator A := K − 1 ρ A T ρ A ρ as in the pr o of o f The o rem 3 .2 fro m which    h, f − P Ker( A ρ ) f − f N    =      Z λ ∈ ]0 , 1] 1 (1 − λ ) N +1 d h h , P ( λ ) f i      Con v ergence and error propagation r esults on a linear iterative unfoldi ng metho d 13 ≤ s Z λ ∈ ]0 , 1] | 1 | 2 d h h , P ( λ ) h i s Z λ ∈ ]0 , 1] | (1 − λ ) N +1 | 2 d h f , P ( λ ) f i (3.25) and    h, f − P Ker( A ρ ) f − f N    =      Z λ ∈ ]0 , 1] 1 (1 − λ ) N +1 d h h , P ( λ ) f i      ≤ s Z λ ∈ ]0 , 1] | 1 | 2 d h f , P ( λ ) f i s Z λ ∈ ]0 , 1] | (1 − λ ) N +1 | 2 d h h, P ( λ ) h i (3.26) follows with arbitrary h ∈ L 2 ( X ). These may b e rewritten a s:    h, f − P Ker( A ρ ) f − f N    ≤   h − P Ker( A ρ ) h   L 2    ( I − A ) N +1 − P Ker( A ρ )  f   L 2 (3.27) and    h, f − P Ker( A ρ ) f − f N    ≤   f − P Ker( A ρ ) f   L 2    ( I − A ) N +1 − P Ker( A ρ )  h   L 2 . (3.28) Then by using the fact that  ( I − A ) N +1 − P Ker( A ρ )  f = f − P Ker( A ρ ) f − f N and  ( I − A ) N +1 − P Ker( A ρ )  h = h − P Ker( A ρ ) h − h N where h N is the itera tive appr oxi- mation o f h in ter ms of Eq.(3.6), w e see that    h, f − P Ker( A ρ ) f − f N    ≤   h − P Ker( A ρ ) h   L 2   f − P Ker( A ρ ) f − f N   L 2 (3.29) and    h, f − P Ker( A ρ ) f − f N    ≤   f − P Ker( A ρ ) f   L 2   h − P Ker( A ρ ) h − h N   L 2 . (3.30) By using   h − P Ker( A ρ ) h   L 2 ≤ k h k L 2 and setting h := 1 V olume( U ) χ U we hav e prov ed (i) a nd (iii). Quite obviously , the same a r gument can be re pea ted with the pro jection op erator P Ker( A ρ ) excluded from the equations, which proves (ii) and (iv). Poin t (v) is proved by o bs erving that for any h ∈ L 2 ( X ) one ha s h h, f − f N i =  h, ( I − A ) N +1 f  , since f − f N = ( I − A ) N +1 f . Due to the self-a djo intness of the comp osite folding oper ator A , o ne has that h h, f − f N i =  ( I − A ) N +1 h, f  . Since the identit y ( I − A ) N +1 h = h − h N holds, one a rrives a t h h, f − f N i = h h − h N , f i and th us |h h, f − f N i| = |h h − h N , f i| is v alid. Then, (v) is prov ed by simply substituting h := ξ U . Remark 3.4. The fol lowing r emarks clarify the us ability of The or em 3.3. (i) By st atemen t (i) and (ii) it is implie d t hat the r esidual err or aver age d over a c omp act set U ⊂ X sc ales as 1 √ V olume( U ) . In the language of histo gr ams it me ans that it sc ales as one p er squar e r o ot of the histo gr am bin size. 14 Andr´ as L´ aszl´ o (ii) The upp er b ou n ds (i), (iii) de cr e ase monotonic al ly to zer o with incr e asing N . The upp er b ounds ( ii) and (iv) de cr e ase monotonic al ly to the c orr esp onding limits 1 √ V olume( U )   P Ker( A ρ ) f   L 2 and k f k L 2   P Ker( A ρ ) ξ U   L 2 , r esp e ctively. Sinc e   ξ U − ξ U,N   L 2 is ful ly c alculable, upp er b ound (iv) c an b e us e d to test whether the inverse of A ρ exists, i.e . whether P Ker( A ρ ) = 0 holds, or if not, it may b e use d to quantify the c ontribution of the irr e c over able p art P Ker( A ρ ) f . (iii) Via sp e ctr al r epr esentation it is e asy to se e that k f N k L 2 c onver ges to the limit   f − P Ker( A ρ ) f   L 2 in a monotonic al ly incr e asing way, i.e. may b e use d t o ap- pr oximate this unkn own c o efficient fr om b elow. (iv) Aga in via using sp e ctr al r epr esentation, one c an se e that with fixe d N and M > N , the expr essions k f M − f N k L 2 and   ξ U,M − ξ U,N   L 2 tend to the c orr esp onding limits   f − P Ker( A ρ ) f − f N   L 2 and   ξ U − P Ker( A ρ ) ξ U − ξ U,N   L 2 with incr e asing M , r esp e ctively, in a monotonic al ly incr e asing way. Ther efor e, they c an b e u se d for appr oximation of these u nknown c o efficients fr om b elow. (v) As a c onse quenc e, the appr oximation err or may b e estimate d for a fix e d iter ation or der N in the fol lo wing way. F or any ε > 0 ther e exists an iter ation index thr eshold M ε,N > N such that for al l M > M ε,N     1 V olume( U ) Z x ∈ U  f − P Ker( A ρ ) f − f N  ( x ) d x     ≤ 1 p V olume( U ) (1 + ε ) k f M − f N k L 2 (3.31) is valid . In addition, a closer, U -dep endent estimate may b e c alculate d: for any ε > 0 ther e exists an iter ation index thr eshold M ε,U,N > N for which for al l M > M ε,U,N the upp er b oun d     1 V olume( U ) Z x ∈ U  f − P Ker( A ρ ) f − f N  ( x ) d x     ≤ (1 + ε ) k f M k L 2   ξ U,M − ξ U,N   L 2 (3.32) holds. Alternative ly,     1 V olume( U ) Z x ∈ U ( f − f N ) ( x ) d x     ≤ (1 + ε ) k f M k L 2   ξ U − ξ U,N   L 2 (3.33) is also vali d whenever A ρ is known t o b e one-to-one, which expr ession is sligh tly che ap er to c alculate. (vi) The identity (v) is p articularly useful. In or der to c onstructively evaluate it, one ne e ds t o use the fact that the se quenc e ( f N ) N ∈ N 0 c onver ges to f − P Ker( A ρ ) f in the L 2 sense. Thus, whenever A ρ is invertible, it c onver ges t o f in the L 2 sense. In that c ase, the identity (v) c an b e r ewritten as     1 V olume( U ) Z x ∈ U ( f − f N ) ( x ) d x     = lim M →∞     Z  ξ U − ξ U,N  ( x ) f M ( x ) d x     . (3.34) T e chnic al ly, the right side of this identity may b e appr oximate d by the inte gr al   R  ξ U − ξ U,N  ( x ) f M ( x ) d x   with lar ge enough M . F or lar ge N , even M := N may b e use d for evaluation of this expr ession. Con v ergence and error propagation r esults on a linear iterative unfoldi ng metho d 15 3.4. Estimation of stati stic al err or . Armed with the approximation er ror estimates of Theor e m 3 .3 one can construct p enalty functions which define o ptimal stopping criterion of the iteration, and o ne can q ua nt ify the err o r of the approximation at finite iteratio n order whic h decreas es with increasing itera tion or der. In practice, howev er, the unfolding pr oblem g = A ρ f + e may also contain a small statistical error term e whose exp ectation v alue is zero, its ex act v alue is unknown, but an es tima te to the b ehavior of the random v ariable e ( x ) for each x ∈ X is av a ila ble. Normally , the statistical co v ariance matrix Cov( e ) is kno wn a long with the mea sured p df g and the known resp onse function ρ . If, for instance, g was a result o f a mea surement in the form of a histogr am, then Cov( e ) = Cov( g ) will b e nothing but the diago nal matr ix comp osed o f the histogr am bin entries. The question naturally ar ises: how can one quantify the propa gated statistical er ror of the N -th iterative approximation of f , i.e . of f N . In the followings we show an exact formula for the ca se when g is measured as a histogra m, i.e. when g can b e reg arded as an n -comp onent vector o f real probability v ariables with known cov ariance. Remark 3. 5. The fol lo wing simple facts in pr ob ability the ory wil l aid the ar gu- mentation of the statist ic al err or pr op agation. (i) If v is a n -c omp onent ve ctor of r e al pr ob ability variables, t hen its c ovarianc e Cov( v ) is an n × n re al symmetric p ositive matrix . Ther efor e, for any m ≥ n ther e exists (not ne c essarily uniquely) a r e al n × m matrix Err( v ) such t hat Cov( v ) = E rr( v )E rr( v ) T (3.35) holds, the symb ol ( · ) T denoting matrix t r ansp ose. Inde e d, b e c ause of r e alness, symmetricity and p ositivity of Cov( v ) ther e exists un iquely a r e al s ymm et ric p os- itive n × n matrix satisfyi ng Eq.(3.35), the squar e-r o ot of Cov( v ) , and ther efor e Err( v ) = p Cov( v ) may b e chosen. Then, this may b e ext ende d t o b e n × m ( m ≥ n ) by zer os without affe cting Eq.(3.35). In some sp e cial c ases, however, ther e also ex ists such n × m ( m ≤ n ) r e al matrix Err( v ) such that Eq.(3.35) stil l holds. (ii) If v is an n - c omp onent ve ctor of r e al pr ob ability varia bles and M is a r e al m × n matrix, then the standar d err or pr op agation formula Cov( M v ) = M Cov( v ) M T (3.36) holds. (iii) As a c onse quenc e of t he pr evious observations, one c an expr ess the st andar d err or pr op agation formula also in the form Err( M v ) = M Err( v ) (3.37) wher e Err( v ) is any r e al n × n matrix satisfying Eq.(3.35), and the r esulting r e al m × n matrix Er r( M v ) shal l ob ey Err( M v )E rr( M v ) T = Cov( M v ) . (iv) In our un folding pr oblem the N -th iter ative appr oximation of f , i.e. f N , may b e expr esse d in the form f N = N X n =0  I − K − 1 ρ A T ρ A ρ  n ! K − 1 ρ A T ρ g (3.38) which is manifestly line ar in the me asur e d p df g . This fact may b e use d in or der to c onstruct statistic al err or pr op agation formula in terms of the pr evious observations. 16 Andr´ as L´ aszl´ o Armed with these equalities, we are r eady to state the statistical error pro pagation formula for o ur unfolding metho d. Theorem 3.4. (Statistic al err or) T ake t he iter ative solution for the unfolding pr oblem as in Eq.(3.6) and assum e that the c onver genc e c onditions of The or em 3.2 hold. L et Cov( g ) b e the n × n statistic al c ovaria nc e matrix of t he me asur e d p df g , wher e g is given in the form of an n - bin histo gr am. If f and f N is mo dele d as an m -bin histo gr am, then the m × m c ovarianc e matrix of f N , Cov( f N ) , may b e obtaine d by the fol lowing iter ative formula along with f N : E 0 := K − 1 ρ A T ρ Err( g ) , E N +1 := E N +  E 0 − K − 1 ρ A T ρ A ρ E N  ( N ∈ N 0 ) (3.39 ) wher e E N E N T = Cov( f N ) holds for e ach N . Pr o of . This is a simple co nsequence of the linea rity of the unfolding metho d Eq.(3.6), and of Remar k 3.5 (iv) c o mbin ed with (iii) and then re-e xpressing it via iterative form. Remark 3.6. T he fol lowing re marks add some pie c es of information ab out the pr actic al usage of the statistic al err or pr op agation t he or em. (i) If the me asur e d p df g is a histo gr am, t hen e ach c omp onent ob eys Poisson dis- tribution, and thus Cov( g ) = diag ( g ) . F urthermor e a r e al n × n matrix E rr( g ) , satisfying Er r( g )Err( g ) T = Cov( g ) , may b e c onstructe d by taking the c omp onen- twise squar e-r o ot of dia g( g ) . This c an dir e ctly b e use d in c alculation of E 0 in The or em 3.4. (ii) If f is mo dele d as a histo gr am with m bins then for e ach iter ation or der N the r e al matrix E N is of m × n typ e, i.e. Cov( f N ) = E N E N T shal l b e of m × m typ e. (iii) The squar e-r o ot of the diagonal elements of the c ovaria nc e matrix Cov( f N ) give the exact st atistic al err ors of f N which then may b e use d to define an iter ation stopping criterion, for instanc e the sum of the statistic al err ors may b e r e quir e d to b e under a pr e define d thr eshold. One should n ot for get, however, that this un- folding metho d —just as any other unfolding metho d— int r o duc es pr etty st r ong c orr elations and thus the non-diagonal elements of Cov( f N ) also play an im- p ortant r ole when describing the char acteristics of t he statistic al fl uctuations of f N . 3.5. Estimation of systematic err or . It was shown that in ca se of a statisti- cal unfolding problem o f the form g = A ρ f + e the qua nt ification of the tw o comp eting error terms is p ossible: close upp er bo und to the conv erg ent approximation err or term was given, whe r eas exact er ror pr opagatio n formula to the divergent sta tistical error term w a s shown. A combination, such as the sum of these terms, ma y be considered as p enalty function and the iter ation may be stopp ed when the penalty function is minimal, furthermore these terms may b e qua n tified at this optimal iteration order with the shown fo rmulae. In practice, howev er, one often faces the problem o f sys- tematic error s whenever the measured pdf contains so me systematic distortion not accounted for in our mo del of resp onse function, o r equiv a len tly , whenever o ur mo de l of resp ons e function is slightly inaccurate. F o rmally we ma y write in s uch case that the actually meas ur ed pdf is g + δ g = A ( ρ + δρ ) f + e wher e δ ρ is the devia tio n of the true r esp onse function ρ + δ ρ fro m o ur mo del r esp onse function ρ . Since by definition Con v ergence and error propagation r esults on a linear iterative unfoldi ng metho d 17 g = A ρ f + e would b e the mea s ured pdf in the absence of δ ρ , one arrives at the re la tion δ g = A δρ f b etw een δ g and δ ρ . When applying the iterativ e solution Eq.(3 .6) using ρ to the a ctually measured p df g + δ g , the N -th iterative es timate o f the true unknown pdf f sha ll contain a pro pagated contribution δ f N which needs to b e quantified. In exp erimental practice, the systematic erro r of the actually mea sured pdf is given in terms of some close upper e s timate s g for whic h | δ g | ≤ sg holds , or similar ly as a close upper estimate sρ for which | δ ρ | ≤ sρ is v alid. Our a im is to provide some upp er estimate to | δ f N | based on sg o r sρ , for a n y giv en iteration order N ∈ N 0 . F or this, let us int ro duce the following normaliz ation factors C ρ,sg := s Z  K − 1 ρ A T ρ sg  2 ( x ) d x (3.40) if the systematic errors a re known in terms of sg , and D ρ,sρ := s sup x ∈ X Z Z  K − 1 ρ A T ρ sρ  ( y | z )  K − 1 ρ A T ρ sρ  ( y | x ) d y d z (3.41) if the systematic errors a re known in terms of sρ . Theorem 3. 5. (Systematic err or) T ake the iter ative solution for the unfolding pr oblem as in Eq.(3.6) and assum e that the c onditions of c onver genc e hold. Then, the fol lowing upp er b ounds ar e valid on the systematic devia tion δ f N of t he N -th iter ative appr oximation of f , f N . (i) F or the aver age of δ f N over any c omp act set U ⊂ X one has     1 V olume( U ) Z x ∈ U δ f N ( x ) d x     ≤   Ξ U,N   L 2 C ρ,sg (3.42) wher e ξ U := 1 V olume( U ) χ U and Ξ U,N is define d by the it er ation Ξ U, 0 := ξ U , Ξ U,N + 1 = Ξ U,N +  Ξ U, 0 − K − 1 ρ A T ρ A ρ Ξ U,N  ( N ∈ N 0 ) . (3.43) (ii) Alternatively,     1 V olume( U ) Z x ∈ U δ f N ( x ) d x     ≤   Ξ U,N   L 2 D ρ,sρ k f k L 2 . (3.44) (iii) The upp er b ound     1 V olume( U ) Z x ∈ U δ f N ( x ) d x     ≤ Z   K − 1 ρ A ρ Ξ U,N   ( y ) sg ( y ) d y (3.45) also holds. (iv) Alternatively,     1 V olume( U ) Z x ∈ U δ f N ( x ) d x     ≤ Z  K − 1 ρ A T sρ   A ρ Ξ U,N    ( x ) | f | ( x ) d x. (3.46) 18 Andr´ as L´ aszl´ o (v) Mor e sp e cific al ly,     1 V olume( U ) Z x ∈ U δ f N ( x ) d x     ≤ k f k L 1 sup x ∈ X  K − 1 ρ A T sρ   A ρ Ξ U,N    ( x ) . (3.47) Her e, whenever f was a p df, then k f k L 1 = 1 aut omatic al ly holds. Pr o of . W e b egin the pr o of by reca lling that b eca us e of Eq.(3.38) and its mo dified form f N + δ f N = N X n =0  I − K − 1 ρ A T ρ A ρ  n ! K − 1 ρ A T ρ ( g + δ g ) (3.48) in presence of systematic distor tio ns, we hav e that δ f N = N X n =0  I − K − 1 ρ A T ρ A ρ  n ! K − 1 ρ A T ρ δ g (3.49) holds, wher e δ g is the unaccounted systematic distortion of the measured p df, whic h is rela ted to the unaccounted systematic distor tion of the resp onse function δ ρ by δ g = A δρ f . Again, w e use the notation A := K − 1 ρ A T ρ A ρ and use its sp ectral r epresentation a s in the pro of of Theorem 3.2. With this, one has h h, δ f N i = Z λ ∈ [0 , 1] 1 N X n =0 (1 − λ ) n d  h, P ( λ ) K − 1 ρ A T ρ δ g  (3.50) for a ny h ∈ L 2 ( X ). F rom that, using Remark 3.3 we a rrive a t |h h, δ f N i| ≤ v u u t Z λ ∈ [0 , 1]      N X n =0 (1 − λ ) n      2 d h h, P ( λ ) h i s Z λ ∈ [0 , 1] | 1 | 2 d  K − 1 ρ A T ρ δ g , P ( λ ) K − 1 ρ A T ρ δ g  =      N X n =0 ( I − A ) n h      L 2   K − 1 ρ A T ρ δ g   L 2 = k H N k L 2   K − 1 ρ A T ρ δ g   L 2 (3.51) where the notation H N := P N n =0 ( I − A ) n h was introduced. It is quite evident that H N may be calculated using the iterative form H 0 := h, H N +1 := H N + ( H 0 − AH N ) ( N ∈ N 0 ) (3.52) in order to ev aluate k H N k L 2 . An upp er b ound for   K − 1 ρ A T ρ δ g   L 2 may b e rea dily constructed using the inequal- it y   K − 1 ρ A T ρ δ g   2 L 2 ≤   K − 1 ρ A T ρ sg   2 L 2 = C 2 ρ,sg (3.53) Con v ergence and error propagation r esults on a linear iterative unfoldi ng metho d 19 which is seen to hold using F ubini’s theor em a nd monotonicity of integration, where non-negativity of ρ and sg is ta citly assumed as previously . Now, by setting h := ξ U , part (i) of the theorem is prov ed. Part (ii) may be proved by using the r e la tion δ g = A δρ f which implies that   K − 1 ρ A T ρ δ g   2 L 2 =   K − 1 ρ A T ρ A δρ f   2 L 2 ≤   K − 1 ρ A T ρ A sρ f   2 L 2 (3.54) again b ecause of F ubini’s theorem a nd monotonicity of integration, wher e one should note that ρ , sρ and f is a ssumed to be non-negative as previously . Then, we see that   K − 1 ρ A T ρ A sρ f   2 L 2 =  f , K − 1 ρ A T sρ A ρ K − 1 ρ A T ρ A sρ f  ≤ k f k 2 L 2   K − 1 ρ A T sρ A ρ K − 1 ρ A T ρ A sρ   L 2 → L 2 (3.55) holds. Realizing tha t the L 2 op erator nor m of the p ositive s elf adjoint op era to r K − 1 ρ A T sρ A ρ K − 1 ρ A T ρ A sρ can b e b ound via the Riesz-Thorin theorem similarly as for K − 1 ρ A T ρ A ρ in pro of of Theorem 3.2 we conclude that the p ertinent oper ator norm is bo und by D 2 ρ,sρ . Part (iii) is pro ved by using the self-a djo intness of A and that the a djoin t o f A T ρ is A ρ . Due to that, for any h ∈ L 2 ( X ), o ne has h h, δ f N i =  K − 1 ρ A ρ H N , δ g  (3.56) with the prev ious notations. Due to the monotonicity o f integration, then the identit y |h h, δ f N i| ≤    K − 1 ρ A ρ H N   , sg  is o btained, since | δ g | ≤ sg ho lds . When setting h := ξ U and co r resp ondingly H N := Ξ U,N , this is nothing but (iii). Part (iv) is pr ov ed by using Eq.(3.56) and δ g = A δρ f , furthermore that the a djoint of A δρ is A T δρ . With that, one has h h, δ f N i = D K − 1 ρ A T δρ A ρ H N , f E . Using | δ ρ | ≤ sρ a nd the monotonicity o f in tegration, one a rrives at |h h, δ f N i| ≤  K − 1 ρ A T sρ | A ρ H N | , | f |  . The upper bound (iv) is obtained, whenev er h := ξ U and H N := Ξ U,N is set. Part (v) is a cons e q uence of (iv), applying H¨ older’s inequalit y , in addition. Remark 3. 7. The fol lo wing r emarks pr ovide some mor e explanation ab out the usability of the ab ove r esults on u pp er estimation of the systematic err ors of f N orig- inating fr om the systematic err ors of the me asur e d p df g or of the r esp onse function ρ . (i) F or any given iter ation or der N ∈ N 0 the upp er estimate (i) of The or em 3.5 b ounds the systematic deviation of the unfolde d p df f N aver age d over any c om- p act set, in a manifestly c alculable way if the systematic err ors of the m e asur e d p df ar e given. In t he language of histo gr ams this me ans that bin-by-bin upp er b ound to t he systematic err or of the u nfolde d p df is ava ilable in terms of the systematic err or of the me asur e d p df. (ii) The upp er estimate (ii) of The or em 3. 5 pr ovide s an alternative b ound for the same quantity for t he c ase when the systematic err ors ar e kn own in terms of the systematic err or of the r esp onse fun ction. This, similarly to The or em 3.3 (iv), ne e ds the unknown value of k f k L 2 which may b e cir cumvente d in the analo gy of R emark 3.4 (v). Namely, for any ε > 0 ther e exists an iter ation index thr eshold M ε ∈ N 0 such that for al l M > M ε one has     1 V olume( U ) Z x ∈ U δ f N ( x ) d x     20 Andr´ as L´ aszl´ o ≤   Ξ U,N   L 2 D ρ,sρ (1 + ε ) k f M k L 2 (3.57) whenever A ρ is one-to-one, b e c ause then in the light of R emark 3.4 (iii), k f M k L 2 as a function of M c onver ges to k f k L 2 in a monotonic al l y incr e asing way. (iii) The right side of Eq.(3.46) may b e appr ox imate d by Z  K − 1 ρ A T sρ   A ρ Ξ U,N    ( x ) | f M | ( x ) d x (3.58) due to | f | = f and b e c au s e f M c onver ges to f as M → ∞ in the L 2 sense, whenever A ρ is invertible. F or lar ge N , the appr oximative formula with M := N may b e use d. 4. Generalization to the context of probabil it y measures. In rar e case s one faces the pro ble m that the distr ibutions in question cannot b e describ ed in terms of pdfs, o nly in terms of pro ba bilit y measur es instead. 3 Such practical cases may arise for instanc e when the folding op erator r epresents kinematics of par ticle decays [2]. There fo re, it is interesting to ask the question whether the iterative unfolding metho d E q.(3.6) applies in the framework of probability measures. Remark 4.1. L et us r e c al l some notions in me asur e the ory [33]. (i) A c omplex me asu r e F over X is a c omplex value d σ -additive set function on the Bor el σ -algebr a of the subsets of X . The variation of the c omplex me asu re F is the non- ne gative value d me asur e | F | define d by the r e quir ement: for a Bor el set E the value of | F | ( E ) is t he supr emum of P K k =0 | F ( E k ) | for any splitting E 1 , . . . , E K of E , i.e. for al l such fi nite system of disj oint Bor el sets E 1 , . . . , E K whose union totals u p to E . The me asur es with finite variatio n, i.e. which have | F | ( X ) < ∞ , form a Banach sp ac e with the norm b eing k F k := | F | ( X ) . We shal l denote this sp ac e by M ( X ) . (ii) A pr ob abili ty me asur e F on X is a non-ne gative m e asur e on the Bor el σ -algebr a of X with t he r e quir ement F ( X ) = 1 . Thus, quite natur al ly, a pr ob ability me a- sur e on X r esides in M ( X ) . W e c o nt inu e with the formal definition of folding op erators whose r e sp o nse func- tion is describ ed b y a measure r ather than a function. Definition 4. 1. A mapping Q : X → M ( Y ) , x 7→ Q ( ·| x ) is c al le d folding me asur e if for every x ∈ X the me asu re Q ( ·| x ) is a non-ne gative me asur e on Y with Q ( Y | x ) = 1 ( i.e. Q ( ·| x ) is a pr ob ability me asur e for al l x ∈ X ), and for every Bor el set E in Y the funct ion x 7→ Q ( E | x ) is me asur able. Remark 4.2. A p ossible usual gener alization is when inefficiencies ar e also al- lowe d, i.e. the less r est rictive c ondition Q ( Y | x ) ≤ 1 is re quir e d for al l x ∈ X . The r esults thr ou ghout this p ap er also holds for that c ase. It follows fro m the definition that a folding measur e Q ma y b e viewed as a con- ditional pr obability meas ure ov er the pro duct space Y × X . Quite evidently , if ρ is a resp onse function then Q ρ ( E | x ) := R y ∈ E ρ ( y | x ) d y defines a folding measure. Definition 4.2. L et Q b e a folding me asur e. Then, the line ar map A Q : M ( X ) → M ( Y ) , F 7→  Z Q ( ·| x ) d F ( x )  (4.1) 3 A measure is a s et function of the subsets of the probabilit y base space. A common example of measures is the Dirac delta. Con v ergence and error propagation r esults on a linear iterative unfoldi ng metho d 21 is c al le d the folding op er ator by Q . Remark 4.3. The r emarks b elow fol lo w fr om the definition [2]. (i) A folding op er ator A Q is wel l-define d as for al l p oints x ∈ X and Bor el sets E of Y the ine quality Q ( E | x ) ≤ 1 holds, thus the function x 7→ Q ( E | x ) is inte gr able by any me asur e with finite variation. (ii) The monotonicity of int e gr ation implies that a folding op er ator is c ontinuous and k A Q k M ( X ) → M ( Y ) = 1 , just as in the c ase of L 1 the ory. If inefficiencies ar e al lowe d, k A Q k M ( X ) → M ( Y ) ≤ 1 holds. (iii) The fold ing op er ators define d by foldi ng me asur es is a gener alization of the fold- ing op er ators by r esp onse functions. As in the L 1 theory , the co nvolutions repr esent an impo rtant class of folding op erators . Definition 4.3. A folding op er ator A Q is c al le d a c onvolut ion if its folding me asur e is tr anslational ly invariant in t he sens e that Y = X and for al l x, z ∈ X and Bor el sets E one has Q ( E | x + z ) = Q ( E − z | x ) . Remark 4.4. The fol lowings ar e imp ortant pr op erties of c onvolution op er ators with me asur es [2]. (i) Whenever the fold ing op er ator A Q by a folding me asur e Q is a c onvolution, Q may b e expr esse d by a single pr ob ability me asur e R := Q ( ·| 0) in the form of Q ( E | x ) = R ( E − x ) for al l x ∈ X and Bor el set E . The alternative notation R ⋆ F := A Q F is often use d in such c ase ( F ∈ M ( X ) ). Note that t he c onvolution is c ommutative, i.e. one has R ⋆ F = F ⋆ R for al l R, F ∈ M ( X ) . (ii) F ourier tr ansformation of me asur es in M ( X ) c an also b e define d and has similar pr op erties as in the L 1 c ase, exc ept that the F ourier tr ansform functions do not de c ay at infin ity, i.e. the Riemann-L eb esgue lemma do es not hold. On ly the b ounde dness of F ourier tr ansforms ar e guar ante e d. (iii) Pr op erties of c onvolut ion op er ators ar e similarly r elate d to the F ourier t ra nsform of the underlying pr ob ability me asure, as in t he L 1 the ory. F or instanc e, a c onvolution op er ator is one-to-one if and only if its F ourier tra nsform is nonzer o almost everywher e. (iv) It is e asily se en t hat if ϕ ∈ L 1 ( X ) and F ∈ M ( X ) , t hen ϕ ⋆ F is a function in L 1 ( X ) . Combining this with R emark 3.1 (ii) we c onclude that if ϕ ∈ L p ( X ) ∩ L 1 ( X ) then for al l F ∈ M ( X ) the fun ction ϕ ⋆ F ∈ L p ( X ) ∩ L 1 ( X ) ( 1 ≤ p ≤ ∞ ). That is, pr ob ability me asur es may b e mapp e d into p dfs in L p ( X ) via c onvolution by a p df inte gr able on the p -t h p ower. Armed with the intro duce d no tions w e may try to ask the ques tion whether one can g eneralize the results in Section 3 to probability measures. Remark 4. 5. The fol lowing r esults ar e gener alization of the re sults in Se ction 3 for pr ob ability me asur es. (i) The naive applic ation of Neu mann series fails to work similarly as in the L 1 fr amework. This is b e c ause as pr ove d in [2] one has k I − A Q k M ( X ) → M ( X ) = 2 whenever Q ( { y }| y ) = 0 for any p oint y — which is the generic c ase. (ii) The c onver genc e and err or pr op agation r esu lts of The or em 3.2, 3.3, 3.4, 3.5 m ay b e gener alize d in a similar mann er to R emark 3.2 (i)-(ii). Namely, inste ad of t he original pr oblem G = A Q F one may c onsider the mo difie d version η ⋆ G = A η⋆Q F to b e solve d for F , wher e η is a squar e-inte gr able p df whose F ourier tr ansform is 22 Andr´ as L´ aszl´ o nowher e vanishing. In t his c ase, the folding op er ator A Q is mapp e d to b e a folding op er ator by a r esp onse funct ion A η⋆Q inste ad, as we have η ⋆ A Q F = A η⋆Q F for any F ∈ M ( X ) . F urthermor e, for e ach x ∈ X the p df η ⋆ Q ( ·| x ) is squar e- inte gr able. Then, the iter ation K η⋆Q := sup x ∈ X Z Z ( η ⋆ Q )( y | z ) ( η ⋆ Q )( y | x ) d y d µ ( z ) , F 0 := K − 1 η⋆Q A T η⋆Q η ⋆ G, F N +1 := F N +  F 0 − K − 1 η⋆Q A T η⋆Q A η⋆Q F N  ( N ∈ N 0 ) . (4.2) ob eys the very s ame c onver genc e and err or pr op agation pr op erties as s t ate d in The or em 3.2, 3.3, 3.4, 3.5, whenever K η⋆Q < ∞ and when the unkn own pr ob abil- ity m e asur e F c orr esp onds to a squ ar e-inte gr able p df with r esp e ct t o some a priori given non-ne gative value d me asur e µ over X . This latter r e quir ement me ans t hat F = f µ ne e ds to b e satisfie d with some non- ne gative me asur e µ over X and with some µ -me asur able function f : X → R + 0 for which R | f | 2 ( x ) d µ ( x ) < ∞ ne e ds to hold. The previous o bs erv ations co nclude tha t whenever the unknown distribution is describ ed by a p df which is square-integrable with resp e ct to some v o lume measure, then the folding measure ma y b e conditioned in a wa y that the iterative unfolding Eq.(3.6) applies to it. 5. The discrete case. F or better illustration, we sp ecialize our r esults in Sec- tion 3 and 4 to the case when the unknown probability distribution along with the resp onse function and the measure d probability distribution is discrete. In that case the mea sured p df g and the unknown pdf f is a finite dimensional vector of non- negative entries, and the folding op e r ator A ρ is simply a finite dimensio nal matrix with non-negative entries as well. Our equation to solve is then the matrix equation g = A ρ f fo r f , or in ca s e o f pres ence of measurement err ors e , the matrix equation g = A ρ f + e . W e also assume that the entries of f , A ρ and g are pro babilities, i.e. they are normalized such that P i g i = 1, P i f i = 1 and P j ( A ρ ) j i = 1, or P j ( A ρ ) j i ≤ 1 in case of presence of inefficiencies. Then, the iterative solutio n of our discrete unfolding problem reads as K ρ := max i X j X k ( A ρ ) j i ( A ρ ) j k , f 0 := K − 1 ρ A T ρ g , f N +1 := f N +  f 0 − K − 1 ρ A T ρ A ρ f N  ( N ∈ N 0 ) . (5.1) where A T ρ is the matr ix tr a nsp ose of A ρ . A s imple observ ation shows tha t Eq.(5.1) is nothing but an iterative fo rm of K ρ := max i X j X k ( A ρ ) j i ( A ρ ) j k , f N := N X n =0 ( I − K − 1 ρ A T ρ A ρ ) n K − 1 ρ A T ρ g ( N ∈ N 0 ) . (5.2) Con v ergence and error propagation r esults on a linear iterative unfoldi ng metho d 23 I denoting the iden tit y matrix. Due to the r esults of Section 3 and 4, the conv er g ence of this approximation is monotonic in the l 2 vector norm, and also holds en trywise, how ever with po ssibly quite differen t conv ergence r ates for different vector ent ries. Along with this, all the conv ergence a nd error pro pa gation pr op erties listed in Sec- tion 3 a nd 4 hold, indep endent ly o f the fineness of the discretization. This decoupling from the discretiza tion is quite imp ortant, as it shows that in the presented method the discretization do es not bec ome an impo rtant ingredient of the reg ularization pro - cedure itself in case when f , g and ρ are in reality contin uum distr ibutions, mo delled and meas ured a s histog rams. 6. Numerical example . In this s ection the p erfor mance o f the prop osed metho d is illustra ted on a numerical example. The example calculation is implemented via the C library libunfold [34], also including the automa tic a ppr oximation, statistica l and s ystematic error propag ation formulae presented in the pap er . The shown exam- ple is also shipp ed with the p ertinent libr ary . The illustrative case was delib era tely chosen in a wa y when the res po nse function is not translatio nally inv ariant, i.e. when ordinary decon v olution metho ds are not sufficient . Our sim ulated measurement sce nario is the following. W e w o uld lik e to measure the true pdf of a quantit y , namely of the energy of pro duced charged particles in a hig h energy particle co llision exp eriment. This tr ue pdf used in o ur toy Monte Ca rlo shall be a pa rametrizatio n of a r e al mea surement at the LHC accelerator [35] at CERN. It is of the form E 7→ f ( E ) := χ [0 , ∞ [ ( E ) | E | ( n − 1)( n − 2 ) ( n T ) 2  1 + | E | n T  − n (6.1) with pa rameters n = 6 . 6 and T = 0 . 1 4 5 GeV. The r e sp o nse function ( E measured , E true ) 7→ ρ ( E measured | E true ) (6.2) shall b e such a cpdf that for each fixed v a lue E true > 0 the p df E measured 7→ ρ ( E measured | E true ) (6.3) shall b e a Gaussian p df with a mean of E true and standard deviation of a + √ b E true + c E true , with par ameter v alues a = 0 . 150 GeV, b = 0 . 71 74 GeV , c = 0 . 07 4. This resp onse function mo de ls the behavior of a calorimeter device used for the ener gy measurement of par ticles, namely o f the HCAL calorimeter [3 6] of the CMS ex per i- men t at the LHC acce lerator at CERN. In the sim ulated mea surement scenario 10 4 Monte Carlo samples ac c o rding to the p df Eq.(6.1) was genera ted, and its cor re- sp onding smeared resp onse according to Eq.(6.2) w as generated. These resp onses were assumed to be collected with an inefficiency of E measured 7→ 1 2  1 + tanh  E measured − E ∆  d (6.4) with para meters E = 1 GeV, ∆ = 1 GeV and d = 0 . 05, i.e. with an inefficiency not greater than 5% o n the overall measurement domain. The collected res po nses w ere histogramed, providing the measured p df g with our non-ideal detector . By con- struction, the statis tica l cov ariance matrix of the histog r am g s ha ll b e diag( g ). The inefficiency profile Eq.(6.4) causing a systematic deviation of the measured pdf from the folded p df by E q.(6.2), is assumed to be not known q ua nt itatively and therefor e 24 Andr´ as L´ aszl´ o is not corrected for. It is assumed, how ever, that an o verall 5% upper b ound to this systematic deviatio n is known, being the systematic er ror of the measured p df, i.e. one has sg = 0 . 05 g . With these inputs, the linear iterative unfolding according to Eq.(3.6) was p erfo rmed. The a ppr oximation err ors were quantified using Re ma rk 3.4 (vi). The propaga ted sta tis tica l err ors w ere calcula ted according to Eq.(3.39). The pr opagated systematic errors were qua n tified using Theorem 3.5 (iii). The iteration w a s stopp ed when the combined statistical, approximation and systematic error exceeded a prede- fined thresho ld of 7%. The result o f the numerical test is shown in Fig. 1 . Note, that more o ptimalized s to pping criter ia can also b e inv ented, using the estimates for the approximation err or, statistical erro r and systematic error . A na tural candidate can be a double- threshold criterion: the approximation erro r needs to be be low a thresh- old (sufficient shap e restor ation), wherea s the co mbin ed s tatistical and systematic error m ust s tay below an upper bo und (divergence regulariza tion). Als o , the iteration might be s to pped a t the er r or o ptim um: a t the minimum of the combined appr oxi- mation, statistical a nd systema tic error . No te, how ever, that one often might require a b etter shap e reco nstruction at the exp ense of increase d statis tica l and systematic error s, a s als o se e n in the shown example. 7. Concluding remarks. In this pap er we pres e n ted mathematical pro ofs of conv er g ence and error propa gation for mulae for a linear itera tive unfolding method [1] in the probability theory co n text. It was shown that the pe rtinent metho d is con- vergen t in the ‘bin wise’ sense under q uite gener ic conditions, which do es hold in case of man y pr actical a pplications. F urthermo r e, explicit formulae for the thre e imp orta nt error terms, the approximation er ror, the s tatistical error and the s ystematic errors were der ived. These can b e used to define o ptimal iteratio n stopping cr iterion and quantification of error s therein. The key element of the pro ofs is the Riesz - Thorin theorem mapping the o riginal L 1 problem to L 2 with a subsequen t usa ge o f spectr a l theory of L 2 op erators . The typical use-cases of the metho d are those unfolding pr ob- lems which cannot b e handled by statistica l deco nv olution [9, 10], due to the abs e nce of tr anslational in v ariance of the res po nse function. The p o ssibility for propaga tion of the systematic er r ors is a spe c ial adv antage, which deserves to be emphasized for exp erimental applications. The p ertinent metho d is also av ailable as a C numerical librar y [34]. Using that, the metho d was demonstrated on a numerical exa mple. The a lgorithm co uld b e included in the ROOUnfold pack age [37] in the future, o r in the GNU Scientific Library [3 8]. The pr esent pap er can serve also as a go o d motiv ation to p erform s imilar co nver- gence and error pr o pagation s tudies on an other iterative unfolding metho d [18, 19, 2 1, 22, 23, 24, 25, 26], a lso called the metho d of conv erg ent weigh ts o r iterative Bay es ian unfolding. That method is no n-linear and therefore is somewhat mo re complicated to study , howev er can b e mo r e suitable for unfolding pro ble ms in probability theory as it conserves the integral and non-negativity of proba bilit y densit y functions. Although widely us ed and num erically very promis ing, s o far little is known on the convergence prop erties of that algor ithm, and nothing is known a bo ut its erro r propa gation. Our prop osed metho d can be conside r ed as the “linearized” version of that metho d, and th us the presented results a re e x pec ted to provide clues also for the conv erg e nc e and error propag ation prop erties of the metho d of co nv erg e n t weigh ts or iterative Bayesian unfolding. Ac knowledgmen ts. The author would lik e to thank to T am´ as Matolcsi for v alu- able co mmen ts and for rea ding v arious v er sions of the manuscript, furthermore to Con v ergence and error propagation r esults on a linear iterative unfoldi ng metho d 25 ❘ ❡s ♣ ♦ ♥ s ❡ ❢ ✉ ♥ ❝ t ✐ ♦ ♥ ♦ ❢ t ❤ ❡ t ❡ s t s ② s t ❡♠ ✵ ✵ ✳  ✵ ✳ ✁ ✵ ✳ ✂ ✵ ✳ ✄ ✵ ✳ ☎ ✵ ✳ ✆ ✵ ✳ ✝ ✵ ✳ ✞ ✵ ✳ ✟  P r ✠ ❜ ❛ ❜ ✡ ☛ ✡ ☞ ✌ ❞ ✍ ✎ ✏ ✡ ☞ ✌ ■ ♥ ♣ ✉ t ❡♥ ❡✑ ❣ ② ❬ ✒ ❡❱ ❪ ✵  ✁ ✂ ✄ ✓ ✎ ✏ ☞ r ✔ ✕ ✍ ✎ ☞ ❛ ☛ ☛ ✌ ✕ ✍ ❛ ✏ ✔ r ✍ ❞ ✍ ✎ ✍ r ✖ ✌ ✗ ✘ ✍ ✙ ✚ ✲ ✁ ✲  ✵  ✁ ✂ ✄ ❯ ♥ ❝ ♦ r r ❡❝t ❡❞ s ② s t ❡♠ ❛ t ✐ ❝ ❞ ✐ s t ♦ r t ✐ ♦ ♥ ♦ ❢ t ❤ ❡ s ② s t ❡♠ ■ ♥ s t r ✉ ♠ ❡♥ t ❛ ❧ ❧ ② ♠ ❡❛ s ✉ r ❡❞ ❡♥ ❡r ❣ ② ❬  ❡❱ ❪ ✲✷ ✲✶ ✵ ✶ ✷ ✸ ✹ ❆ ♣ ♣ ✁ ✂ ✄ ☎ ✆ ✝ ✆ ✞ ✄ ✟ ✠ ✞ ✂ ✡ ☎ ✂ ✆ ✞ ☛ ☞ ✞ ✂ ☛ ✌ ☛ ✌ ✍ ☛ ✁ ☎ ✄ ☎ ♣ ☎ ✍ ✎ ✏ ✑ ✵ ✷ ✳ ✒ ✒ ✼ ✳ ✒ ✶ ✵ ▲✐  ❡ ❛ r ✐ ✁ ❡ r ❛ ✁ ✐ ✂ ❡ ✉  ❢ ♦ ❧ ✄ ✐  ❣ ✭ ✺ ✻ ✽ ✐ ✁ ❡r ❛ ✁ ✐ ♦  s ✱ ❝♦ ♠ ❜ ✐  ❡✄ ❡ r r ♦ r ❝♦  ✁ ❡  ✁ ❂✼ ✪ ✮ ❊  ❡ r ❣ ② ❬ ☎ ❡ ❱ ❪ ✲✷ ✲✶ ✵ ✶ ✷ ✸ ✹ ✺ P ✆ ✝ ✞ ✟ ✞ ✠ ✡ ✠ ☛ ☞ ❞ ✌ ♥ ✍ ✠ ☛ ☞ ✎ ✏ ✴ ● ✌ ✑ ✒ ✲✶ ✲ ✵ ✳ ✺ ✵ ✵ ✳ ✺ ✶ ✶ ✳ ✺ ✷ ❙ ✁ ❛ ✁ ✐ s ✁ ✐ ❝ s ❂ ✾ ✽ ✹ ✹ ❡ ✁ r ✐ ❡ s ■  ♣ ✉ ✁ ♣ ✄ ❢ ✭ ▼ ❈ ✁ r ✉ ✁ ❤ ✮ ▼❡ ❛ s ✉ r ❡✄ ❢ ♦ ❧ ✄ ❡ ✄ ♣ ✄ ❢ ✭ ✇ ✐ ✁ ❤ s ✁ ❛ ✁ ✳ ❛  ✄ s ② s ✁ ✳ ❡ r r ♦ r s ✮ ❯  ❢ ♦ ❧ ✄ ❡✄ ♣ ✄ ❢ ✭ ❧ ✐  ❡ ❛ r ✐ ✁ ❡r ❛ ✁ ✐ ✂ ❡ ✉  ❢ ♦ ❧ ✄ ✐  ❣ ✮ Evolution of approximation, statistical and systematic bin pro bability errors Iteration order 0 100 200 30 0 400 500 60 0 Largest error of bin probability 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Largest approximation error Largest statistical error Largest systematic error Fig. 1 . (Color online) T op left : il lustr ation of the r esp onse function of our t e st e xample. The c olor intensity indic ates the pr ob ability density of t he r esp onse function. T op right: il lustr ation of the unac c ounted systematic distortion applie d to the folde d p df in our test ex ample. The solid curve indic ates the systematic distortion (an ineffici ency, in our example) on the unfolde d p df, which is assume d to b e not exactly quantifiable, and ther e f or e is not c orr e cte d for in t he simulate d me asur ed p df. Only an upp er bo und for the systematic distortion, ca l le d to b e the systematic e rr or, is assume d to b e known for the simulate d me asur e d p df. That is taken to b e a co nstant 5% upp er b ound in the example. Bottom left: the true input p df (solid line), the simulate d me asur e d p df (squar es) and t he unfolde d p df (triangles) by t he pr op ose d metho d. The p dfs ar e shown to g e ther with their bin-by-b in statistic al err ors (err or b ars), systematic err ors (err or b ands), and appr oximation err ors (narr ow err or b ands). Bottom right: evolution of the bin-b y-bin maximum of the appr oximation err or (cir cles), statistic al err or (diamonds), and systematic err or (flipp e d triangles) as a function of the numb er of iter ations. Note that the binwise appr oximation err ors c onver ge to zer o, but not in a monotonic manner, which explains the slight incr e ase of that term after ab out 300 iter ations. If the ite r ation was c ontinue d, that term inde e d co nver ge d to zer o, but with sever al lo c al minima, i .e. “waves” or “jumps” ar e se en in the c onve rgenc e curve. On the other hand, the binwise statistic al and systematic e rr or term ar e se en simply to diver ge, as exp e cted . The c omp etition of these thr e e err or terms gives a p ossibility to define a stopping criterion. Dezs˝ o V a rga for discussions on the physical applications and on the relev ance of err or propaga tion form ulae, in particula r for the systematic errors. This work was sup- 26 Andr´ as L´ aszl´ o po rted in par t by the Mo men tum (‘Lend ¨ ulet’) prog ram of the Hungaria n Academy o f Sciences under the grant num b er LP2013- 60. The author would also like to ac knowl- edge the supp or t of the J´ a nos Bolyai Research Scholarship o f the Hungarian Academy of Sciences. REFERENCES [1] A. L´ aszl´ o : A linear iterativ e unfolding m etho d. J. Phys. Conf. Ser. 36 8 (2012), 012043. [2] A. L´ aszl´ o : A r obust iterativ e unfolding metho d f or signal pro cessing. J. Phys. A39 (2006), 13621. Zbl 1107.94004 [3] G. Cowan : Pr o ce e dings of Confer ence on Ad vanc e d Statistic al T e chniques i n Particle Physics (18-22 Mar ch 2002, Durham, Unit ed Kingdom) (2002, IPPP/02/39, Durham) p 248 [4] V. Blob el : Unfolding for HEP Exp eriments ( T alk at DESY Computing Seminar , 2008) http://w ww.desy. de/~blobel/DESYcompsem08.pdf [5] G. Bohm, G. Zech : Int roduction to Statistics and D ata Analysis for Physicists (2010, Hamburg: V erlag D eutsc hes Elektronen-Sync hrotron). [6] M. Kuusela, V. M. Panar etos : Statistical unfolding of element ary particle sp ectra: empi r ical Ba y es estimation and bias- corrected uncertain ty quant ification. Ann. Appl. Stat. 9 (2015), 1671. [7] M. Kuusela, P. B. Stark : Shape-constrained uncertain ty quan tification in unfolding steeply falling element ary particle sp ectra. Pr eprint (2016) [ arXiv:1512.0090 5 ]. [8] H. P. Dembinsk i, M. R oth : An algori thm for automatic unfolding of one-dimensional distribu- tions. Nucl. Instr. Meth. A729 (2013), 410. [9] I. Datt ner, A. Goldenshluger, A. Juditsky : On decon volution of distribution functions. Ann. Stat. 39 (2011), 2477. Zbl 1232.62056 [10] I. Dattner, M. R eiß, M. T r abs : A daptiv e quantile estimation in decon volution with unkno wn error distribution. Bernoul li 22 (2016), 143. Zbl 0654326 6 [11] J. F an : On the opt imal r ates of conv ergence for nonparametric decon volution problems. A n- nals of Stat. 19 (1991), 1257. Zbl 0729.62033 [12] C. H. Hesse : Iterative density estimation f rom cont aminated observ ations. Met rika 64 (2006), 151. Zbl 1100.62042 [13] F. Comte, C. La c our : Anisotropic adaptive k ernel decon volution. Ann. Inst. H. Poinc ar´ e Pr ob. Stat. 49 (2013), 569. Zbl 06171260 [14] M. C. Liu, R . L. T aylor : A consistent nonparametric density estimator for the deconv olution problem. Can. J. Stat. 17 (1989), 427. Zbl 0694.62017 [15] L. A. Stefanski, R. J. Car ol : Deconv oluting k ernel densit y estimators. Statistics 21 (1990), 169. Zbl 0697.62035 [16] J. Kalifa, B. Ro uge : Decon volution by Thresholding i n Mirror W av elet Bases. IEEE T r ans. on Image Pr o c. 12 (2003), 446. [17] A. Ho e cker, V. Kartvelishvili : SVD Approach to data unfolding. Nucl. Instr. Meth. A372 (1996), 469. [18] G. D’A gostini : A multidimensional unfolding method based on Ba yes’ theorem. Nucl. In- str. Meth. A362 (1995), 487. [19] G. Ze c h : Iterativ e unfolding wi th the Richardson-Lucy algorithm. Nucl. Instr. Meth. A716 (2013), 1. [20] C. A lt et al : High T ransverse Momen tum Hadron Spectra at √ s N N = 17 . 3 GeV, i n Pb+Pb and p+p Coll i sions. Phys. R ev . C77 (2008), 034906. [21] W. H. Richar dson : Ba y esi an-based iterativ e metho d of im age restoration. J. O pt. So c. of A mer. A62 (1972), 55. [22] L. B. Lucy : An iterativ e technique for the rectification of observed distributions. Astr onomi- c al Journal 79 (1974), 745. [23] L. A. Shepp, Y. V ar di : Maximum likelihoo d reconstruction for emi s sion tomograph y . IEEE T r ans. Me d. Imag. 1 (1982) , 113. [24] A. Kondor : Method of con vergen t w eigh ts – An iterativ e procedure for solving F redholm’s int egral equations of the firs t kind. Nucl. Instr. Met h. 216 (1983), 177. [25] H. N. M ¨ ulthei, B. Schorr : On an iterative method f or a class of int egral equations of the first kind. Math. Meth. Appl. Sci. 9 (1987), 137. Zbl 0628.65130 [26] H. N. M¨ ulthei, B. Schorr : On an iterativ e metho d f or the unfolding of sp ectra. Nucl. In- str. Meth. A257 (1987), 371. [27] P. D. L ax : F unctional analysis (2002, Chichester: Wiley-Interscience) . Zbl 1009.47001 Con v ergence and error propagation r esults on a linear iterative unfoldi ng metho d 27 [28] W. Rudin : F unctional Analysis (1973, New Y ork: McGra w-Hill). Zbl 0253.46001 [29] G. Arfken : F ourier Con v olution theorem 20.4 Mathematic al Metho ds for Physicists 7rd edn (2013, Amsterdam: Elsevier/Academic Press) pp 985. Zbl 1239.00005 [30] P. Br ac ewel l : Con volution theorem The F ourier T r ansform and Its Applic ations 3rd edn (1999, New Y ork: McGraw-Hill) pp 108. [31] L. Land web er : An i teration formula for F redholm inte gral equations of the fir st kind. Am . J. Math. 73 (1951), 615. Zbl 0043.10602 [32] G. B. F ol land : Real A nalysi s: Modern T echniques and Their Applications 2nd edn (1999, Wiley-In terscience). Zbl 0924.28001 [33] N. Dincule anu : V ector Measures (1967, Elsevier). Zbl 0142.10502 [34] A. L´ aszl´ o : The libunfold pack age (2011, Sour c e c o de ) http://w ww.rmki. kfki.hu/~laszloa/downloads/libunfold.tar.gz [35] V. Khachatryan et al (the CMS c ol lab or ati on) : T ransv erse-Moment um and Pseudorapidit y Distributions of Charged Hadrons in pp Collisions at √ s = 7 T eV. Phys. R ev. L ett . 105 (2010), 022002. [36] E. Y azgan (for the CMS c ol la b or ation) : The CM S barr el calorimeter resp onse to particl e beams from 2 to 350 GeV/c. J. Phys. Conf. Ser. 1 60 (2009), 012056. [37] T. A dye et al : The ROOUnfold pack age http://h epunx.rl .ac.uk/~adye/software/unfold/RooUnfold.html [38] The GNU Scientific Library: http://ww w.gnu.or g/software/gsl

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment