On-line Spot Volatility-Estimation and Decomposition with Nonlinear Market Microstructure Noise Models

On-line Spot V olatility-Estimation and Decomposition with Nonlinear Mar k et Microstr ucture Noise Models Rainer Dahlhaus (Univ ersity of Heidelberg) Jan C . Nedder mey er (DZ BANK A G) July 2012 Summary . A technique f or on-line estimation of spot v olatility f or high-frequency data is de v eloped. The algor ithm works directly on the transaction data and updates the volatility estimate immediately after the occurrence of a new transaction. Fur ther more, a nonlinear market microstr ucture noise model is proposed that reproduces se ver al stylized f acts of high-frequency data. A computationally efﬁcient par ticle ﬁlter is used that allows for the appro ximation of the unknown efﬁcient pr ices and, in combination with a recursive EM al- gorithm, f or the estimation of the v olatility cur ve . W e neither assume that the transaction times are equidistant nor do we use inter polated pr ices. W e also make a distinction be- tween volatility per time unit and volatility per transaction and provide estimators f or both. More precisely we use a model with random time change where spot volatility is decom- posed into spot volatility per transaction times the trading intensity - thus highlighting the inﬂuence of trading intensity on v olatility . Ke ywor ds. Nonlinear state-space model; microstr ucture noise; sequential EM algor ithm; tick-b y-tick data; random time change; volatility decomposition; transaction time; trading intensity . Address f or correspondence: Institute of Applied Mathematics, Univ ersity of Heidelberg, Im Neuenheimer F eld 294, D-69120 Heidelberg, Ger many (e-mail: jc@nedder mey er .net). The work was suppor ted by the University of Heidelberg under F rontier D .801000/08.023. 1 Intr oduction In the last couple of years the modeling of ﬁnancial data obser ved at high-frequency be- came one of the major research topics in the ﬁeld of ﬁnancial econometr ics. It is of high practical rele vance because a rising number of market par ticipants ex ecute trades based on high-frequency strategies and are e xposed to high-frequency market risk. Examples of those trading strategies are statistical arbitrage, the ex ecution of large bloc k trades, and market making. For most strategies the spot v olatility is impor tant f or trading signal gener- ation and risk management. Often the immediate detection of sudden volatility movements is par ticularly relev ant for traders . Usually high-frequency trading strategies are highly au- tomated. In fact, they are often “speed games” and only proﬁtable if one reacts to mar ket changes f aster than other mar ket par ticipants. An e xample is the pr icing of high frequency options which can be traded until a fe w seconds to matur ity . In a high-frequency setting the estimation of spot volatility is much more complicated due to the presence of mar ket microstructure noise. Overall, this causes the need for an on-line spot volatility estimator which ﬁlters out market microstr ucture noise and adapts to volatility mov ements quickly . In addition, it needs to be computationally efﬁcient. In this paper we propose such an estimation method. In the method descr ibed below , the efﬁcient log-price process of a security is treated as a latent state in a nonlinear state-space model. The relation between the efﬁcient log- prices and the transaction pr ices is descr ibed by a class of nonlinear mar ket microstr ucture noise models leading to a par ticular form of the obser v ation equation in the state-space model. A computationally efﬁcient par ticle ﬁlter is dev eloped which allows the estima- tion of the ﬁlter ing distributions of the efﬁcient log-prices given the obser ved transaction prices. Based on the ﬁlter ing distr ibutions the time-varying volatility is estimated by using a sequential Expectation-Maximization (EM) algorithm. The procedure wor ks on-line and updates the v olatility estimate immediately when a new transaction comes in. The method is suitable f or real-time applications because of its computational efﬁciency . Contrary to se v eral other papers we do not assume that the transaction times are equidistant nor do we use interpolated pr ices. Until recently , the main f ocus in the literature has been on the estimation of the inte- grated v olatility . This task has been studied e xtensiv ely under v arious assumptions on the market microstr ucture noise (Zhou 1996; Zhang et al. 2005; Andersen et al. 2006; Bandi and Russell 2006, 2008; Hansen and Lunde 2006; Bar ndorff-Nielsen et al. 2008; Kalnina and Linton 2008; Christensen et al. 2009; Jacod et al. 2009; P odolskij and V etter 2009). 1 Some authors suggested that estimates of the spot volatility can be obtained through lo- calized versions of estimators for the integrated volatility (Harr is 1990; Zeng 2003; F an and W ang 2008; Bos et al. 2009; Kristensen 2010) or by Fourier ser ies methods (Munk and Schmidt-Hieber 2009). A speciﬁc noise-robust estimator is provided with a detailed analysis by Zu and Boswic k (2010). How e v er , these methods are essentially off-line pro- cedures. Foster and Nelson (1996) der ive the rate of conv ergence of rolling regression estimates. They include the case of one-sided k er nel-estimates which can be transf or med to recursiv e estimates. In this ar ticle w e use a diffusion model with r andom time change given by the number of transactions (see Section 4). Conditional on the obser ved trading times t 1 < t 2 < . . . < t T the ev olution of the unobser ved efﬁcient log-pr ice process X t j is given by a random walk in transaction time with possib ly time-v ar ying volatility σ t j , that is X t j = X t j − 1 + Z t j (1) with Z t j ∼ N (0 , σ 2 t j ) (an alter nativ e is a diffusion model in clock time - see Section 4). Drift ter ms are ignored since their eff ect is of lo wer order in high-frequency data. The obser ved transaction data Y t j are then treated as noisy obser vations of the latent process X t j . In Section 4, a transf or mation from transaction time volatility σ 2 t j to cloc k time v olatility is giv en. The basis f or this is that the underlying diffusion model with random time change leads to a decomposition of volatility in clock time into volatility in transaction time and trading intensity . This shows in par ticular the inﬂuence of the local trading intensity on v olatility . In addition, we present a direct cloc k time estimator . In our opinion the main adv antage of the abov e model in compar ison with a continuous time diffusion model is the af orementioned decomposition of the v olatility discussed in Section 4. In addition, volatility in transaction time is more constant than volatility in clock time making the algor ithm more stable (Ané and Geman 2000; Plerou et al. 2001; Gabaix et al. 2003 – see also Section 6.2). The relation between the unobser ved efﬁcient (log-)prices and the observed transaction prices is described through a nonlinear market microstr ucture noise model given by the generaliz ed rounding scheme Y t j = g t j  exp( X t j )  = g t j ; Y t 1: j − 1  exp( X t j )  . (2) Here the function g t j ma y be random or deter ministic , time-inhomogeneous, and depend- ing on the past obser vations Y t 1: j − 1 := { Y t 1 , . . . , Y t j − 1 } and in addition on e xogenous vari- ables such as order book data or mar ket maker quotes. Contrary and complementar y to 2 the additive model log Y t j = X t j + U t j used in the major ity of existing papers this model tries e.g. to descr ibe in detail the rounding-mechanism due to order books and market maker quotes . The par ticle ﬁlter applied in this paper allows f or a fairly general class of nonlinear func- tions g t j  · ) possibly depending on unknown parameters (cf .(13) for an e xample different to the mainstream of this paper) - for computational simplicity and since we belie ve that this is a very good model we restr ict ourselves to the setting (3) where the possible suppor t of exp( x t j ) can be diagnosed from y t j , pre vious obser vations and (say) the order book. A simple deter ministic e xample cov ered by this model is the rounding of exp( X t j ) to the nearest cent. A more complex stochastic e xample is the situation where the ne xt trade is made both with probability 1 / 2 on the closest bid- or ask-le vel of an order book. One might be tempted to write our model in the f or m Y t j = ˜ g t j  exp( X t j ) , U t j  with a deter ministic non- linear ˜ g t j and a random component U t j - but in most situations the U t j would depend in a v er y complicated wa y on the past of the Y t j (e.g. in Example 4 from Section 2). The state equation (1) and the obser v ation equation (2) f or m a nonlinear state-space model (see also (3) and (4)). The (transaction time) spot volatility cur ve is considered as a parameter of this state-space model. The estimation is done through a par ticle ﬁlter and a sequential EM-type algor ithm. V er y roughly speaking the volatility estimator can be view ed as a localized realized volatility estimator based upon the par ticles of the par ticle ﬁlter . In detail the situation is howe ver more complicated because we need a back and f or th between par ticle ﬁlter and v olatility estimator to obtain a decent on-line estimator . The ar ticle is organized as follo ws. Section 2 describes the nonlinear mar ket microstruc- ture noise model. In Section 3, a par ticle ﬁlter and a sequential EM-type algorithm are proposed for on-line estimation of spot volatility in transaction time. In Section 4 the de- composition of clock time v olatility is given and the estimation of spot volatility in clock time is discussed - both in a diffusion model with random time change and in a standard dif- fusion model. Modiﬁcations (e.g. f or data with diur nal patter ns), adaptation issues, and the implementation of the algor ithm are discussed in Section 5. Finally , simulation results and an application to real data are presented in Section 6 follow ed by some conclusions in Section 7. In some par ts of the paper we could replace the notation y t j , x t j by the simpler notation y j , x j etc. Since the time points t j are treated in Section 4 as the realization of a point process we ha v e decided to stick with this notation throughout. 3 2 Nonlinear Market Micr ostructure Noise Models In most e xisting mar ket microstructure models the efﬁcient log-pr ice is assumed to be corrupted b y additiv e stationary noise (cf. Aït-Sahalia et al. 2005; Zhang et al. 2005; Bandi and Russell 2006; Hansen and Lunde 2006; Bar ndorff-Nielsen et al. 2008). The noise v ariables are typically independent of the efﬁcient log-pr ice process. The setting allows for weak (e ven nonparametr ic) assumptions on the noise and f air ly general theoretical results of the estimators. The major weakness of these models is that they cannot reproduce the discreteness of tr ansaction prices. More adequate models which incorporate rounding noise ha ve also been considered (Ball 1988; Delattre and Jacod 1997; Large 2007; Li and Mykland 2007; Rober t and Rosenbaum 2008; Rosenbaum 2009). P opular models are based on additiv e noise f ollow ed by rounding according to the smallest tic k size as in (13). At the end of Section 3.1 we discuss how these models can be used in the frame work of our paper . Hansen and Horel (2009) use a Mar kov chain model f or the ﬁlter ing of discretized prizes and consider realized v olatility estimates based on the ﬁltered ser ies. As already described in (2) the obser v ed price in our model is obtained from the un- known efﬁcient price by application of the generaliz ed rounding function g t j which ma y be deter ministic or stochastic - examples are simple deter ministic and stochastic round- ing (Example 1 below), rounding to the closest liquid bid or ask lev els of an order book (Example 2), rounding to the quotes created by a mar ket maker (Example 3) or to some le v els estimated from pre vious obser vations Y t 1: j − 1 (Example 4) where g t j depends in a complicated wa y on past obser vations . In general the rounding lev els are created from additional exogenous inf or mation or from pre vious obser v ations y t 1: j − 1 (e.g. the lev els in an order book). In most cases we assume that (conditional on y t 1: j − 1 ) these le vels are known. Example 4 is an example where such e xogenous information is unav ailable and the le vels are estimated. Examples 1-4 do not contain unkno wn parameters (although this is not mandator y - cf . (13) which depends on σ 2 U ). Model assumption 1 (observation equation / micr ostructure noise model): (i) The distribution of Y t j = g t j ; Y t 1: j − 1  exp( X t j )  is discrete with suppor t Y . (ii) The conditional distr ibution of Y t j giv en the state X t j and previous obser vations is of the f or m p  y t j   y t 1: j − 1 , exp( x t j )  ∝ 1 A t j  exp( x t j )  1 Y ( y t j ) a.s. (3) where the set A t j depends on y t j and on the conditioning obser vations y t 1 , . . . , y t j − 1 . “a.s. ” means almost surely with respect to the distribution of X t j . 4 2290 2295 2300 2305 2310 46.85 46.87 46.89 46.91 Figure 1: A real data example of estimated ﬁltering distr ibutions based on the market microstructure noise model with deter ministic rounding for the case when market maker quotes are availab le in addition to the transaction data. The details are provided in Section 6.2. The plot shows some transaction pr ices (circles) along with kernel density estimates of the ﬁltering distributions of the efﬁcient pr ices (b lack lines) based on the par ticles produced by the particle ﬁlter . The gr ay vertical lines indicate the assumed suppor t of the ﬁltering dis- tributions. The bid and ask market maker quotes are displa yed b y gra y and b lack horizontal lines, respectiv ely . The x-axis shows transaction time . It is impor tant to note that the concrete speciﬁcation of the set A t j is an impor tant par t of the microstr ucture noise model at hand - see Examples 1-4 below . Note that we need to know A t j only f or the obser ved y t j and not for all possib le realizations of Y t j . If the function g t j = g t j ; Y t 1: j − 1 is deter ministic then A t j := g − 1 t j ( y t j ) = { z : g t j ( z ) = y t j } is the inv erse image of y t j under g t j . The “a.s. ” will be omitted in the rest of the paper . In par ticular Proposition 1 continues to hold if (3) only holds almost surely . Reasons f or choosing the model: W e chose the abov e model for three reasons: (a) The par ticle ﬁlter takes a simple form: The optimal proposal becomes a tr uncated nor- mal distribution and the impor tance weights are also easy to calculate (see Proposition 1). This means that the ﬁlter is more efﬁcient than in the general case and less par ticles (and less computation time) are needed. (b) The model covers sev eral impor tant cases of microstr ucture noise (see the Examples below). (c) Already the simplest model of deter ministic rounding in Example 1 (in combination with order book data or market maker quotes) describes in our opinion in a sufﬁcient wa y se v eral stylized f acts of high-frequency data - namely the discreteness of prices, the bid- ask bounce and time-var ying bid-ask spread and the form of autocorrelations and par tial autocorrelations of real log-retur ns (see Figure 3). At the same time this model is more parsimonious than other models. 5 Identiﬁability and appro ximation of the ﬁltering distr ibution: (11) and (3) imply that the joint ﬁltering distribution p ( x t 1: j | y t 1: j ) is uniquely deﬁned under the model assumptions . A t j and log A t j are the suppor ts of the ﬁlter ing distr ibution of the efﬁcient price p  exp( x t j )   y t 1: j  and the efﬁcient log-price p ( x t j | y t 1: j ) , respectively . It will be shown in Section 3.1 that the ﬁltering distributions can be approximated through a par ticle ﬁlter . A real data e xample is given in Figure 1. It shows the suppor ts A t j (gra y ver tical lines) and ker nel density estimates of the ﬁltering distributions of the efﬁcient prices (blac k lines) which are computed based on the output of the par ticle ﬁlter . In this e xample, mar ket maker quotes are av ailable (see Example 3 below). The details of this e xample are provided in Section 6.2. F or completeness we giv e the state equation again. Model assumption 2 (state equation / efﬁcient price model): The unobser ved efﬁcient price is giv en by exp( x t j ) with p  x t j   x t j − 1  = N  x t j   x t j − 1 , σ 2 t j  . (4) It is assumed that σ 2 t j = σ 2 ( t j ) with a function σ 2 ( · ) which is either constant or smooth in time (sa y Lipschitz-continuous). The smoothness assumption does not need to be speciﬁed any fur ther because we do not use it f or mally . Howe ver , without this assumption the estimation procedure dev eloped in Section 3.2 would not make sense. Any proof of consistency of the estimates of this paper would require in addition some type of in-ﬁll asymptotics . Modeling microstructure noise via the speciﬁcation of A t j : In order to carry out the particle ﬁlter and the v olatility estimate descr ibed later we hav e to specify f or the obser vation y t j at hand the set A t j i.e. the set of the possible efﬁcient prices. This speciﬁcation is an impor tant modeling step. We no w give e xamples. Example 1 (simple deterministic and stochastic r ounding): (i) The simplest e xample is the rounding of exp( x t j ) to the nearest integer (say cent) – i.e. y t j = round (exp( x t j )) . In this case A t j = [ y t j − 0 . 5 , y t j + 0 . 5) and p  y t j   y t 1: j − 1 , exp( x t j )  = 1 A t j  exp( x t j )  1 N ( y t j ) . (ii) A simple stochastic example is where we choose for exp( x t j ) ∈ ( n, n + 1) the values y t j = n and y t j = n + 1 each with probability 1/2. In that case A t j = ( y t j − 1 , y t j + 1) and p  y t j   y t 1: j − 1 , exp( x t j )  = 1 2 1 A t j  exp( x t j )  1 N ( y t j ) for almost all x t j . It seems natural to set y t j = n f or exp( x t j ) = n but with order book data as in Example 2 below this choice is no longer natural. 6 W e now give some e xamples where we model the bid-ask spread of ﬁnancial transaction data. In these cases A t j depends on past obser vations and/or e xogenous data. Example 2 (order book data): Let’ s assume that at each transaction time t j the exchange provides a limit order book with bid and ask le vels given by α k t j ( k = 1 , 2 , . . . , K ) and β ` t j ( ` = 1 , 2 , . . . , L ) respectively (these are the lev els where contract off ers are really av ailable). The order book le vels satisfy α K t j < . . . < α 2 t j < α 1 t j < β 1 t j < β 2 t j < . . . < β L t j and we denote M t j - := { α K t j , . . . , α 2 t j , α 1 t j , β 1 t j , β 2 t j , . . . , β L t j } . M t j - represents the state of the order book immediately before the transaction at time t j occurs. M t j - depends in an unknown wa y on the past obser v ations y t 1: j − 1 and e xogenous inf or mation. Clear ly y t j ∈ M t j - . We now set, corresponding to the deter ministic case (i) in Example 1 A t j := { z ∈ R : argmin γ ∈M t j - | z − γ | = y t j } (5) or equiv alently g t j ; y t 1: j − 1 ( z ) := argmin γ ∈M t j - | z − γ | . Thus the transaction pr ice at time t j is that pr ice from M t j - with the smallest Euclidean distance to the efﬁcient pr ice. This means that the efﬁcient pr ice at time t j is assumed to be closer to the obser ved pr ice y t j than to any other order book le v el. Of course, this cannot be guaranteed and it seems to be more realistic to choose for (sa y) γ ∈ ( α 1 t j , β 1 t j ) y t j = α 1 t j and y t j = β 1 t j each with probability 1 / 2 - i.e. a trade is made with probability 1 / 2 on the bid and on the ask side. This corresponds to the stochastic case (ii) from Example 1 leading to the deﬁnition A t j :=  largest le vel from M t j - below y t j , smallest le vel from M t j - abov e y t j  . Figure 3 indicates that the model with A t j as in (5) (deter ministic rounding) better captures the stylized facts of real transaction data. The e xplanation may be that in the case of a liquid order book often sev eral trades are e x ecuted at the same le vel and theref ore the ﬁrst model giv es a better ﬁt. The situation ma y be diff erent if the stoc k is less heavily tr aded - how ev er we hav e not inv estigated that. In the present situation we could better write instead of (3) p  y t j   M t j - , exp( x t j )  ∝ 1 A t j  exp( x t j )  1 Y ( y t j ) a.s. where M t j - contains implicitly the rele v ant information from y t 1: j − 1 . If the v olume of the trade at time t j is so large that it is e x ecuted on se ver al lev els of the order book then y t j should be set equal to the largest ask le vel (smallest bid le vel) and all low er le vels should be deleted bef ore deter mining A t j . 7 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 t 11 10 . 00 10 . 01 10 . 02 10 . 03 10 . 04 y t 1 exp( x t 1 ) A t 1 Figure 2: An example of the market microstructure noise model with deter ministic rounding for the case when order book data are availab le. The ﬁgure shows the transaction pr ices (circles), the (in practice unknown) efﬁcient prices in transaction time (diamonds), the latent efﬁcient price process in clock time (black line), the order book lev els (gra y horizontal lines), and the suppor ts of the ﬁltering distributions of the efﬁcient pr ices (gra y ver tical lines). An e xample of this market microstructure model is visualized in Figure 2. The inter vals A t j are denoted b y thic k ver tical lines . Note, that these are also the suppor ts of the ﬁlter ing distributions. Larger inter v als A t j are usually due to a larger bid-ask spread. Example 3 (market maker quotes): In case where mar ket maker quotes are av ailable instead of order book data, we only ha ve a single bid and a single ask lev el α t j and β t j , respectively , which satisfy α t j < β t j . That is, y t j is either equal to α t j or equal to β t j . Corresponding to deter ministic rounding as in Example 1 (i) we set A t j = [ y t j − ∆ t j , y t j + ∆ t j ) (6) where ∆ t j := 0 . 5 ( β t j − α t j ) . The choice ∆ t j := ( β t j − α t j ) corresponds to stochastic rounding as in Example 1 (ii). F rom a cer tain point of view this choice of A t j seems to be not adequate and one is tempted to chose A t j = (  − ∞ , α t j + ∆ t j  , if y t j = α t j  β t j − ∆ t j , ∞  , if y t j = β t j . T o understand why this is not a proper choice one needs to look in more detail at the behav- ior of the market maker . Of course the mar ket mak er has more (invisib le) le vels which are automatically e x ecuted at the same time if the efﬁcient price makes larger jumps. Fur ther- more, the mar ket maker has additional inf or mation on the efﬁcient price (sa y from trades of correlated securities) and may ha ve already adjusted his le v els towards the efﬁcient price. 8 This last f act violates our model assumptions (in that the function g t j not only depends on past values y t 1: j − 1 and exogenous information but also somehow on exp( x t j ) ) but in par- ticular in this situation our model with the abov e choice of A t j seems to be a reasonable parsimonious model. This example also demonstrates the adv antage of the f act that we just hav e to specify the inv erse image A t j f or the y t j at hand. 0 1000 2000 3000 4000 5000 46.60 46.75 46.90 0 1000 2000 3000 4000 5000 46.7 46.9 47.1 0 1000 2000 3000 4000 5000 46.7 46.9 47.1 0 50 100 150 200 250 46.80 46.86 46.92 0 50 100 150 200 250 46.88 46.92 46.96 47.00 0 50 100 150 200 250 46.88 46.92 46.96 47.00 0 5 10 15 20 25 30 35 −0.4 0.0 0.4 0.8 0 5 10 15 20 25 30 35 −0.4 0.0 0.4 0.8 0 5 10 15 20 25 30 35 −0.4 0.0 0.4 0.8 0 5 10 15 20 25 30 35 −0.4 −0.2 0.0 0 5 10 15 20 25 30 35 −0.4 −0.2 0.0 0 5 10 15 20 25 30 35 −0.4 −0.2 0.0 Figure 3: Compar ison of real transaction data f or Citigroup (left column) with simulated data from the market microstructure noise model with deter ministic rounding (middle column) and stochastic rounding (right col- umn). The plots show (from top to bottom): 5,000 transaction prices; the ﬁrst 250 transaction pr ices and the efﬁcient price process of the simulated data; the autocorrelations and partial autocorrelations of the returns of the transaction prices. Example 4 (transaction data only): W e now consider the situation where no order book data or mar ket maker quotes are av ailable . In this case we tr y to estimate the order book lev els from the data and use afterwards the A t j from Example 2. More precisely we estimate half the bid-ask spread at 9 time t j by ∆ t j = ( 0 . 5 | y t j − y t j − 1 | if y t j 6 = y t j − 1 , ∆ t j − 1 else (7) and set A t j = [ y t j − ∆ t j , y t j + ∆ t j ) (8) (f or stochastic rounding we delete the f actor 0 . 5 abov e). Sur pr isingly , this speciﬁcation does not belong to a deter ministic but to a stochastic mapping g t j (it is not difﬁcult to see that the same x t j lies in A t j constructed from diff erent y t j - hence g t j must be stochastic). The mapping g t j becomes deter ministic (conditionally on y t 1: j − 1 ) if one replaces ∆ t j by ∆ t j − 1 . How e ver ∆ t j − 1 obviously is much worse than ∆ t j as an estimate of the bid-ask spread at time t j - so from a practical point of vie w the abov e speciﬁcation is to be preferred. W e ﬁnally demonstrate in a simulation e xample that the model reproduces the auto- correlations and par tial autocorrelations of real log-retur ns. In addition we compare the deter ministic and the stochastic rounding from Example 1. In Figure 3 transaction data of Citigroup are compared with data simulated from our model with the two different rounding schemes from Example 1(i) and (ii), respectively . The ﬁgure shows the simulated efﬁcient and the obser ved pr ices. The efﬁcient log-prices w ere generated according to (1) such that the obser v ations hav e approximately the same volatility as the Citigroup data. The impor- tant point is that the mar ket microstr ucture noise model with deter ministic rounding auto- matically introduces autocorrelations and par tial autocorrelations of the log-retur ns which are similar to those of the real Citigroup data while the results with stochastic rounding are worse . Another indicator for the super iority of deter ministic vs. stochastic rounding are the results of Section 6.2 (see the paragraph “T ransaction time volatility estimation” and Figure 9). In addition, the model cov ers bid-ask bounces, time-v ar ying bid-ask spreads and price discreteness. Although there is some evidence to use deter ministic rounding, all methods of this paper can also be applied with stochastic models: The stochastic rounding discussed abov e can also be used in combination with order book data (Example 2), mar ket maker quotes (Ex- ample 3) and pure transaction data (Example 4). In par ticular the corresponding volatility estimator with stochastic rounding is included in Figure 9. Other types of stochastic round- ing such as Y t j = round  exp( X t j + U t j )  are discussed at the end of Section 3.1. 10 3 On-Line Estimation of Spot V olatility W e now present on-line algorithms for the estimation of the spot volatility . Because all results also hold in the multivariate case with synchronous trading times we formulate this section for multiv ariate security pr ices. We are aw are of the fact that the main challenge in the multiv ar iate case are non-synchronous trading times. The presented results are, how e ver , the basis for future w ork on non-synchronous trading. W e therefore consider in this section the estimation of the cov ar iance matr ix Σ t j which giv es the volatilities of the individual efﬁcient log-pr ice processes X t = ( X t, 1 , . . . , X t,S ) 0 as well as their cross-volatilities . Thus for S = 1 we hav e Σ t j = σ 2 t j . The multivariate version of the nonlinear state-space model (3) and (4) is giv en by p  y t j   y t 1: j − 1 , exp( x t j )  ∝ 1 A t j  exp( x t j )  1 Y ( y t j ) a.s. , (9) p  x t j   x t j − 1  = N  x t j   x t j − 1 ; Σ t j  , (10) where X t = ( X t, 1 , . . . , X t,S ) 0 , Z t j ∼ N ( 0 , Σ t j ) , and the exp -function is applied component- wise. The set A t j usually is of the form A t j = A t j , 1 × · · · × A t j ,S . We assume that model assumption 1 holds f or all components. F or simplicity we assume as an initial condition that giv en Y t 1 ,s = y t 1 ,s the efﬁcient prices exp( X t 1 ,s ) are unif or mly distributed on A t 1 ,s . W e remar k that (9) and (10) constitute a slightly gener alized state-space model because the obser vations Y t j are not conditional independent of Y t 1: j − 1 giv en X t j as in standard state-space models. Howe ver , this is a standard extension which does not cause any difﬁculty f or estimation. Our objective is the estimation of the cov ariance matr ix Σ t j based on the obser ved pr ices y t 1: j . Because of the nonlinear market microstructure noise this is difﬁcult. It is well kno wn that cr ude estimators that ignore the noise lead to sev erely biased estimates (see, f or in- stance, V oev and Lunde 2007). The idea of our estimation procedure is to appro ximate the conditional distribution of the efﬁcient log-prices X t j giv en all obser ved transaction pr ices y t 1: j up to time t j by an efﬁcient par ticle ﬁlter . Based on this approximation a localized EM-type algorithm is used to constr uct an estimator of Σ t j . W e mention that, for such a state-space model, par ticle ﬁlters hav e been used bef ore (Andrieu and Doucet 2002). An alter native to par ticle ﬁlters would be to use MCMC meth- ods together with an EM algor ithm (Manrique and Shepard 1998). A related model with rounding noise is discussed by Hasbrouck (1999, 2004) who also used MCMC methods f or estimation. In comparison to the e xisting literature we provide a more general model f or mi- crostructure noise and f ocus on localized estimation. Owens and Steigerw ald (2006) ha ve used a Kalman ﬁlter in a linear microstr ucture noise model to derive volatility estimates 11 based on weighted obser vations . Their method can be modiﬁed for on-line estimation of spot v olatility . 3.1 An Efﬁcient P ar ticle Filter P ar ticle ﬁlters are sequential Monte Car lo methods (Doucet et al. 2001) that appro ximate the posterior distributions p ( x t 1: j | y t 1: j ) with clouds of par ticles { x i t 1: j , ω i t j } N i =1 . A par ticle consists of a sample x i t 1: j and an associated weight ω i t j . The par ticle approximation of the target distribution is giv en by p ( x t 1: j | y t 1: j ) ≈ N X i =1 ω i t j δ x i t 1: j ( x t 1: j ) , with δ being the Dirac delta function. A par ticle ﬁlter generates par ticles sequentially in time making use of the relation p ( x t 1: j | y t 1: j ) = p ( y t j , x t 1: j | y t 1: j − 1 ) p ( y t j | y t 1: j − 1 ) = p ( y t j | y t 1: j − 1 , x t j ) p ( x t j | x t j − 1 ) p ( y t j | y t 1: j − 1 ) p ( x t 1: j − 1 | y t 1: j − 1 ) (11) and a general sampling technique kno wn as impor tance sampling. Impor tance sampling is necessar y because direct sampling from (11) is not f easible. In standard state-space mod- els p ( y t j | y t 1: j − 1 , x t j ) fur ther simpliﬁes to p ( y t j | x t j ) . As a result of the violated conditional independence proper ty mentioned ear lier , this is not the case here. In each iteration of the par ticle ﬁlter samples are drawn from an impor tance sampling distribution called proposal. Subsequently , the samples are weighted such that they ap- pro ximate the target distribution. The choice of the proposal is cr ucial f or the efﬁciency of the ﬁlter . In our framew ork it is possible to sample from the proposal p ( x t j | y t 1: j , x t j − 1 ) which is the optimal proposal in the sense that it minimiz es the v ar iance of the impor tance sampling weights (Doucet et al. 2000). The algor ithm can be stated as f ollows: Assume that weighted par ticles { x i t 1: j − 1 , ω i t j − 1 } N i =1 appro ximating p ( x t 1: j − 1 | y t 1: j − 1 ) are giv en; then • F or i = 1 , . . . , N : – Sample from the optimal proposal: x i t j ∼ p ( x t j | y t 1: j , x i t j − 1 ) . – Compute impor tance weights ˘ ω i t j ∝ ω i t j − 1 p ( y t j | y t 1: j − 1 , x i t j ) p ( x i t j | x i t j − 1 ) p ( x i t j | y t 1: j , x i t j − 1 ) = ω i t j − 1 p ( y t j | y t 1: j − 1 , x i t j − 1 ) . • F or i = 1 , . . . , N : – Nor malize impor tance weights ω i t j = ˘ ω i t j / ( P N k =1 ˘ ω k t j ) . • Obtain par ticles { x i t 1: j , ω i t j } N i =1 which appro ximate p ( x t 1: j | y t 1: j ) . 12 It is well-known that this algorithm suff ers from weight degeneracy which means that after some iterations only f ew par ticles will hav e signiﬁcant weight. This issue can be resolved by introducing a resampling step that maps the par ticle system { x i t 1: j , ω i t j } N i =1 onto an equally weighted par ticle system { x i t 1: j , 1 / N } N i =1 . Because resampling is time-consuming, it is carried out only if the effectiv e sample size ESS  { ω i t j } N i =1  = 1 P N i =1 ( ω i t j ) 2 is below some threshold (K ong et al. 1994). Other resampling schemes are discussed in Douc et al. (2005). T o apply this par ticle ﬁlter to the state-space model given by (9) and (10) it is neces- sar y to specify the optimal proposal and the computation of the impor tance weights. The f ollowing result sho ws that both take a v er y simple form. Proposition 1. The optimal proposal is a tr uncated multivariate nor mal distr ibution given by p ( x t j | y t 1: j , x t j − 1 ) ∝ N ( x t j | x t j − 1 ; Σ t j )   log A t j with log A t j = log A t j , 1 × · · · × log A t j ,S and the impor tance weights can be computed through ˘ ω i t j ∝ ω i t j − 1 Z log A t j N ( x t j | x i t j − 1 ; Σ t j ) d x t j . (12) Proof: The proof is straightforw ard. Remark (r ounding with additive noise): Alter native stochastic models with rounding are Y t j = round  exp( X t j ) + U t j  or Y t j = round  exp( X t j + U t j )  (13) with i.i.d. Gaussian U t j . Let A t j = [ y t j − 0 . 5 , y t j + 0 . 5) . For e xample for the second model we consider the corresponding state space model with state variable ˜ X t j =  X t j , U t j  0 . Then the optimal proposal satisﬁes p ( ˜ x t j | y t 1: j , ˜ x t j − 1 ) ∝ p ( y t j | y t 1: j − 1 , ˜ x t j ) p ( ˜ x t j | ˜ x t j − 1 ) = p ( x t j | x t j − 1 ) p ( u t j ) 1 A t j  exp( x t j + u t j )  which can be used easily to sample the par ticles in the ﬁlter step . The impor tance weights are giv en by p ( y t j | y t 1: j − 1 , ˜ x i t j − 1 ) = p ( y t j | y t 1: j − 1 , x i t j − 1 ) = Z Z 1 log A t j  x t j + u t j ) N ( x t j | x i t j − 1 ; σ 2 t j ) N ( u t j | 0; σ 2 U ) dx t j du t j which are how e ver more difﬁcult to compute (compare Section 5). 13 3.2 A Sequential EM-T ype Algorithm In this section, w e discuss the estimation of Σ t j in the time-constant and time-v ar ying case. A stochastic EM algor ithm can be used to obtain the maximum likelihood estimator in the time-constant case Σ t j = Σ (Dempster et al. 1977). The EM algor ithm maximizes the likelihood p Σ ( y t 1: T ) by iterativ ely carr ying out an E-step and an M-step. In the E-step, the e xpectation Q (Σ | ˆ Σ ( m ) ) = E ˆ Σ ( m )  log p Σ ( X t 1: T , y t 1: T ) | y t 1: T  = T X j =1 E ˆ Σ ( m )  log p ( y t j | y t 1: j − 1 , X t j ) | y t 1: T  + E ˆ Σ ( m )  log p ( X t 1 ) | y t 1: T  + T X j =2 E ˆ Σ ( m )  log p Σ ( X t j | X t j − 1 ) | y t 1: T  (14) needs to be appro ximated, where ˆ Σ ( m ) is the current estimator . Note, it is sufﬁcient to consider the sum in (14) because the random variables log p ( y t j | y t 1: j − 1 , X t j ) and p ( X t 1 ) do not depend on Σ . In the M-step , a new parameter estimate ˆ Σ ( m +1) is obtained by maximizing Q (Σ | ˆ Σ ( m ) ) . Below we show how to modify this procedure tow ards an on-line estimator . If Σ t j is time-var ying some regular ization is needed. F or e xample ˆ Σ ( m +1) t j can be ob- tained by maximizing some localiz ed version of (14), e.g. Q t j (Σ | ˆ Σ ( m ) t 1: T ) = 1 T j − 2 X k = j − T 1 b K  k bT  E ˆ Σ ( m ) t 1: T  log p Σ ( X t j − k | X t j − k − 1 ) | y t 1: T  (15) with a kernel K ( · ) and a bandwidth b . If the kernel is an one sided e xponential ker nel this can be written in recursive f or m as Q t j (Σ | ˆ Σ t 1: T ) := { 1 − λ j } Q t j − 1 (Σ | ˆ Σ t 1: T ) + λ j E ˆ Σ t 1: T  log p Σ ( X t j | X t j − 1 ) | y t 1: T  (16) with Q t 2 (Σ | ˆ Σ t 1: T ) = E ˆ Σ t 1: T  log p Σ ( X t 2 | X t 1 ) | y t 1: T  and λ j = 1 bT . This procedure is not an on-line algor ithm because the conditional e xpectation in (16) depends on all obser vations. Theref ore, we replace the conditioning set of variables { y t 1: T } by { y t 1: j } , i.e. E ˆ Σ t 1: T  log p Σ ( X t j | X t j − 1 ) | y t 1: T  is replaced by E ˆ Σ t 1: j − 1  log p Σ ( X t j | X t j − 1 ) | y t 1: j  (we need at this point an estimate for Σ t j to apply the par ticle ﬁlter - see the comment at the end of this section). This leads to the on-line algor ithm Q t j (Σ | ˆ Σ t 1: j − 1 ) := { 1 − λ j } Q t j − 1 (Σ | ˆ Σ t 1: j − 2 ) + λ j E ˆ Σ t 1: j − 1  log p Σ ( X t j | X t j − 1 ) | y t 1: j  (17) 14 with Q t 2 (Σ | ˆ Σ t 1 ) = E ˆ Σ t 1  log p Σ ( X t 2 | X t 1 ) | y t 1:2  . Q t j (Σ | ˆ Σ ( m ) t 1: j − 1 ) can be computed with the ﬁltering par ticles { x i t j − 1: j , ω i t j } N i =1 from the par ticle ﬁlter leading to the approximation E ˆ Σ t 1: j − 1  log p Σ ( X t j | X t j − 1 ) | y t 1: j  ≈ 1 2 N X i =1 ω i t j h S log 2 π + log | Σ | + tr n Σ − 1  x i t j − x i t j − 1  x i t j − x i t j − 1  0 oi . (18) The resulting estimate f or Σ can be obtained from the on-line recursion ˆ Σ t j = { 1 − λ j } ˆ Σ t j − 1 + λ j ˘ Σ t j ( ω t j ) with ˆ Σ t 2 = ˘ Σ t 2 ( ω t 2 ) (19) where ˘ Σ t j ( ω t j ) := N X i =1 ω i t j  x i t j − x i t j − 1  x i t j − x i t j − 1  0 . (20) It can be written in closed form as ˆ Σ t j = j − 3 X k =0 h k − 1 Y ` =0 (1 − λ j − ` ) i λ j − k ˘ Σ t j − k ( ω t j − k ) + h j − 3 Y ` =0 (1 − λ j − ` ) i ˘ Σ t 2 ( ω t 2 ) . (21) The new parameter estimate ˆ Σ t j is used afterwards to calculate the ne xt ﬁltering par ticles and their weights { x i t j +1 , ω i t j +1 } N i =1 f ollow ed by the calculation of ˆ Σ t j +1 via another applica- tion of (19) etc. In contrast to the standard EM algorithm, our sequential variant updates the co variance estimate (which in turn is used in the ne xt step of the par ticle ﬁlter) in e very time step. In the “new E-step”, Q t j (Σ | ˆ Σ t 1: j − 1 ) is appro ximated through (17) and (18) using the par ticles { x i t j − 1: j , ω i t j } N i =1 which are generated as described in Section 3.2. In the “new M-step”, the maximization of ˆ Q t j (Σ | ˆ Σ t 1: j − 1 ) giv es the on-line estimator deﬁned in (19). Note that ˘ Σ t j ( ω t j ) is not an appro ximation of the conditional variance V ar  X t j − X t j − 1   y t 1: j  but an approximation of E  ( X t j − X t j − 1 ) 2   y t 1: j   both are different because E ( X t j − X t j − 1 | y t 1: j ) 6 = 0  . As a consequence of E  E  ( X t j − X t j − 1 ) 2   Y t 1: j  = E  X t j − X t j − 1  2 = V ar  X t j − X t j − 1  , ˆ Σ t j is a descent estimator of Σ t j . 2) Time-constant cov ar iance matr ices: If Σ t j is time-constant the ﬁrst idea is to apply the algorithm (19) with the “constant parameter setting” λ j = 1 / ( j − 1) . This corresponds to the global av erage in (14) where all obser vations hav e equal weights. Howe ver , the situation is different from the classical case in that the “old” estimate ˆ Σ t j − 1 has in addition some bias due to the use of par ticles generated with an estimated cov ar iance instead of the true one. Theref ore we need to put less weight on the ﬁrst ter m in (19). The situation has been carefully inv estigated f or a similar algor ithm in the i.i.d.-case by Cappé 15 and Moulines (2009). Follo wing their recommendation we use in the time-constant case the on-line algorithm ˆ Σ t j = { 1 − ( j − 1) − γ } ˆ Σ t j − 1 + ( j − 1) − γ ˘ Σ t j ( ω t j ) (22) with γ ∈ ( 1 2 , 1) . Cappé and Moulines prov e consistency and asymptotic nor mality of their estimate f or weights λ j := λ 0 j − γ and γ ∈ ( 1 2 , 1) and also for γ = 1 under some restrictions on λ 0 (Theorem 2). Fur thermore, in their simulations it tur ned out that a value of γ = 0 . 6 and λ 0 = 1 has lead to good estimates. F rom our experience we pref er the choice γ = 0 . 9 and λ 0 = 1 (see Figure 6). Even-Dar and Mansour (2003) obtained an optimal value of about 0 . 85 in a related estimation problem. 3) Time-varying cov ariance matr ices: If Σ t j is time-varying we use the algor ithm (19) with time-constant λ j ≡ λ instead of a decaying λ j . The choice of λ depends on the smoothness of the tr ue volatility cur ve . T o adapt locally to this smoothness one may either choose a time varying λ j anyho w (in some wa y dependent on the data) or use the SAGES procedure (see Section 5 below) where the algorithm is r un simultaneously f or L different v alues of λ and the optimal estimate is deter mined in each step as a conv ex combination of these estimates. 3.3 Combining the P ar ticle Filter and the Sequential EM-T ype Algorithm T o summar ize the estimation method f or the transaction time volatility and the ﬁlter ing distribution of the efﬁcient price consists of 3 components: (i) The state-space model with a new mar ket microstructure noise model and the trans- action time model f or the efﬁcient log-price ((9) and (10)); (ii) A par ticle ﬁlter which sequentially approximates the ﬁlter ing distributions of the efﬁ- cient log-prices given the observed transaction prices (Section 3.1); (iii) The on-line EM-type estimator ˆ Σ t j giv en by (19) or (22) which estimates Σ t j based on the par ticle approximation of the ﬁltering distribution (Section 3.2). A ke y aspect of the method is the back and f or th between the par ticle ﬁlter and the EM- type estimator . T o propagate the par ticles from time t j to time t j +1 the par ticle ﬁlter requires an estimator of Σ t j +1 denoted by ˆ Σ pf t j +1 . A simple solution is to use ˆ Σ pf t j +1 := ˆ Σ t j from the pre vious EM-type step . The EM-type estimator then in tur n updates the covariance estimate based on the new particles for time t j +1 generated b y the par ticle ﬁlter . Estimation results of our estimator ˆ Σ t j and a benchmark estimator (see Section 6.1) are presented in Figure 4. Details and a discussion are given in Section 6.1. 16 0 5000 10000 15000 0.00007 0.00010 0.00013 0 5000 10000 15000 0.00010 0.00020 0.00030 Figure 4: Estimation of two time-v ar ying volatility cur ves given by the dashed lines based on simulated data. The plots show the estimator ˆ Σ t j (blac k line) and a benchmar k estimator (light gra y line). The second plot also shows an adaptiv e version of the ﬁrst estimator based on SA GES ˆ Σ S t j (red (dar k gra y) line) described below . F or details see Section 6.1. In the ﬁrst plot this estimator is omitted since it did not lead to additional improv ements. 4 A Decomposition of Cloc k Time V olatility (fr om transaction time to cloc k time) The Basic Relationship W e deﬁne the spot v olatility in clock time b y Σ c ( t ) := lim ∆ t → 0 V ar  X ( t + ∆ t ) − X ( t )  ∆ t . F or e xample in the model d X ( t ) = ˜ Γ( t ) d W t with a Brownian motion W t (multiv ar iate with independent components) we hav e V ar  X ( t + ∆ t ) − X ( t )  = R t +∆ t t ˜ Γ( s ) ˜ Γ( s ) 0 ds and there- f ore Σ c ( t ) = ˜ Γ( t ) ˜ Γ( t ) 0 . At the end of this section we indicate how estimation can be per- f or med directly in this model with a par ticle ﬁlter . In this paper we merely advocate the model X t j = X t j − 1 + Γ( t j ) Z j with Z j iid ∼ N (0 , I ) (see (10)) which can be wr itten in the form d X ( t ) = Γ( t ) d W N ( t ) with N ( t ) = P j I [ t j , ∞ ) ( t ) . If we now assume that the obser vation times t j are realizations of a stochastic point process 17 with intensity function λ I ( t ) (transaction rate) and N ( · ) is independent of W ( · ) (i.e. the dependence only enters via Γ( t j ) ) then V ar  X ( t + ∆ t ) − X ( t )  = Z t +∆ t t Σ( s ) λ I ( s ) ds with Σ( t ) := Γ( t )Γ( t ) 0 and theref ore Σ c ( t ) = Σ( t ) λ I ( t ) . (23) Heuristically this reads as “v ariance per time unit = v ariance per transaction × expected number of transactions per time unit”. This is a decomposition of continuous time volatility which provides a deeper understanding of volatility . An e xample is giv en below . W e now use this relation f or the estimation of Σ c ( t ) . W e mention that both cur ves can be identiﬁed if in addition to the X t j also the random times t j are obser ved (which is fulﬁlled in our setting). A proof of consistency of estimates of λ I ( t ) (e.g. of ˆ λ I ( t j ) from below) would require an in-ﬁll asymptotic setting where N ( t ) has the intensity function λ I  t T  . Models with random time changes are common in ﬁnance (cf. Clark 1973; Ané and Geman 2000; Plerou et al. 2001; Howison and Lamper 2001; Gabaix et al. 2003). The process of random times (here the t j ) is often called directing process and the process X ( t ) is called subordinated to the directing process. Another example is where N ( t ) is replaced by the accum ulated traded v olume. A New Estimator f or the Clock Time Spot V olatility W e now use the relation (23) f or the estimation of Σ c ( t ) . An obvious estimate of the intensity would be ˆ ˆ λ I ( t j ) := |{ ` : t j − ∆ t < t ` ≤ t j }| / ∆ t with some ∆ t . Here we advocate a diff erent estimation method of the intensity function λ I ( t ) which is closer related to our on-line scheme, namely the estimation of λ I ( t ) by the inv erse of the av eraged duration times , leading to the alter native estimate ˆ Σ c alt ( t j ) := ˆ Σ t j ˆ λ I ( t j ) with ˆ λ I ( t j ) := 1 ¯ δ j c bc (24) and ˆ Σ t j as in (19) where ¯ δ j is deﬁned by the recursion ¯ δ j = (1 − λ j ) ¯ δ j − 1 + λ j  t j − t j − 1  with ¯ δ 2 = t 2 − t 1 . (25) (the notation ˆ Σ c alt means “alter native” estimate in compar ison to the more classical estimate deﬁned below). c bc is a bias correction due to the fact that E 1 X 6 = 1 E X . A second order T aylor- e xpansion leads to E 1 X ≈ 1 E X  1 + v ar( X ) ( E X ) 2  and we theref ore use the abov e estimate with c bc = (1 + ˆ d ) − 1 where ˆ d is an estimate of v ar( ¯ δ ) ( E ¯ δ ) 2 . 18 W e mention that the intensity λ I ( t ) of the point process often changes considerab ly over time thus requir ing a large value of λ while Σ( t ) usually is more smooth. For that reason we use diff erent step sizes λ f or the estimators ˆ Σ t j and ¯ δ j (cf . Section 6.2). Estimation of the Clock Time Spot V olatility without Time Chang e W e now deﬁne the estimator of the clock time volatility in the classical model d X ( t ) = ˜ Γ( t ) d W t with the microstr ucture noise model from abov e. If we replace X t j by X ( t j ) we obtain almost the same state space model as in (9) and (10) but with a modiﬁed variance of the transition distribution which is no w giv en by p  x t j   x t j − 1  = N  x t j   x t j − 1 ; | t j − t j − 1 | Σ c ( t j )  . (26) This is the only change needed in the state-space model (9), (10). As an estimate ˆ Σ c t j we can use the on-line estimates (22) and (19) but no w with the update matrix ˘ Σ t j ( ω t j ) replaced by ˘ Σ c t j ( ω c t j ) := N X i =1 ω ci t j  x ci t j − x ci t j − 1  x ci t j − x ci t j − 1  0 | t j − t j − 1 | (27) based on the modiﬁed ﬁltering par ticles { x ci t j − 1: j , ω ci t j } N i =1 . W e conclude this section with a heur istics on the relation between the two estimates: Suppose the same stepsize λ were used f or the calculation of ˆ Σ t j and ¯ δ j . We then had with (21) ˆ Σ c alt ( t j ) = ˆ Σ t j ¯ δ j c bc = λ P j − 3 k =0 (1 − λ ) k ˘ Σ t j − k ( ω t j − k ) + (1 − λ ) j − 2 ˘ Σ t 2 ( ω t 2 ) λ P j − 3 k =0 (1 − λ ) k  t j − k − t j − k − 1  + (1 − λ ) j − 2  t 2 − t 1  c bc . Since ˘ Σ t ` ( ω t ` ) ≈  t ` − t ` − 1  ˘ Σ c t ` ( ω c t ` ) the estimator is of the f or m ˆ Σ c alt ( t j ) ≈ P j − 2 k =0 w k ˘ Σ c t j − k ( ω c t j − k ) P j − 2 k =0 w k c bc , that is ˆ Σ c alt ( t j ) is a weighted av erage of the ˘ Σ c t ` ( ω c t ` ) and therefore a similar estimator as in the clock time model. The “ ≈ ” signs stem from the fact that in ˘ Σ t ` ( ω t ` ) and ˘ Σ c t ` ( ω c t ` ) diff erent par ticle ﬁlters for diff erent models are used. This eff ect usually cannot be neglected. 19 −25 −20 −15 −1 0 10:00 11:00 12:00 13:00 14:00 15:00 16:00 −20 −18 10:00 11:00 12:00 13:00 14:00 15:00 16:00 −20 −18 10:00 11:00 12:00 13:00 14:00 15:00 16:00 −2 −1 0 1 10:00 11:00 12:00 13:00 14:00 15:00 16:00 Figure 5: Real data example: Estimation of time-v ar ying spot volatility in clock time based on the transactions of symbol C f or the 3rd September 2007. The ﬁrst and second plot give the log v olatility estimators ˆ Σ cS t j (blac k line) and ˆ Σ cS alt ( t j ) (green (gray) line) where the second plot uses the same scaling as the follo wing plots; the green (gra y) estimator is the sum of the log volatility estimator in transaction time ˆ Σ S t j (third plot) and the log trading intensity ˆ λ I ( t j ) (last plot). The superscr ipt ‘S’ denotes the SAGES - v ersion - see Section 5. Decomposing Clock Time V olatility Figure 5 shows an e xample based on real data which is discussed in detail in Sec- tion 6.2. W e hav e used a log plot in order to demonstrate the inﬂuence of the two cur v es in the decomposition (24) which now becomes log ˆ Σ cS alt ( t j ) = log ˆ Σ S t j + log ˆ λ I ( t j ) where the superscr ipt ‘S’ denotes the adaptive SA GES - version of the estimators de- scribed below . The ﬁrst and second plot give the log of the volatility estimators ˆ Σ cS t j (blac k line) and ˆ Σ cS alt ( t j ) (green (gra y) line) where the second plot uses the same scaling as the 20 f ollowing plots; the green (gr a y) estimator is the sum of the log volatility estimator in trans- action time ˆ Σ S t j (third plot) and the log trading intensity ˆ λ I ( t j ) (last plot). The decomposition of clock time volatility into transaction time volatility and trading in- tensity (i.e. the additiv e decomposition of the green (gra y) cur ve of Figure 5 into the two low er plots) rev eals that the typical ﬂuctuation of clock time volatility is mainly due to the ﬂuctuation of the trading intensity while the transaction time v olatility in this example is al- most constant after 11:00. The typical U-shape of clock time v olatility is visible - but it is more a patter n of the trading intensity and less of the transaction time volatility . More pre- cisely , the decrease of volatility between 9:30 and 12:30 is a feature of both cur ves while the increase of v olatility between 12:30 and 16:00 is only a f eature of trading intensity . It is wor th mentioning that the b lack and the g reen (g ra y) estimators in Figure 5 coincide in magnitude (which w as not clear bef orehand since diff erent models and diff erent par ticle ﬁlters are used). Figure 10 below compares the estimates f or a small time period. 5 Modiﬁcations, Adaptation, and Implementation Using returns of lag k There e xist some objections against the use of ultra-high frequency data at the ﬁnest le vel av ailable . Here we show how the method can be used on a coarser scale, together with a f ew comments on the situation. It is common to use retur ns of lag k instead of lag 1 , the main reason being that mi- crostructure noise is smaller in the av eraged data. Our effor ts in this paper were to con- struct a better microstr ucture noise model and to remov e the microstr ucture noise b y a par ticle ﬁlter pr ior to the calculation of the volatility estimate. This allows us to inv estigate ultra-high frequency data at a ﬁner lev el. An application is the pricing of high frequency options which can be traded until a f ew seconds to maturity . On the other hand it is likely that our microstr ucture noise model still is not perfectly speciﬁed. For that reason one ma y still want to use returns of lag k instead of lag 1 . In our setting this can be accomplished by using ˘ Σ t j ( ω t j ) := 1 k N X i =1 ω i t j  x i t j − x i t j − k  x i t j − x i t j − k  0 , ˘ Σ c t j ( ω c t j ) := N X i =1 ω ci t j  x ci t j − x ci t j − k  x ci t j − x ci t j − k  0 | t j − t j − k | in (20) and (27) (with the recursions (22) and (19) as before). We think that this is in par ticular impor tant for the continuous time estimator ˘ Σ c t j ( ω c t j ) which explodes for v er y small values of | t j − t j − k | - this happens more often for k = 1 . As a consequence of the 21 larger lag also a larger stepsize is necessar y - also in the SAGES procedure introduced below . Note, that the estimate is still updated with each new obser vation. Fur thermore, the con- ditional distribution of the state is calculated with a new obser vation. When implementing the abov e estimate, special care is needed if a resampling step is carried out between t j and t j − k . In a correctly speciﬁed model (where in par ticular microstr ucture noise is speciﬁed cor- rectly) the variance and the mean squared error get smallest for lag 1. On the other hand the bias due to a misspeciﬁed microstructure noise model gets smaller with larger lag. In principle one ma y test the quality of the microstructure noise model by compar ing the le v el of the estimates f or diff erent lags. Howe ver , this topic is bey ond the scope of this paper . Impro ving estimates in the presence of diurnal patterns Diur nal patter ns like the strong decrease of the volatility at the beginning of the da y in Figure 9 create problems in that an unadjusted look-back local estimator ov erestimates the target. Similar ly , when the volatility is rising, a look-back local estimator underestimates the target. The SAGES procedure below reduces the eff ect but the problem in pr inciple sta ys the same. In the present setting the situation is e ven more critical since those poor v olatility estimates are used afterwards in the par ticle ﬁlter . A common advice is to use a batch of days to estimate the mean diur nal v olatility patter n ov er small bloc ks of time, scale out the patter n yielding diur nally-adjusted data, estimate the local object of interest on the adjusted data, and then rescale back to account for the diur nal patter n. Such a modiﬁcation can also be applied with the procedure of this paper . The ke y difference to other situations is that our volati lity estimator consists of the prod- uct of two cur ves corresponding to the decomposition (23). Both cur ves can be identiﬁed and both cur ves can be adjusted for diur nal patter ns. As an e xample we argue in Sec- tion 6.2 that (at least f or the data set analyzed there) the well known U-shape eff ect at lunchtime is a diur nal patter n merely of the trading intensity λ I ( t ) and not of the trading time v olatility Σ( t ) . When rescaling the estimate of Σ( t ) special care is needed in order not to affect the microstructure noise model: Suppose the mean diurnal v olatility pattern of the s-th compo- nent is σ 2 0 s ( t ) . Let V 0 ( t ) := diag { σ 01 ( t ) , . . . , σ 0 S ( t ) } . Instead of rescaling the obser v ations we use the rescaled (unobser ved) state-variab le ˜ X t j := V 0 ( t j ) − 1 X t j . Provided that the diff erence between σ 0 s ( t j ) and σ 0 s ( t j − 1 ) is negligible we then can use instead of (9) and 22 (10) the modiﬁed state space model p  y t j   y t 1: j − 1 , exp( ˜ x t j )  ∝ 1 A t j  exp( V 0 ( t j ) ˜ x t j )  1 Y ( y t j ) (28) = 1 V 0 ( t j ) − 1 log A t j  ˜ x t j  1 Y ( y t j ) a.s. , (29) p  ˜ x t j   ˜ x t j − 1  = N  ˜ x t j   ˜ x t j − 1 ; V 0 ( t j ) − 1 Σ t j V 0 ( t j ) − 1  (30) f or estimation. This means we can r un the whole procedure in exactly the same wa y where log A t j in the proposal distribution and in the impor tance weights (Proposition 1) is replaced by V 0 ( t j ) − 1 log A t j . The resulting volatility estimator ˆ Σ t j then is an estimator of V 0 ( t j ) − 1 Σ t j V 0 ( t j ) − 1 , i.e. we ﬁnally use V 0 ( t j ) ˆ Σ t j V 0 ( t j ) as an estimator of Σ t j . Rescaling the estimator of λ I ( t ) in (23) is much simpler : Suppose the mean diur nal intensity patter n is λ 0 ( t ) . A natural recursive estimator then is ˆ λ I ( t j ) := λ 0 ( t j ) 1 ¯ δ j c bc where ¯ δ j is deﬁned by the recursion ¯ δ j = (1 − λ j ) ¯ δ j − 1 + λ j  t j − t j − 1  λ 0 ( t j ) and c bc is the corresponding bias correction. Step size selection In the time-constant case w e use the decreasing step size λ j = ( j − 1) − 0 . 9 as proposed in Section 3.2. This choice is empir ically justiﬁed (see Figure 6). The step size in the time-v ar ying case is data dependent and can be obtained through the f ollowing procedure: The mean squared error of ˆ Σ t j is minimized with respect to λ by the cross-v alidation type criter ion crit( λ ) := T − 1 X j =2  ˆ Σ t j − ˘ Σ t j +1 ( ω t j +1 )  2 . (31) This cannot be done on-line. In practice, one will use in an on-line setting a λ from past e xperience with similar data sets. The expectation of the abo ve criterion is approximately T − 1 X j =2 h  E ˆ Σ t j − Σ t j  2 + V ar  ˆ Σ t j  + V ar  ˘ Σ t j +1 ( ω t j +1 )  i . Because the last term does not depend on λ we correctly minimize the approximate mean squared error . 23 Adaptive step size selection using SA GES T o adaptiv ely select non-constant step sizes λ j in the time-v ar ying case we propose to use spatially aggregated exponential smoothing (SAGES) dev eloped by Chen and Spokoin y (2009). In our setting the SA GES method wor ks as follo ws. The basic idea is to run L volatility estimators ˆ Σ ` t j in parallel with diff erent step sizes λ 1 > λ 2 > . . . > λ L . The resulting SAGES estimate ˆ Σ S t j is then a conv ex combination of these estimators. In practice we hav e, say , L = 15 which implies that the computational offset is minimal. In fact, only the recursion (19) needs to be computed L times with diff erent step sizes . F or ev er y time step j the SAGES estimate ˆ Σ S t j is obtained from the estimators ˆ Σ ` t j , ` = 1 , . . . , L , through the f ollowing recursion. (i) Set ˆ Σ S, 1 t j = ˆ Σ 1 t j (ii) F or ` = 2 , . . . , L : Compute ˆ Σ S,` t j = γ ` ˆ Σ ` t j + 1 − γ ` ˆ Σ S,` − 1 t j ! − 1 , where γ ` = K  1 κ ` − 1 λ ` K ( ˆ Σ ` t j , ˆ Σ S,` − 1 t j )  with kernels K ( u ) = { 1 − ( u − 1 / 6) + } + and K (Σ , ˜ Σ) = − 0 . 5  log(Σ / ˜ Σ) + 1 − Σ / ˜ Σ  . (iii) Obtain the SA GES estimate ˆ Σ S t j = ˆ Σ S,L t j . Note that this method can be applied completely on-line. The parameters κ 1 , κ 2 , . . . , κ ` − 1 are critical v alues (independent of the time step j ) which can be calculated beforehand through a Monte Carlo simulation. Note that SAGES is a univariate method. For a more de- tailed descr iption and a theoretical analysis of the SAGES method see Chen and Spokoiny (2009). Implementing the algorithm The par ticle ﬁlter uses the follo wing steps f or j = 2 , . . . , T (see Proposition 1) • F or i = 1 , . . . , N : – Generate x i t j from the optimal proposal N ( x t j | x i t j − 1 ; ˆ Σ pf t j )   log A t j with ˆ Σ pf t j = ˆ Σ t j − 1 . – Compute the impor tance weight ˘ ω i t j as in (12). If S = 1 this is given b y ˘ ω i t j ∝ ω i t j − 1 n Φ  sup log A t j | x i t j − 1 ; ˆ Σ pf t j  − Φ  inf log A t j | x i t j − 1 ; ˆ Σ pf t j  o . 24 • F or i = 1 , . . . , N : Normalize the impor tance weight ω i t j = ˘ ω i t j / P N k =1 ˘ ω k t j . • If the eff ective sample size ESS ( { ω i t j } N i =1 ) < cN (with sa y c = 0 . 2 ), then resample the par ticles using, for instance , the residual resampling scheme (Douc et al. 2005). • Update the estimator ˆ Σ t j − 1 according to (22) or (19). Ov erall the algor ithm is easy to implement in a fe w lines. It is computationally efﬁcient because the complexity of one iteration is linear in the number of par ticles N . In addition resampling is required only rarely because the optimal proposal is used. In our applications resampling was carried out only about e very 15th iteration using a threshold f or the effectiv e sample size of c = 0 . 2 . As a result of the efﬁciency of our par ticle ﬁlter , the number of par ticles N is not a cr itical quantity . T ypically , about 500 par ticles sufﬁce to achiev e a sufﬁcient precision (see Figure 6). Note that in the multivariate case the sampling from the optimal proposal and the e val- uation of the impor tance weights is nontrivial. Howe ver , both the sampling from and the e v aluation of a tr uncated nor mal distribution are standard problems in statistics which hav e been discussed e xtensively in the literature . Relev ant references for the sampling problem are Gewek e (1991) and Robert (1995). More recent approaches based on Gibbs sampling are described by K otecha and Djuric (1999) and Rodriguez-Y am et al. (2004). Also f or the numerical appro ximation of multivariate (rectangular) nor mal probabilities se ver al efﬁcient methods hav e been proposed for instance b y Genz (1992, 2004) and Joe (1995). Initialization Our e xperience from many data sets is that the algorithm stabilizes quickly provided that reasonable star ting values are used – e.g. ˆ Σ t 2 ma y be chosen as yesterda y’ s star ting v olatility or yesterda y’ s ending volatility , after adjustment f or the magnitude of the o vernight close-to-open jump . The par ticle ﬁlter is star ted by simulating the x i t 1 ,s such that the exp( x i t 1 ,s ) are unif or mly distributed on A t 1 ,s . In order to e xclude the eff ect of star ting values we hav e used in the simulations (except from Figure 6) the tr ue matr ix Σ t 2 as the star ting v alue (i.e. ˆ Σ pf t 2 = ˆ Σ t 2 = Σ t 2 ). 6 Sim ulations and Applications 6.1 Results for Sim ulated Data Estimation of time-constant spot v olatility W e ﬁrst consider the estimation of time-constant spot volatility . An efﬁcient log-pr ice pro- cess is simulated from t 1 to t 5000 with squared volatility equal to Σ t = 0 . 0001 2 . The initial 25 N=500 gam=0.7 N=500 gam=0.8 N=500 gam=0.9 N=500 gam=1.0 0.000090 0.000100 N=50 gam=0.9 N=100 gam=0.9 N=200 gam=0.9 N=500 gam=0.9 0.000096 0.000102 N=500 gam=0.9 Benchmark Optimal gam=0.9 Optimal gam=1.0 0.000094 0.000100 0.000106 Figure 6: Box plots for the estimation of a time-constant volatility based on simulated data (5,000 transac- tions). The estimator (22) is applied with diff erent numbers of par ticles N and different γ and compared to the benchmark estimator and the optimal estimator (not availab le in practice). The box plots are based on 500 independent runs. 26 efﬁcient pr ice exp( X t 1 ) is sampled from a uniform distr ibution on [50 − 0 . 005 , 50 + 0 . 005) . The transaction prices are obtained by rounding the efﬁcient prices to the nearest cent (see Example 1 (i) in Section 2). The algor ithm f or time-constant spot v olatility estimation (22) is applied with diff erent numbers of par ticles N and diff erent values of γ . The initial v alue ˆ Σ pf t 2 = ˆ Σ t 2 is dra wn from a uniform distribution on (0 . 00006 2 , 0 . 00014 2 ) which is quite uninf or mative . F or comparison the results of two benchmark algorithms are also repor ted. The ﬁrst benchmark method (“Benchmark” in Figure 6) is a recursiv e estimator with a sim- pler microstr ucture noise correction. It is related to the method in Zumbach et al. (2002) and it is based on the market microstr ucture model log Y t j = X t j + U t j , where the noise v ariables U t j are i.i.d. with V ar U t j = η 2 . The recursive estimator is giv en by ˆ Σ B t j :=  1 − 1 j − 1  ˆ Σ B t j − 1 + max { 0 , 2 ˆ η 2 t j − 1 }  + 1 j − 1 (log y t j − log y t j − 1 ) 2 − max { 0 , 2 ˆ η 2 t j } (32) where ˆ η 2 t j := { 1 − 1 j − 2 } ˆ η 2 t j − 1 − 1 j − 2  log y t j − log y t j − 1  log y t j − 1 − log y t j − 2  (here 1 j − 2 is used instead of 1 j − 1 because the algorithm star ts one time point later). The ter m max { 0 , 2 ˆ η 2 t j } corrects f or the market microstructure noise. This follo ws from the fact that Cov  log Y t j − log Y t j − 1 , log Y t j − 1 − log Y t j − 2  = − η 2 . The second benchmark method is, in some sense, the optimal estimator (“Optimal” in Figure 6). It is unav ailable in practice because it uses the latent efﬁcient log-prices. It is computed analogous to (22) but instead of the par ticles it emplo ys the efﬁcient log-pr ices leading to ˆ Σ Opt t j = { 1 − ( j − 1) − γ } ˆ Σ Opt t j − 1 + ( j − 1) − γ ( x t j − x t j − 1 ) 2 . The simulation results are given in ter ms of box plots which are obtained by 500 indepen- dent r uns (Figure 6). The box plots suggest that our volatility estimator is asymptotically unbiased and that γ = 0 . 9 is a reasonable value . We can also conclude that about 500 par ticles are sufﬁcient which makes our algorithm computationally efﬁcient and suitab le f or real-time applications. In addition, it can be obser ved that the benchmar k estimator has a larger v ariance than our estimator . Estimation of time-varying spot v olatility W e now compare our estimator f or time-varying spot volatility ˆ Σ t j deﬁned in (19) with a benchmark estimator . The efﬁcient log-pr ices are generated with respect to the time- v ar ying volatility given by the gra y dashed lines in Figure 4. The ﬁrst case (upper plot) is more challenging while the second case (lo wer plot) is more realistic f or a volatility cur ve in transaction time - see the real data example in Figure 9. In both cases we use for the 27 initial pr ice exp( X t 1 ) ∼ U [50 − 0 . 005 , 50 + 0 . 005) . Again transaction pr ices (obser vations) are obtained by rounding the efﬁcient prices to the nearest cent. 15,000 transactions are generated which is typical for one trading da y of a liquid stock. The par ticle ﬁlter is ap- plied with N = 500 par ticles. Our estimator ˆ Σ t j uses the constant step size λ obtained by minimizing (31). Analogous to (32) we consider the benchmark estimator given b y ˆ Σ B t j := { 1 − λ }  ˆ Σ B t j − 1 + max { 0 , 2 ˆ η 2 t j − 1 }  + λ  log y t j − log y t j − 1  2 − max { 0 , 2 ˆ η 2 t j } (33) with ˆ η 2 t j := { 1 − 1 j − 2 } ˆ η 2 t j − 1 − 1 j − 2  log y t j − log y t j − 1  log y t j − 1 − log y t j − 2  . λ is obtained by minimizing the criter ion T − 1 X j =2  ˆ Σ B t j + max { 0 , 2 ˆ η 2 t j } − (log y t j +2 − log y t j +1 ) 2  2 (34)  the ter ms ˆ Σ B t j + max { 0 , 2 ˆ η 2 t j } and (log y t j +2 − log y t j +1 ) 2 are independent in the additive microstructure noise model log Y t j = X t j + U t j with U t j i.i.d. - thus by using (log y t j +2 − log y t j +1 ) 2 (34) becomes a decent estimate of the mean squared error (plus a ter m con- stant in λ )  . F or ˆ η 2 t j we use the step sizes 1 j − 2 because η 2 t should be close to a constant function. All estimators use the tr ue v olatility as star ting value. T ypical outcomes of the estimators are given in Figure 4. Note that volatility is plotted (instead of squared volatility). In the second case (lower plot) a constant step size is clearly suboptimal. Theref ore we also computed our estimat or combined with the SAGES method f or adaptive step size selection ˆ Σ S t j as described in Section 5. ˆ Σ S t j is calculated using L = 15 step sizes ranging from 0.05 to 0.00005 (equally spaced). In the ﬁrst case (upper plot) the estimator ˆ Σ S t j didn’t giv e better results than the estimator ˆ Σ t j and is theref ore omitted. We also tr ied to use the SA GES method f or the benchmark estimator . This gav e sur prisingly bad results which are not repor ted here. Because the true Σ( t j ) is known we can compute the mean squared error Σ T − 1 j =2  ˆ Σ( t j ) − Σ( t j )  2 f or the estimators which gives 1 . 21 × 10 − 18 and 1 . 34 × 10 − 18 f or ˆ Σ t j and ˆ Σ B t j , re- spectiv ely , f or the upper plot in Figure 4. F or the estimators ˆ Σ t j , ˆ Σ S t j , and ˆ Σ B t j in the lower plot we obtain 8 . 59 × 10 − 19 , 7 . 52 × 10 − 19 , and 2 . 55 × 10 − 18 . In both plots, our estimators signiﬁcantly outperf or ms the benchmark estimator . The general impression from Figure 4 is that the estimates are a bit undersmoothed. As f or nonparametric path-wise estimation of local volatility the noise is to be expected if the method has a low bias, and additional smoothing will reduce variance at the expense of increased bias. We mention that additional variability comes in from the par ticle ﬁlter where the estimated cov ar iance matrix is used instead of the tr ue one. 28 0 5000 10000 15000 0.00010 0.00020 0.00030 Figure 7: Estimation of the second volatility curve from Figure 4 in case of a pr ice-jump. The plot shows the estimator ˆ Σ t j (blac k line), the adaptive version with SAGES ˆ Σ S t j (red (dark gra y) line), and the benchmark estimator (light gra y line). 4996 4998 5000 5002 5004 5006 50.28 50.32 50.36 50.40 Figure 8: Estimated ﬁltering distribu- tions about the pr ice-jump f or the mar- ket microstructure noise model with de- terministic rounding. Inﬂuence of jumps In par ticular for coarser sampling inter vals there is strong evidence that stock pr ice lev- els exhibit jumps - e.g. so-called rare compound P oisson jumps. T odorov and T auchen (2011) analyze the high-frequency movements in stock market volatility using data of the VIX volatility index sampled at a 5 minute rate and ev en conclude that volatility should be modeled b y a pure jump process with jumps of inﬁnite v ariation. Foster and Nelson (1996) ackno wledge the problems that jumps ma y cause f or local volatility estimation. The model w e hav e given in this paper is a model at a ﬁner time-scale based on volatility in transaction time and trading intensity . The volatility at a larger inter val of say 4 t = 5 minutes would be given by R t +∆ t t Σ( s ) λ I ( s ) ds and it is an interesting question whether par t of the jumps on a larger scale can solely be explained by an increase of the trading intensity on that inter val. Nev er theless jumps may also occur in Σ( · ) and λ I ( · ) - although their occurrence seems to be less frequent. A formal study on the str ucture of the jumps must be def erred to future work. T o inv estigate the inﬂuence of jumps we hav e taken in Figure 7 and 8 the simulated data from the second plot of Figure 4 and added a jump of 8 cents at time 5,000 (i.e. the retur ns show one “outlier” at time 5,000). The plot shows that the volatility estimate and the par ticle ﬁlter quickly recov er after the jump . Fur ther more we can see that the SAGES estimator recov ers a bit better which is due to the adaptive stepsiz e selection. 29 6.2 Results for Real Data The data and data speciﬁc modiﬁcations T o demonstrate the method we hav e used stock data from the T AQ data base. T r ansactions and mar ket maker quotes of the symbol C (Citigroup) for the 3rd September 2007 were e xtracted from this data base . Pr ior to our analysis we hav e carried out the follo wing obvious data cleaning steps which could also be done on-line . Cleaning A: Delete all transactions (quotes) with time stamps outside the main trading period (9:30 AM to 4 PM). Cleaning B: Delete all transactions (quotes) that are not originating from the NYSE. Cleaning C: Delete all transactions with abnor mal sale condition or corrected pr ices (see the T A Q User’ s Guide for details). Since the time stamp precision of these data is limited to one second, sev eral time stamps occur with multiple transactions. Since each of these transactions constitute a single step in the (transaction time) state equation (4) one should nor mally use all these transactions separately (e.g. with an equidistant ex post splitting of the trading times). Howe ver , a closer inspection of these multiple transactions rev ealed that the (time) order ing of these trans- actions was not preser v ed and we theref ore decided to treat this problem like a missing data problem. This means at a time stamp with M transactions and equal trading times t j − M +1 = · · · = t j we used for the transaction time estimator ˆ Σ t j instead of the recursion (22) with ˘ Σ t j ( ω t j ) the corresponding M -step recursion with ˘ Σ M t j ( ω t j ) := 1 M N X i =1 ω i t j  x i t j − x i t j − M  x i t j − x i t j − M  0 and f or the classical clock time estimator ˆ Σ c t j the recursion with ˘ Σ cM t j ( ω c t j ) := N X i =1 ω ci t j  x ci t j − x ci t j − M  x ci t j − x ci t j − M  0 | t j − t j − M | . F or the durations the situation is diff erent since with the number of trades the inf or mation about the trading intensity λ I ( t ) is (almost) fully av ailable . We theref ore apply f or k = j − M + 1 , . . . , j the update ¯ δ k =  1 − λ  ¯ δ k − 1 + λ t j − t j − M M (i.e. M -times the same update - alter natively we ma y also use one update with λ replaced by M λ which for small M λ is almost the same). 30 Estimation of the ﬁltering distributions with real market maker quotes In order to show how our method works in the case when mar ket maker quotes are av ail- able (Example 3 in Section 2) we matched by hand (through an adjustment of the time stamps) the quotes and transactions of symbol C f or a fraction of the trading da y . The par- ticle ﬁlter is used with N = 5 , 000 par ticles and A t j as in (6) where ∆ t j := 0 . 5 ( β t j − α t j ) to estimate the ﬁltering distributions of the unkno wn efﬁcient (log-) pr ices. Figure 1 giv es ker- nel density estimates based on these par ticle appro ximations. The market maker quotes, the transaction prices, and suppor ts of the ﬁlter ing distr ibutions are also shown. From the ﬁgure it can be seen that some ﬁlter ing distr ibutions are highly skew ed. In addition, consecutiv e zero returns lead to v er y uninformative ﬁltering distributions (see transactions 2,300 through 2,309). T ransaction time v olatility estimation W e apply the estimators ˆ Σ t j and ˆ Σ S t j with N = 500 par ticles and the benchmark method ˆ Σ B t j from (33) to estimate the spot volatility f or C. An initial volatility of 0 . 0005 is used. Here we hav e estimated the market maker quotes from the trades, that is we hav e used A t j as giv en in (8) ﬁrst with deter ministic rounding and then with stochastic rounding - see (7). The transaction data of C and the v olatility estimators are shown in Figure 9. At the beginning of the trading da y the volatility is large and highly varying. Later , the volatility settles down and seems to be almost constant. Therefore , the SA GES method for localized step size selection is adv antageous compared to ﬁx ed step sizes. Again the benchmark estimator is rougher than our estimators. Practically , the transaction time v olatility is almost constant after 11:00 am which in our e xperience is a typical feature of transaction data of liquid stocks . Contrar y to this the clock time v olatility is more ﬂuctuating and shows well known features like the U-shape . This has already been discussed at the end of Section 4. The blue (lower gra y) line in Figure 9 shows the estimator ˆ Σ t j with stochastic round- ing. The difference to the blac k estimator with deter ministic rounding is quite large - but can be e xplained heuristically: In some sense all methods decompose the realized volatil- ity into the “true” volatility and the volatility coming from microstr ucture noise. Since the microstructure noise model with stochastic rounding has higher volatility than the deter- ministic one it is obvious that the resulting volatility of the unobser ved efﬁcient price must be smaller . It is remar kable that the ﬁrst estimator has the same le vel as the benchmark- estimator (which uses a completely different linear microstr ucture noise model). In our opinion this is another indicator that the microstr ucture noise model with deter ministic rounding is pref erab le to the model with stochastic rounding. 31 46.6 47.0 47.4 10:00 11:00 12:00 13:00 14:00 15:00 16:00 1e−04 3e−04 5e−04 10:00 11:00 12:00 13:00 14:00 15:00 16:00 1e−04 3e−04 5e−04 09:40 09:50 Figure 9: Real data example: Estimation of time-varying spot volatility in transaction time. The upper plot shows the transaction data of the symbol C for the 3rd September 2007. The middle and lower plot give the volatility estimator with deter ministic rounding ˆ Σ t j (blac k line), the adaptive version with SA GES ˆ Σ S t j (red (dark gra y) line), and the benchmark estimator ˆ Σ B t j (light gra y line). The blue (low er gra y) line in the middle plot shows the estimator ˆ Σ t j with stochastic rounding. Clock time spot v olatility estimation The corresponding cloc k time v olatility estimators ha ve already been display ed in Figure 5 where also the transition from transaction time to clock time has been discussed. In Fig- ure 5 all estimators ha ve been calculated with the SA GES-method. Since volatility in clock time is more v olatile than in transaction time here SA GES requires larger step sizes . We use step sizes equally spaced betw een 0.3 and 0.003. F or the duration estimator ¯ δ j we deter mined the stepsize by minimizing the prediction error Σ T − 1 j =2  ¯ δ j − ( t j +1 − t j )  2 leading to approximately λ = 0 . 08 . (Because of the depen- dence of the durations , ¯ δ j and ( t j +1 − t j ) usually are not independent and minimization of the abov e criter ion therefore is not approximately the same as minimization of the mean squared error . Despite of this we think that the resulting λ is reasonable .) In addition to Figure 5 abov e Figure 10 compares the transaction data and the volatility estimates f or a small time per iod. Note that the estimator needs about one minute to settle down again after the occurrence of a spik e. 32 46.70 46.74 46.78 14:05 14:07 14:09 14:11 14:13 14:15 14:17 −21 −19 14:05 14:07 14:09 14:11 14:13 14:15 14:17 −1.5 −0.5 0.5 14:05 14:07 14:09 14:11 14:13 14:15 14:17 Figure 10: Results from Figure 5 for a small fraction of the trading day . The upper plots show the transaction prices of C; the middle plot gives the log of the volatility estimators ˆ Σ cS t j (blac k line) and ˆ Σ cS alt ( t j ) (green (gray) line); the lower plot sho ws the log of the trading intensity ˆ λ I ( t j ) . 7 Conc luding Remarks W e hav e presented a technique f or the on-line estimation of time-var ying volatility based on noisy tr ansaction data. The algor ithm updates the volatility estimate immediately after a new transaction. On a recent personal computer an efﬁcient implementation of the method requires a f ew milliseconds for a single update of the estimator (including one iteration of the par ticle ﬁlter with 500 par ticles). The paper contains diff erent contributions: First, we have proposed a nonlinear mar ket microstructure noise model that cov ers bid-ask bounces, time-varying bid-ask spreads, and the discreteness of prices obser v ed in real data. Second, the problem of on-line v olatility estimation has been treated in a nonlinear state-space frame work. The ﬁlter ing distribution of the efﬁcient pr ice is approximated with a par ticle ﬁlter and the volatility is estimated as a parameter of the ﬁlter ing distr ibution. Third, we hav e presented a sequential EM-type algorithm which allows the on-line estimation of time-v ar ying volatility . W e also mak e a clear distinction between the (spot) v olatility per time unit Σ c ( t ) and the v olatility per transaction Σ( t ) . We hav e used a diffusion model with random time change 33 giv en by the total number of transactions. This leads to a decomposition of volatility in clock time into v olatility in transaction time and trading intensity . At least for our data set it tur ned out that volatility in transaction time is almost constant (after some steep decrease at the beginning of the trading day), and the ﬂuctuation of clock time volatility is merely a result of the ﬂuctuation of the trading intensity . In our data also the increase of v olatility in the after noon (par t of the U-shape) is a f eature of the trading intensity and not of the transaction time v olatility . W e mention that most components of this method can be used in combination with other models or estimation techniques: F or e xample the par ticle ﬁlter can be used with other pr ice models (e.g. with a dr ift ter m) or other microstr ucture noise models. Like wise the decomposition of clock time volatility into transaction time v olatility and trading intensity can be used with linear microstructure noise models and other estimation techniques. Of course it is desirab le to hav e a complete mathematical theor y for the methods of this paper . Howe ver , we think that this is ver y hard to achiev e. Mathematically exact are the results on the par ticle ﬁlter given that the true v olatility is known (i.e. with ˆ Σ pf t j = Σ t j ) - in par ticular the results from Proposition 1 on the optimal proposal and the impor tance weights . This means that the par ticle ﬁlter determines correctly the conditional distr ibution of the efﬁcient prices giv en the obser vations . In the simpler context of i.i.d.-obser vations conv ergence proper ties of recursive EM-type algorithms hav e been studied in Titter ington (1984), Sato (2000), Wang and Zhao (2006). Cappé and Moulines (2009) derive asymp- totic nor mality with rate of conv ergence λ 1 / 2 f or a similar recursive EM-type algor ithm in an i.i.d setting. In the present situation we may hope for a similar result provided that the model in (9) and (10) is properly rescaled with volatility Σ( t/T ) and the cur v e Σ( · ) is sufﬁ- ciently smooth. A similar result can be found in Dahlhaus and Subba Rao (2007) where the asymptotic proper ties of a recursiv e ARCH-estimator hav e been derived. The optimal rate of conv ergence will howe ver not be attained since the recursive estimator is one-sided. The data-adaptiv e SAGES-procedure will make it e ven more difﬁcult to der ive the asymptotic distribution. F or that reason we recommend a simulation based on the estimated volatility cur ve f or deriving approximate conﬁdence intervals. Ackno wledg ement: We are v er y gr ateful to the Co-Editor Prof essor George T auchen and an anonymous ref eree whose comments helped to improv e the paper considerab ly . Disclaimer: The vie ws e xpressed here are those of the authors and not necessar ily those of its emplo yers . 34 References Aït-Sahalia, Y ., Mykland, P .A., and Zhang, L. (2005) How Often to Sample a Continuous- Time Process in the Presence of Mar ket Microstructure Noise. Re view of Financial Studies , 18, 351-416. Andersen, T .G., Bollerslev , T ., and Meddahi, N. (2006) Realiz ed V olatility F orecasting and Market Microstructure Noise. unpublished manuscript. Andrieu, C ., and Doucet, A. (2002) P ar ticle Filtering f or par tially observed Gaussian state space models. Jour nal of the Ro yal Statistical Society B , 64, 827-836. Ané, T ., and Geman, H. (2000) Order Flow , T r ansaction Clock, and Nor mality of Asset Retur ns. The Jour nal of Finance , 55, 2259-2284. Ball, C .A. (1988) Estimation Bias Induced by Discrete Secur ity Pr ices. The Jour nal of Finance , 43, 841-865. Bandi, F .M., and Russell, J .R. (2006) Seperating microstr ucture noise from volatility . Jour- nal of Financial Economics , 79, 655-692. — (2008) Microstr ucture noise, realized variance, and optimal sampling. Re view of Eco- nomic Studies , 75, 339-369. Bar ndorff-Nielsen, O .E., Hansen, P .R., Lunde, A., and Shephard, N. (2008) Designing Realized K er nels to Measure the Ex-P ost V ariation of Equity Prices in the Presence of Noise. Econometrica , 76, 1481-1536. Bos, C .S., Jan us, P ., and K oopman, S .J. (2009) Spot V ar iance P ath Estimation and its Application to High Frequency Jump T esting. Discussion P aper TI 2009-110/4, Tin- bergen Institute. Cappé, O ., and Moulines, E. (2009) On-line e xpectation-maximization algorithm f or latent data models. Jour nal of the Ro yal Statistical Society , Ser ies B , 71, 593-613. Chen, Y . and Spokoiny , V . (2009) Modeling and estimation f or nonstationar y time ser ies with applications to robust risk management. unpublished manuscript. Christensen, K., P odolskij, M., and V etter , M. (2009) Bias-correcting the realised range- based variance in the presence of mar ket microstr ucture noise. Finance and Stochas- tics , 13, 239-268. Clark, P . (1973) A subordinated stochastic process model with ﬁnite variance f or specula- tiv e prices. Econometr ica , 41, 135-155. Dahlhaus R. and Subba Rao S. (2007) A recursive online algor ithm for the estimation of time-v ar ying ARCH parameters. Ber noulli , 13, 389-422. Delattre, S. and Jacod, J . (1997). A central limit theorem for normalized functions of the increments of a diffusion process, in the presence of round-off errors. Ber noulli , 3, 1ï£¡28. Dempster , A.P ., Laird, N.M., and Rubin, D .B . (1977) Maximum Likelihood from Incomplete Data via the EM Algor ithm. Jour nal of the Roy al Statistical Society , Series B , 39, 1- 38. Douc, R., Cappé, O ., and Moulines, E. (2005) Comparison of resampling schemes f or par ticle ﬁlter ing. In Proceedings of the 4th Inter national Symposium on Image and Signal Processing and Analysis , 64-69. 35 Doucet, A., Godsill, S., and Andrieu, C . (2000) On sequential Monte Car lo sampling meth- ods f or Ba yesian ﬁltering. Statistics and Computing , 10, 197-208. Doucet, A., de F reitas, N., and Gordon, N. (ed.) (2001) Sequential Monte Car lo Methods in Practice . New Y ork: Spr inger . Ev en-Dar , E., and Mansour , Y . (2003) Lear ning Rates for Q-learning. Jour nal of Machine Lear ning Research , 5, 1-25. F an, J., and W ang, Y . (2008) Spot volatility estimation for high-frequency data. Statistics and Its Interf ace , 1, 279-288. F oster , D ., and Nelson, D . (1996) Continuous Record Asymptotics for Rolling Sample Estimators. Econometrica , 64, 139-174. Gabaix, X., Gopikrishnan, P ., Plerou, V ., and Stanley , H.E. (2003) A theor y of power-la w distributions in ﬁnancial market ﬂuctuations . Nature , 423, 267-270. Genz, A. (1992) Numerical computation of multivariate nor mal probabilities. Journal of Computational and Graphical Statistics , 1, 141-149. Genz, A. (2004) Numerical computation of rectangular bivariate and trivariate nor mal and t probabilities. Statistics and Computing , 14, 151-160. Gew eke , J . (1991) Efﬁcient simulation from the m ultivariate normal and student-t distribu- tions subject to linear constraints. In Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface , Ed. E. K eramidas and S. Kaufman , 571-578. American Statistical Association, Alexandria, V A. Hansen, P .R., and Lunde, A. (2006) Realized V ariance and Market Microstructure Noise. Jour nal of Business and Economics Statistics , 24, 127-161. Hansen, P .R. and Horel, G. (2009). Quadratic V ar iation by Mar ko v Chains. CREA TES Research P aper 2009-13. Harris, L. (1990) Estimation of Stock Pr ice V ar iances and Serial Cov ariances from Dis- crete Obser vations . Jour nal of Financial and Quantitative Analysis , 25, 291-306. Hasbrouck, J. (1999) Security Bid/Ask Dynamics with Discreteness and Cluster ing. Jour- nal of Financial Markets , 2, 1-28. Hasbrouck, J. (2004) Liquidity in the Futures Pits: Inferring Market Dynamics from Incom- plete Data. Jour nal of Financial and Quantitative Analysis , 39, 2. Howison, S., and Lamper , D . (2001) T rading volume in models of ﬁnancial derivativ es. Applied Mathematical Finance , 8, 119-135. Jacod, J., Li, Y ., Mykland, P .A., P odolskij, M., and V etter , M. (2009) Microstr ucture noise in the continuous case: The pre-av eraging approach. Stochastic Processes and their Applications , 119, 2249-2276. Joe, H. (1995) Approximations to multiv ariate nor mal rectangle probabilities based on conditional expectations . Jour nal of the American Statistical Association , 90, 957- 964. Kalnina, I., and Linton, O. (2008) Estimating quadratic variation consistently in the pres- ence of endogenous and diur nal measurement error . Jour nal of Econometr ics , 147, 47-59. K ong, A., Liu, J., and W ong, W . (1994) Sequential imputation and Ba yesian missing data problems . Jour nal of Amer ican Statistical Association , 89, 278-288. 36 K otecha, J . and Djur ic, P . (1999) Gibbs sampling approach for the generation of truncated multiv ar iate Gaussian random v ariables. Proceedings of the IEEE Inter national Con- f erence on Acoustics, Speech and Signal Processing , 1757-1760. Kristensen, D . (2010) Nonparametric ﬁltering of the realized spot volatility: A K er nel-based Approach. Econometr ic Theor y , 26, 60-93. Large, J. (2007) Estimating Quadratic V ariation When Quoted Pr ices Change By A Con- stant Increment. unpublished manuscript. Li, Y ., and Mykland, P .A. (2007) Are volatility estimators robust with respect to modeling assumptions?. Ber noulli , 13, 601-622. Manrique, A., and Shephard, N. (1998) Simulation-based likelihood inf erence for limited dependent processes. Econometrics Jour nal , 1, C174-C202. Munk, A., and Schmidt-Hieber , J. (2009) Nonpar ametric Estimation of the V olatility Func- tion in a High-F requency Model corr upted by Noise . unpublished manuscript. Owens , J.P . and Steigerwald, D .G. (2006) Noise reduced realized v olatility: a Kalman ﬁlter approach. Advances in Econometr ics 20 (ed. T om Fomb y and Dek T errell), 211-227, Else vier . Plerou, V ., Gopikrishnan, P ., Gabaix, X., A Nunes Amaral, L., and Stanley , H.E. (2001) Price ﬂuctuations, mar ket activity and trading volume . Quantitativ e Finance , 1, 262- 269. P odolskij, M., and V etter , M. (2009) Estimation of v olatility functionals in the sim ultaneous presence of microstructure noise and jumps. Ber noulli , 15, 634-658. Rober t, C . (1995) Simulation of truncated nor mal variables . Statistics and Computing , 5, 121-125. Rober t, C .Y ., and Rosenbaum, M. (2008) Ultra high frequency v olatility and co-volatility estimation in a microstr ucture model with uncer tainty zones . unpublished manuscript. Rodriguez-Y am, G., Davis , R., and Scharf , L. (2004) Efﬁcient gibbs sampling of truncated multiv ar iate nor mal with application to constrained linear regression. T echnical re- por t, Colorado State University , 2004. Rosenbaum, M. (2009) Integrated v olatility and round-off error . Ber noulli , 15, 687-720. Sato , M. (2000) Conv ergence of on-line EM algorithm. In Proc. Int. Conf. on Neural Inf or mation Processing , 1, 476-481. T odorov , V . and T auchen, G. (2011) V olatility Jumps. J. Business and Economic Statistics , 29, 356-371. Titterington, D .M. (1984) Recursive P arameter Estimation Using Incomplete Data. Jour- nal of the Ro yal Statistical Society , Ser ies B , 46, 257-267. V oe v , V ., and Lunde, A. (2007) Integrated Cov ar iance Estimation using High-F requency Data in the Presence of Noise. Jour nal of Financial Econometrics , 5, 68-104. W ang, S., and Zhao , Y . (2006) Almost sure conv ergence of Titter ington’ s recursive esti- mator f or mixture models. Statistics & Probability Letters , 76, 2001-2006. Zeng, Y . (2003) A P ar tially Obser ved Model f or Micromov ement of Asset Prices with Ba yes Estimation via Filtering. Mathematical Finance , 13, 411-444. 37 Zhang, L., Mykland, P .A., and Aït-Sahalia (2005) A T ale of T wo Time Scales: Deter min- ing Integrated V olatility with Noisy High-F requency Data. Journal of the Amer ican Statistical Association , 100, 1394-1411. Zhou, B. (1996) High-F requency Data and V olatility in F oreign-Exchange Rates. Jour nal of Business & Economic Statistics , 14, 45-52. Zu, Y . and Boswijk, P . (2010) Estimating spot v olatility with high frequency ﬁnancial data. Preprint, University of Amsterdam. Zumbach, G, Corsi, F ., and T rapletti, A. (2002) Efﬁcient estimation of volatility using high- frequency data. T echnical Repor t, Olsen & Associates. 38

On-line Spot Volatility-Estimation and Decomposition with Nonlinear Market Microstructure Noise Models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment