Neural Stochastic Differential Equations with Change Points: A Generative Adversarial Approach

NEURAL STOCHASTIC DIFFERENTIAL EQ U A TIONS WITH CHANGE POINTS: A GENERA TIVE AD VERSARIAL APPRO A CH Zhongchang Sun ⋆ † Y ousef El-Laham ⋆ Svitlana Vyetr enko ⋆ J.P . Mor gan AI Research ⋆ , Uni versity at Buf falo † ABSTRA CT Stochastic differential equations (SDEs) ha ve been widely used to model real world random phenomena. Existing works mainly fo- cus on the case where the time series is modeled by a single SDE, which might be restrictiv e for modeling time series with distribu- tional shift. In this work, we propose a change point detection al- gorithm for time series modeled as neural SDEs. Giv en a time se- ries dataset, the proposed method jointly learns the unkno wn change points and the parameters of distinct neural SDE models correspond- ing to each change point. Speciﬁcally , the SDEs are learned under the framework of generativ e adversarial networks (GANs) and the change points are detected based on the output of the GAN discrim- inator in a forward pass. Numerical results on both synthetic and real datasets are provided to v alidate the performance of the algo- rithm in comparison to classical change point detection benchmarks, standard GAN-based neural SDEs, and other state-of-the-art deep generativ e models for time series data. Index T erms — deep generativ e models, stochastic differential equations, generativ e adversarial networks, change point detection 1. INTRODUCTION Stochastic differential equations (SDEs) are a class of mathematical equations used to model continuous-time stochastic processes [1–3], with applications ranging from ﬁnance and physics to biology and engineering. Recently , neural SDEs [4–10] have been proposed as a means to inte grate neural netw orks with SDEs, providing a more ﬂexible approach for modeling sequential data. In [9], the authors established a novel connection between neural SDEs and generativ e adversarial networks (GANs), showing that certain classes of neu- ral SDEs can be interpreted as inﬁnite-dimensional GANs. In [10], a v ariational autoencoder (V AE) framework for identifying latent SDEs from noisy observations was proposed based on the Euler- Maruyama approximation of SDE solutions. Existing w orks on neu- ral SDEs mainly focus on the case where the time series is modeled by a single SDE; howe ver , in real-world applications, the underlying dynamics of the data may change over time. For example, ﬁnancial time series may exhibit sharp distributional shifts due to exogenous factors (e.g., global ﬁnancial crisis, the CO VID-19 pandemic). T o train the neural SDEs, it is common to assume that the drift and diffusion terms are Lipschitz continuous. This assumption is restric- tiv e, in the sense that a single neural SDE that with Lipschitz smooth drift and diffusion cannot effecti vely model time series with sudden distributional shifts. This motiv ates us to study the change point de- tection problem of SDEs and model the time series as multiple SDEs conditioned on the change points. Change point detection [11, 12] is a critical aspect of time se- ries analysis, especially in domains such as ﬁnance, climate science, and sensor data processing, where abrupt shifts in behavior can ha ve profound implications. By identifying change points, we can par- tition the time series into distinct segments where each segment is described by a different SDE model. This adaptation allows us to capture the speciﬁc characteristics and uncertainties within each seg- ment, leading to a more precise understanding of the underlying pro- cesses. In [13, 14], SDEs are applied to detect change points in time series. Howe ver , the drift and diffusion functions in [13, 14] are char - acterized by a restricted number of parameters instead of neural net- works, which constrains the overall model capacity of SDEs. In [15], latent neural SDEs are introduced to detect changes in time series, where a single SDE in the latent space is assumed and is trained using V AEs. In this work, it is assumed that there is a prior SDE with a known diffusion term in the latent space for the tractabil- ity purposes of the loss function. Ho wev er , this assumption is too restrictiv e since the training data might not necessarily conform to this latent SDE. Neural jump SDEs (JSDEs) were proposed in [16], which combine temporal point processes and neural ordinary dif- ferential equations (ODEs) [17] to model both continuous dynamics and abrupt changes. Compared with the neural SDEs, the continuous dynamics of neural JSDEs is deterministic and the randomness only comes from the temporal point process. Similarly , stochastic deep latent state space model [18, 19] combined with ODE-based model are introduced in [20] to increase the modeling capacity of ODEs. Howe ver , a prior on the latent variable sequence is needed and no change detection is in volv ed in this method. In this work, we develop a nov el approach for modeling change points in neural SDEs based on the GAN framework presented in [9], which enhances the expressiv e capacity of neural SDEs. Our speciﬁc contributions are as follo ws: • W e propose a frame work and training algorithm for modeling neural SDEs with change points. The proposed algorithm al- ternates between detecting change points (while holding the model parameters ﬁxed) and optimizing the GAN parameters (while holding the change points ﬁxed). • W e propose a change point detection scheme for neural SDEs (trained as GANs) by lev eraging the learned GAN discrim- inator as a means to approximate the W asserstein distance between time series samples. Speciﬁcally , we ﬁrst partition the training data into multiple segments based on the sliding window approach and then input them sequentially into the GAN discriminator to get a sequence of scores. The change point estimate is then updated by specifying the change point of the score sequence, at which the approximated W asserstein distance between two consecuti ve segments is the lar gest. • W e demonstrate the effecti veness and versatility of our ap- proach through extensi ve experiments on synthetic and real- world datasets. 2. PROBLEM FORMULA TION W e consider SDEs of the following form: dX t = f ( t, X t ) dt + g ( t, X t ) ◦ dW t , (1) where X 0 ∼ µ is the initial state following the initial distrib ution µ , X = { X t } t ∈ [0 ,T ] is a continuous R x -valued stochastic process, “ ◦ ” denotes that the SDE is understood using Stratonovich integration, f : [0 , T ] × R x → R x is called the drift function that describes the deterministic evolution of the stochastic process, g : [0 , T ] × R x → R x × w is called the diffusion function and W = { W t } t ≥ 0 is a w -dimensional Brownian motion representing the random noise in the sample path. Unlike ODEs, SDEs do not always have unique solutions. W e say that X = { X t } t ≥ 0 is a strong solution of the SDE (1) if it satisﬁes (1) for each sample path of the W iener process { W t } t ≥ 0 and for all t in the deﬁned time interv al almost surely . Due to the large capacity of neural networks for function approx- imation, neural SDEs have been proposed, which model the drift and diffusion terms via neural networks. When training the neural SDEs, the drift function f and the diffusion function g are assumed to be Lipschitz continuous so that a unique strong solution to the SDEs exists [21]. Therefore, when there are changes in the dynamics of the stochastic process, it is not accurate to model the stochastic pro- cess as a single neural SDE. In this paper, we turn to an alternativ e approach, where we leverage multiple neural SDE models condi- tioned on change points to model the dynamics of a continuous-time stochastic process. Our goal is to jointly detect the change of the dynamics in the time series and model the time series with multiple SDEs conditioned on the change points. 3. BA CKGROUND 3.1. Neural SDEs as GANs In this section, we show that ﬁtting the SDEs can be approached using WGANs [9]. WGANs [22] utilize a generator network and a discriminator network, where the loss function is deﬁned using the W asserstein distance. WGANs enforce Lipschitz continuity on the discriminator through gradient penalties, fostering training stability and con ver gence while minimizing mode collapse. Let Y true be the ground truth of the SDE trajectory which is a random variable on the path space. Let V ∼ N (0 , I v ) be a v - dimensional random Gaussian noise. The generator maps V to a trajectory , which is the solution to the following neural SDE: X 0 = ζ θ ( V ) , dX t = µ θ ( t, X t ) dt + σ θ ( t, X t ) ◦ dW t , Y t = α θ X t + β θ , (2) where ζ θ , µ θ and σ θ are (Lipschitz) neural networks and are param- eterized by θ . α θ and β θ are vectors that are jointly optimized. The generator networks are optimized so that the generated sample on path space Y t is close to the ground truth trajectory Y true . For the discriminator , a neural controlled differential equation (CDE) is utilized since it can take an inﬁnite-dimensional sample path as input and can output a scalar score, which in practice mea- sures the realism of path with respect to the real data. The discrimi- nator has the following form: H 0 = ξ ϕ ( Y 0 ) , dH t = f ϕ ( t, H t ) dt + g ϕ ( t, H t ) ◦ d Y t , D = m ϕ · H T , (3) where ξ ϕ , f ϕ and g ϕ are (Lipschitz) neural networks and are param- eterized by ϕ , H : [0 , T ] → R h is the solution to this SDE and m ϕ maps the terminal state H T to a scalar D . Let Y θ : ( V , { W } t ≥ 0 ) → Y be the o verall action of the genera- tor and D ϕ : Y → D be the overall action of the discriminator . Let y be the collection of the training data. The training loss is deﬁned as the W asserstein GANs, where the generator is trained to minimize E V ,W [ D ϕ ( Y θ ( V , W ))] , (4) and the discriminator is trained to maximize E V ,W [ D ϕ ( Y θ ( V , W ))] − E y [ D ϕ ( ˆ y )] . (5) The goal is to minimize the W asserstein distance between the true data distribution and the generated distribution [22]. The loss func- tions can be optimized using stochastic optimization techniques (e.g., SGD [23], RMSprop [24], and Adam [25]). 3.2. W asserstein T wo-Sample T esting For training SDEs as GANs [9], the training loss can be vie wed as the W asserstein distance between the training samples and the generated samples. Therefore, the learned model can be used to approximate the W asserstein distance between two time series, which motiv ates us to design change detection algorithm by le veraging the popular W asserstein two-sample test [26]. In this section, we provide a brief introduction for the W asserstein two-sample test. The details of our algorithm are presented in the following section. The W asserstein two-sample test [26] is a statistical method used to compare two sets of data and determine if they originate from the same distribution. Unlike traditional tests that focus on compar- ing means or variances, the W asserstein two-sample test computes the W asserstein distance between the empirical distributions of the samples which measures the minimum amount of cost required to transform one distribution into the other . Speciﬁcally , giv en inde- pendent and identically distributed (i.i.d.) samples X 1 , · · · , X m ∼ P and Y 1 , · · · , Y n ∼ Q where P , Q are probability measures on R d , let P m , Q n denote the empirical distributions of X 1 , · · · , X m and Y 1 , · · · , Y n respectiv ely . Given an exponent p ≥ 1 , the p - W asserstein distance between P m and Q n is deﬁned as W ( P m , Q n ) =  inf π ∈ Π( P m ,Q n ) Z R d × R d ∥ X − Y ∥ p dπ  1 p , where Π( P m , Q n ) is the collection of all joint probability distribu- tion on R d × R d with marginal distrib ution P m , Q n . The W asserstein two-sample test [26] is particularly useful for high-dimensional data and can provide more informati ve insights into the dissimilarities between distributions. The W asserstein dis- tance has also found other applications in various aspects of statis- tical inference such as goodness-of-ﬁt testing [27] and change de- tection [28]. In [28], W asserstein barycenters were used to capture changes in distribution. 4. CHANGE POINT DETECTION IN NEURAL SDES In this section, we in vestig ate the problem of modeling change points in neural SDE models based on the GAN framew ork. T o make the presentation more concise, we consider the case where there is one change point and later discuss a straightforward e xtension to case of multiple change points. Note that since we detect the change point and learn the SDE models in a data-driv en manner and the data is not Fig. 1 . Flow diagram of our training algorithm. independent o ver time, it is challenging to directly detect the change using classical change detection algorithms such as the CuSum al- gorithm [29]. Observe that the training loss in (5) is deﬁned to ap- proximate the W asserstein distance between the training data and the generated samples, given the trained discriminator , we can approx- imate the W asserstein distance between two time series. Therefore, we propose to detect change by leveraging the idea of W asserstein two-sample test and alternatively update the parameters for the SDEs and change point estimate. Algorithm summary: Our training algorithm is summarized as follows. Firstly , we initialize the change point estimate ν and the neural network parameters θ 0 , θ 1 , ϕ for the generator and the discriminator . Secondly , based on the change point estimate ν , we partition the training data and run different SDE models for each segment and update the parameters of the GANs. Thirdly , we ap- ply a sliding window method to get multiple segments of the train- ing data and then input them sequentially into the discriminator . As we iterate through the time series, a sequence of scores is returned. The dif ference of scores between two segments can be viewed as the W asserstein distance between two segments. Therefore, the change point estimate is then updated by specifying the change point of the score sequence. Figure 1 shows a ﬂow diagram summarizing our training algorithm. Speciﬁcally , at each step, we alternate between the following tw o update steps: Model parameters update: Based on the change point estimate ν , we use sample paths X 1: ν − 1 as training samples to optimize the parameter θ 0 of the neural SDE (before the change happens): dX t = µ θ 0 ( t, X t ) dt + σ θ 0 ( t, X t ) ◦ dW t , (6) and use sample paths X ν : T as training samples to optimize the pa- rameters θ 1 of the neural SDE (after the change happens): dX t = µ θ 1 ( t, X t ) dt + σ θ 1 ( t, X t ) ◦ dW t . (7) W e also update the parameter ϕ of the discriminator D ϕ based on the generated trajectory Y 1: T . Change point update: After the SDEs model parameters are updated, we update the change point estimate. Consider a sliding window of size w . Note that this window size is a hyperparameter of the algorithm that can be tuned in practice. W e partition the observed sample path into dif ferent segments X 1: w , X 2: w +1 , · · · , X T − w +1: T . W e pass each se gment X t : t + w into the discriminator and denote the returned score by s t : s t = D ϕ ( X t : t + w ) , t = 1 , 2 , . . . , T − w + 1 . (8) Algorithm 1 Neural SDEs with Change Points Require: Initial parameters θ 0 , θ 1 , ϕ, ν , training samples X 1 1: T , · · · , X N 1: T . while not con ver ged do Update θ 0 , θ 1 , ϕ by running SGD based on ν . Compute ¯ s t using (9) based on current ϕ Update ν according to (10). end while The subsequences X 1: w , X 2: w +1 , · · · , X T − w +1: T are thus con- verted to a sequence of scores s 1 , s 2 , · · · , s T − w +1 . W e deﬁne the av erage score o ver all training samples using the arithmetic av erage: ¯ s t = 1 N N X i =1 D ϕ ( X ( i ) t : t + w ) . (9) The difference between two average scores can be viewed as the W asserstein distance between two corresponding segments. Sequen- tially , at each time t , we compare the approximated W asserstein distance between two consecuti ve segments ¯ s t − ¯ s t − 1 with a pre- speciﬁed threshold γ to distinguish between two hypotheses: H 0 : the change happens at time t ; and H 1 : the change happens after time t . When ¯ s t − ¯ s t − 1 > γ , we declare that the change happens at time t , otherwise, we proceed to the next time step. In an ofﬂine setting, the change point can be estimated as the time inde x v where the changes of the av erage score is the largest: v = arg max t ( ¯ s t − ¯ s t − 1 ) . (10) After the change point is updated, we return again update the SDE model parameters and then the change point estimate again and re- peat this process until con ver gence. W e summarize our algorithm by pseudocode in Algorithm 1. Extension to multiple change points: Our algorithm can be easily adapted to the cases where there are multiple changes. As- sume that there is only one change within a window with size w . W e sort all s t − s t − 1 in descending order and denote their time inde x as ˆ ν 1 , ˆ ν 2 , · · · . The change point is ﬁrst declared as ˆ ν 1 . If | ˆ ν 2 − ˆ ν 1 | ≤ w , we discard ˆ ν 2 and proceed to the follo wing element until we ﬁnd the i such that | ˆ ν i − ˆ ν 1 | > w . Then, ˆ ν i will be another change point. More change points can be found by repeating this process. 5. SIMULA TION RESUL TS 5.1. T oy Example: Ornstein-Uhlenbeck Pr ocess W e begin by ﬁtting a time-dependent one-dimensional Ornstein- Uhlenbeck (OU) process, which is deﬁned by the following SDE: dX t = ( µt − θ X t ) dt + σ ◦ dW t . (11) W e consider the cases where there is one change point, two change points and three change points. Let the change points be ν 1 = 32 , ν 2 = 64 , ν 3 = 96 . Before ν 1 , we set µ 1 = 0 . 04 , θ 1 = 0 . 1 , σ 1 = 0 . 4 . After ν 1 and before ν 2 , we set µ 2 = − 0 . 02 , θ 2 = 0 . 1 , σ 2 = 0 . 4 . After ν 2 and before ν 3 , we set µ 3 = 0 . 02 , θ 3 = 0 . 1 , σ 3 = 0 . 4 . After ν 4 , we set µ 4 = − 0 . 02 , θ 4 = 0 . 1 , σ 4 = 0 . 4 . Baselines: W e compare our approach with tw o heuristic change detection approaches. The ﬁrst one detects the change by the mean change of the time series. Speciﬁcally , we partition the sample into different segments X 1: w , X 2: w +1 , · · · , X T − w +1: T . Deﬁne the av- Fig. 2 . Simulation results on synthetic OU process data withe change points. erage mean of each segment o ver all training samples as ¯ µ t = 1 N N X i =1 w X t =1 X ( i ) t . (12) The change point using the average mean is then estimated as ˆ ν mean = arg max t ( ¯ µ t − ¯ µ t − 1 ) . The second approach is based on the maximum mean discrepancy (MMD) which is usually used to quantify the difference between two distributions. Deﬁne the av erage MMD between two consecuti ve segments as ¯ η t = 1 N N X i =1 MMD ( X i t − 1: t + w − 1 , X i t : t + w ) . (13) The change point using the av erage MMD is then deﬁned as ˆ ν MMD = arg max t ¯ η t . Results: W e ﬁrst plot the generated sample paths of our ap- proach and the training data in Fig. 2 for all three cases. It can be seen that our approach detects the change points and ﬁts the training data well even when there are multiple change points. T o compare our approach with the heuristic approaches, we plot the estimated change points for all approaches along with the training data for the case with three change points in the last ﬁgure of Fig. 2. It can be seen that MMD and mean change don’t reﬂect the change of the SDE trajectories while our approach detects the change accurately . 5.2. Real Data Experiment: ETF Data W e use part of the Exchange-Traded Fund (ETF) data from Decem- ber 12, 2019 to June 07, 2020 which co vers the COVID period where a sharp distributional shift occurred. Each sample of the data corre- sponds to have a different underlier of the S&P 500 index. The data is normalized to hav e mean zero and unit variance. Baselines and metrics: W e compare our approach against two baselines: GAN-based neural SDEs without change detection (de- noted by SDEGAN) [9] and the R TSGAN [30]. For our approach, we consider the cases with one, two and three change points (de- noted by CP-SDEGAN 1 , CP-SDEGAN 2 , CP-SDEGAN 3 ). W e use three metrics to compare their performance. MMD : W e use MMD to measure the difference between the training samples and generated samples. Smaller v alue means the generated samples are closer to the training samples. Pr ediction : W e perform one-step prediction under the train-on- synthetic-test-on-real (TSTR) metrics [31]. W e train a 2-layer LSTM predictor on the generated samples and test its performance on the real data. Smaller loss means that the generated samples are able to capture the temporal dynamics of the training samples. Classiﬁcation : W e train a 2-layer LSTM to distinguish between MMD ↓ Classiﬁcation ↑ Prediction ↓ R TSGAN 0 . 2942 ± 0 . 0000 0 . 0680 ± 0 . 0774 1 . 0416 ± 0 . 0005 SDEGAN 0 . 6028 ± 0 . 0000 0 . 1716 ± 0 . 0714 0 . 8189 ± 0 . 0001 CP-SDEGAN 1 0 . 2144 ± 0 . 0000 0 . 2038 ± 0 . 0682 0 . 8316 ± 0 . 0002 CP-SDEGAN 2 0 . 1548 ± 0 . 0000 0 . 1847 ± 0 . 0351 0 . 8173 ± 0 . 0001 CP-SDEGAN 3 0.1464 ± 0.0000 0.2816 ± 0.0867 0.8167 ± 0.0002 T able 1 . Results for ETF data. the real data and the generated samples and get the classiﬁcation loss on the test set. Larger loss means it’ s more difﬁcult to disti nguish the real and synthetic data. Results: W e summarize our results in T able 1. For this dataset, we have that our approach outperforms the R TSGAN [30] and neural SDEs without change detection [9]. Howe ver , assuming different number of change points will lead to different performance of our algorithm on all three metrics. In real-world applications, based on speciﬁc tasks, we can determine the best number of change points to train neural SDEs by model selection. 6. CONCLUSION In this paper , we proposed a novel approach to detect the change of the neural SDEs based on GANs and further model time series with multiple SDEs conditioning on the change point. Our research contributes to the advancement of more robust and accurate mod- eling techniques, particularly in the context of ﬁnancial markets, where the ability to capture dynamic changes is crucial for informed decision-making. Our results show that the proposed approach out- performs other deep generati ve models in terms of generati ve quality on datasets exhibiting distrib utional shifts. 7. A CKNO WLEDGEMENTS This paper was prepared for informational purposes by the Artiﬁ- cial Intelligence Research group of JPMorgan Chase & Co. and its afﬁliates (“JP Morgan”), and is not a product of the Research De- partment of JP Morgan. JP Morgan makes no representation and warranty whatsoev er and disclaims all liability , for the completeness, accuracy or reliability of the information contained herein. This doc- ument is not intended as inv estment research or inv estment advice, or a recommendation, of fer or solicitation for the purchase or sale of any security , ﬁnancial instrument, ﬁnancial product or service, or to be used in any way for ev aluating the merits of participating in any transaction, and shall not constitute a solicitation under any jurisdic- tion or to any person, if such solicitation under such jurisdiction or to such person would be unlawful. 8. REFERENCES [1] T . Leli ` evre and G. Stoltz, “Partial differential equations and stochastic methods in molecular dynamics, ” Acta Numerica , vol. 25, p. 681–880, 2016. [2] T . K. Sobolev a and A. B. Pleasants, “Population growth as a nonlinear stochastic process, ” Mathematical and Computer Modelling , vol. 38, no. 11, pp. 1437–1442, 2003. [3] T . Huillet, “On Wright–Fisher diffusion and its relativ es, ” J our - nal of Statistical Mechanics: Theory and Experiment , vol. 2007, no. 11, p. 11006, nov 2007. [4] B. Tzen and M. Raginsky , “Theoretical guarantees for sam- pling and inference in generativ e models with latent diffu- sions, ” in Confer ence on Learning Theory . PMLR, 2019, pp. 3084–3114. [5] X. Li, T .-K. L. W ong, R. T . Chen, and D. K. Duvenaud, “Scal- able gradients and variational inference for stochastic differ - ential equations, ” in Symposium on Advances in Approximate Bayesian Infer ence . PMLR, 2020, pp. 1–28. [6] P . Gierjatowicz, M. Sabate-V idales, D. Siska, L. Szpruch, and Z. Zuric, “Robust pricing and hedging via neural stochas- tic differential equations, ” Journal of Computational Finance , vol. 26, no. 3, 2022. [7] X. Liu, T . Xiao, S. Si, Q. Cao, S. Kumar , and C.-J. Hsieh, “Neural SDE: Stabilizing neural ode networks with stochastic noise, ” arXiv preprint , 2019. [8] Y . Song, J. Sohl-Dickstein, D. P . Kingma, A. Kumar , S. Ermon, and B. Poole, “Score-based generati ve model- ing through stochastic differential equations, ” arXiv pr eprint arXiv:2011.13456 , 2020. [9] P . Kidger , J. Foster , X. Li, and T . J. L yons, “Neural sdes as inﬁnite-dimensional gans, ” in Pr oc. International Confer ence on Mac hine Learning (ICML) . PMLR, 2021, pp. 5453–5463. [10] A. Hasan, J. M. Pereira, S. Farsiu, and V . T arokh, “Identifying latent stochastic differential equations, ” IEEE T ransactions on Signal Pr ocessing , vol. 70, pp. 89–104, 2022. [11] I. Nikiforov , “ A generalized change detection problem, ” IEEE T ransactions on Information Theory , vol. 41, no. 1, pp. 171– 187, 1995. [12] V . V . V eeravalli and T . Banerjee, “Quickest change detection, ” in Academic press library in signal pr ocessing . Elsevier , 2014, vol. 3, pp. 209–255. [13] S. M. Iacus and N. Y oshida, “Numerical analysis of volatil- ity change point estimators for discretely sampled stochastic differential equations, ” Economic Notes , vol. 39, no. 1-2, pp. 107–127, 2010. [14] M. Ko v ´ a ˇ r ´ ık, “V olatility change point detection using stochas- tic differential equations and time series control charts, ” In- ternational J ournal of Mathematical Models and Methods in Applied Sciences , 2013. [15] A. Ryzhiko v , M. Hushchyn, and D. Derkach, “Latent neural stochastic differential equations for change point detection, ” arXiv pr eprint arXiv:2208.10317 , 2022. [16] J. Jia and A. R. Benson, “Neural jump stochastic differential equations, ” Pr oc. Advances in Neural Information Pr ocessing Systems (NIPS) , vol. 32, 2019. [17] R. T . Chen, Y . Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordinary differential equations, ” Proc. Advances in Neural Information Pr ocessing Systems (NIPS) , vol. 31, 2018. [18] Y . Rubanov a, R. T . Chen, and D. K. Duvenaud, “Latent ordi- nary differential equations for irregularly-sampled time series, ” Advances in neural information pr ocessing systems , vol. 32, 2019. [19] A. Gu, K. Goel, and C. R ´ e, “Efﬁciently modeling long sequences with structured state spaces, ” arXiv preprint arXiv:2111.00396 , 2021. [20] L. Zhou, M. Poli, W . Xu, S. Massaroli, and S. Ermon, “Deep latent state space models for time-series generation, ” in Inter- national Confer ence on Machine Learning . PMLR, 2023, pp. 42 625–42 643. [21] P . E. Kloeden, E. Platen, P . E. Kloeden, and E. Platen, Stoc has- tic differ ential equations . Springer , 1992. [22] M. Arjovsk y , S. Chintala, and L. Bottou, “W asserstein gen- erativ e adversarial networks, ” in International conference on machine learning . PMLR, 2017, pp. 214–223. [23] Y . Lecun, L. Bottou, Y . Bengio, and P . Haffner , “Gradient- based learning applied to document recognition, ” Pr oceedings of the IEEE , vol. 86, no. 11, pp. 2278–2324, 1998. [24] I. Goodfellow , Y . Bengio, and A. Courville, Deep learning . MIT press, 2016. [25] D. P . Kingma and J. Ba, “ Adam: A method for stochastic opti- mization, ” arXiv preprint , 2014. [26] A. Ramdas, N. Garc ´ ıa Trillos, and M. Cuturi, “On W asser- stein two-sample testing and related families of nonparametric tests, ” Entr opy , vol. 19, no. 2, p. 47, 2017. [27] M. Hallin, G. Mordant, and J. Segers, “Multi variate goodness- of-ﬁt tests based on Wasserstein distance, ” Electr onic Journal of Statistics , vol. 15, no. 1, pp. 1328 – 1371, 2021. [28] K. Faber , R. Corizzo, B. Sniezynski, M. Baron, and N. Jap- ko wicz, “WATCH: W asserstein change point detection for high-dimensional time series data, ” in 2021 IEEE International Confer ence on Big Data (Big Data) , 2021, pp. 4450–4459. [29] E. S. Page, “Continuous inspection schemes, ” Biometrika , vol. 41, no. 1/2, pp. 100–115, 1954. [30] H. Pei, K. Ren, Y . Y ang, C. Liu, T . Qin, and D. Li, “T owards generating real-world time series data, ” in 2021 IEEE Interna- tional Confer ence on Data Mining (ICDM) . IEEE, 2021, pp. 469–478. [31] C. Esteban, S. L. Hyland, and G. R ¨ atsch, “Real-value d (med- ical) time series generation with recurrent conditional gans, ” arXiv pr eprint arXiv:1706.02633 , 2017.

Neural Stochastic Differential Equations with Change Points: A Generative Adversarial Approach

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment