Unrestrained Simplex Denoising for Discrete Data. A Non-Markovian Approach Applied to Graph Generation
Denoising models such as Diffusion or Flow Matching have recently advanced generative modeling for discrete structures, yet most approaches either operate directly in the discrete state space, causing abrupt state changes. We introduce simplex denois…
Authors: Yoann Boget, Alex, ros Kalousis
Unr estrained Simplex Denoising f or Discr ete Data A Non-Marko vian A ppr oach A pplied to Graph Generation Y oann Boget 1 Alexandros Kalousis 2 Abstract Denoising models such as Diffusion or Flo w Matching ha ve recently advanced generati ve mod- eling for discrete structures, yet most approaches either operate directly in the discrete state space, causing abrupt state changes. W e introduce sim- plex denoising, a simple yet effecti ve generati ve framew ork that operates on the probability sim- plex. The ke y idea is a non-Markovian noising scheme in which, for a giv en clean data point, noisy representations at different times are con- ditionally independent. While preserving the theoretical guarantees of denoising-based gen- erativ e models, our method removes unneces- sary constraints, thereby improving performance and simplifying the formulation. Empirically , unr estrained simplex denoising surpasses strong discrete diffusion and flow-matching baselines across synthetic and real-world graph benchmarks. These results highlight the probability simplex as an effecti ve frame work for discrete generati ve modeling. 1. Introduction Denoising models, such as Discrete Diffusion and Discrete Flow Matching, hav e substantially advanced generativ e modeling for discrete structures ( Austin et al. , 2021 ; Camp- bell et al. , 2022 ; 2024 ; Gat et al. , 2024 ), including graphs ( Haefeli et al. , 2022 ; V ignac et al. , 2023 ; Qin et al. , 2025 ; Boget , 2025 ). Y et, surprisingly fe w works study denois- ing on simplices for modeling discrete data ( Richemond et al. , 2022 ; A vdeye v et al. , 2023 ; Stark et al. , 2024 ). In graph generation specifically , e xisting generativ e models do not extend simple x-based parameterizations beyond the 1 -simplex ( Liu et al. , 2025 ). 1 Department of Computer Science, Uni versity of Gene va, Switzerland 2 DMML Group, Genev a School for Business admin- istration HES-SO, Switzerland. Correspondence to: Y oann Boget < yoann.boget@hes-so.ch > . Pr eprint. Marc h 31, 2026. Discrete dif fusion models such as Austin et al. ( 2021 ) and Campbell et al. ( 2022 ) impose transitions between discrete states during denoising. Such jumps can abruptly alter the nature of a noisy instance, for e xample, disconnecting a graph by deleting an edge. By allowing probabilistic super - positions of categories, denoising on the simplex enables smooth transitions between states. Howe ver , na ¨ ıve noising schemes on the simplex exhibit undesirable behavior . In particular , avoiding abrupt discon- tinuities in the support of the noisy distrib utions requires a carefully designed probability path. T o mitigate these issues, prior simplex-based denoising methods (e.g., Richemond et al. , 2022 ; Stark et al. , 2024 ) introduce non-trivial theoret- ical formulations, which may hinder broader adoption. Moreov er , standard approaches such as diffusion or flow matching strongly constrain the re verse dynamics by enforc- ing a continuous denoising trajectory in the continuous-time limit, i.e., lim dt → 0 p t + dt | t ( x t + dt | x t ) = δ ( x t + dt − x t ) . In discrete diffusion, this constraint has been linked to com- pounding denoising errors ( Lezama et al. , 2023 ; Boget , 2025 ; Zhang et al. , 2025 ). W e ar gue that similar dynamics operate in the continuous case. In this work, we introduce simple x denoising , a simple yet effecti ve generative model for discrete data that operates directly on the probability simplex. Our model le verages a non-Markovian dif fusion framework that assumes indepen- dence across noisy states (conditional on the clean input). By removing dependence on the previous noisy state, our formulation eliminates unnecessary constraints and simpli- fies the denoising process. Our key contrib utions are as follows: • W e introduce a ne w non-Markovian denoising ap- proach for categorical data operating on the probability simplex (Section 4 ). The formulation efficiently sub- stantially simplifies standard denoising paradigms such as dif fusion or flo w matching and addresses the com- pounding denoising errors issue. • W e provide theoretical guarantees, proving con ver- gence of the proposed procedure under the true de- noising distribution p 1 | t ( x | x t ) (Section 4 ). 1 Unrestrained Simplex Denoising f or Discrete Data • Lev eraging explicit probability paths, we propose new theoretical tools for the analysis of the noising dy- namics on the probability simplex. Specifically , we introduce the V oronoi probability and its closed-form formula for the Dirichlet probability path (Section 3 ). • Empirically , our unrestrained simplex denoising method surpasses strong discrete dif fusion and flo w- matching baselines on synthetic and real-world graph benchmarks, advancing the state of the art on multiple datasets (Section 5 ). 2. Background In this paper , we focus on modeling multi variate categori- cal data, with application to graphs that may ha ve discrete node and edge attributes. In this section, we begin by es- tablishing some notations used throughout the paper . W e then position our contrib ution within the literature along three ax es: (i) denoising-based generative models for dis- crete data; (ii) non-Marko vian formulations of noising and denoising processes; and (iii) generativ e models for graphs. W e conclude by motiv ating our choice to operate directly on the probability simplex. 2.1. Notations W e define a graph as a set of nodes and edges, denoted by G = ( V , E ) . W e represent each node pair ( ν i , ν j ) as a vector a i,j ∈ R d a , where d a is the number of edge types, including the absence of an edge. Similarly , we represent the node attributes as a vector v i ∈ R , b ut we do not need to encode the absence of a node. When the formulation applies unambiguously to node and edge representations, we use x to refer to both v ( i ) and a ( i,j ) to facilitate readability . W e denote e i as the i th standard basis vector . W e adopt the flow matching con vention t ∈ [0 , 1] , with t = 1 corresponding to the data distribution x 1 ∼ q data ( x ) and t = 0 to the noise distrib ution x 0 ∼ q 0 ( x ) . W e denote dt as a small but not necessarily infinitesimal time interval such that t + dt ≤ 1 . 2.2. Denoising Models for Discr ete Data W e group existing approaches for discrete data into three broad families: (1) continuous relaxations, (2) discrete nois- ing, and (3) simplex denoising. Approaches based on continuous relaxation embed discrete variables into an unconstrained Euclidean space and train con ventional continuous diffusion models with, e.g., a stan- dard Gaussian prior ( Jo et al. , 2022 ; 2024 ; Niu et al. , 2020 ). By encoding unordered categories into R , these methods create artefactual orderings and distances, which may bias generation. Initiated by D3PM ( Austin et al. , 2021 ), discr ete diffusion corrupts data by injecting discrete noise and models tran- sitions directly in the discrete domain. Subsequent work extends this idea to continuous time ( Campbell et al. , 2022 ) and to discrete flo w matching ( Campbell et al. , 2024 ; Gat et al. , 2024 ). In graph generation, DiGress ( V ignac et al. , 2023 ) and DeF og ( Qin et al. , 2025 ) e xemplify this line of work. By jumping between discrete states during denoising, these models may alter the nature of the instance abruptly and drastically . For instance, removing one edge may dis- connect a graph. In simplex denoising , categorical v ariables are represented as points on the probability simplex (vertices for one-hot encodings). One strategy diffuses in R K and leverages distributional correspondences, such as Gamma–Dirichlet or normal–logistic–normal, to map noisy samples back onto the simplex ( Richemond et al. , 2022 ; Floto et al. , 2023 ). A second strategy generates noise dir ectly on the simplex ( A vde yev et al. , 2023 ; Stark et al. , 2024 ). In this work, we adopt the latter approach. 2.3. Non-Markovian Diffusion In statistics and physics, the strict definition of a dif fusion process entails the Marko v property . In deep generati ve modeling, howe ver , “diffusion” is used more broadly to denote a variety of forward noising schemes. By defining the forward kernel q t | t +d t, 1 to depend on both the preced- ing noisy state and the clean sample, DDIM ( Song et al. , 2021 ) is, to our kno wledge, the first non -Marko vian diffu- sion model. Recently , a family of non-Markovian formulations speci- fies the noising process uniquely as a function of the noise lev el α t and the clean data x 1 , making q t | 1 ( x t | x 1 ) inde- pendent of all other noisy states ( Boget , 2025 ; Chen et al. , 2025 ; Zhang et al. , 2025 ). During denoising, this indepen- dence assumption removes direct dependencies on pre vi- ous noisy states and thereby mitigates error accumulation (“compounding denoising errors”) ( Lezama et al. , 2023 ). Our work is, to the best of our knowledge, the first to adapt this class of non-Marko vian methods to simplex-based de- noising. 2.4. Graph Generative Models Recently , denoising models coupled with equiv ariant GNNs hav e become the dominant approach, as they ha ve been shown to be highly effecti ve (for other approaches, see Appendix C.1 ). Continuous-space methods include score- based dif fusion ( Y ang et al. , 2019 ; Jo et al. , 2022 ), dif fusion bridges ( Jo et al. , 2024 ), and flow matching ( Eijkelboom et al. , 2024 ). Discrete formulations such as dif fusion in discrete time ( Haefeli et al. , 2022 ; V ignac et al. , 2023 ) or in continuous time ( Xu et al. , 2024 ), flo w matching ( Qin et al. , 2 Unrestrained Simplex Denoising f or Discrete Data 2025 ), and non-Markovian discrete dif fusion ( Boget , 2025 ), also report strong results. Howe ver , these approaches do not perform smooth dif fusion on the probability simplex: they either noise graphs in an unconstrained Euclidean space or operate as jump processes ov er discrete states. Beta dif fusion ( Liu et al. , 2025 ) applies continuous noise on the 1 -simplex via the Beta distrib ution. An e xtension to the K -dimensional probability simplex might appear straight- forward. Howe ver , no model achieves this extension di- rectly , indicating that this is more challenging than it appears. Liu et al. ( 2025 ) generalize their approach by encoding K - categorical variables as K binary variables. Eijkelboom et al. ( 2024 ) adopt the Dirichlet flo w matching formulation but report poor performance, leading them to relax the noising process to R K . Recently , Graph Bayesian Flow Network ( Song et al. , 2025 ) proposes a diffusion mechanism that adds Gaussian noise in R K followed by a projection into the simplex. Sampling proceeds via a dual update scheme in which the update type is selected based on the amplitude of the modification performed by the pre vious step. T aken together , these dev elopments suggest that simplex-based denoising is more difficult than one might expect. W e ad- dress this challenge by unr estraining the denoising process, yielding a simple yet ef fective model that operates dir ectly on the probability simplex. In the next section, we further motiv ate our approach. 2.5. Rationale for Denoising on the Simplex W e motiv ate denoising directly on the probability simple x by contrasting it with discrete denoising approaches and methods that operate in R K . Discrete and continuous noising differ fundamentally in the structure of the corrupted data. In discrete schemes, noisy samples remain sparse, whereas continuous perturbations produce fully connected weighted graphs. Continuous nois- ing av oids discontinuities, most notably graph disconnec- tion, and yields edge weights that naturally encode proxim- ity , providing each node with information about its relativ e position to all others. More generally , continuous v ectors are richer than one-hot representations of the same dimension and supply unique identifiers for nodes and edges, thereby alleviating the expressivity limitations of GNNs ( Loukas , 2020 ). This prevent the need for costly , hand-crafted struc- tural features typically required in discrete dif fusion and flow matching methods (see Appendix C.2 ). Importantly , the sparsity of discrete noising does not translate into sparse GNN computation. Edge representations and probabilities must still be ev aluated for all node pairs. Thus, the dense complexity remains. Modeling categorical distrib utions in R K introduces limita- tions that are av oided by operating directly on the probability simplex. First, points on the simplex have only K − 1 de- grees of freedom; embedding them in R K adds a redundant dimension that carries no information and unnecessarily complicates the dynamics. Second, simplex-based nois- ing naturally defines explicit and interpretable probability paths ov er categorical distrib utions (Section 3 ), whereas Euclidean noising requires nonlinear projection or renor- malization steps whose behavior is harder to characterize. Finally , points on the simplex admit a direct probabilistic interpretation, which is both conceptually elegant and facili- tates the design of the denoising process (Section 4.2 ). 3. Noising on the Simplex Noising processes defined on the probability simplex pose specific challenges. In particular , specifying q t as an in- terpolant between endpoint distributions q 1 and q 0 , as is common in discrete dif fusion and flow matching, restricts the support of q t , lea ving large re gions of S K out of distrib u- tion and creating jump discontinuities in q t . In this section, we formalize the setting, introduce an explicit, well-beha ved parameterization of the noising path, and propose ne w theo- retical tools for the analysis of the noising dynamics. 3.1. Problem Setting Let S K denote the K - 1 -dimensional probability simplex, S K = x ∈ R K 1 ⊤ K x = 1 , x i > 0 ∀ i . W e represent categorical node/edge attrib utes as vertices of the simplex, i.e., x 1 ∈ { e i } i ∈ [ K ] . A noising process on the simplex is a family of conditional distributions ( q t ) t ∈ [0 , 1) with q t ( · | x 1 ) supported on S K for all t ∈ [0 , 1) , and boundary conditions lim t → 1 q t ( x t | x 1 ) = δ ( x t − x 1 ) , q 0 ( x t | x 1 ) = q 0 ( x t ) , i.e., the termi- nal distribution collapses to the clean verte x and the initial distribution is independent of x 1 . Such processes can be specified either implicitly via a stochastic interpolant or explicitly by choosing a parametric family on the simple x. The interpolant construction is standard in discrete diffu- sion and flow matching, and also underlies Simple Iterative Denoising , which can be vie wed as the discrete analogue of our approach.. It sets x t = α t x 1 + (1 − α t ) x 0 , with x 1 ∼ p data , x 0 ∼ q 0 , and α t ∈ [0 , 1] . Howe ver , this for- mulation exhibits what Stark et al. ( 2024 ) call pathologi- cal properties. In particular, within the interpolant frame- work, the posterior p 1 | t ( x 1 | x t ) ∝ p data ( x 1 ) q t ( x t | x 1 ) has support on at most K − 1 vertices for all noise lev els α t > 1 /K , so that the posterior is tri vial for α t > 0 . 5 . For all t , q t ( x t | x 1 ) is a uniform distribution ov er subsets of S K , and zero else where, producing support discontinuities. Figure 1 illustrates such a process on the 2 -simplex. Such piecewise-constant beha vior is ill-suited for approximating posteriors with differentiable neural networks. W e avoid these pathologies by specifying an e xplicit, parametric prob- 3 Unrestrained Simplex Denoising f or Discrete Data t = 1 t = 0 . 75 t = 0 . 5 t = 0 . 25 t = 0 t = 1 t = 0 . 75 t = 0 . 5 t = 0 . 25 t = 0 F igure 1. Upper row: Noising with linear interpolant creates discontinuities and, for t > 0 . 5 , all points remain in their V oronoi region. Lower ro w: Noising with explicit parametric probability path a voids discontinuities. ability path on the simplex. Dirichlet distributions (and their mixtures) offer a natural, tractable choice on S K . Recall the Dirichlet density func- tion: Dir ( x ; α ) = B ( α ) − 1 Q K i =1 x α i − 1 i , where B is the multiv ariate Beta function (see Equation 10 ). For instance, a simple yet useful noising path is defined as q t ( x t | x 1 ) = Dir x t ; 1 + α t x 1 , with a non-decreasing noise schedule α t ∈ R ≥ 0 such that α 0 = 0 and lim t ↑ 1 α t = ∞ , e.g., α t = − a log(1 − t ) , where a is a hyperparameter . This way , q t has full support on S K for any t , thereby avoiding the support-collapse issues of interpolant-based methods. Fig- ure 1 illustrates noising based on a parametric probability path. 3.2. V oronoi Probabilities The V oronoi regions of the simplex vertices provide an intuitiv e lens on how a noising process beha ves. Let V k ⊆ S K denote the V oronoi region of vertex k such that V k = { x ∈ S K | d ( x , e k ) ≤ d ( x , e j ) ∀ j = k } , with d ( · , · ) the Euclidean distance. Geometrically , V k is the polytope of points whose nearest simplex verte x is e k . Equiv alently , we hav e x ∈ V k ⇐ ⇒ x k = max j x j . Giv en the conditional noising distribution q t ( x t | x 1 ) , we define the V or onoi pr obability for verte x k as the conditional probability P v k ( x t ∈ V k | x 1 = e k ) , i.e., the probability that the closest vertex to x t remains its origin x 1 . Intuitively , this quantity summarizes ho w quickly the unconditional distribution q t ( x ) ”mixes”. For the stochastic-interpolant construction, for instance, one has P ( x t ∈ V i | x 1 = e i ) = 1 for all t such that α t > 0 . 5 ; and con versely , the posterior over vertices collapses to the V oronoi indicator P 1 | t ( x 1 = k | x t ) = [ x t ∈ V k ] . For the probability path Dir ( x t ; 1 + α t x 1 ) introduced abov e, the V oronoi probability admits a closed-form e xpression: Proposition 3.1. Assume x ∼ Dir ( 1 + α t e i ) . Then, P v i ( x | e i ) = K − 1 X k =0 ( − 1) k K − 1 k ( k + 1) α t − 1 (1) All proofs are presented in Appendix A . W e le verage V oronoi probabilities to design and calibrate explicit probability paths. In Appendix E.2.2 , we ev aluate our model across multiple values of a , demonstrating the usefulness of V oronoi probabilities for calibrating the noise scheduler . 4. Method In this section, we present our Unr estrained Simple x Denois- ing (UNSIDE) method. As with any other denoising model, it includes a noising process that progressiv ely corrupts the data, and a learned denoising backward process. The multiv ariate forward (noising) process acts independently on each dimension x ( i ) , i ∈ [ L ] , noising instances on the multi-simplex S L K , where L is the number of dimensions. In contrast, during the reverse process used at inference, dependencies across dimensions are captured by a learned denoiser that outputs element-wise logits conditioned on all dimensions x (1: L ) t . This setup follows prior work on discrete diffusion and flow matching ( Austin et al. , 2021 ; V ignac et al. , 2023 ; Campbell et al. , 2024 ; Stark et al. , 2024 ; Boget , 2025 ). Our UNSIDE framework can be vie wed as a contin- uous analogue of Simple Iterati ve Denoising (SID) ( Boget , 2025 ). Importantly , extending SID to the continuous setting, much like adapting dif fusion or flow matching frame works 4 Unrestrained Simplex Denoising f or Discrete Data across data modalities, requires substantial methodological modifications and represents a significant contribution of this work. A detailed comparison between UNSIDE and SID is provided in Appendix C.3 . 4.1. Noising Our noising mechanism relies on tw o key components: (i) a probability path that gov erns corruption at any contin- uous time t ∈ [0 , 1] ; and (ii) a conditional-independence assumption across different times. Probability path W e define a probability path ( q t ) t ∈ [0 , 1) with boundary conditions lim t → 1 q t ( x | x 1 ) = δ ( x − x 1 ) and q 0 ( x | x 1 ) = q 0 ( x ) . Since the forward process operates independently across dimensions, the distribution at time t factorizes as q t x (1: L ) | x (1: L ) 1 = L Y i =1 q t x ( i ) | x ( i ) 1 . (2) In principle, our method accommodates any path ( q t ) t ∈ [0 , 1) . For efficient training, howe ver , we require (a) the ability to easily sample from q t for all t , and (b) well-behaved noise distributions as described in Section 3 . W e further discuss the choice of the probability path in the context of graph generation in Section 4.6 . In practice, Dirichlet distri- butions (or mixtures thereof) provide a con venient choice, e.g., q t ( x | x 1 ) = Dir ( 1 + α ( t ) x 1 ) , where α ∈ R ≥ 0 is an increasing function with α 0 = 0 and lim t → 1 α t = ∞ . Independence across time Unlik e standard diffusion models such as DDPM, DDIM, or D3PM, we do not de- fine direct dependencies between x t and x t + dt . Instead, we assume conditional independence of intermediate noisy states gi ven the clean input, as formalized in the follo wing assumption: Assumption 4.1. For any s = t , we assume the follo wing conditional-independence property: q t ( x t | x s , x 1 ) = q t ( x t | x 1 ) . (3) Because training typically draws a single noisy sample at a randomly chosen time t , this assumption has no practical effect on the training objecti ve. It does, ho wever , hav e major implications for the denoising process. 4.2. Denoising Let P 1 | t ( x ( i ) 1 | x (1: L ) t ) denote the posterior distribution of a clean element gi ven a noisy instance at time t . Since this posterior P 1 | t ( x ( i ) 1 | x (1: L ) t ) = Cat( x ; π ) is a categorical distribution, it is fully specified by its parameter π . W e denote p t + dt ( x ( i ) t + dt | x (1: L ) t ) a single denoising transition and obtain the following result. Proposition 4.2. Given P 1 | t ( x ( i ) 1 | x (1: L ) t ) and Assumption 4.1 , the one-step denoising kernel satisfies p t + dt ( x ( i ) t + dt | x (1: L ) t ) = X x ( i ) 1 q t + dt ( x ( i ) t + dt | x ( i ) 1 ) P 1 | t ( x ( i ) 1 | x (1: L ) t ) = E x ( i ) 1 ∼ P 1 | t ( ·| x (1: L ) t ) h q t + dt ( x ( i ) t + dt | x ( i ) 1 ) i . (4) Thus, each denoising step can be interpreted as (i) sampling a clean element x ( i ) 1 from the posterior P 1 | t ( x ( i ) | x (1: L ) t ) , followed by (ii) re-noising this sample via q t + dt ( · | x ( i ) 1 ) . Equiv alently , we sample from the noisy distribution un- der the expected clean posterior . W e emphasize that the simplicity of these formulations relies on the fact that q t + dt ( x ( i ) t + dt | x ( i ) 1 ) remains on the simplex. At inference time, we fix the number of function e v aluations (NFE) to T and use a uniform step size dt = 1 /T . The full rev erse procedure consists of iterating T times the transition in Equation 4 . 4.3. Con vergence and Corrector Sampling For simplicity , we first consider the univ ariate case. By fixing the time inde x in Equation 4 , we obtain the Marko v kernel p t ′ | t ( x ′ t | x t ) = P x ( i ) 1 q t ( x ′ t ′ | x 1 ) P 1 | t ( x 1 | x t ) . Proposition 4.3. Assume that P 1 | t ( x 1 | x t ) and q ( x t | x 1 ) have full support on [ K ] and S K , r espectively . Further assume that the conditional independence holds: q t ( x ′ t | x t , x 1 ) = q t ( x ′ t | x 1 ) . Then the Markov kernel p t ′ | t ( x ′ t | x t ) = P x 1 ∈S L P 1 | t ( x 1 | x t ) q t ( x ′ t | x 1 ) con verg e to the stationary distribution π t ( x t ) = X x 1 ∈S L p ( x 1 ) q t ( x t | x 1 ) , (5) Fixing the time index during denoising therefore yields a simple corr ector sampler , analogous in spirit to the correc- tor step of Gat et al. ( 2024 ) for discrete flow matching. The proof easily extend to the multiv ariate case (see Appendix A ). When the true posterior P 1 | t ( x 1 | x t ) is used, iterati vely sampling from the kernel con verges to π t . Thus, iterating de- noising and corrector steps to stationarity recov ers samples from p ( x 1 ) . In practice, the true posterior P 1 | t ( x 1 | x ∗ t ) is not av ail- able and must be approximated by a parameterized model P θ 1 | t ( x 1 | x t ) . The resulting approximation error induces an error in the denoising transition, which is bounded as follow: Proposition 4.4. The KL diverg ence between the true and appr oximate one-step denoising distributions is up- per bounded by the KL diver gence between the true and 5 Unrestrained Simplex Denoising f or Discrete Data appr oximate posterior distributions: D KL p t + dt ( · | x t ) , | , p θ t + dt ( · | x t ) ≤ D KL P 1 | t ( · | x t ) , | , P θ 1 | t ( · | x t ) . (6) This proposition guarantees that posterior approximation error does not lead to a lar ger KL error in the induced one- step denoising transition. 4.4. Advantages ov er Diffusion and Flow Matching While our procedure resembles standard diffusion and flo w matching, there is a ke y distinction. In classical re verse processes one typically enforces lim dt → 0 p t + dt | t ( x | x t ) = δ ( x − x t ) . By Assumption 4.1 , our model does not impose this constraint. Intuitively , dif fusion and flow matching update the sampl e by interpolating between the current noisy state and the denoiser’ s prediction, whereas our method directly resamples a less noisy state conditioned on the denoiser’ s prediction. This observ ation pro vides intuition for the sampling benefits of our approach. In diffusion and flo w matching, if x t lies in a region where the learned denoiser P θ 1 | t poorly approxi- mates the true posterior P 1 | t , or in a region highly sensitiv e to small perturbations (e.g., near equilibrium points), then the next iterate x t + dt remains close to x t , causing approx- imation errors to compound over time. In contrast, our method resamples from P θ 1 | t at each step, so x t + dt need not stay in the same local region. This reduces error com- pounding by allo wing the trajectory to escape regions that are poorly approximated and/or sensitiv e to small pertur- bations. Our experiments demonstrate that UNSIDE con- sistently outperforms diffusion-based models and Dirichlet Flow Matching by a lar ge margin. In addition, our formulation constitutes a substantial simpli- fication relati ve to standard denoising methods such as dif- fusion and flow matching. Compared to diffusion, Assump- tion 4.1 yields the simplification q t + dt ( x t + dt | x 1 , x t ) = q t + dt ( x t + dt | x 1 ) . Compared to flow matching, it remov es the need to construct a vector field that realizes the cho- sen probability path and to e v aluate its deri v ati ves at e very denoising step, operations that are computationally e xpen- siv e (See sampling time comparison in Appendix E.1 ). In contrast, our method is both simple and computationally efficient. W e therefore regard this simplification as an im- portant contribution. 4.5. Parametrization and Learning As in discrete dif fusion, the posterior P 1 | t ( x 1 | x (1: L ) t ) is intractable. W e approximate it with a neural network f θ ( x (1: L ) t , t ) = π (1: L ) , referred to as the denoiser , which outputs element-wise logits for the categorical distrib ution. T o respect graph symmetries, we instantiate f θ as a Graph Neural Network (GNN), ensuring equi v ariance to node per- mutations. Our training objectiv e matches the x -prediction of discrete denoising and discrete flow matching methods, so any of their criteria can be adopted. Following V ignac et al. ( 2023 ), we minimize a weighted negati ve log-likelihood: L θ = E γ X v 1 [ − log ( p θ ( v 1 | G t ))] + (1 − γ ) X a 1 [ − log ( p θ ( a 1 | G t ))] , (7) where the expectation is over t ∼ U (0 , 1) , G 1 ∼ p data , and G t ∼ q t ( G t | G 1 ) , and γ is a weighting f actor between nodes and edges. 4.6. Probability P ath and Prior Choices A natural choice for the noise prior is the uniform distribu- tion o ver the simplex, q 0 ( x ) = Dir( 1 ) . Howe ver , graphs are typically sparse, with highly imbalanced edge distributions. In discrete diffusion, V ignac et al. ( 2023 ) has sho wn that the empirical marginal distrib ution is theoretically optimal and empirically effecti ve. Motiv ated by this result, we consider a mar ginal-weighted Dirichlet mixture prior: q marg 0 ( x ) = P K k =1 m k Dir( x ; 1 + κ e k ) , where m = ( m 1 , . . . , m K ) denotes the empirical marginal ov er cate gories, e k is the k th standard basis vector , and κ > 0 is a user -specified concentration parameter . W e can directly parametrize the prior as such, but we can also approximate this prior using a model trained with the probability path q ⋆ t ( x t | x 1 ) := Dir( x t ; 1 + α t x 1 ) . In that case, we hav e dimension-wise: E x 1 ∼ p data [ q ⋆ t ( x t | x 1 )] = q marg 0 ( x 0 ) ⇐ ⇒ α t = κ. This identity does not directly extend to the L -dimensional joint, since in general the L dimensions are not independent. Nevertheless, when q ⋆ t is suf ficiently close to the stationary distrib ution q ⋆ 0 , we can lev erage the following approximation: q ⋆ t x (1: L ) t | x (1: L ) 1 ≈ L Y l =1 q ⋆ t x ( l ) t | x ( l ) 1 = q marg 0 ( x (1: L ) ) , (8) with α t = κ . Operationally , we can thus initialize inference with x (1: L ) 0 ∼ q marg 0 , and then run the reverse process along the path q ⋆ t with α 0 = κ . W e show in Section 5 that this initialization improv es generati ve performance. 4.7. Guidance While high-quality unconditional generation is a prerequi- site, conditioning on graph-lev el properties is necessary for 6 Unrestrained Simplex Denoising f or Discrete Data many downstream tasks. In drug discov ery , for example, one often seeks molecules that are both synthetically accessible and highly activ e against specified targets. Denoising-based models typically steer samples via either classifier guidance or classifier-fr ee guidance ; our framew ork supports both, be- cause these conditioning mechanisms are direct adaptations of established techniques. Due to space limitations, we pro- vide implementation details in Appendix B . W e demonstrate the ef fecti veness of our approach for property-conditioned molecular generation in Section 5 . 5. Evaluation and Results W e ev aluate our model on both molecular and synthetic graph datasets. For molecular data, we adopt two standard benchmark datasets: QM9 and ZINC250K . Regarding QM9 , we prefer the more challenging version with explicit hy- dr ogens ( QM9H ), as the smaller v ersion reaches saturation. The QM9 dataset contains 133,885 molecules with up to 29 atoms of 5 types. The ZINC250K dataset contains 250,000 molecular graphs with up to 38 hea vy atoms of 9 types. For generic graphs, we run experiments on the Planar and Stochastic Block Model ( SBM ) datasets. Both con- tain 200 unattributed graphs with 64 and up to 200 nodes, respectiv ely . V isualizations of generated molecules and graphs are av ailable in Appendix F . Our code will be re- leased upon acceptance. 5.1. Baselines W e compare against two categories of baselines: (i) pub- lished models representing dif ferent denoising approaches, and (ii) our own implementations for controlled compar - isons. W e report results for four representative methods. D I - G R E S S ( V ignac et al. , 2023 ) implements standard discrete diffusion; G RU M ( Jo et al. , 2024 ) performs denoising in a continuous space; GraphBFN (G R A B F N ) is a Bayesian flow–netw ork–based generative model; S I D ( Boget , 2025 ) is a non-Markovian discrete dif fusion method; and D E F O G ( Qin et al. , 2025 ) is a discrete flo w-matching model. In general, we reproduce the numbers reported in the original papers, except for the Zinc250K results of D I G R E S S , for which we include the values pro vided in Jo et al. ( 2024 ), as D I G R E S S does not report on this dataset. For the Planar and SBM datasets, we reran experiments for baselines that reported results from a single sampling run. W e motiv ate this choice and describe our reproduction protocol in Ap- pendix D.7 . Finally , we also report metrics computed on random samples of the training set, each containing the same number of graphs as the generated samples ( T R A I N . ). T o isolate the ef fect of our sampling formulation, we train three models with identical architectures and training hy- perparameters for each experiment: a discrete denoiser and a simplex-based denoiser , and a continuous denoiser . The only notable dif ference is that the discrete denoiser includes the extra input features used by D I G R E S S . F or the dis- crete and the simplex-based denoisers, we ev aluate two sampling procedures. For the discrete denoiser , we use stan- dard discrete diffusion ( D I S C D I FF , as in D I G R E S S ) and its non-Marko vian v ariant (N M - D D , as in S I D ). The con- tinuous denoiser is identical to the simplex denoiser , but its inputs lies on R K × L instead of S L K , we refer to is as the non-Markovian Continuous Dif fusion ( ( N M - C D , S E E A P P E N D I X F O R D E TA I L S . ). On the simplex, we implement the Dirichlet flow-matching sampler of Stark et al. ( 2024 ) (D I R I F M ), as well as our unrestrained simplex denoising (U N S I D E ) sampler . Experimental and implementation de- tails are provided in Appendix D.5 . 5.2. Results W e report means over fi ve independent sampling runs. The ev aluation protocol, additional metrics, and standard de vi- ations (std.) are provided in Appendices D.5 and E . W e highlight in bold all results that fall within the training-set error margin. If none do, we bold the best result. Molecule Generation W e ev aluate generative perfor- mance by measuring distrib utional similarity in chemical space (FCD), subgraph structures (NSPDK, reported × 10 3 ), and chemical validity without correction. Our U N S I D E model surpasses all baselines, often by a significant mar- gin, on almost all metrics, except for FCD on QM9H . In particular , it generates very few in valid molecules. T able 1. Results on Qm9H . V A L I D ( % ) ↑ F C D ↓ N S P D K ↓ T R A I N . 98 . 90 0 . 062 0 . 121 D I G R E S S 95 . 4 D I S C D I F 22 . 29 4 . 246 41 . 932 N M - D D 97 . 97 0 . 366 1 . 149 N M - C D 31 . 29 1 . 69 22 . 675 D I R I F M 92 . 20 0 . 356 0 . 495 U N S I D E 98 . 87 0 . 152 0 . 487 Unattributed Graph Generation F or unattrib uted graphs, we measure distributional similarity via Maximum Mean Discrepancy (MMD) o ver degree distrib utions, clus- tering coef ficients, orbit counts, and spectral densities. For On Planar and SBM , we follo w the procedure of Mar - tinkus et al. ( 2022 ), MMDs are reported × 10 3 and we also report v alidity , i.e., planarity and statistical consistency with the true stochastic block model. On Enzymes, we follow the procedure of Jo et al. ( 2022 ). In particular, MMDs are 7 Unrestrained Simplex Denoising f or Discrete Data T able 2. Results on ZINC250K . V A L I D ( % ) ↑ F C D ↓ N S P D K ↓ T R A I N . 100 . 00 1 . 13 0 . 10 D I G R E S S 94 . 99 3 . 48 2 . 1 G RU M 98 . 65 2 . 26 1 . 5 G R A B F N 99 . 22 2 . 12 1 . 3 S I D 99 . 50 2 . 06 2 . 01 D I S C D I F 74 . 17 4 . 78 4 . 08 N M - D D 99 . 92 2 . 65 3 . 48 N M - C D 57 . 96 9 . 51 13 . 65 D I R I F M 97 . 32 2 . 79 1 . 93 U N S I D E 99 . 98 1 . 79 1 . 00 computed with the Earth Moving Distance instead of the total variation distance. On Planar (T able 3 ), our model reaches a perfect v alidity rate, while also achie ving comparable results to the best baseline on other metrics. On SBM (T able 4 ), we again ob- tain the highest validity and achie ve results on the remaining metrics that are comparable to the strongest baseline. On SBM , we do not report D E F O G , as it reports relati vely lo w uniqueness and nov elty , as well as degree MMD, which is significantly belo w the reference metric (between training and test set). Under these conditions, we consider that the results for the other metrics are not significant. T able 3. Results on Planar . V A L I D ↑ D E G . ↓ C L U S T . ↓ O R B I T ↓ S P E C T . ↓ T R A I N . 100 . 0 0 . 25 36 . 6 0 . 8 6 . 6 D I G R E S S 45 . 5 0 . 71 45 . 4 1 . 31 8 . 4 G RU M 91 . 0 0 . 38 40 . 7 6 . 42 7 . 9 S I D 91 . 3 5 . 9 163 . 4 19 . 1 7 . 6 D E F O G 99 . 5 0 . 5 50 . 1 0 . 6 7 . 2 D I S C D I F 0 . 0 56 . 1 294 . 0 1410 . 0 85 . 1 N M - D D 98 . 0 14 . 1 363 . 0 27 . 2 6 . 9 N M - C D 0 . 0 1 . 2 226 . 3 45 . 8 22 . 0 D I R I F M 0 . 0 6 . 4 196 . 5 45 . 0 21 . 7 U N S I D E 100 . 0 0 . 36 39 . 9 0 . 78 7 . 1 T able 4. Results on SBM . V A L I D ↑ D E G . ↓ C L U S T . ↓ O R B I T ↓ S P E C T . ↓ T R A I N . 93 . 5 1 . 57 50 . 1 37 . 0 4 . 5 D I G R E S S 51 . 0 1 . 28 51 . 5 39 . 6 5 . 0 G RU M 67 . 5 2 . 20 49 . 9 40 . 4 5 . 1 S I D 63 . 5 11 . 5 51 . 4 123 . 1 5 . 9 D I S C D I F 0 . 0 1 . 82 86 . 4 125 . 7 11 . 2 N M - D D 60 . 5 4 . 38 50 . 9 52 . 9 5 . 7 N M - C D 57 . 5 1 . 89 50 . 3 41 . 1 5 . 4 D I R I F M 46 . 0 4 . 26 53 . 0 53 . 0 5 . 0 U N S I D E 78 . 5 1 . 74 49 . 9 52 . 1 5 . 9 T able 5. Results on Enzymes . D E G . ↓ C L U S T . ↓ O R B I T ↓ S P E C T . ↓ T R A I N . G R A P H V A E 1 . 3 6 9 0 . 6 2 9 0 . 1 9 1 - G R A P H R N N 0 . 0 1 7 0 . 0 6 2 0 . 0 4 6 - G D S S 0 . 0 2 6 0 . 0 6 1 0 . 0 0 9 - D I S C D I F 0 . 212 0 . 631 0 . 515 0 . 020 N M - C D N M - D D 0 . 013 0 . 039 0 . 010 0 . 017 D I R I F M 0 . 007 0 . 101 0 . 006 0 . 020 U N S I D E 0 . 010 0 . 052 0 . 002 0 . 020 5.3. Ablations and Efficiency W e study the impact of ke y design choices: the unrestrained denoising formulation, the number of function e valuations (NFE), the noise scheduler , the sampling efficiency , and the initialization of the sampling noise. First, D I R I F M and our unrestrained simplex denoising use the same denoiser; the only difference lies in the sampling strategy . Across all experiments and metrics, our unrestrained sampler con- sistently outperforms simplex flo w matching, often by a large mar gin, demonstrating the empirical superiority of our sampling method. Second, we ablate NFE on ZINC250K . Results are shown in Figure 2 and T able 9 . Our model attains state-of-the-art validity in as fe w as 64 steps, with competitiv e FCD and NSPDK. As expected, all metrics improv e monotonically with lar ger NFE. Third, Appendix E.1 shows that our model is highly efficient at sampling: it is the fastest among all compared methods and achie ves these results using fewer parameters than the strongest base- lines. F ourth, Appendix E.2.2 demonstrates that V oronoi probabilities provide an ef fective tool for calibrating our noise scheduler, and that our model remains robust under substantial v ariations of the scheduling parameter . Finally , we compare two noise initializations: the uniform prior q 0 = Dir( 1 ) versus the mar ginal mixture q marg 0 with κ = 2 (Section 4.6 ). Marginal initialization yields a small, but consistent improv ement in all metrics (See Figure 5 ). 5.4. Guidance W e ev aluate conditional generation to ward a tar get molec- ular property using the Quantitati ve Estimation of Drug- likeness (QED) score (as implemented in RDKit ). On QM9H , we apply classifier guidance to steer samples to- ward prescribed standardized QED v alues. Concretely , we draw 1,000 target QEDs from the test set, generate 1,000 molecules conditioned on these targets, and compute the QEDs of the generated molecules with RDKit . Guidance effecti veness is quantified by the mean absolute error (MAE) between the target and generated QEDs. Results are reported in T able 11 . W e observe that classifier 8 Unrestrained Simplex Denoising f or Discrete Data F igure 2. Evolution of metrics on ZINC250K as a function of the number of function ev aluations (NFE). guidance effecti vely steers generation tow ard the prescribed QED v alues (reducing MAE from 1.06 to 0.42) without hindering the model’ s ability to generate valid molecules. 6. Conclusion W e introduced Unr estrained Simplex Denoising (U N S I D E ), a new denoising paradigm for categorical data that oper - ates directly on the probability simplex and relax es the Markovian constraints commonly imposed by dif fusion and flow-matching methods. On molecular and synthetic graph benchmarks, U N S I D E matches or surpasses strong discrete diffusion and flo w-matching baselines, reaches state-of-the- art v alidity with fe w denoising steps, and enables ef fecti ve property-conditioned generation. Ablations demonstrate the benefits of the non-Marko vian formulation, a fav orable NFE–quality trade-of f, and the utility of marginal-based initialization on the simplex. More broadly , U N S I D E rep- resents a ne w class of simplex-based generative models, opening promising directions such as extending the frame- work to additional discrete modalities (e.g., sequences and structured tabular data), exploring alternativ e probability paths, and integrating hybrid discrete–continuous v ariables. Impact Statement This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here. References Austin, J., Johnson, D. D., Ho, J., T arlow , D., and v an den Berg, R. Structured denoising diffusion models in discrete state-spaces. In Ranzato, M., Beygelzimer , A., Dauphin, Y ., Liang, P ., and V aughan, J. W . (eds.), Advances in Neural Information Pr ocessing Systems , volume 34, pp. 17981–17993. Curran Associates, Inc., 2021. URL https://proceedings.neurips. cc/paper_files/paper/2021/file/ 958c530554f78bcd8e97125b70e6973d- Paper. pdf . A vdeye v , P ., Shi, C., T an, Y ., Dudnyk, K., and Zhou, J. Dirichlet dif fusion score model for biological sequence generation. In Krause, A., Brunskill, E., Cho, K., En- gelhardt, B., Sabato, S., and Scarlett, J. (eds.), Pr o- ceedings of the 40th International Confer ence on Ma- chine Learning , v olume 202 of Pr oceedings of Machine Learning Researc h , pp. 1276–1301. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/ v202/avdeyev23a.html . Bergmeister , A., Martinkus, K., Perraudin, N., and W at- tenhofer , R. Efficient and scalable graph generation through iterativ e local expansion. In The T welfth In- ternational Confer ence on Learning Representations , 2024. URL https://openreview.net/forum? id=2XkTz7gdpc . Biswas, S. V arious proofs of the fundamental theorem of markov chains, 2022. URL abs/2204.00784 . Boget, Y . Simple and critical iterati ve denoising: A recast- ing of discrete diffusion in graph generation. In Pr oceed- ings of the 42th International Confer ence on Mac hine Learning , Proceedings of Machine Learning Research. PMLR, July 2025. Boget, Y ., Gregorov a, M., and Kalousis, A. Discrete graph auto-encoder . T ransactions on Machine Learn- ing Researc h , 2024. ISSN 2835-8856. URL https: //openreview.net/forum?id=bZ80b0wb9d . Boget, Y ., Strasser , P ., and Kalousis, A. Hierarchical equiv ariant graph generation, 2025. URL https:// openreview.net/forum?id=uEqOYXtn7f . Campbell, A., Benton, J., De Bortoli, V ., Rainforth, T ., Deligiannidis, G., and Doucet, A. A continuous time framew ork for discrete denoising models. In K oyejo, S., Mohamed, S., Agarwal, A., Belgrav e, D., Cho, K., and Oh, A. (eds.), Advances in Neur al Information Pr o- cessing Systems , volume 35, pp. 28266–28279. Curran Associates, Inc., 2022. Campbell, A., Y im, J., Barzilay , R., Rainforth, T ., and Jaakkola, T . Generative flows on discrete state-spaces: enabling multimodal flows with applications to protein co-design. In Pr oceedings of the 41st International Con- fer ence on Machine Learning , pp. 5453–5512, 2024. 9 Unrestrained Simplex Denoising f or Discrete Data Chen, X., He, J., Han, X., and Liu, L. Efficient and degree-guided graph generation via discrete diffusion modeling. In Krause, A., Brunskill, E., Cho, K., En- gelhardt, B., Sabato, S., and Scarlett, J. (eds.), Pr o- ceedings of the 40th International Confer ence on Ma- chine Learning , v olume 202 of Pr oceedings of Machine Learning Researc h , pp. 4585–4610. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/ v202/chen23k.html . Chen, Z., Y uan, H., Li, Y ., K ou, Y ., Zhang, J., and Gu, Q. Fast sampling via discrete non-mark ov dif fusion models with predetermined transition time. In Pr oceedings of the 38th International Conference on Neur al Information Pr ocessing Systems , NIPS ’24, Red Hook, NY , USA, 2025. Curran Associates Inc. ISBN 9798331314385. Costa, F . and Grave, K. D. Fast neighborhood sub- graph pairwise distance kernel. In F ¨ urnkranz, J. and Joachims, T . (eds.), Pr oceedings of the 27th Interna- tional Confer ence on Machine Learning (ICML-10), J une 21-24, 2010, Haifa, Israel , pp. 255–262. Omnipress, 2010. URL https://icml.cc/Conferences/ 2010/papers/347.pdf . De Cao, N. and Kipf, T . MolGAN: An implicit generativ e model for small molecular graphs. arXiv:1805.11973 [cs, stat] , May 2018. URL 1805.11973 . arXiv: 1805.11973. Eijkelboom, F ., Bartosh, G., Naesseth, C. A., W elling, M., and van de Meent, J.-W . V ariational flo w match- ing for graph generation. In The Thirty-eighth Annual Confer ence on Neural Information Pr ocessing Systems , 2024. URL https://openreview.net/forum? id=UahrHR5HQh . Floto, G., Jonsson, T ., Nica, M., Sanner , S., and Zhu, E. Z. Dif fusion on the probability simplex, 2023. URL https: //arxiv.org/abs/2309.02530 . Gat, I., Remez, T ., Shaul, N., Kreuk, F ., Chen, R. T . Q., Syn- naev e, G., Adi, Y ., and Lipman, Y . Discrete flow match- ing. In The Thirty-eighth Annual Confer ence on Neural Information Pr ocessing Systems , 2024. URL https: //openreview.net/forum?id=GTDKo3Sv9p . Goyal, N., Jain, H. V ., and Ranu, S. Graphgen: A scalable approach to domain-agnostic labeled graph generation. In Huang, Y ., King, I., Liu, T ., and van Steen, M. (eds.), WWW ’20: The W eb Conference 2020, T aipei, T aiwan, April 20-24, 2020 , pp. 1253–1263. A CM / IW3C2, 2020. doi: 10.1145/3366423.3380201. URL https://doi. org/10.1145/3366423.3380201 . Haefeli, K. K., Martinkus, K., Perraudin, N., and W atten- hofer , R. Dif fusion models for graphs benefit from dis- crete state spaces. In The Fir st Learning on Graphs Con- fer ence , 2022. URL https://openreview.net/ forum?id=CtsKBwhTMKg . Jo, J., Lee, S., and Hwang, S. J. Score-based Generativ e Modeling of Graphs via the System of Stochastic Differ - ential Equations. Pr oceedings of the 39th International Confer ence on Machine Learning , 162:10362–10383, 2022. URL https://github.com/harryjo97/ GDSS.http://arxiv.org/abs/2202.02514 . Jo, J., Kim, D., and Hwang, S. J. Graph generation with dif fusion mixture. In Salakhutdinov , R., K olter , Z., Heller , K., W eller, A., Oli ver , N., Scarlett, J., and Berk enkamp, F . (eds.), Pr oceedings of the 41st International Conference on Machine Learning , v olume 235 of Pr oceedings of Ma- chine Learning Resear ch , pp. 22371–22405. PMLR, 21– 27 Jul 2024. URL https://proceedings.mlr. press/v235/jo24b.html . Karami, M. Higen: Hierarchical graph generativ e networks. In The T welfth International Conference on Learning Repr esentations , 2024. URL https://openreview. net/forum?id=KNvubydSB5 . K ong, L., Cui, J., Sun, H., Zhuang, Y ., Prakash, B. A., and Zhang, C. Autoregressi ve diffusion model for graph generation. In Krause, A., Brunskill, E., Cho, K., En- gelhardt, B., Sabato, S., and Scarlett, J. (eds.), Pr o- ceedings of the 40th International Confer ence on Ma- chine Learning , v olume 202 of Pr oceedings of Machine Learning Resear ch , pp. 17391–17408. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/ v202/kong23b.html . Krawczuk, I., Abranches, P ., Loukas, A., and Ce vher , V . { GG } - { gan } : A geometric graph generativ e adversarial network, 2021. URL https://openreview.net/ forum?id=qiAxL3Xqx1o . Lezama, J., Salimans, T ., Jiang, L., Chang, H., Ho, J., and Essa, I. Discrete predictor-corrector dif fusion models for image synthesis. In The Eleventh International Confer- ence on Learning Repr esentations , 2023. URL https: //openreview.net/forum?id=VM8batVBWvg . Liao, R., Li, Y ., Song, Y ., W ang, S., Hamilton, W ., Duve- naud, D. K., Urtasun, R., and Zemel, R. Ef ficient Graph Generation with Graph Recurrent Attention Networks. In Advances in Neural Information Pr ocessing Systems , volume 32. Curran Associates, Inc., 2019. Liu, J., Kumar , A., Ba, J., Kiros, J., and Swersky , K. Graph normalizing flo ws. In Advances in Neural Information Pr ocessing Systems , volume 32, 2019. 10 Unrestrained Simplex Denoising f or Discrete Data Liu, X., He, Y ., Chen, B., and Zhou, M. Advancing graph generation through beta dif fusion. In 13th International Confer ence on Learning Representations (ICLR 2025) , 2025. Loukas, A. What graph neural networks cannot learn: depth vs width. In International Conference on Learning Repr esentations , 2020. URL https://openreview. net/forum?id=B1l2bp4YwS . Luo, Y ., Y an, K., and Ji, S. GraphDF: A Discrete Flo w Model for Molecular Graph Generation. Pr oceedings of the 38th International Confer ence on Machine Learning , 139:7192–7203, 2021. URL abs/2102.01189 . Madhawa, K., Ishiguro, K., Nakago, K., and Abe, M. Graph- n vp: An in vertible flo w model for generating molecular graphs, 2019. Martinkus, K., Loukas, A., Perraudin, N., and W atten- hofer , R. SPECTRE: Spectral conditioning helps to ov er- come the expressi vity limits of one-shot graph genera- tors. In Chaudhuri, K., Jegelka, S., Song, L., Szepes- vari, C., Niu, G., and Sabato, S. (eds.), Pr oceedings of the 39th International Confer ence on Machine Learn- ing , volume 162 of Pr oceedings of Machine Learn- ing Researc h , pp. 15159–15179. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/ v162/martinkus22a.html . Morris, C., Ritzert, M., Fey , M., Hamilton, W . L., Lenssen, J. E., Rattan, G., and Grohe, M. W eisfeiler and leman go neural: Higher-order graph neural networks. In 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Con- fer ence, IAAI 2019 and the 9th AAAI Symposium on Edu- cational Advances in Artificial Intellig ence, EAAI 2019 , volume 33, pp. 4602–4609. AAAI Press, jul 2019. ISBN 9781577358091. doi: 10.1609/aaai.v33i01.33014602. URL www.aaai.org . Nguyen, V . K., Boget, Y ., Lavda, F ., and Kalousis, A. GLAD: Improving latent graph generati ve modeling with simple quantization. In ICML 2024 W orkshop on Struc- tur ed Pr obabilistic Inference & Generative Modeling , 2024. URL https://openreview.net/forum? id=aY1gdSolIv . Nichol, A. Q. and Dhariwal, P . Improved denoising dif- fusion probabilistic models. In Meila, M. and Zhang, T . (eds.), Pr oceedings of the 38th International Confer- ence on Machine Learning , volume 139 of Pr oceedings of Machine Learning Rese ar ch , pp. 8162–8171. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr. press/v139/nichol21a.html . Niu, C., Song, Y ., Song, J., Zhao, S., Grover , A., and Er- mon, S. Permutation In v ariant Graph Generation via Score-Based Generative Modeling. In Pr oceedings of Machine Learning Resear ch , volume 108, pp. 4474– 4484. PMLR, jun 2020. URL http://proceedings. mlr.press/v108/niu20a.html . Preuer , K., Renz, P ., Unterthiner , T ., Hochreiter, S., and Klambauer , G. Fr ´ echet chemnet distance: A metric for generativ e models for molecules in drug disco very . Jour - nal of Chemical Information and Modeling , 58(9):1736– 1741, 2018. doi: 10.1021/acs.jcim.8b00234. PMID: 30118593. Qin, Y ., V ignac, C., and Frossard, P . Sparse train- ing of discrete dif fusion models for graph generation, 2024. URL https://openreview.net/forum? id=oTRekADULK . Qin, Y ., Madeira, M., Thanou, D., and Frossard, P . Defog: Discrete flo w matching for graph generation. In F orty- second International Confer ence on Machine Learning , 2025. URL https://openreview.net/forum? id=KPRIwWhqAZ . Richemond, P . H., Dieleman, S., and Doucet, A. Categorical sdes with simplex diffusion, 2022. URL https:// arxiv.org/abs/2210.14784 . Shi, C., Xu, M., Zhu, Z., Zhang, W ., Zhang, M., and T ang, J. Graphaf: a flow-based autoregressiv e model for molecular graph generation. In International Confer- ence on Learning Repr esentations , 2020. URL https: //openreview.net/forum?id=S1esMkHYPr . Simonovsk y , M. and K omodakis, N. GraphV AE: T owards Generation of Small Graphs Using V ariational Autoen- coders. arXiv:1802.03480 [cs] , February 2018. URL . arXiv: 1802.03480. Song, J., Meng, C., and Ermon, S. Denoising dif fusion implicit models. In International Conference on Learning Repr esentations , 2021. URL https://openreview. net/forum?id=St1giarCHLP . Song, Y ., Shi, J., Gong, J., Xu, M., Ermon, S., Zhou, H., and Ma, W .-Y . Smooth interpolation for improved discrete graph generative models. In Singh, A., Fazel, M., Hsu, D., Lacoste-Julien, S., Berkenkamp, F ., Ma- haraj, T ., W agstaff, K., and Zhu, J. (eds.), Proceed- ings of the 42nd International Conference on Machine Learning , volume 267 of Pr oceedings of Machine Learn- ing Researc h , pp. 56363–56388. PMLR, 13–19 Jul 2025. URL https://proceedings.mlr.press/ v267/song25f.html . 11 Unrestrained Simplex Denoising f or Discrete Data Stark, H., Jing, B., W ang, C., Corso, G., Berger , B., Barzi- lay , R., and Jaakkola, T . Dirichlet flow matching with applications to dna sequence design. In Pr oceedings of the 41st International Confer ence on Machine Learning , ICML ’24. JMLR.org, 2024. V aswani, A., Shazeer , N., Parmar , N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser , Ł ., and Polosukhin, I. Trans- former: Attention is all you need. Advances in Neural Information Pr ocessing Systems 30 , pp. 5998–6008, 2017. ISSN 10495258. V ignac, C., Krawczuk, I., Siraudin, A., W ang, B., Cevher , V ., and Frossard, P . Digress: Discrete denoising dif fusion for graph generation. In The Eleventh International Confer - ence on Learning Repr esentations , 2023. URL https: //openreview.net/forum?id=UaAD- Nu86WX . Xu, Z., Qiu, R., Chen, Y ., Chen, H., Fan, X., Pan, M., Zeng, Z., Das, M., and T ong, H. Discrete-state continuous- time diffusion for graph generation. In The Thirty-eighth Annual Confer ence on Neural Information Pr ocessing Systems , 2024. URL https://openreview.net/ forum?id=YkSKZEhIYt . Y ang, C., Zhuang, P ., Shi, W ., Luu, A., and Li, P . Con- ditional Structure Generation through Graph V ariational Generativ e Adversarial Nets. In Advances in Neural In- formation Pr ocessing Systems , volume 32, 2019. Y ou, J., Y ing, R., Ren, X., Hamilton, W ., and Leskovec, J. GraphRNN: Generating Realistic Graphs with Deep Auto- regressi ve Models. In Pr oceedings of the 35th Interna- tional Confer ence on Mac hine Learning , pp. 5708–5717. PMLR, July 2018. URL https://proceedings. mlr.press/v80/you18a.html . ISSN: 2640- 3498. Zang, C. and W ang, F . MoFlow: An In vertible Flow Model for Generating Molecular Graphs. Proceedings of the A CM SIGKDD International Confer ence on Knowledge Discovery and Data Mining , 10:617–626, aug 2020. doi: 10.1145/3394486.3403104. URL https://dl.acm. org/doi/10.1145/3394486.3403104 . Zhang, Y ., He, S., Levine, D., Zhao, L., Zhang, D., Rizvi, S. A., Zappala, E., Y ing, R., and van Dijk, D. Non- markovian discrete diffusion with causal language mod- els, 2025. URL 09767 . Zhao, L., Ding, X., and Akoglu, L. Pard: Permutation- in variant autoregressi ve diffusion for graph generation. In The Thirty-eighth Annual Confer ence on Neural In- formation Pr ocessing Systems , 2024. URL https: //openreview.net/forum?id=x4Kk4FxLs3 . 12 Unrestrained Simplex Denoising f or Discrete Data A. Proofs A.1. V oronoi probability Let recall the Gamma function: Γ( a ) = Z ∞ 0 t a − 1 e − t dt, a ∈ R > 0 , (9) the multinomial Beta function B ( a ) = Q K i =1 Γ( a i ) Γ( P K i =1 a i ) , (10) the density function of the Dirichlet distribution for a point x ∈ S K f ( x ) = 1 B ( α ) n +1 Y i =1 x α i − 1 i , α 0 = X i α i , (11) and the Gamma(a, b) density function: f ( x ; a, b ) = b a Γ( a ) x a − 1 e − bx , x > 0 , a > 0 , b > 0 . (12) . W e show that if x ∼ Dir (1 , . . . , a, . . . , 1) , i.e. all parameters are ones except the i th parameter , which is a : P v i ( x | e i ) = K − 1 X k =0 ( − 1) k n k ( k + 1) a − 1 (13) W e remind that the V oronoi region of verte x i is defined as: V i = { x ∈ S K | d ( x , e i ) ≤ d ( x , e j ) ∀ j = i } , (14) with d ( · , · ) the Euclidean distance. W e first note that: x ∈ V i ⇐ ⇒ x i = max j ∈ [ K ] x j , (15) so that P v i ( x | e i ) = P ( x i = max j ∈ [ K ] x j ) (16) It is well-known that for an y b > 0 : g 1 ∼ Γ( α 1 , b ) , . . . , g K ∼ Γ( α K , b ) = ⇒ [ x 1 , . . . , x K ] T ∼ Dir( α 1 , . . . , α K ) , (17) where x i = g i P K j =1 g j . Since the denominator is a constant, it does not change the ordering: g i = max j ∈ [ K ] g j ⇐ ⇒ x i max j ∈ [ K ] x j (18) In the following, we choose b = 1 , and denote. 13 Unrestrained Simplex Denoising f or Discrete Data g i ∼ Gamma ( a, 1) , g j ∼ Gamma (1 , 1) , ∀ j = i (19) From Equations 16 and 18 and independence of the g i ∀ i = j , we get: P v i ( x | e i ) = P ( g i ≥ g 1 , . . . , g i − 1 , g i +1 , . . . , g K ) (20) = P ( g i ≥ g j ) ( K − 1) (21) Let recall that the cumulativ e distribution function for Gamma ( x ; 1 , 1) is: F ( x ; 1 , 1) = 1 − e − x (22) Hence, P ( x ≥ g j ) ( K − 1) = (1 − e − x ) ( K − 1) (23) . P ( g i ≥ g 1 , . . . , g i − 1 , g i +1 , . . . , g K ) = 1 Γ( a ) Z ∞ 0 t a − 1 e − t (1 − e − t ) K − 1 dt. (24) . By the binomial theorem, we hav e: (1 − e − t ) K − 1 = K − 1 X k =0 ( − 1) k K − 1 k e − kt . (25) Substituting the expression in Equation 24 and rearranging the terms, we get: P v i ( x | e i ) = 1 Γ( a ) K − 1 X k =0 ( − 1) k K − 1 k Z ∞ 0 t a − 1 e − ( k +1) t dt (26) Set u = ( k + 1) t = ⇒ t = u ( k +1) : = 1 Γ( a ) K − 1 X k =0 ( − 1) k K − 1 k Z ∞ 0 u ( k + 1) a − 1 e − u 1 k + 1 du (27) = 1 Γ( a ) K − 1 X k =0 ( − 1) k K − 1 k ( k + 1) a Z ∞ 0 u a − 1 e − u du (28) = 1 Γ( a ) K − 1 X k =0 ( − 1) k K − 1 k ( k + 1) a Γ( a ) (29) = K − 1 X k =0 ( − 1) k K − 1 k ( k + 1) a (30) 14 Unrestrained Simplex Denoising f or Discrete Data A.2. Denoising kernel Giv en P 1 | t ( x 1 | x (1: L ) t ) and Assumption 4.1 , the one-step denoising kernel satisfies p t + dt ( x | x (1: L ) t ) = X x 1 q t + dt ( x | x 1 ) P 1 | t ( x 1 | x (1: L ) t ) (31) = E x 1 ∼ P 1 | t ( ·| x (1: L ) t ) [ q t +d t ( x | x 1 )] . (32) By the low of total probabilities (first equality), and Assumption 4.1 (second equality), we ha ve: p t +d t ( x | x (1: L ) t ) = X x 1 p t +d t ( x | , x (1: L ) t , x 1 ) P 1 | t ( x 1 | x (1: L ) t ) (33) = X x 1 q t + dt ( x | , x 1 ) P 1 | t ( x 1 | x (1: L ) t ) (34) A.3. Con vergence Proposition A.1 (Con ver gence to the stationary distribution) . Let assume that P 1 | t ( x 1 | x t ) and q ( x t | x 1 ) have full support on S L K and S L K , r espectively . Further assume that the conditional independence holds: q t ( x ′ t | x t , x 1 ) = q t ( x ′ t | x 1 ) . Then the Markov k ernel p t ′ | t ( x ′ t | x t ) = X x 1 ∈S L P 1 | t ( x 1 | x t ) q ( x ′ t | x 1 ) . (35) con verg e to the stationary distribution π ( x t ) = X x 1 ∈S L p ( x 1 ) q ( x t | x 1 ) , (36) Assume P 1 | t ( x 1 | x t ) has full support on S L , and q t ( x t | x 1 ) has full support on S L K . Assume also the conditional independence q ( x ′ t | x t , x 1 ) = q ( x ′ t | x 1 ) . Then p t ′ | t ( x ′ t | x t ) = X x 1 ∈S L P 1 | t ( x 1 | x t ) q ( x ′ t | x 1 ) (37) defines a valid Mark ov kernel on S L K . Stationarity A distribution π on S L K is stationary for p t ′ | t if π ( x ′ t ) = Z S L K p t ′ | t ( x ′ t | x t ) π ( x t ) d x t . (38) Expanding the kernel and interchanging sum and integral, π ( x ′ t ) = Z S L K π ( x t ) X x 1 ∈S L P 1 | t ( x 1 | x t ) q ( x ′ t | x 1 ) d x t (39) = X x 1 ∈S L Z S L K π ( x t ) P 1 | t ( x 1 | x t ) d x t q ( x ′ t | x 1 ) . (40) By the law of total probability , Z S L K π ( x t ) P 1 | t ( x 1 | x t ) d x t = p ( x 1 ) , (41) 15 Unrestrained Simplex Denoising f or Discrete Data the marginal distrib ution of x 1 . Hence π ( x ′ t ) = X x 1 ∈S L p ( x 1 ) q ( x ′ t | x 1 ) . (42) Thus the stationary distribution of p t ′ | t is the marginal π = X x 1 ∈S L p ( x 1 ) q ( · | x 1 ) . (43) Con vergence. Since both P 1 | t and q ( · | x 1 ) hav e full support, the kernel p t ′ | t has strictly positiv e density on the simplex. This ensures irreducibility and aperiodicity . Then, by the Fundamental Theorem of Marko v Chains ( Biswas , 2022 ), the distribution of x t con verges to π for any initialization. A.4. Bound Appr oximation error The KL diver gence between the true and approximate one-step denoising distributions is upper bounded by the KL diver gence between the true and approximate posterior distributions: D KL p t + dt ( · | x t ) , | , p θ t + dt ( · | x t ) ≤ D KL P 1 | t ( · | x t ) , | , P θ 1 | t ( · | x t ) . (44) W e first define the joint laws π ( x 1 , x t + dt | x t ) = P 1 ( x 1 | x t ) p t + dt ( x t + dt | x 1 ) , (45) and π θ ( x 1 , x t + dt | x t ) = P θ 1 ( x 1 | x t ) p t + dt ( x t + dt | x 1 ) . (46) Their marginals in x t + dt are p t + dt ( · | x t ) and p θ t + dt ( · | x t ). Moreover , D KL ( π | π θ ) = X x 1 ∈X Z π ( x 1 , x t + dt | x t ) log π ( x 1 , x t + dt | x t ) π θ ( x 1 , x t + dt | x t ) d x t + dt (47) = X x 1 ∈X Z P 1 ( x 1 | x t ) p t + dt ( x t + dt | x 1 ) log P 1 ( x 1 | x t ) P θ 1 ( x 1 | x t ) d x t + dt (48) = X x 1 ∈X P 1 ( x 1 | x t ) log P 1 ( x 1 | x t ) P θ 1 ( x 1 | x t ) Z p t + dt ( x t + dt | x 1 ) d x t + dt (49) = D KL P 1 ( · | x t ) || P θ 1 ( · | x t ) , (50) since p t + dt ( · | x 1 ) integrates to 1 for every x 1 . The log-sum inequality yields, for each fixed x t + dt , X x 1 ∈X π ( x 1 , x t + dt | x t ) log P x 1 ∈X π ( x 1 , x t + dt | x t ) P x 1 ∈X π θ ( x 1 , x t + dt | x t ) ≤ X x 1 ∈X π ( x 1 , x t + dt | x t ) log π ( x 1 , x t + dt | x t ) π θ ( x 1 , x t + dt | x t ) . (51) Using the definition of the marginals, this becomes p t + dt ( x t + dt | x t ) log p t + dt ( x t + dt | x t ) p θ t + dt ( x t + dt | x t ) ≥ X x 1 ∈X π ( x 1 , x t + dt | x t ) log π ( x 1 , x t + dt | x t ) π θ ( x 1 , x t + dt | x t ) . (52) Integrating o ver x t + dt giv es D KL p t + dt ( · | x t ) || p θ t + dt ( · | x t ) ≤ D KL ( π | π θ ) . (53) Combining this with the previous identity yields D KL p t + dt ( · | x t ) || p θ t + dt ( · | x t ) ≤ D KL P 1 ( · | x t ) || P θ 1 ( · | x t ) , (54) 16 Unrestrained Simplex Denoising f or Discrete Data B. Guidance Classifier and classifier -free guidance are two standard mechanisms for steering generation to ward desired properties. Classifier guidance augments an unconditional generative model with the gradient of a separately trained classifier (or regressor) for the target property . Classifier-free guidance does not require an auxiliary classifier , but it does require both a conditional and an unconditional generati ve model, typically obtained by jointly training with masked/unmasked conditioning vectors. Below we first present the high-le vel ideas, then provide deri vations. B.1. Key ideas Let y denote the target property and let p y ( y | x ) be a discriminative model (classifier/regressor) of y giv en x . Let p t +d t | t ( x t +d t | x t ) denote the unconditional re verse transition at time t . W e modulate the strength of conditioning with a scale ω > 0 via p ω t + dt ( x | x t , y ) ∝ p y ( y | x t + dt ) ω p t + dt ( x t + dt | x t ) (55) . Classifier guidance. Here p y ( y | x ) is provided by a classifier trained for all noise lev els. During sampling, one combines a rev erse step with an update in the direction of the classifier gradient, ∇ x t log p ϕ y | x ( y | x t ) . Classifier -free guidance. W ithin this approach, no external classifier is needed; instead, one uses a conditional rev erse model p t +d t | t,y ( x t +d t | x t , y ) and an unconditional one p t +d t | t ( x t +d t | x t ) , typically produced by a single network via masking. A standard identity yields the log-linear interpolation log p ω t + dt | t,y ( x t + dt | x t , y ) ∝ ω log p t + dt | t,y ( x t + dt | x t , y ) + (1 − ω ) log p t + dt | t ( x t + dt | x t ) . (56) B.2. Derivations Assume the corruption (noising) process is independent of the conditioning v ariable y : q t ( x t | x t +d t , y ) = q t ( x t | x t +d t ) . By Bayes’ rule and this independence, p y ( y | x t , x t + dt ) = q t | y ,t + dt ( x t | x t + dt , y ) p y ( y | x t + dt ) q t | y ,t + dt ( x t | x t + dt ) (57) = q t | y ,t + dt ( x t | x t + dt ) p y ( y | x t + dt ) q t | y ,t + dt ( x t | x t + dt ) (58) = p y ( y | x t + dt ) (59) Consequently , p t + dt | t,y ( x t + dt | x t , y ) = p y ( y | x t + dt , x t ) p t + dt | t ( x t + dt | x t ) p y ( y | x t ) (60) ∝ p y ( y | x t + dt ) p t + dt | t ( x t + dt | x t ) . (61) Introducing ω > 0 sharpens the guidance: p ω t + dt | t,y ( x t + dt | x t , y ) ∝ p y ( x t + dt | x t + dt ) ω p t + dt | t ( x t + dt | x t ) (62) 17 Unrestrained Simplex Denoising f or Discrete Data B . 2 . 1 . C L A S S I FI E R F R E E G U I D A N C E Applying bayes’ rule again on the last Equation, we obtain: p ω t + dt | t,y ( x t + dt | x t , y ) ∝ p y ( y | x t + dt ) ω p t + dt | t ( x t + dt | x t ) (63) = p ( x t + dt | y , x t ) p ( y | x t ) p ( x t + dt | x t ) ω p t + dt | t ( x t + dt | x t ) (64) = p ( x t + dt | y , x t ) ω p ( x t + dt | x t ) p t + dt | t ( x t + dt | x t ) ω p ( y | x t ) ω (65) By taking the logarithm and pushing the last term on the right into the constant we get: log p ω t + dt | t,y ( x t + dt | x t , y ) ∝ ω log p t + dt | t,y ( x t + dt | x t , y ) + (1 − ω ) log p t + dt | t ( x t + dt | x t ) . (66) B . 2 . 2 . C L A S S I FI E R G U I D A N C E For classifier guidance, we first apply Bayes’ rules two times to Equation 62 p ω t + dt | t,y ( x t + dt | x t , y ) ∝ p y ( y | x t + dt ) ω p t + dt | t ( x t + dt | x t ) (67) = p ( x t + dt | y , x t ) p ( y | x t ) p ( x t + dt | x t ) ω p t + dt | t ( x t + dt | x t ) (68) = p t + dt | t ( x t + dt | y , x t ) ω p ( x t + dt | x t ) p t + dt | t ( x t + dt | x t ) ω p ( y | x t ) ω (69) = p ( x t + dt | x t ) p ( y | x t ) ω (70) W e then use T aylor approximation to ev aluate p ( y | x t ) : log ˜ p y | t + dt ( y | x t + dt ) = x T t + dt + dt ∇ z log p y | z ( y | z ) | z = x t + C (71) = L X i =1 z T ∇ z log p y | z ( y | z ) | z = x t + C (72) = L X i =1 ∂ ∂ z ( i ) j log p y | z ( y | x t ) | z = x t + C, (73) which giv es: log p ω t + dt | t,y ( x t + dt | x t , y ) ∝ p t + dt | t ( x t + dt | x t ) + ω ∇ x t log p y | z ( y | x t ) , (74) This sho ws that classifier guidance amounts to a gradient-ascent adjustment of the unconditional re verse kernel to ward higher discriminativ e likelihood of the target property . C. Complementary texts C.1. Extended Related W ork on Graph Generation A central challenge in generativ e graph modeling follows from the many n ! different w ays to represent graphs due to node permutation. This has motiv ated two dominant families of methods: sequential (autoregressi ve) approaches generating graphs step by step, adding nodes, edges, or higher-order motifs according to a fixed or learned ordering ( Y ou et al. , 2018 ; Shi et al. , 2020 ; Luo et al. , 2021 ; Liao et al. , 2019 ; Goyal et al. , 2020 ; K ong et al. , 2023 ; Zhao et al. , 2024 ), and permutation-equivariant models that enforce equiv ariance by design. Permutation-equiv ariant architectures hav e been 18 Unrestrained Simplex Denoising f or Discrete Data instantiated within se veral generati ve framew orks, including variational autoencoders ( Simonovsk y & Komodakis , 2018 ), generativ e adversarial networks ( De Cao & Kipf , 2018 ; Krawczuk et al. , 2021 ; Martinkus et al. , 2022 ), normalizing flo ws ( Madhawa et al. , 2019 ; Zang & W ang , 2020 ; Liu et al. , 2019 ), and vector -quantized autoencoders ( Boget et al. , 2024 ; Nguyen et al. , 2024 ). Orthogonal to this point, a recent line of w ork targets scalability . A central limitation of standard models is their reliance on dense representations and all-pairs computations, which impedes scaling to large graphs. T o address this, prior work has proposed scalable denoising architectures and sampling strategies ( Qin et al. , 2024 ; Chen et al. , 2023 ; Karami , 2024 ; Bergmeister et al. , 2024 ; Boget et al. , 2025 ). Notably , Qin et al. ( 2024 ) and Boget et al. ( 2025 ) introduce methods that enable permutation-equi variant models to scale to large graphs. These techniques are complementary to our approach and could be used to scale our unrestrained simplex denoising frame work; we leav e this exploration to future w ork. C.2. Denoiser Expressi vity in discrete and continuous settings Graph neural networks (GNNs) exhibit limited e xpressivity due to inherent graph symmetries ( Morris et al. , 2019 ). These limitations, ho wever , disappear once each node is equipped with a unique identifier . In particular , Loukas ( 2020 ) demonstrates that suf ficiently deep message-passing architectures become uni versal when nodes can be uniquely distinguished. Continuous noise naturally provides such distinguishing information, ef fectiv ely acting as a unique identifier for each node. Consequently , injecting continuous noise enhances the expressi ve power of GNN-based denoisers. In practice, discrete denoisers often require additional hand-crafted features to ov ercome symmetry-induced limitations. Common examples include spectral embeddings deriv ed from Laplacian eigen vectors ( V ignac et al. , 2023 ) and relativ e random-walk probability features ( Qin et al. , 2025 ). These computations must be performed before each training and sampling pass, introducing a non-negligible computational o verhead. A typical criticism of continuous models is that adding continuous noise to edge attributes may collapse the graph into a fully connected weighted structure, ostensibly erasing the discrete adjacency information. Howe ver , the graph operations underly- ing the aforementioned extra features—such as spectral decompositions and random-walk statistics—remain well-defined on weighted graphs. Thus, continuous perturbations do not destroy the structure required for these computations. Howe ver , since continuous noise serves as an intrinsic node identifier , it eliminates the need for auxiliary feature computations. C.3. Comparison with Simple Iterative Denoising Our Unside frame work can be vie wed as a continuous analogue of Simple Iterative Denoising (SID). Below we outline the key similarities and dif ferences. Noising. In SID, the corruption process follows discrete dif fusion for categorical data ( Austin et al. , 2021 ), where intermediate noisy distributions linearly interpolate between the data distrib ution and a stationary prior: q t | 1 ( x | x 1 ) = α t δ x 1 ( x ) + (1 − α t ) q 0 ( x ) . (75) As discussed in Section 3 , this interpolation is not suitable in the continuous setting on the simple x. Accordingly , Unside adopts an explicit probability path within the simple x. Independence Assumption. Both SID and Unside rely on the same conditional-independence assumption (Assump- tion 4.1 ), which is central to the formulation. Denoising. The denoising rules in Unside follo w directly from the conditional-independence assumption and are closely related in spirit to SID. A notable distinction is our original hybrid construction coupling a continuous kernel q t on the simplex with a discr ete posterior P 1 | t . Con vergence. Proposition 4.3 and its proof are original to this work. SID provides no analogous con vergence result. W e regard this result as a core theoretical contrib ution. Advantage ov er Diffusion and Flow Matching Although there are conceptual similarities at a high level, including the connection to compounding denoising errors, the underlying intuition differs substantially between the continuous and 19 Unrestrained Simplex Denoising f or Discrete Data discrete settings. In particular , the observ ation that x t + dt remains close to x t , and the reasoning that follo ws from this, holds only in the continuous case and does not translate directly to discrete models. Parametrization and learning . Both approaches employ standard parameterizations and training objecti ves common to discrete diffusion and flo w-matching methods. Probability paths and priors. Our choices and analysis of probability paths and priors are specific to continuous simplex noise and hav e no direct counterpart in SID. Guidance. As explained in the main te xt, guidance in Unside is obtained via direct adaptations of standard dif fusion techniques (classifier and classifier-free guidance). SID does not include a guidance mechanism. Beyond this element-wise comparison, our primary contribution is to inte grate these components into a simple and efficient framew ork for graph generation that achieves state-of-the-art performance across multiple datasets. C.4. Use of LLMs W e employed lar ge language models (LLMs) only for editorial purpose, identifying typographical errors, and formatting tables. No scientific content, code, analyses, or results were generated by LLMs. D. T echnical Report D.1. GNNs ar chitecture Our model is b uilt on Simple Iterati ve Denoising ( Boget , 2025 ). W e use the same architecture and reproduce the architecture description. The denoisers are Graph Neural Networks, inspired by the general, po werful, scalable (GPS) graph Transformer . A single layer is described as: ˜ X ( l ) , ˜ E ( l ) = MPNN ( X ( l ) , E ( l ) ) , (76) X ( l +1) = MultiheadAttention ( ˜ X ( l ) + X ( l ) ) + ˜ X ( l ) (77) E ( l +1) = ˜ E ( l ) + E ( l ) (78) where, X ( l ) and E ( l ) are the node and edge hidden representations after the l th layer . The Multihead Attention layer is the classical multi-head attention layer from V aswani et al. ( 2017 ), and MPNN is a Message-Passing Neural Network layer described hereafter . The MPNN operates on each node and edge representations as follow: h l i,j = ReLU ( W l src x l i + W l trg x l j + W l edg e e l i,j ) (79) e l +1 i,j = LayerNorm ( f edge ( h l i,j ) (80) x l +1 i = LayerNorm x l i + X j ∈N ( i ) f node ( h l i,j ) , (81) with W l src , W l trg , and W l edg e denoting trainable weight matrices, and f node and f edge being small neural networks. The node hidden states x i and the outputs of f node are of dimension d h , a tunable hyperparameter (see T able 6 ). The edge hidden states e i,j , intermediate messages h l i,j , and the outputs of f edge hav e dimension d h / 4 . 20 Unrestrained Simplex Denoising f or Discrete Data Inputs and Outputs In the input, we concatenate the node attributes, extra features, and time step as node features, copying graph-level information (e.g., time step or graph size) to each node. The node and edge input vectors are then projected to their respectiv e hidden dimensions, d h for nodes and d h / 4 for edges. Similarly , the outputs of the final layer are projected to their respectiv e dimensions, d x for nodes and d e for edges (or to a scalar in the case of the Critic ). T o enforce edge symmetry , we compute e i,j = e i,j + e j,i 2 . Finally , we ensure the outputs can be interpreted as probabilities by applying either a softmax or sigmoid function, as appropriate. D.2. Pr obability Path and Schedulers For all experiments, we use a probability path parametrize via the Dirichlet distribution: Dir x t 1 + α t x 1 , where α t = − a log (1 − t ) . The hyperparameter a defines the noising dynamic (see Figure 3 ). W e use a = 3 in all our experiments, except on SBM where we use 2. F igure 3. V oronoi probabilities over time for x t ∈ S 3 ∼ Dir ( 1 + α t x 1 ) , with α t = − a log (1 − t ) for various v alues of a . For a = 1 , P v k increases rapidly as t → 1 ; for a = 10 , P v k is already close to 1 by t = 0 . 6 . Suitable choices of a therefore likely lie between 1 and 10 . D.3. h yperparameters T able 6. Hyperparameters Planar SBM Qm9H Zinc250K GNN layers 8 8 8 8 Hidden layers in MLPs 2 2 2 2 Node representation size 256 256 256 256 Edge representation size 64 128 128 128 Diffusion steps 128 512 1024 1024 Learning rate 0.0002 0.0005 0.0005 0.0005 Optimizer AdamW AdamW AdamW AdamW For discrete dif fusion and SID, we use the cosine scheduler ( Nichol & Dhariwal , 2021 ) and the marginal noise distrib ution. D.4. Extra F eatures For Discrete Dif fusion (Markovian and non-Mark ovian, i.e. DiscDif and NM-DD) we follo w a common practice ( V ignac et al. , 2023 ; Qin et al. , 2025 ; Boget et al. , 2025 ), and enhance the discrete graph representation with synthetic extra node features. For our our Unside and for Dirichlet flo wmatching, we only add the graph size. W e use the follo wing extra features: eigen features ( ZINC250k , Planar , SBM ), graph size, molecular features ( ZINC250k ), and cycle information ( Planar ), and the Relativ e Random W alk Probabilities (RR WP , Qm9 and Qm9H ). All these features are concatenated to the input node attributes, and edge edge attrib utes for the RR WPs. 21 Unrestrained Simplex Denoising f or Discrete Data Spectral features W e use the eigen vectors associated with the k lowest eigen values of the graph Laplacian. Additionally , we concatenate the corresponding k lo west eigenv alues to each node. Graph size encoding The graph size is encoded as the ratio between the size of the current graph and the largest graph in the dataset, n/n max . This value is concatenated to all nodes in the graph. Molecular features For molecular datasets, we use the charge and v alency of each atom as additional features. Cycles F ollo wing V ignac et al. ( 2023 ), we count the number of cycles of size 3, 4, and 5 that each node is part of, and use these counts as features. Relative Random W alk Probabilities . It is the probabilities of reaching a target node from from a node in a giv en number of steps. W e typically compute it for all number of steps between 1 and k . D.5. Ev aluation D . 5 . 1 . M O L E C U L E G E N E R A T I O N For the molecular graph dataset Zinc250K , we adopt the e valuation procedure follo wed by Jo et al. ( 2024 ), from which we took the baseline model results, and which was originally established in Jo et al. ( 2022 ). W e assess performance using three standard metrics: (i) Fr ´ echet ChemNet Distance (FCD) ( Preuer et al. , 2018 ), which measures similarity in chemical feature space; (ii) Neighborhood Subgraph P airwise Distance Kernel (NSPDK) ( Costa & Gra ve , 2010 ), which e valuates graph-structural similarity; and (iii) validity , the fraction of chemically v alid molecules without any post hoc correction or resampling. W e report means over fiv e sampling runs, each generating 10,000 molecules. For completeness, we also provide standard de viations, as well as uniqueness and novelty statistics, in Appendix E . QM9H For QM9H , in general we follo w the above procedure used for QM9 . For ev aluation, we rely on SMILES with explicit hydrogen atoms to compute v alidity , uniqueness, and FCD. Under this e valuation procedure, we observ e that a small fraction of molecules in the dataset are in valid. W e use the kekulized v ersion of the dataset (i.e., with three possible bond types: single, double, and triple). W e note that some concurrent implementations use a variant of the dataset with explicit aromatic bonds; in such cases, ev aluation metrics, in particular validity and FCD are not directly comparable to ours. D . 5 . 2 . U NAT T R I B U T E D S Y N T H E T I C G R A P H G E N E R A T I O N For unattributed graphs, we follow the procedure of Martinkus et al. ( 2022 ): an 80/20 train–test split with 20% of the training set used for validation W e measure distributional similarity via Maximum Mean Discrepancy (MMD) ov er degree distributions, clustering coef ficients, orbit counts, and spectral densities. W e measure distrib utional similarity via Maximum Mean Discrepancy (MMD) ov er degree distributions, clustering coefficients, orbit counts, and spectral densities. W e additionally report v alidity . For the Planar dataset, a v alid graph must be planar and connected. For the Stochastic Block Model (SBM) dataset, validity indicates consistency with the data-generating block model (intra-community edge density 0 . 3 , inter-community edge density 0 . 005 ). W e omit uniqueness because all models reach 100% on both datasets, and we report novelty only for SBM (nov elty is 100% on Planar ). Uniqueness is the fraction of distinct graphs among generated samples; nov elty is the fraction of unique graphs not present in the training set. W e report means over fi ve runs, each generating 40 graphs. D . 5 . 3 . R E A L - W O R L D G R A P H G E N E R AT I O N Enzymes is a dataset containing 587 protein graphs that represent the protein tertiary structures of enzymes. It is extracted from the BREND A database. The graphs in this dataset hav e node range between 10 and 125. For consistency , we follow Jo et al. ( 2022 ) in using the Earth Mover Distance (EMD) to compute the MMDs. W e used 20% of the data for the testset and the remaining for training set, from which we additionally reserve 20% for the validation set. MMDs are computed between the test set and a batch of generated graphs of the same size. 22 Unrestrained Simplex Denoising f or Discrete Data D.6. Non-Mark ovian Continuous Diffusion T o further assess the role of the simplex parameterization itself, we implemented an additional baseline that uses the same non-Markovian resampling rule, b ut with a parameterization in rather than on the simplex. Concretely , we define the probability path as x t = α t (2 x 1 − 1) + (1 − α t ) x 0 , (82) with x 1 ∼ p data , x 0 ∼ N (0 , I ) . W e used a linear scheduler α t = t as we found it more effecti ve. Exactly as U N S I D E , we learn P θ 1 | t ( x 1 | x t ) , and use Equation 4 in denoising. D.7. Baselines As explained in the main text, we report baseline results as presented in the corresponding original papers, except for G RU M and D I G R E S S on Planar and SBM , for which we reran the experiments using the of ficial repositories. Below , we explain the motiv ation for this choice and detail the procedure we followed, as we were unable to reproduce the reported results. D . 7 . 1 . M OT I V A T I O N F O R M U LT I P L E S A M P L I N G R U N S On unattributed graph datasets such as Planar and SBM , the test sets are relativ ely small, and sampling-based metrics exhibit high v ariance across runs. Moreover , ev aluating many models, checkpoints, or seeds increases the risk of ov erfitting to the test set. Multiple sampling runs are therefore strongly recommended, and the corresponding variability should be assessed via standard deviations. Sev eral baselines report results from a single sampling run, including G R U M , D I G R E S S , and G R A P H B F N . These reported results ev en outperform samples drawn directly from the training set, indicating clear overfitting. For this reason, we reran the experiments for G RU M and D I G R E S S . Unfortunately , the official G R A P H B F N repository is currently empty , preventing us from reproducing that baseline. D . 7 . 2 . P RO C E D U R E F O R R E S U LT R E P R O D U C T I O N W e were unable to reproduce the published results for either G R U M or D I G R E S S . Below we describe the e xact procedure we followed. G R U M . W e used the pretrained models available in the of ficial repository: https://github.com/harryjo97/ GruM/tree/master/GruM_2D . W e performed five sampling runs, varying only the sampling seed, and computed metrics using the code provided by the authors. W e note that the default seed yields substantially better performance than the others. D I G R E S S . Since no pretrained models are currently available, we retrained D I G R E S S using the official repository ( https://github.com/cvignac/DiGress/ ) and the configurations specified in the corresponding Y AML files. As specified, we e valuated the model ev ery 400 epochs and computed metrics against the validation set. W e increased the sampling size during training to 40 samples for stability . W e trained for 100,000 epochs on Planar and 20,000 epochs on SBM , selecting the checkpoint with the highest v alidity for final ev aluation. Note, we corrected a minor inconsistency in src/datasets/spectre_dataset.py , where preprocessed graphs were added twice to their respectiv e sets (lines 93 and 100). E. Additional Results In this Section, we provide the follo wing additional results: • Model sizes and Sampling T imes. • Results for Enzymes • Ablation: 23 Unrestrained Simplex Denoising f or Discrete Data – NFE. – Noise Schedule. – Comparison between Priors. • Guidance. • Results with Standard Deviation E.1. Model Sizes and Sampling Times W e report the number of parameters for each model as well as the wall-clock time required to generate 100 sampling iterations for 40 Planar graphs and 40 SBM graphs. All experiments were conducted on a single server with 1 GPU (NVIDIA GeForce R TX 3090) and 128 CPU cores. Our models are the most computationally ef ficient. Since Dirichlet Flo w Matching ( D I R I F M ) and our U N S I D E share the same denoiser architecture, dif ferences in sampling speed arise solely from the sampling procedure itself. Like wise, our implementation of SID (N N - D D ) uses the same architecture as U N S I D E ; the additional ov erhead in N N - D D is primarily due to the computation of extra structural features required by the discrete denoiser . T able 7. Model Size and Sampling T ime. P L A N A R S B M Q M 9 H Z I N C 2 5 0 K T I M E P A R A M S T I M E P A R A M S P A R A M S P AR A M S D I G R E S S 16 . 9 ± 0 . 0 8 . 9 M 96 . 9 ± 0 . 1 7 . 1 M 3 . 6 M 8 . 2 M G RU M 29 . 3 ± 0 . 0 7 . 1 M 84 . 2 ± 0 . 0 7 . 1 M - 8 . 2 M N M - D D 9 . 9 ± 0 . 4 5 . 4 M 84 . 1 ± 0 . 3 6 . 2 M 6 . 2 M 6 . 2 M D I R I F M 9 . 4 ± 0 . 1 5 . 4 M 89 . 0 ± 0 . 3 6 . 2 M 6 . 2 M 6 . 2 M U N S I D E 6 . 3 ± 0 . 0 5 . 4 M 72 . 6 ± 0 . 5 6 . 2 M 6 . 2 M 6 . 2 M NB. W e assume that G RU M uses the same architecture for its implementation of D I G R E S S on Zinc250K as for its o wn model. T able 8. Results on Enzymes . D E G . ↓ C L U S T . ↓ O R B I T ↓ S P E C T . ↓ T R A I N . G R A P H V A E 1 . 3 6 9 0 . 6 2 9 0 . 1 9 1 - G R A P H R N N 0 . 0 1 7 0 . 0 6 2 0 . 0 4 6 - G D S S 0 . 0 2 6 0 . 0 6 1 0 . 0 0 9 - D I S C D I F 212 . 663 ± 11 . 824 631 . 122 ± 45 . 654 515 . 520 ± 54 . 116 19 . 607 ± 0 . 225 N M - C D N M - D D 13 . 581 ± 4 . 119 39 . 857 ± 5 . 231 10 . 368 ± 0 . 883 17 . 449 ± 0 . 020 D I R I F M 7 . 437 ± 2 . 414 100 . 648 ± 17 . 482 6 . 484 ± 2 . 690 19 . 753 ± 0 . 151 U N S I D E 10 . 151 ± 4 . 178 52 . 036 ± 15 . 880 2 . 686 ± 1 . 094 19 . 537 ± 0 . 049 E.2. Ablation W e conduct all ablation studies on Zinc250K , as it is by f ar the lar gest dataset and pro vides the largest test set, yielding more precise and robust e valuation metrics. This dataset is therefore the most sensitiv e to small effects. E . 2 . 1 . N F E When ablating the number of function e valuations (NFE), we observe that our model achiev es state-of-the-art validity within just 64 denoising steps, and reaches state-of-the-art performance on the remaining metrics by 256 steps. This confirms the strong efficienc y and overall ef fectiveness of our approach. 24 Unrestrained Simplex Denoising f or Discrete Data T able 9. Ef fect of NFE on generation: Zinc250K results. N F E I N V A L I D ( % ) F C D N S P D K ( 10 3 ) 16 5 . 71 6 . 54 7 . 97 32 1 . 05 4 . 32 4 . 00 64 0 . 36 3 . 26 2 . 79 128 0 . 08 2 . 63 2 . 08 256 0 . 09 2 . 17 1 . 59 512 0 . 01 1 . 92 1 . 34 1024 0 . 02 1 . 71 0 . 87 E . 2 . 2 . N O I S E S C H E D U L E W e assess the effect of the noise scheduler α t , which parametrizes our probability paths Dir x t 1 + α t x 1 , with α t = − a log (1 − t ) Concretely , we e v aluate our model across a range of v alues for the hyperparameter a . W e find that our method is remarkably robust to v ariations in a . Figure 4 illustrates the corresponding V oronoi probabilities for each tested scheduler . Since the V oronoi probability depends on the number of categories K , we report it separately for node attributes K = 9 and edge attributes K = 2 . From Figure 4 , we observe that the scheduler with low v alues of a (in particular with a = 1 ) induces a probability path that becomes too sharp near the end of the trajectory (i.e., as t → 1 ), making the denoising task overly dif ficult at the final steps. Con versely , for large values of a , the probability path becomes too flat tow ard the end of the process, leaving the denoiser with little useful signal to correct. The results in T able 10 align with these expectations. In particular, when a = 1 , model performance degrades substantially . For lar ge values of a , performance also decreases b ut only marginally . Overall, the model appears highly rob ust to variations in a , with all tested values yielding significant impro vements ov er the baselines. W e draw two main conclusions from this experiment: (1) the V oronoi-probability analysis provides a reliable tool for calibrating the noise scheduler , and (2) our model is robust to a wide range of scheduler choices. F igure 4. V oronoi probabilities for various v alues a of the scheduler α t = − a log (1 − t ) with K = 9 (left) and K = 2 (right). 25 Unrestrained Simplex Denoising f or Discrete Data T able 10. Metrics for models with v arious values a of the scheduler α t = − a log (1 − t ) . A V A L I D ( % ) F C D N S P D K ( 10 3 ) U N I Q U E N OV E L 1 98 . 87 ± 0 . 16 4 . 13 ± 0 . 08 2 . 61 ± 0 . 11 99 . 86 ± 0 . 01 99 . 99 ± 0 . 01 2 99 . 89 ± 0 . 03 1 . 89 ± 0 . 01 0 . 98 ± 0 . 06 99 . 98 ± 0 . 00 99 . 96 ± 0 . 01 3 99 . 98 ± 0 . 01 1 . 79 ± 0 . 03 1 . 00 ± 0 . 06 100 . 00 ± 0 . 00 99 . 97 ± 0 . 01 4 99 . 94 ± 0 . 02 1 . 82 ± 0 . 05 1 . 05 ± 0 . 04 99 . 99 ± 0 . 00 99 . 98 ± 0 . 01 6 99 . 92 ± 0 . 02 1 . 66 ± 0 . 03 0 . 82 ± 0 . 03 99 . 99 ± 0 . 01 99 . 97 ± 0 . 02 10 99 . 91 ± 0 . 00 1 . 95 ± 0 . 03 1 . 25 ± 0 . 07 99 . 99 ± 0 . 01 99 . 99 ± 0 . 00 E . 2 . 3 . P R I O R C H O I C E W e sho w that the mar ginal weighted mixture of Dirichlet impro ves slightly b ut consistently the sampling performance across all metrics. F igure 5. Comparison between uniform prior and marginal weighted mixture of Dirichlet. E.3. Guidance This experiment has for only purpose to sho w that our model support classifier guidance without af fecting its ability to generate valid (molecular) graphs. T able 11. Classifier guidance ef fect on QED values. V A L I D M A E U N C O N D I T I O N A L 99 . 6 1 . 06 C L A S S I FI E R G U I D A N C E 99 . 5 0 . 42 26 Unrestrained Simplex Denoising f or Discrete Data E.4. Detailed Results T able 12. Graph generation results on Planar . V A L I D ( % ) D E G R E E C L U S T E R O R B I T S P E C T R A L T R A I N . S E T 100 . 0 ± 0 . 0 0 . 25 ± 0 . 18 36 . 6 ± 3 . 5 0 . 78 ± 0 . 37 6 . 62 ± 1 . 34 D I G R E S S 45 . 5 ± 7 . 0 0 . 71 ± 0 . 40 45 . 4 ± 13 . 6 1 . 31 ± 0 . 78 8 . 40 ± 1 . 16 G RU M 91 . 0 ± 2 . 6 0 . 38 ± 0 . 15 40 . 5 ± 4 . 8 6 . 42 ± 1 . 96 7 . 87 ± 1 . 17 D E F O G 99 . 5 ± 1 . 0 0 . 5 ± 0 . 2 50 . 1 ± 14 . 9 0 . 6 ± 0 . 4 7 . 2 ± 1 . 1 S I D 91 . 3 ± 4 . 1 5 . 93 ± 1 . 26 163 . 4 ± 31 . 9 19 . 1 ± 4 . 14 7 . 62 ± 1 . 34 D I S C D I F 0 . 0 ± 0 . 0 56 . 1 ± 3 . 70 294 . 0 ± 3 . 4 1410 . 0 ± 44 . 5 85 . 1 ± 1 . 6 N M - D D 98 . 0 ± 1 . 0 14 . 1 ± 2 . 58 363 . 0 ± 45 . 6 27 . 2 ± 5 . 6 6 . 9 ± 0 . 8 N M - C D 0 . 0 ± 0 . 0 1 . 23 ± 0 . 50 226 . 3 ± 8 . 4 45 . 77 ± 6 . 20 21 . 96 ± 0 . 74 D I R I F M 0 . 0 ± 0 . 0 6 . 44 ± 1 . 42 196 . 6 ± 22 . 5 45 . 04 ± 5 . 83 21 . 76 ± 1 . 30 U N S I D E 100 . 0 ± 0 . 0 0 . 36 ± 0 . 24 39 . 9 ± 8 . 7 0 . 78 ± 0 . 49 7 . 12 ± 0 . 45 T able 13. Graph generation results on Stochastic Block Model . V A L I D ( % ) D E G R E E ( × 10 3 ) C L U S T E R ( × 10 3 ) O R B I T ( × 10 3 ) S P E C T R A L ( × 10 3 ) N OV E L ( % ) T R A I N . S E T 93 . 50 ± 2 . 00 1 . 57 ± 0 . 55 50 . 11 ± 0 . 51 37 . 0 ± 10 . 7 4 . 58 ± 0 . 49 D I G R E S S 51 . 00 ± 9 . 62 1 . 28 ± 0 . 48 51 . 49 ± 1 . 31 39 . 6 ± 7 . 7 5 . 04 ± 0 . 63 100 ± 0 . 00 G RU M 67 . 54 ± 2 . 98 2 . 20 ± 0 . 76 49 . 88 ± 0 . 62 40 . 4 ± 5 . 6 5 . 06 ± 0 . 72 100 ± 0 . 00 S I D 63 . 5 ± 3 . 7 11 . 5 ± 2 . 7 51 . 4 ± 1 . 5 123 . ± 5 . 4 5 . 93 ± 1 . 18 100 ± 0 . 00 D E F O G 90 . 0 ± 5 . 2 0 . 6 ± 2 . 3 51 . 7 ± 1 . 2 55 . 6 ± 73 . 9 5 . 40 ± 1 . 20 90 . 0 ± 5 . 1 D I S C D I F 0 . 00 ± 0 . 00 1 . 82 ± 0 . 88 86 . 38 ± 6 . 37 125 . 7 ± 3 . 3 11 . 28 ± 1 . 06 100 ± 0 . 00 N M - D D 60 . 50 ± 4 . 30 4 . 38 ± 1 . 65 50 . 92 ± 0 . 89 52 . 9 ± 6 . 3 5 . 70 ± 0 . 30 100 ± 0 . 00 C M - C D 57 . 50 ± 5 . 40 1 . 89 ± 0 . 50 50 . 29 ± 0 . 45 41 . 08 ± 1 . 00 5 . 38 ± 1 . 30 100 ± 0 . 00 D I R I F M 46 . 00 ± 4 . 36 4 . 26 ± 0 . 54 53 . 04 ± 1 . 05 53 . 0 ± 8 . 9 5 . 02 ± 1 . 14 100 ± 0 . 00 U N S I D E 78 . 50 ± 4 . 64 1 . 74 ± 0 . 60 49 . 94 ± 1 . 07 52 . 1 ± 1 . 1 5 . 90 ± 1 . 05 100 ± 0 . 00 T able 14. Molecule generation Qm9H on generation task. V A L I D ( % ) F C D N S P D K ( 10 3 ) U N I Q U E ( % ) T R A I N . S E T 98 . 90 ± 0 . 05 0 . 062 ± 0 . 002 0 . 121 ± 0 . 016 99 . 81 ± 0 . 04 D I G R E S S 95 . 4 ± 1 . 1 97 . 6 ± 0 . 4 D I S C D I F 22 . 29 ± 0 . 62 4 . 246 ± 0 . 454 41 . 932 ± 0 . 741 73 . 72 ± 0 . 56 N M - D D 97 . 97 ± 0 . 07 0 . 366 ± 0 . 021 1 . 149 ± 0 . 047 95 . 23 ± 0 . 17 N M - C D 31 . 29 ± 0 . 15 1 . 688 ± 0 . 031 22 . 675 ± 0 . 099 91 . 45 ± 0 . 27 D I R I F M 92 . 20 ± 0 . 19 0 . 356 ± 0 . 029 0 . 495 ± 0 . 010 97 . 53 ± 0 . 13 U N S I D E 98 . 87 ± 0 . 07 0 . 152 ± 0 . 011 0 . 487 ± 0 . 068 96 . 33 ± 0 . 14 T able 15. Molecular generation on Zinc250K results. V A L I D ( % ) F C D N S P D K ( 10 3 ) U N I Q U E ( % ) N O V E L ( % ) T R A I N . S E T 100 . 00 ± 0 . 00 1 . 128 ± 0 . 009 0 . 10 ± 0 . 00 99 . 97 ± 0 . 02 - D I G R E S S 94 . 99 3 . 482 2 . 1 G RU M 98 . 65 2 . 257 1 . 5 S I D 99 . 50 ± 0 . 06 2 . 06 ± 0 . 05 2 . 01 ± 0 . 01 99 . 84 ± 0 . 02 99 . 97 ± 0 . 00 D I S C D I F 74 . 17 ± 0 . 65 4 . 78 ± 0 . 14 4 . 08 ± 0 . 13 100 . 00 ± 0 . 00 100 . 00 ± 0 . 00 N N - C D 57 . 96 ± 0 . 16 9 . 51 ± 0 . 04 13 . 65 ± 0 . 07 99 . 92 ± 0 . 02 99 . 99 ± 0 . 00 N M - D D 99 . 92 ± 0 . 01 2 . 65 ± 0 . 06 3 . 48 ± 0 . 06 99 . 88 ± 0 . 04 99 . 95 ± 0 . 03 D I R I F M 97 . 32 ± 0 . 12 2 . 79 ± 0 . 01 1 . 92 ± 0 . 07 99 . 99 ± 0 . 01 99 . 99 ± 0 . 01 O U R S 99 . 98 ± 0 . 01 1 . 79 ± 0 . 03 1 . 00 ± 0 . 06 100 . 00 ± 0 . 00 99 . 97 ± 0 . 01 F . V isualizations W e present here some visualizations of the generated and reference graphs. 27 Unrestrained Simplex Denoising f or Discrete Data F igure 6. Qm9H G E N E R A T E D M O L E C U L E S R E A L M O L E C U L E S F igure 7. ZINC250K G E N E R A T E D M O L E C U L E S R E A L M O L E C U L E S 28 Unrestrained Simplex Denoising f or Discrete Data F igure 8. Planar G E N E R A T E D G R A P H S R E A L G R A P H S F igure 9. Stochstic Block Model G E N E R A T E D G R A P H S R E A L G R A P H S 29
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment