Digital Twin--Driven Adaptive Wavelet Strategy for Efficient 6G Backbone Network Telemetry

Classical orthogonal wavelets guarantee perfect reconstruction but rely on fixed bases optimized for polynomial smoothness, achieving suboptimal compression on signals with fractal spectral signatures. Conversely, learned methods offer adaptivity but…

Authors: Alex, re Barbosa de Lima, Xavier Hesselbach

Digital Twin--Driven Adaptive Wavelet Strategy for Efficient 6G Backbone Network Telemetry
D I G I T A L T W I N – D R I V E N A DA P T I V E W A V E L E T S T R A T E G Y F O R E FFI C I E N T 6 G B A C K B O N E N E T W O R K T E L E M E T RY Alexandre Barbosa de Lima Pontifical Catholic Univ ersity of S ˜ ao Paulo, Brazil ablima@pucsp.br Xavier Hesselbach Univ ersitat Polit ` ecnica de Catalunya, Spain xavier.hesselbach@upc.edu Jos ´ e Roberto de Almeida Amazonas Univ ersity of S ˜ ao Paulo, Brazil jose.amazonas@usp.br February 24, 2026 A B S T R A C T Classical orthogonal wa velets guarantee perfect reconstruction but rely on fix ed bases optimized for polynomial smoothness, achie ving suboptimal compression on signals with fractal spectral signatures. Con versely , learned methods offer adaptivity but typically enforce orthogonality via soft penalties, sacrificing structural guarantees. This work establishes a rigorous equi valence between Multiscale Entanglement Renormalization Ansatz (MERA) tensor networks and paraunitary filter banks. The resulting framew ork learns adaptiv e wa velets while enforcing exact orthogonality through manifold-constrained optimization, guaranteeing perfect reconstruction and energy conserv ation throughout training. V alidation on Long-Range Dependent (LRD) network traf fic demonstrates that learned filters outper- form classical wav elets by 0.5–3.8 dB PSNR on six MA WI backbone traces (2020–2025, 314 Mbps– 1.75 Gbps) while preserving the Hurst exponent within estimation uncertainty ( | ∆ H | ≤ 0 . 03 ). These results establish MERA-inspired wa velets as a principled approach for telemetry compression in 6G digital twin synchronization. Keyw ords: Digital twin synchronization, adaptive w avelets, semantic telemetry , 6G networks, long-range dependence, paraunitary filter banks, network telemetry compression. 1 Introduction Network traf fic in future 6G systems is expected to exhibit long-range dependence (LRD), characterized by power -law correlation decay: ϕ ( k ) ∼ k − β , 0 < β < 1 [1 – 3]. This fractal structure has profound implications for network management: buf fer overflo w probabilities decay polynomially – not exponentially – with buf fer size, fundamentally challenging classical queueing models [4, 5]. For digital twins (DT) driving closed-loop optimization, capturing these self-similar dynamics is not merely a statistical ex ercise but a prerequisite for stability . As highlighted in surveys on machine learning for networking [6, 7], these dynamics call for adaptiv e multiscale representations capable of capturing traffic correlations across scales while preserving ph ysical interpretability and robustness. This work focuses on backbone aggre gation telemetry , where traffic from thousands of edge cells con ver ges into high-capacity core links. Statistical aggregation theorems establish that superposition of heterogeneous sources with heavy-tailed distrib utions preserves or amplifies LRD at such aggre gation scales [8, 9], making backbone traces a natural testbed for v alidating LRD-preserving transforms. This setting motiv ates the de velopment of adapti ve wa velets that exploit traf fic-specific correlation structures while maintaining the mathematical guarantees (perfect reconstruction, energy conserv ation) required for reliable signal processing. While 6G networks will encompass heterogeneous access technologies – from millimeter-wa ve massi ve MIMO to satellite non-terrestrial networks – wireless edge telemetry , with its distinct statistical properties induced by channel fading and mobility , represents a complementary challenge identified as future work (Section 8). A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 While the focus of this study is on backbone aggreg ation telemetry rather than access-level radio traf fic, this choice is deliberate. Digital twin synchronization fundamentally depends on preserving the statistical in v ariants of aggregated traffic flo ws – most notably LRD – which arise from multiplexing heterogeneous sources and persist independently of the underlying access technology (4G, 5G, or beyond). As such, backbone telemetry provides a technology-agnostic and representativ e testbed for validating synchronization-preserving compression mechanisms. The discrete wavelet transform (D WT) remains a cornerstone for multiscale analysis of LRD traf fic [10 – 12]. Con- ventional orthonormal wavelets provide the mathematical safety net required for control systems – offering perfect reconstruction (PR) 1 and Parse val ener gy conservation. Ho wever , they rely on fix ed, a priori designed filter banks (e.g., Haar , Daubechies) that cannot adapt to the ev olving correlation structure of real network traffic, leading to suboptimal compression. Con versely , recent machine learning approaches introduce data-dri ven adaptability through neural or statistical representations [6, 7, 13, 14]. Y et, these “black-box” methods often relax structural guarantees, introducing approximation errors that can lead to unpredictable behavior under perturbations or resource constraints. This tension between data-dri ven adaptability and mathematical rigor defines the central challenge for application-aw are telemetry: can we learn multiscale r epresentations that r emain orthonormal and pro vably stable while adapting to the statistics of network traffic? Existing learned transforms generally fall into three categories: (i) unconstrained models that abandon orthogonality for flexibility , losing PR guarantees [15, 16]; (ii) soft-constrained approaches that enforce properties via loss penalties, which hold only approximately and require laborious hyperparameter tuning [17, 18]; or (iii) structural methods that impose conjugate quadrature filter (CQF) constraints [19] but do not guarantee exact orthogonality at intermediate training steps. Such approximations are insufficient for mission-critical DT : Parse val violations corrupt energy budgets, and imperfect reconstruction distorts the traffic’ s LRD signature, degrading the twin’ s predictiv e stability . Lezcano-Casado and Mart ´ ınez-Rubio [20] do maintain e xact orthogonality via exponential parametrization, b ut their frame work targets recurrent neural netw orks (RNN) for temporal sequence modeling rather than multiscale signal decomposition. Although neural autoencoders achieve impressi ve rate-distortion on generic signals, they lack the interpretable multiscale structure and LRD-preserv ation guarantees that network state synchronization requires. This work answers affirmati vely by introducing a manifold-constrained optimization scheme where the frame work enforces orthogonality at e very training iteration through polar projection onto the orthogonal manifold [21 – 23], ensuring PR and Parse val ener gy conservation hold throughout learning. The mathematical foundation draws from the multiscale entanglement renormalization ansatz (MERA) [24, 25] – a hierarchical tensor network (TN) from quantum many-body physics, reformulated here as a trainable cascade of local 2 × 2 orthogonal transformations. While the physics literature has treated wav elets as a mathematical analogy when synthesizing quantum states [26, 27], this work in verts the paradigm: by imposing constraints on MERA tensors and interpreting the resulting decomposition as a learnable filter bank, Theorem 1 (Section 5) establishes that MERA layers are mathematically equiv alent to two-channel paraunitary filter banks at ev ery decomposition le vel. This equi valence is not approximate or asymptotic – it holds exactly , enabling a frame work that unifies data-driv en adaptability with the mathematical rigor necessary for reliable closed-loop operation. W ith polar projection ensuring orthogonality at ev ery training iteration, the frame work guarantees energy preservation and in vertibility at all scales while retaining full adaptability to data statistics. When used for compression, it yields interpretable rate-distortion trade-of fs by retaining a fraction ρ of coefficients, of fering empirical validation of its ener gy compaction properties. Having moti vated the need for adapti ve wa velets with structural guarantees, the main contrib utions of this work are summarized as follows: • A formal equiv alence between MERA TN and orthonormal paraunitary wav elet filter banks is established (Theorem 1), bridging concepts from quantum many-body theory and multirate signal processing. • A learning frame work operating directly on the Stiefel manifold O (2) (the group of 2 × 2 orthogonal matrices) is introduced, enforcing PR and Parse val energy preserv ation via polar projection at every iteration. This eliminates the approximation errors inherent to soft-penalty methods [15, 18] and ensures orthogonality throughout training, unlike CQF-based approaches [19] that allo w intermediate coef ficient drift or exponential parametrizations [20] designed for RNN stability . • Experimental validation on six real-world backbone traf fic traces spanning 2020–2025 (314 Mbps–1.75 Gbps) demonstrates 0.5–3.8 dB Peak Signal-to-Noise Ratio (PSNR) gains over fixed w avelet bases while preserving Hurst exponents within 95% confidence interv als at 90% compression, establishing superior LRD retention. Roadmap The remainder of this paper is organized as follo ws: 1 PR is the property that ˆ x [ n ] = x [ n ] with no aliasing or distortion. 2 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 • Section 2 : DT telemetry compression – requirements, bottleneck analysis, and scope. • Section 3 : Mathematical foundations. • Section 4 : MERA-inspired wavelet architecture. • Section 5 : Equiv alence to paraunitary filter banks (Theorem 1). • Section 6 : Learning framework with manifold optimization. • Section 7 : V alidation on real backbone traces. • Section 8 : Conclusion and future directions. 2 Digital T win Synchronization: A Lay ered Perspecti ve This section establishes the context and requirements that moti vate the proposed framew ork. Fig. 1 illustrates the layered architecture, highlighting the telemetry compression layer addressed by this work. 2.1 The Network Digital T win Paradigm Originally proposed in [28] as a digital representation to support the design and de velopment of manufactured components, the concept of a Digital T win (DT) has e volved into an essential component to wards the ne xt generation networks and services. The DT paradigm refers to the construction of continuously updated digital counterparts capable of mirroring the behavior and state of physical entities (or e ven purely virtual, or a physical-virtual hybrid entity). DT methodologies ha ve been e xtended to the domain of communication infrastructures, emer ging the Digital T win Network (DTN) architectures [29] and the Network Digital T win (NDT) instances [30]. An NDT constitutes a virtualized replica of an Original Network (ON), whether physical or virtual, and remains tightly coupled with it through information exchange that enables near real-time state synchronism. Therefore, NDTs are able to support advanced functionalities such as online simulation, network design, optimization, and AI-dri ven control and orchestration mechanisms, which can be exploited by the ON to enhance operational ef ficiency and overall performance. Moreov er, the flexible and interoperable design of NDTs makes them suitable for deploying ne w network services. An NDT does not require dedicated physical equipment to be realized, but it can be instantiated either on specific computing resources or through virtualized resources and service infrastructures. Fixed hardware solutions can guarantee an e xact replica of the original, at the same cost. In comparison, virtualization-based approaches enable significantly more dynamic control and allow the NDT to be tailored more easily to the requirements, and usually with a reduced cost. Network Digital T wins represent an emerging paradigm for 6G network management, where a dynamic virtual replica of the physical infrastructure enables simulation-based optimization, capacity planning, and “what-if ” analysis before deploying changes to production systems [31 – 34]. The DT continuously ingests telemetry data from the physical network – capturing traffic characteristics, queue states, and resource utilization – to maintain synchronization between the virtual model and real-world dynamics [32]. T ypically , a single Digital T win is associated with an Original, say 1:1. Howe ver , multiple Digital T wins can also be instantiated in parallel (say 1:N), forming a DT farm in which each instance can be focused on analyzing dif ferent aspects of the system or e valuating alternativ e strategies in order to be compared. So, a DT farm enables the distribution of analytical tasks across se veral specialized DT instances. Each Digital T win can be tailored to focus on a particular target. DT f arms allo w the analysis and comparison of alternati ve strategies under identical baseline conditions. Because all DTs originate from the same synchronized state of the original, their outcomes can be compared without af fecting the source, reducing risks and accelerating the analysis results. From a DT perspectiv e, backbone telemetry constitutes the dominant synchronization bottleneck, as it aggregates traf fic originating from radio, edge, and core domains into a unified stochastic process. As a result, distortions introduced at this layer propagate directly into the virtual model, affecting the fidelity of do wnstream simulation and optimization tasks. The effecti veness of a DT hinges on synchr onization fidelity : the degree to which the virtual model accurately reflects the statistical and temporal properties of the physical network. This fidelity directly impacts the reliability of simulations used for critical decisions such as congestion control, routing optimization, and service-lev el agreement (SLA) enforcement. High-fidelity synchronization requires continuous telemetry ingestion at temporal resolutions sufficient to capture traf fic dynamics ranging from millisecond-scale microbursts to hour-scale session patterns. The backbone traces employed in this work (Section 7, T able 1) illustrate typical data volumes: at 1 ms sampling granularity , monitoring hundreds of 3 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Digital T win A pplication Layer Simulation, Optimization, What-if Analysis DT Synchronization Layer State Updates, Model Calibration T elemetry Compression Layer (This W ork) MERA-W avelet Codec: Adaptiv e, LRD-Preserving Data Collection Layer In-band Network T elemetry , Streaming T elemetry Physical Network Layer Routers, Switches, Links, Queues Figure 1: Layered architecture for NDT synchronization. The telemetry compression layer (highlighted) provides the interface between raw netw ork measurements and the virtual model. This work contrib utes the adaptiv e MERA-wavelet codec operating at this layer . concurrent links can generate gig abytes of raw telemetry per collection cycle 2 . While such overhead is negligible in ov erprovisioned core networks, it becomes relev ant in bandwidth-constrained scenarios such as satellite backhaul or disaggregated RAN fronthaul [34]. Beyond v olume reduction, a more fundamental requirement is statistical fidelity : compression schemes must preserve the in variants that go vern network performance models – most critically , the LRD structure of traffic (Section 7.1). This motiv ates the de velopment of adaptiv e transforms that maintain structural guarantees while exploiting signal-specific correlations. 2.2 T echnical Requirements and Scope Delimitation This work addresses the telemetry compression layer (Fig. 1) within the broader DT synchronization pipeline. The contribution is a signal processing solution that operates on time-series telemetry streams (e.g., byte-rate samples at 1 ms granularity) and produces compressed representations suitable for transmission to DT infrastructure. The compression method must simultaneously achiev e: 1. Rate-Distortion Efficiency: Maximize reconstruction fidelity (PSNR) under bandwidth constraints. 2. Statistical Fidelity: Preserve the Hurst e xponent H within estimator confidence intervals, ensuring that decompressed telemetry retains the LRD structure necessary for accurate queueing analysis. 3. Structural Guarantees: Provide PR and Parse val ener gy conservation for predictable, deterministic behavior in mission-critical operations. Standard W a velets vs. Learned Appr oaches Classical orthogonal wav elets (Haar , Daubechies, Coiflets) satisfy requirement (3) through their paraunitary properties b ut achiev e suboptimal performance on requirements (1) and (2) due to fixed filter designs optimized for polynomial smoothness rather than po wer-law correlations. Con versely , fully learned neural approaches (e.g., autoencoders) may e xcel at (1) but sacrifice (3) by relaxing orthogonality constraints, introducing approximation errors incompatible with safety-critical DT applications. The proposed MERA-inspired adaptiv e w avelets reconcile this trade-of f by learning filter banks from data while maintaining paraunitary guarantees through manifold-constrained optimization (Section 6). By adapting to the specific spectral characteristics of backbone traf fic, the method achieves superior ener gy compaction (up to 3.8 dB PSNR gain ov er fixed wav elets, Section 7) while preserving H within 95% confidence intervals at 90% compression (Section 7.4). 2 For example, 1 ms sampling over 15-minute windows yields ∼ 9 × 10 5 samples per trace. At 64-bit precision, a single link generates ∼ 7.2 MB per interval; scaling to 200 links produces ∼ 1.4 GB per c ycle [33]. 4 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Architectural P ositioning The codec is DT -agnostic: it interfaces with any simulator or emulator that ingests traffic time series as input – including discrete-event simulators (e.g., NS-3), queueing models, fluid-flow approximations, or hardware-in-the-loop testbeds (e.g., MININET). Rather than prescribing a particular DT architecture, it provides a reusable compression module that preserves the statistical properties required across di verse modeling frame works. V alidation Strategy Follo wing standard practice in source coding research – where video codecs are validated using rate-distortion metrics on benchmark datasets without implementing full streaming protocol stacks – the proposed codec is v alidated using PSNR (rate-distortion ef ficiency) and Hurst e xponent deviation | ∆ H | (statistical fidelity) on six years of MA WI backbone traces [35] spanning 314 Mbps–1.75 Gbps with H ∈ [0 . 77 , 0 . 93] . Baselines include classical orthogonal wa velets: Haar , Daubechies-4, Coiflet-3, Symmlet-8, and Biorthogonal-4.4. This validation demonstrates that compressed telemetry retains the statistical properties necessary for downstream DT models, independent of specific simulator implementations. Integration with full DT frame works – including topology modeling, routing protocols, and closed-loop control – represents important future work (Section 8) but f alls outside the scope of this signal processing contribution. 3 Mathematical Background This section establishes the mathematical foundations underlying the proposed framework: orthogonal transformations that preserve signal energy , the MERA TN architecture that org anizes these transformations hierarchically , and the Stiefel manifold on which constrained optimization is performed to maintain orthogonality throughout learning. 3.1 Unitary and Orthogonal T ransformations Definition 1 (Unitary T ransformation) . A linear operator U : H → H on a comple x inner pr oduct space H is unitary if it pr eserves inner products: ⟨ U x, U y ⟩ = ⟨ x, y ⟩ for all x, y ∈ H , equivalently U † U = I , wher e † denotes the conjugate transpose (Hermitian adjoint). Definition 2 (Orthogonal T ransformation) . F or r eal-valued spaces H = R n , a matrix U ∈ R n × n is orthogonal if U † U = I . The set of all such matrices forms the orthogonal gr oup O ( n ) = { U ∈ R n × n | U † U = I } . Remark 1 (Notational Con vention) . The dagg er symbol A † denotes the conjugate transpose, following con ventions in quantum TN [26, 36]. F or r eal matrices, A † = A T . This notation is r etained thr oughout to emphasize the structural connection to MERA formalism, while acknowledging that U † = U T in the r eal-valued implementation ( U ℓ ∈ O (2) ). Orthogonality ensures three critical properties for adapti ve wa velets: (i) energy conservation via the Parse val identity , (ii) PR through U † , and (iii) numerical stability under composition ( ∥ U ∥ = 1 ). These guarantees are maintained throughout optimization via polar projection onto O (2) (Section 6), distinguishing the proposed framework from approaches where orthogonality is imposed only approximately [15, 19]. 3.2 MERA T ensor Networks MERA TN were introduced by V idal [24] to efficiently represent quantum systems e xhibiting scale-i nv ariant correlations with po wer-law decay – a property that directly parallels LRD in network traf fic. MERA organizes computation into hierarchical layers, each applying: 1. Disentanglers: local unitary transformations removing short-range correlations before coarse-graining. 2. Isometries: linear maps satisfying U † U = I that reduce degrees of freedom (typically by factor tw o) while preserving large-scale structure. This alternating disentangle–coarsen procedure across L layers directly parallels dyadic wa velet decomposition: each MERA layer corresponds to a resolution le vel, with isometries playing the role of analysis filters. As shown by Reyes and Stoudenmire [37], MERA can learn hierarchical correlations across resolutions, bridging quantum renormalization with deep-learning principles. Section 5 formalizes the equiv alence between MERA layers and paraunitary filter banks (Theorem 1), enabling adaptiv e wav elets with exact PR and energy conserv ation guarantees. 3.3 The Stiefel Manifold The orthogonality requirements of paraunitary filter banks frame learning as a constrained optimization problem on smooth manifolds. Specifically , the Stiefel manifold St( n, k ) is defined as the set of matrices with orthonormal columns: St( n, k ) = { U ∈ C n × k | U † U = I k } . (1) 5 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Standard optimizers (SGD, Adam) compute updates in ambient Euclidean space. A linear update e U t +1 = U t − η ∇L ( U t ) (2) generally violates orthogonality , since the gradient may contain a nonzero component normal to the manifold. T o maintain structural integrity , the Euclidean step is followed by a polar retraction [38]: U t +1 = R ( e U t +1 ) = e U t +1  e U † t +1 e U t +1  − 1 / 2 , (3) which projects onto St( n, k ) , ensuring exact orthogonality at e very iteration (Section 6). It is emphasized that the architecture considered in this work corresponds to a disentangler-free, tree-structured MERA- inspired network, rather than a full MERA in the strict tensor-network-theoretic sense. This restriction is deliberate, as it preserves the e xact paraunitary structure required for perfect reconstruction. 4 Mera-Inspired W a velet Ar chitecture This section introduces a unified fr amework for adapti ve orthonormal w avelets tailored to LRD network traf fic. The key idea is to reinterpret the MERA architecture as structured parameterizations of paraunitary filter banks, thereby enabling learnable multiscale representations that remain orthonormal by design. Section 4.1 moti vates the need for adaptiv e multiscale models in the presence of LRD traf fic, while Section 4.2 introduces the MERA-inspired orthogonal layers (Definition 3) that provide the architectural foundation. 4.1 Adaptive Multiscale Models f or LRD T raffic An accurate characterization of the Hurst exponent H from traf fic measurements is critical for capacity planning, queueing analysis, and DT synchronization in 6G systems. The DWT pro vides a natural framework for both analyzing and representing LRD signals through multiresolution decomposition, whose hierarchical structure directly mirrors the scale-in variant correlation patterns characteristic of fractal network traf fic. Fig. 2 illustrates the Mallat pyramid: at each scale ℓ ( ℓ = 1 , 2 , 3 ), the signal is recursiv ely split into approximation coefficients a ℓ (low-pass filter g ) and detail coefficients d ℓ (high-pass filter h ), with downsampling by tw o ( ↓ 2 ) at each stage. This dyadic decomposition not only aligns naturally with the self-similar structure of LRD processes but also enables robust Hurst exponent estimation via wa velet variance scaling [11], making w avelets the standard tool for LRD traf fic analysis and compression. Figure 2: Multiresolution analysis (MRA) illustrating recursiv e approximation/detail splitting with decimation by two. The input discrete-time signal x n is successi vely filtered by the lo w-pass filter g (scaling function) and high-pass filter h (wa velet function), followed by do wnsampling by a factor of two. The approximation stream propagates upward through all levels, while the detail streams are extracted at each corresponding scale. At each stage, the signal length is halved ( N → N/ 2 → N / 4 → N / 8 ), forming the dyadic tree structure characteristic of the D WT [12]. Despite this structural alignment, DWT relies on fix ed, pre-designed filter banks (Haar, Daubechies, Coiflets, Symmlets) optimized for generic smoothness assumptions 3 [12]. While these bases provide rigorous guarantees, their filters are 3 Mallat [12] sho ws that classical wav elets achieve optimal approximation rates for functions in Besov spaces – those with bounded deriv ativ es admitting local polynomial approximations. Network traf fic violates this assumption due to its impulsive, fractal structure. 6 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 hand-crafted to maximize vanishing moments, not to capture the power -law correlations ϕ ( k ) ∼ k − β characteristic of LRD traf fic. Consequently , fix ed wa velets achie ve suboptimal energy compaction on backbone traces: detail coefficients retain significant energy that could be concentrated into approximations, degrading rate-distortion performance under bandwidth constraints. The central challenge is thus to learn wavelet filters adapted to traf fic-specific correlation structures while preserving the mathematical guarantees that make wa velets reliable for mission-critical telemetry . 4.2 MERA-Inspired Orthogonal Lay ers A hierarchical architecture inspired by MERA TN is introduced to pro vide a structured parameterization of orthonor- mal wav elets. The design is guided by three principles: (1) Locality – transformations operate on disjoint pairs; (2) Orthogonality – all operations preserve inner products; and (3) Hierar chy – layers correspond to dyadic scales. Definition 3 (MERA-Inspired Orthogonal Layer) . Let x ∈ R N be a discr ete-time signal with N = 2 L , wher e N is the number of samples and L is the maximum decomposition le vel. A MERA layer at scale ℓ applies a 2 × 2 orthogonal matrix U ℓ ∈ O (2) to disjoint pairs of samples (implicit downsampling by two ( ↓ 2 )): " a ( ℓ ) k d ( ℓ ) k # = U ℓ  x 2 k x 2 k +1  , k = 0 , . . . , N 2 ℓ − 1 , (4) wher e the outputs a = { a ( ℓ ) k } and d = { d ( ℓ ) k } have length N / 2 ℓ each and ar e the decimated appr oximation and detail coefficients, r espectively . Example 1 (MERA Layer Computation) . Consider input x = [1 , 2 , 3 , 4] ⊤ and orthogonal matrix U 1 = 1 √ 2  1 1 1 − 1  (Haar). Applying Eq. (4) : " a (1) 0 d (1) 0 # = U 1  x 0 x 1  = 1 √ 2  1 1 1 − 1   1 2  = 1 √ 2  3 − 1  " a (1) 1 d (1) 1 # = U 1  x 2 x 3  = 1 √ 2  1 1 1 − 1   3 4  = 1 √ 2  7 − 1  Y ielding appr oximation a = 1 √ 2 [3 , 7] ⊤ and detail d = 1 √ 2 [ − 1 , − 1] ⊤ . Ener gy conservation: ∥ x ∥ 2 = 30 = ∥ a ∥ 2 + ∥ d ∥ 2 = 29 + 1 = 30 ✓ Orthogonality U † ℓ U ℓ = I (Definition 2) ensures (i) local pairwise energy conservation a 2 k + d 2 k = x 2 2 k + x 2 2 k +1 ; (ii) Parse val identity ∥ x ∥ 2 = ∥ a ( L ) ∥ 2 + P L ℓ =1 ∥ d ( ℓ ) ∥ 2 ; and (iii) PR via U † ℓ . A complete L -lev el ( L = 4 ) cascade (Fig. 3) applies these layers recursiv ely , producing { a ( L ) , d ( L ) , . . . , d (1) } . The resulting analysis operator A ∈ O ( N ) inherits all guarantees with O ( N ) complexity via decimation, matching f ast wav elet transforms. Summary This section established the MERA-inspired architecture for adapti ve wavelets. Section 4.1 moti vated the need for adapti ve multiscale models tailored to LRD traffic . Section 4.2 introduced the orthogonal layer structure (Definition 3) that provides hierarchical decomposition while preserving energy conserv ation. Section V establishes the exact mathematical equi valence between these layers and classical paraunitary filter banks. 5 Equivalence to P araunitary Filter Banks This section establishes the exact equiv alence between MERA-inspired layers and two-channel paraunitary wa velet filter banks. Section 5.1 introduces the polyphase theory background necessary for this equi valence. Section 5.2 presents the main result (Theorem 1) demonstrating that MERA layers with constant orthogonal matrices are mathematically equiv alent to two-tap paraunitary filter banks. Section 5.3 formulates the manifold-constrained learning objecti ve. T o- gether , these results provide the theoretical foundation upon which the variational learning and optimization procedures dev eloped in subsequent sections are built. 5.1 Polyphase Theory Backgr ound Definition 4 (Paraunitary Filter Bank) . A two-channel filter bank with polyphase matrix E ( z ) =  G 0 ( z ) G 1 ( z ) H 0 ( z ) H 1 ( z )  (5) 7 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Figure 3: MERA-inspired wavelet circuit with four dyadic le vels ( L = 4 ). The input signal samples x 1 , . . . , x 16 feed the first layer , which consists of parallel 2 × 2 orthogonal blocks U 1 acting on disjoint pairs. At each level, the approximation outputs a ( ℓ ) propagate upward through the hierarchy , while the detail outputs d ( ℓ ) are extracted at their respectiv e scales. This hierarchical structure parallels the D WT (Fig. 2) but employs learnable transformation blocks U ℓ , ℓ = 1 , 2 , 3 , 4 . is said to be paraunitary if E ( z ) E † ( z − 1 ) = I . (6) This condition ensur es: 1. PR: synthesis filters ˜ G ( z ) ≜ G ( z − 1 ) and ˜ H ( z ) ≜ H ( z − 1 ) satisfy ˆ x ( z ) = x ( z ) ; 2. P arseval energy conservation: ∥ x ∥ 2 = ∥ a ( L ) ∥ 2 + P L ℓ =1 ∥ d ( ℓ ) ∥ 2 ; 3. F requency-domain po wer complementarity: | G ( ω ) | 2 + | H ( ω ) | 2 = 2 . In multirate signal processing (Fig. 4), PR requires the polyphase matrix (5) to satisfy the paraunitary condition (6) [39]. Intuition: A MERA-inspired layer applies the same 2 × 2 orthogonal matrix U ℓ to disjoint pairs of samples – that is, it intrinsically oper ates in the polyphase domain , where filtering and decimation collapse into a single matrix multiplication. This pairwise block transform is equiv alent to a two-channel paraunitary filter bank with two-tap finite impulse response (FIR) analysis filters. The special case where U ℓ exhibits quadrature mirror filter (QMF) structure, combined with maximum DC gain, uniquely yields the Haar wa velet 4 . Polyphase Decomposition. In multirate filter bank theory , the type-1 polyphase decomposition represents a filter G ( z ) = P n g [ n ] z − n by separating its e ven- and odd-indexed coef ficients. Following the notation of V aidyanathan [39], G ( z ) denotes the full analysis filter , while G 0 ( z ) and G 1 ( z ) denote its e ven and odd polyphase components, respectively , defined as G ( z ) = G 0 ( z 2 ) + z − 1 G 1 ( z 2 ) , (7) with G 0 ( z ) = P k g [2 k ] z − k and G 1 ( z ) = P k g [2 k + 1] z − k . For a 2 × 2 polyphase matrix, the entries admit two equi valent representations. The first expresses the matrix directly in terms of the polyphase components of the analysis filters, E ( z ) =  G 0 ( z ) G 1 ( z ) H 0 ( z ) H 1 ( z )  , (8) 4 Strictly speaking, the classical QMF structure ( H ( z ) = G ( − z ) ) cannot simultaneously achie ve PR and linear phase with FIR filters, except for the trivial Haar case, as prov en by V aidyanathan [39]. The filters employed in this work belong to the class of CQF introduced by Smith and Barnwell [40] – also referred to as paraunitary QMF banks by V aidyanathan. 8 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Figure 4: T wo-channel CQF system. (a): Analysis stage splits input a ℓ − 1 ( m ) via filters g ( − m ) , h ( − m ) and downsampling by a factor of two ( ↓ 2 ), producing a ℓ ( n ) and d ℓ ( n ) . (b): Synthesis stage upsamples by a factor of two ( ↑ 2 ), filters via g ( n ) , h ( n ) , and sums to reconstruct a ℓ − 1 ( m ) . PR is characterized in the polyphase domain by a paraunitary matrix, a condition that will be guaranteed by orthogonal U ℓ (Theorem 1). whereas an alternativ e, more structural representation employs generic polyphase entries, E ( z ) =  E 00 ( z ) E 01 ( z ) E 10 ( z ) E 11 ( z )  . (9) Although (8) and (9) are algebraically equi valent, the generic notation in (9) emphasizes the polyphase matrix as the primary structural object. The correspondence between the two notations is gi ven by E 00 ( z ) ≡ G 0 ( z ) , E 01 ( z ) ≡ G 1 ( z ) , E 10 ( z ) ≡ H 0 ( z ) , and E 11 ( z ) ≡ H 1 ( z ) . Applying the polyphase decomposition (7) to both analysis filters yields: G ( z ) = E 00 ( z 2 ) + z − 1 E 01 ( z 2 ) = G 0 ( z 2 ) + z − 1 G 1 ( z 2 ) , (10) H ( z ) = E 10 ( z 2 ) + z − 1 E 11 ( z 2 ) = H 0 ( z 2 ) + z − 1 H 1 ( z 2 ) . (11) The usefulness of this representation follows from the Noble identities [39], which establish that filtering by H ( z 2 ) followed by decimation by 2 ( ↓ 2 ) is equiv alent to decimation by 2 ( ↓ 2 ) followed by filtering by H ( z ) . This commutation property allows the analysis outputs A ( z ) and D ( z ) to be computed directly in the decimated (polyphase) domain:  A ( z ) D ( z )  = E ( z )  X 0 ( z ) X 1 ( z )  , (12) where X 0 ( z ) and X 1 ( z ) denote the e ven and odd polyphase components of the input X ( z ) = X 0 ( z 2 ) + z − 1 X 1 ( z 2 ) . When E ( z ) ≡ U is constant (i.e., z -independent), so that all polyphase entries satisfy E ij ( z ) ≡ u ij , equations (10) – (11) reduce to length-2 FIR analysis filters, G ( z ) = u 00 + u 01 z − 1 , H ( z ) = u 10 + u 11 z − 1 . (13) This constant-polyphase structure is precisely the one induced by MERA layers, as demonstrated next. 5.2 Main Equivalence Result Theorem 1 (Architectural Equi valence) . A MERA-inspired layer (Definition 3) is equivalent to a two-channel parauni- tary filter bank whose polyphase r epr esentation is a constant orthonormal matrix E ( z ) ≡ U ℓ :  A ( z ) D ( z )  =  g 0 g 1 h 0 h 1   X 0 ( z ) X 1 ( z )  (14) wher e z = e j w , X 0 ( z ) = P k x 2 k z − k , X 1 ( z ) = P k x 2 k +1 z − k ar e e ven/odd polyphase components, and U ℓ =  g 0 g 1 h 0 h 1  ∈ O (2) . The pr oof is given in Appendix A. 9 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Corollary 1 (PR QMF as a special case) . The PR QMF constr aint h [ n ] = ( − 1) n g [ N − 1 − n ] ( N = 2 for a two-tap FIR filter) [12, 39] arises as a special case of the framework when additional symmetry is imposed. This condition ensur es that the highpass filter H ( z ) is derived fr om the lowpass G ( z ) thr ough fr equency re versal, ther eby guar anteeing both orthogonality and alias cancellation. Under the assumptions of Theor em 1, imposing the quadratur e-mirr or symmetry with N = 2 gives h [ n ] = ( − 1) n g [1 − n ] ⇐ ⇒ H ( z ) = − z − 1 G ( − z − 1 ) . (15) F or two-tap FIR analysis filter s G ( z ) = g 0 + g 1 z − 1 and H ( z ) = h 0 + h 1 z − 1 , the QMF r elation yields h 0 = g 1 , h 1 = − g 0 , (16) so that the polyphase matrix takes the form U ℓ =  g 0 g 1 g 1 − g 0  . (17) Furthermor e, the Haar wavelet is the unique r eal two-tap FIR filter bank satisfying both PR, QMF paraunitarity , and maximal DC gain (equivalently , g 0 = g 1 (see Appendix B for pr oof)). 5.3 Manifold-Constrained Lear ning Objective This subsection formulates the learning problem associated with the MERA-inspired paraunitary architecture. The optimization objecti ve is to learn scale-dependent orthogonal transformations that maximize ener gy compaction in LRD traffic while guaranteeing PR and P arsev al energy conserv ation. Let θ = { U ℓ } L ℓ =1 denote the learnable parameters, where each U ℓ ∈ O (2) . For an input signal x , the analysis transform A θ produces the coefficient set { a ( L ) , d (1) , . . . , d ( L ) } . Signal reconstruction is gi ven by ˆ x = S θ ( A θ ( x )) , where S θ = A † θ . The loss function L ( { U ℓ } ) promotes sparse multiscale representations by concentrating signal energy into a small number of approximation coef ficients while penalizing the aggregate magnitude of detail coef ficients across all scales: min { U ℓ ∈O (2) } L = λ sparse L X ℓ =1 1 N ℓ ∥ d ( ℓ ) ∥ 1 | {z } sparsity term + λ MSE N ∥ x − ˆ x ∥ 2 2 | {z } reconstruction term (MSE) (18) where N ℓ ≜ card ( d ( ℓ ) ) = N / 2 ℓ denotes the number of detail coefficients at scale ℓ . Loss function terms The sparsity term ( ℓ 1 norm) promotes energy compaction into few lar ge-magnitude coefficients. The r econstruction term (MSE) penalizes mismatch between the input signal and its reconstruction from the full coef ficient set. This term is optional: when λ MSE = 0 , training minimizes only the sparsity objecti ve P ℓ ∥ d ( ℓ ) ∥ 1 . Since orthogonality is enforced by projection after each update, PR is guaranteed regardless of whether the MSE term is activ e. Normalization The sparsity term is normalized by N ℓ to ensure that λ sparse remains in variant with respect to changes in the decomposition depth L and the window size N . Without this normalization, the ef fectiv e weight of the sparsity penalty would decrease as decimation reduces the number of coef ficients at coarser scales, requiring retuning of λ sparse for different configurations. Manifold Constraint The constraint U ℓ ∈ O (2) defines a smooth Riemannian manifold. This formulation differs from unconstrained empirical risk minimization, as orthogonality is enforced structurally via polar projection onto the orthogonal group (Section 6) rather than through soft penalty terms. As a result, perfect reconstruction and P arsev al energy conserv ation are satisfied by construction at ev ery training iteration. Summary This section established the exact equiv alence between MERA-inspired layers and paraunitary filter banks. Section 5.1 introduced the polyphase theory frame work, showing ho w two-channel filter banks operate in the decimated domain via constant polyphase matrices. Section 5.2 proved the main result (Theorem 1): MERA layers with orthogonal matrices U ℓ ∈ O (2) are mathematically equiv alent to two-tap paraunitary filter banks, inheriting perfect reconstruction and energy conserv ation guarantees. Corollary 1 showed that the Haar wav elet arises as the unique QMF filter maximizing DC gain. Section 5.3 formulated the manifold-constrained learning objecti ve, which promotes sparsity while enforcing orthogonality via polar projection at ev ery training iteration. Section 6 presents the optimization algorithm; Section 7 validates performance on real netw ork traces. 10 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Algorithm 1 MERA-W av elet Optimization Require: x , L , numiter , η , λ sparse , λ MSE , U 0 (optional) 1: U ← U 0 if provided; otherwise initialize randomly and project onto O (2) 2: f or k = 1 to numiter do 3: ( a, { d ( ℓ ) } L ℓ =1 ) ← M E R A A N A L Y Z E ( x, U ) ▷ Forw ard transform 4: L sparse ← 1 n d P L ℓ =1 ∥ d ( ℓ ) ∥ 1 ▷ Mean ℓ 1 norm 5: if λ MSE > 0 then 6: ˆ x ← M E R A S Y N T H E S I Z E ( a, { d ( ℓ ) } , U ) 7: L MSE ← 1 N ∥ ˆ x − x ∥ 2 2 8: else 9: L MSE ← 0 10: end if 11: L ← λ sparse L sparse + λ MSE L MSE 12: ∇ U ← B AC K P RO PAG A T E ( L , U, x ) ▷ Euclidean gradient 13: U ← A DA M S T E P ( U, ∇ U , η ) ▷ Update in R 2 × 2 14: for ℓ = 1 to L do 15: U ℓ ← U ℓ ( U † ℓ U ℓ ) − 1 / 2 ▷ Polar projection onto O (2) 16: end for 17: end f or 18: r eturn U 6 Learning Framew ork Having established in Section 5 that MERA-inspired layers form paraunitary filter banks, the practical optimization pipeline is described next 5 . This pipeline learns the scale isometries { U ℓ } L ℓ =1 directly from data while preserving PR, energy conserv ation, and numerical stability . 6.1 Optimization Pipeline The core learning procedure, detailed in Algorithm 1, implements a v ariational loop that optimizes the MERA-inspired filter banks on windowed traf fic segments. The algorithm requires a real-valued signal windo w x ∈ R N , the number of decomposition le vels L , and non-negati ve loss weights λ sparse and λ MSE . The output is a collection of scale isometries U = { U 1 , . . . , U L } constrained to remain orthonormal throughout training. Each iteration proceeds in three phases: 1. Forward analysis (line 3): decompose x into multiscale coef ficients { a ( L ) , d (1) , . . . , d ( L ) } . 2. Loss ev aluation (lines 4–11): compute the composite objecti ve combining sparsity promotion and (optionally) reconstruction fidelity . 3. Constrained update (lines 12–15): apply Adam gradient step followed by polar projection to restore orthogonality . Gradient flow and manif old projection Line 12 computes the Euclidean gradient ∇ U L in the ambient space R 2 × 2 via automatic dif ferentiation (AD). Line 13 applies Adam [42] with learning rate η . This Euclidean step violates the orthogonality constraint U † ℓ U ℓ = I . Line 15 restores the constraint via polar pr ojection : for each U ℓ , the nearest orthogonal matrix in Frobenius norm is U ℓ ( U † ℓ U ℓ ) − 1 / 2 . By enforcing U ℓ ∈ O (2) at e very iteration, the algorithm guarantees that paraunitarity , Parse val identity , and PR hold throughout training – not merely as soft approximations. Loss function The sparsity term L sparse (line 4) promotes ener gy compaction by penalizing the mean absolute v alue of detail coef ficients. The normalization factor n d denotes the total number of detail coef ficients across all scales, ensuring that the ef fectiv e weight of the sparsity penalty remains in v ariant with respect to the decomposition depth L and the signal length N . 5 The implementation is inspired by the MERA Julia code example released by Ev enbly [41]. 11 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Gradient-Based Optimization The frame work employs AD to compute gradients ∇ U L with respect to the filter parameters. Gradients are computed via the chain rule applied through the computational graph of the MERA transform – a process commonly termed backpr opagation in the machine learning literature [43]. The Adam optimizer [42] adapts learning rates per-parameter using exponential moving a verages of first ( m ) and second gradient ( v ) moments: m t = β 1 m t − 1 + (1 − β 1 ) ∇ U L , (19) v t = β 2 v t − 1 + (1 − β 2 )( ∇ U L ) 2 , (20) U t = U t − 1 − η ˆ m t √ ˆ v t + ϵ , (21) where ˆ m t and ˆ v t are bias-corrected estimates. This adaptiv e scheme provides f aster con vergence and reduced sensiti vity to hyperparameter selection compared to vanilla stochastic gradient descent. The default parameters ( β 1 = 0 . 9 , β 2 = 0 . 999 , ϵ = 10 − 8 ) are used throughout (T able 2). 6.2 Initialization The framework adopts Haar warm-start initialization : each U ℓ is set to U Haar = 1 √ 2  1 1 1 − 1  . This provides coarse energy compaction from the outset, and subsequent Adam updates refine this prior to match trace-specific correlations. Empirically , Haar initialization accelerates conv ergence compared to random starting points while achieving identical final performance. For random initialization, each U ℓ can be drawn from a Gaussian distrib ution and immediately projected onto O (2) , providing a neutral baseline that does not bias the learned filters to ward any w avelet family . 6.3 Computational Complexity The proposed framew ork preserves the ef ficiency of classical wav elet transforms: Inference. Analysis and synthesis have comple xity O ( N ) , identical to the DWT . Each lev el ℓ processes N / 2 ℓ samples with constant-cost 2 × 2 operations. T raining. Each iteration requires O ( N ) for forward/backward passes. Polar projection operates on 2 × 2 matrices with O (1) cost per le vel, contributing O ( L ) = O (log N ) ov erhead. T otal training cost is O ( T · N ) for T iterations. Parameter efficiency . The learned transform requires only 4 L scalar parameters (one 2 × 2 orthogonal matrix per lev el). For L = 5 , this amounts to 20 trainable parameters – enabling rapid adaptation without risk of overfitting. Summary This section presented the MERA-inspired w avelet learning frame work. Algorithm 1 inte grates AD, Adam optimization, and manifold-constrained projection into a unified pipeline. Section 7 validates the framew ork on six years of backbone traffic, demonstrating that learned orthonormal wa velets adapt to traffic-specific correlation structures while preserving the mathematical guarantees essential for 6G telemetry . 7 Experimental Results This section presents an empirical v alidation of the proposed adapti ve MERA-inspired wa velet frame work. The e valuation assesses whether learned orthonormal wa velets simultaneously achie ve impro ved rate-distortion performance and preserve the LRD properties critical for DT synchronization. 7.1 The LRD Preser vation Requirement As mentioned in Section 1, backbone traffic e xhibits LRD characterized by power -law autocorrelation decay: ϕ ( k ) ∼ k − β , 0 < β < 1 , (22) where the decay exponent β relates to the Hurst parameter H ∈ (0 . 5 , 1) via β = 2 − 2 H . This slow correlation decay has profound implications for network models commonly employed in DT frame works: 12 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Queueing Analysis For finite buf fers of size B packets, the overflo w probability under LRD input decays polynomially rather than exponentially [4, 5]: P ( overflo w ) ∼ B − (2 − 2 H ) = B − β . (23) In contrast, Markovian (memoryless) models predict P ( overflo w ) ∼ e − λB . This disparity leads to 10 2 – 10 3 × errors in buf fer dimensioning when H is underestimated, fundamentally altering capacity pro visioning rules for ultra-reliable low-latenc y communications (URLLC) in 6G systems. Capacity Planning The effecti ve bandwidth required to meet target loss rates scales differently under LRD traf fic compared to Poisson or exponential models [1]. DT -driv en optimization algorithms that rely on traffic statistics as input parameters will generate in valid pro visioning decisions if the telemetry compression distorts H . Consequence If telemetry compression degrades the LRD signature (e.g., by attenuating long-timescale correlations), the DT’ s predictions become statistically inconsistent with the physical network. This can lead to mis-provisioning, SLA violations, or instability in closed-loop control scenarios. 7.2 Experimental Setup 7.2.1 Dataset and Prepr ocessing The e v aluation utilizes trans-Pacific backbone traces from the MA WI (Measurement and Analysis on the WIDE Internet) W orking Group Traf fic Archi ve [35]. Six captures spanning 2020–2025 (Samplepoint-F) were selected to represent heterogeneous operating conditions, with traf fic loads ranging from 314 Mbps to 1.75 Gbps. Pack et-level metadata were aggregated into byte-per -millisecond time series. T able 1 summarizes the characteristics of the traces. At the time of this study , publicly a vailable, lar ge-scale, millisecond-resolution traf fic traces from operational 5G or beyond-5G networks are not av ailable. This limitation is widely ackno wledged in the literature. Consequently , this work validates the proposed framew ork on backbone aggregation traces, which capture the emer gent statistical properties – particularly LRD – that digital twins must preserve for stable closed-loop optimization. T able 1: MA WI trace characteristics (Samplepoint-F , 15-min captures). T race Duration Packets A vg. rate 202004081229 900 s 81 M 314 Mbps 202103181400 900 s 86 M 416 Mbps 202204131100 900 s 119 M 769 Mbps 202301131400 900 s 108 M 776 Mbps 202406192000 900 s 194 M 1.75 Gbps 202504090300 900 s 126 M 885 Mbps Scope limitation: These backbone traces capture aggregated traffic from thousands of sources, exhibiting the LRD structure characteristic of statistical multiplexing. The framew ork’ s performance on wireless edge telemetry – where individual user dynamics, channel f ading, and mobility introduce distinct correlation structures – is deferred to future in vestigation (Section 8). 7.2.2 T raining Configuration Experiments employ a two-stage training schedule optimizing MERA-wav elet parameters on 1024-sample non- ov erlapping windows. The optimization utilizes the Adam solver with a sparsity-driv en objectiv e ( λ sparse = 1 . 0 , λ MSE = 0 ), ensuring that the learned filters prioritize energy compaction into approximation coefficients. T able 2 details the complete hyperparameter configuration. Repr oducibility: All experiments ran on a single Apple M3 Pro laptop using Julia 1.11 with CPU-only execution. Ran- dom seed 12345 ensures deterministic initialization. The complete codebase, including hyperparameter configuration files, training scripts, and learned filters, is av ailable at https://github.com/alexandreblima/MERA- wavelets . 7.2.3 Baselines Performance is compared against fixed orthonormal wavelets: Haar (length-2), Daubechies-4 ( db4 ), Coiflet-3, Symmlet- 8, and Biorthogonal 4.4. These baselines allo w us to isolate the benefits of data-dri ven adaptivity under strict paraunitary constraints. 13 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Δ PSNR (dB) ρ 0.0 0.2 0.4 0.6 0.8 0.5 1.0 1.5 2.0 2.5 MERA - Haar MERA - DB4 MERA - Coiflet-3 MERA - Symmlet-8 MERA - Biorthogonal-4.4 (a) 2020 (314 Mbps, H =0 . 89 ) Δ PSNR (dB) 0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.0 2.5 3.0 MERA - Haar MERA - DB4 MERA - Coiflet-3 MERA - Symmlet-8 MERA - Biorthogonal-4.4 ρ (b) 2021 (416 Mbps, H =0 . 77 ) 0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.0 2.5 3.0 MERA - Haar MERA - DB4 MERA - Coiflet-3 MERA - Symmlet-8 MERA - Biorthogonal-4.4 (c) 2022 (769 Mbps, H =0 . 93 ) 0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.0 2.5 3.0 MERA - Haar MERA - DB4 MERA - Coiflet-3 MERA - Symmlet-8 MERA - Biorthogonal-4.4 ρ Δ PSNR (dB) (d) 2023 (776 Mbps, H =0 . 86 ) Δ PSNR (dB) 0.0 0.2 0.4 0.6 0.8 1.5 2.0 2.5 3.0 3.5 4.0 MERA - Haar MERA - DB4 MERA - Coiflet-3 MERA - Symmlet-8 MERA - Biorthogonal-4.4 ρ (e) 2024 (1.75 Gbps, H =0 . 88 ) 0.0 0.2 0.4 0.6 0.8 ρ 0.5 1.0 1.5 2.0 2.5 MERA - Haar MERA - DB4 MERA - Coiflet-3 MERA - Symmlet-8 MERA - Biorthogonal-4.4 Δ PSNR (dB) (f) 2025 (885 Mbps, H =0 . 83 ) Figure 5: PSNR gains of MERA-learned wav elets over fixed baselines as a function of retention ratio ρ . A retention ratio of ρ = 0 . 1 corresponds to 90% compression (retaining only 10% of coefficients by magnitude). The learned filters consistently outperform classical wa velets across all compression lev els and traffic conditions. 7.2.4 Evaluation Metrics Reconstruction fidelity is quantified using PSNR: PSNR = 10 · log 10  MAX 2 I MSE  , MSE = 1 N N X i =1 ( x i − ˆ x i ) 2 (24) where MAX I is the peak magnitude of the window . Statistical fidelity is assessed by the preservation of the Hurst exponent ( H ), estimated via Abry–V eitch wa velet regression [11]. The error metric is ∆ H = H compressed − H orig . 14 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 T able 2: MERA-W avelet training hyperparameters for all experiments (Section 7). Parameter V alue Justification Ar chitectur e Decomposition lev els L = 5 Captures scales 2 1 – 2 5 (2–32 ms) Initialization Haar warm-start Lev erages wav elet prior Optimization T otal iterations 100 (50 + 50) T wo-stage schedule Stage 1 learning rate η 1 = 5 × 10 − 3 Coarse adaptation Stage 2 learning rate η 2 = 2 . 5 × 10 − 3 Fine-tuning (halved η 1 ) Adam parameters β 1 =0 . 9 , β 2 =0 . 999 Standard defaults Adam epsilon ϵ = 10 − 8 Numerical stability Loss function Sparsity weight λ sparse = 1 . 0 ℓ 1 penalty on detail coefficients MSE weight λ MSE = 0 . 0 Disabled (no improv ement observed) Data pr ocessing W indow size 1024 samples Power -of-two for dyadic D WT W indow stride 1024 samples Non-overlapping windo ws Retention ratios ρ ∈ { 0 . 01 , . . . , 0 . 80 } Rate-distortion ev aluation Implementation Random seed 12345 Reproducibility Parametrization MERA (polar proj.) Algorithm 1, Section 6 Hardware Apple M3 Pro CPU-only execution 7.3 Compression P erformance After training, the learned filters are e valuated under v arying bandwidth constraints by retaining only a fraction ρ ∈ (0 , 1] of the wavelet coefficients ranked by magnitude. Specifically , gi ven the full coefficient vector c = [ a ( L ) , d ( L ) , . . . , d (1) ] , the compressed representation retains the ⌈ ρ · | c |⌉ coefficients with lar gest absolute values, setting the remainder to zero. Reconstruction is then performed via the inv erse MERA transform S θ . This coef ficient-thresholding approach follows standard practice in wa velet compression [12] and enables direct comparison across retention ratios. Note that ρ is an evaluation parameter – it does not appear in the training objecti ve (18) . The learned filters are optimized for general sparsity (minimizing detail coef ficient magnitudes), and the retention ratio is varied at test time to characterize rate-distortion performance across dif ferent compression le vels. T o facilitate direct comparison with fixed w avelet baselines, performance is reported in terms of ∆PSNR , defined as ∆PSNR( ρ ) ≜ PSNR MERA ( ρ ) − PSNR baseline ( ρ ) . (25) Fig. 5 presents the Rate-Distortion performance for all six MA WI traces. The proposed MERA-inspired w avelet framew ork consistently outperforms fixed baselines across the full range of retention ratios ( ρ ). Rate-Distortion Analysis The learned filters achie ve PSNR gains ranging from 0 . 5 dB to 3 . 8 dB compared to the best fixed alternati ve. • Peak P erformance (2024): The largest gains are observed in the 2024 trace (Fig. 5e), where MERA achiev es a 3 . 8 dB improv ement over Coiflet-3, Symmlet-8, and Biorthogonal-4.4 . This trace corresponds to the highest network load (1.75 Gbps) and strong LRD ( H ≈ 0 . 88 ), indicating that adaptiv e filters effecti vely capture the bursty dynamics of saturated links. • Con vergence of Baselines: Higher-order fixed wa velets ( db4 , Coiflet, Symmlet) tend to cluster within a narro w performance band ( < 0 . 3 dB dif ference). MERA breaks this ceiling, demonstrating that optimizing the spectral tilt of the filter bank yields benefits beyond simply increasing the number of v anishing moments. • Haar Comparison: While Haar performs robustly due to its short support, MERA consistently surpasses it by 0 . 6 – 3 . 1 dB, proving that the learned filters successfully balance time-domain localization with frequenc y selectivity . 15 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 7.4 Statistical Fidelity (LRD Preser vation) Beyond pointwise error (MSE), 6G DT will require the preserv ation of the self-similar traf fic structure. T able 3 lists the reference Global Hurst Exponents ( H ) for the raw traces, confirming persistent LRD ( 0 . 77 ≤ H ≤ 0 . 93 ) across all years. T able 3: Global Hurst exponent estimates from MA WI traces (Abry–V eitch regression, 95% confidence interv al). T race (MA WI) ˆ H 95% CI 202004081229 0.8897 [0.853, 0.926] 202103181400 0.7674 [0.684, 0.851] 202204131100 0.9313 [0.883, 0.979] 202301131400 0.8641 [0.817, 0.911] 202406192000 0.8771 [0.825, 0.929] 202504090300 0.8329 [0.787, 0.878] Hurst Exponent Preser vation T able 4 reports the deviation ∆ H in the reconstructed signal. At a retention ratio of ρ = 0 . 1 (90% compression), the method maintains | ∆ H | ≤ 0 . 03 for all traces. This demonstrates that the learned basis functions preserve the po wer-la w decay of the autocorrelation function even at high compression rates. Importantly , increasing the retention factor ρ does not necessarily impro ve the stability of the Hurst e xponent. While larger ρ preserves more coef ficients, it also reintroduces small-amplitude detail components primarily associated with high-frequency fluctuations. Since Hurst exponent estimation depends on the stability of multiscale scaling beha vior rather than local reconstruction fidelity , these weak high-frequency contributions may increase estimator sensitivity and perturb the slope of the wav elet logscale diagram. Conv ersely , moderate sparsification suppresses such weak detail coefficients, effecti vely acting as a structural denoising mechanism that stabilizes scaling statistics. Therefore, the observed v ariations of ∆ H with ρ reflect estimator sensitivity rather than degradation of the reconstructed signal. The threshold | ∆ H | ≤ 0 . 03 was selected based on the statistical precision of the estimator . As derived from the 95% confidence interv als for the raw traces (T able 3), the intrinsic uncertainty of the Abry–V eitch estimator for these finite-length windows ranges from ± 0 . 036 (T race 2020) to ± 0 . 083 (T race 2021). Ev en for the critical high-load scenario (T race 2024), the measurement error is ≈ ± 0 . 052 . Consequently , maintaining compression deviations within 0 . 03 ensures that the LRD structure of the reconstructed telemetry remains statistically indistinguishable from the original source, preserving the validity of the data for queueing analysis within the limits of measurement precision. The spectral analysis in Fig. 6 corroborates this, showing that the energy distrib ution across scales ℓ maintains linearity . While deviations occur at v ery large scales ( ℓ > 15 ) due to finite-size effects and non-stationarity , the primary scaling region essential for LRD modeling is preserv ed. T able 4: Hurst exponent deviations across compression lev els (learned MERA). V alues with | ∆ H | ≤ 0 . 03 are shown in bold , indicating strong preservation of LRD. T race H orig ∆ H ( ρ =0 . 1) ∆ H ( ρ =0 . 2) ∆ H ( ρ =0 . 4) ∆ H ( ρ =0 . 8) 2020 0.890 +0.027 -0.011 -0.038 -0.049 2021 0.767 +0.011 -0.017 -0.039 -0.051 2022 0.931 +0.002 -0.028 -0.055 -0.064 2023 0.864 +0.020 -0.012 -0.039 -0.048 2024 0.877 -0.010 -0.041 -0.065 -0.079 2025 0.833 -0.004 -0.046 -0.078 -0.086 Mean 0.846 +0.009 -0.026 -0.052 -0.063 7.5 Learned Filter Analysis T o understand the adaptation mechanism, we examine the filters learned from the 2024 trace (Fig. 7). Starting from a Haar initialization, the optimization con ver ges to an asymmetric structure (T able 5) that increases the support of the basis functions. The frequency response rev eals that the learned filters introduce specific passband ripples that deviate from the “maximum flatness” criteria of Daubechies wav elets. These deviations are not artifacts b ut data-driven adaptations that 16 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Figure 6: W avelet ener gy spectra S ℓ vs. scale ℓ for MA WI traces. The linear growth confirms po wer-law scaling (LRD). MERA filters are optimized to match this spectral tilt. maximize energy compaction for the specific spectral signature of internet traf fic, v alidating the use of the MERA framew ork for discovering domain-specific orthogonal bases. T able 5: Learned filter coefficients for trace 202406192000 . De viations from Haar (length-2) indicate adaptation to traffic structure. Lev el ℓ ∥ g ℓ − g Haar ∥ 2 g ℓ (low-pass) h ℓ (high-pass) 1 0.0177 [0 . 7195 , 0 . 6945] ⊤ [ 0 . 6945 , − 0 . 7195] ⊤ 2 0.0505 [0 . 7419 , 0 . 6705] ⊤ [ 0 . 6705 , − 0 . 7419] ⊤ 3 0.0473 [0 . 7398 , 0 . 6729] ⊤ [ 0 . 6729 , − 0 . 7398] ⊤ 4 0.0331 [0 . 6833 , 0 . 7301] ⊤ [ 0 . 7301 , − 0 . 6833] ⊤ 5 0.0222 [0 . 7226 , 0 . 6913] ⊤ [ 0 . 6913 , − 0 . 7226] ⊤ Summary The experimental validation confirms that the proposed MERA-wav elet framework ef fectively bridges the gap between theoretical orthogonality and data-dri ven adaptation. The key takea ways for 6G DT implementations are: • Rate-Distortion Superiority: The learned filters achiev e consistent PSNR gains of 0 . 5 – 3 . 8 dB ov er standard wa velet families. The advantage is most pronounced in high-load, bursty scenarios (e.g., the 2024 trace with 1.75 Gbps), confirming that adapti ve bases successfully capture the non-stationary dynamics of modern backbone traffic. • Statistical Preservation: Crucially for predictive modeling, the method preserves the self-similar nature of the traffic. The Hurst exponent deviations remain negligible ( | ∆ H | ≤ 0 . 03 ) ev en at 90% compression ( ρ = 0 . 1 ), ensuring that the reconstructed telemetry retains the correlation structure necessary for accurate network simulation. • Structural Guarantees: Unlike unconstrained deep learning approaches, the MERA-based optimization con ver ges to interpretable, perfectly reconstructing filter banks. The results demonstrate that strict paraunitary constraints can be maintained without sacrificing the flexibility required to adapt to diverse spectral signatures. These findings position the MERA-wav elet not merely as a compression tool, but as a reliable interface for high-fidelity data synchronization in 6G architectures. 17 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Figure 7: Frequency response of analysis filters (T race 2024). Left: learned paraunitary filters. Right: Haar initialization. T op: low-pass cascades G ℓ ( ω ) . Bottom: high-pass filters H ℓ ( ω ) . Haar itself exhibits ripple due to the length - 2 basis. Both learned cascades keep this ripple structure; howe ver , the low-pass shift their amplitude and zero locations slightly across levels, especially in the pass-band and transition regions. 8 Conclusion This w ork addressed a central challenge in the design of DT for 6G networks: achieving high-fidelity telemetry compression while preserving the strict structural guarantees required for reliable closed-loop operation. Instead of treating compression as a generic rate – distortion problem, the proposed approach framed telemetry as a synchronization mechanism, where violations of in vertibility , ener gy conservation, or LRD directly compromise the predictiv e stability of the DT . A rigorous and exact equiv alence between MERA tensor networks and tw o-channel paraunitary wav elet filter banks was established, enabling a learning frame work that overcomes the limitations of fix ed wav elet designs. Experimental validation on real-world backbone aggre gation traces demonstrated consistent rate – distortion gains of up to 3.8 dB ov er classical orthogonal and biorthogonal wav elets, while preserving the self-similar structure of traffic within strict Hurst-exponent bounds. These improvements were obtained without relaxing paraunitary constraints, ensuring PR and Parse val ener gy conservation at all scales. Beyond compression performance, the results position the MERA-wa velet framew ork as a principled synchronization interface between physical networks and their DT . By preserving the multiscale statistical inv ariants that underpin traffic modeling, the proposed method pro vides a technology-agnostic foundation for telemetry pipelines in bandwidth- constrained 6G architectures. Extensions to wireless and edge en vironments, where mobility and radio-induced nonstationarity introduce additional challenges, constitute a natural direction for future in vestigation. 18 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 A Proof of Theor em 1 (Architectural Equi valence) Theorem 1 (Ar chitectural Equi valence). A MERA-inspired layer (Definition 3) is equiv alent to a two-channel paraunitary filter bank whose polyphase representation is a constant orthonormal matrix E ( z ) ≡ U ℓ :  A ( z ) D ( z )  =  g 0 g 1 h 0 h 1   X 0 ( z ) X 1 ( z )  . (26) Pr oof of Theor em 1. Both directions of the equiv alence are deriv ed. (Sufficiency) Assume a MERA-inspired local operator U ℓ ∈ O (2) acts pointwise on adjacent pairs of samples,  a k d k  = U ℓ  x 2 k x 2 k +1  =  g 0 g 1 h 0 h 1   x 2 k x 2 k +1  , k ∈ Z . (27) It is shown ne xt that this local transformation induces a paraunitary filter bank. Step 1 (Time-domain output): Expanding (27) yields the component-wise relations a k = g 0 x 2 k + g 1 x 2 k +1 , d k = h 0 x 2 k + h 1 x 2 k +1 . (28) Step 2 (Polyphase decomposition): Define the decimated z-transforms of the outputs A ( z ) = X k a k z − k , D ( z ) = X k d k z − k , (29) and the ev en/odd polyphase components of the input X 0 ( z ) = X k x 2 k z − k , X 1 ( z ) = X k x 2 k +1 z − k . (30) Step 3 (Z-transform substitution): Substituting the expressions for a k and d k from Step 1 into the z-transforms gives A ( z ) = g 0 X 0 ( z ) + g 1 X 1 ( z ) , (31) D ( z ) = h 0 X 0 ( z ) + h 1 X 1 ( z ) , (32) which can be written compactly in matrix form as  A ( z ) D ( z )  =  g 0 g 1 h 0 h 1   X 0 ( z ) X 1 ( z )  = E ( z )  X 0 ( z ) X 1 ( z )  , (33) where E ( z ) ≡ U ℓ is the constant polyphase matrix. Step 4 (T wo-tap FIR filters): Applying (10)–(11), the analysis filters are G ( z ) = E 00 ( z 2 ) + z − 1 E 01 ( z 2 ) , H ( z ) = E 10 ( z 2 ) + z − 1 E 11 ( z 2 ) . (34) Since E ( z ) ≡ U ℓ is constant (z-independent), substituting the scalar entries E 00 = g 0 , E 01 = g 1 , E 10 = h 0 , E 11 = h 1 yields G ( z ) = g 0 + g 1 z − 1 , H ( z ) = h 0 + h 1 z − 1 , (35) which are length-2 FIR analysis filters parameterized by the entries of U ℓ . Step 5 (Paraunitarity): The orthogonality condition U ℓ ∈ O (2) directly implies paraunitarity of the polyphase matrix: E ( z ) E † ( z − 1 ) = U ℓ U ℓ † = I . (36) This ensures power complementarity in the frequenc y domain: | G ( ω ) | 2 + | H ( ω ) | 2 = 2 . Step 6 (Perfect reconstruction): Choosing the synthesis polyphase matrix R ( z ) = E † ( z − 1 ) = U ℓ † ensures R ( z ) E ( z ) = I , guaranteeing alias cancellation. In the time domain, this yields U ℓ †  U ℓ  x 2 k x 2 k +1  = ( U ℓ † U ℓ )  x 2 k x 2 k +1  =  x 2 k x 2 k +1  , (37) confirming perfect reconstruction of the input samples. 19 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Conclusion: A MERA layer with U ℓ ∈ O (2) induces a critically sampled, two-channel, two-tap paraunitary filter bank with constant polyphase matrix E ( z ) ≡ U ℓ , inheriting guarantees such as perfect reconstruction, energy conserv ation (Parse val identity), and O ( N ) complexity . (Necessity) Suppose now that the analysis stage of a two-channel paraunitary filter bank has a constant polyphase matrix E ( z ) ≡ U =  g 0 g 1 h 0 h 1  , U U † = I . (38) It is established next that this filter bank necessarily implements a MERA-inspired layer . Step 1 (Polyphase repr esentation): By the polyphase decomposition of the two-channel analysis bank, the output transforms are  A ( z ) D ( z )  = E ( z )  X 0 ( z ) X 1 ( z )  , (39) where X 0 ( z ) = P k x 2 k z − k and X 1 ( z ) = P k x 2 k +1 z − k are the even and odd polyphase components of the input signal. Step 2 (Time-domain relation): Since E ( z ) ≡ U is constant (z-independent), all polyphase entries are scalars. Matching coefficients in the z-transform yields the time-domain relation  a k d k  = U  x 2 k x 2 k +1  , k ∈ Z . (40) This sho ws that the analysis operation applies the same matrix U to each pair of adjacent samples ( x 2 k , x 2 k +1 ) independently , followed by implicit do wnsampling. Step 3 (Equivalence to MERA layer): The pairwise transformation in the pre vious step is precisely the definition of a MERA-inspired layer (Definition 3):  a k d k  = U ℓ  x 2 k x 2 k +1  . (41) Thus, the filter bank analysis coincides exactly with the action of a MERA layer using U ℓ = U . Step 4 (Paraunitarity verification): The paraunitarity condition E ( z ) E † ( z − 1 ) = I reduces to U U † = I for constant E ( z ) ≡ U , confirming that U ∈ O (2) . This ensures perfect reconstruction via U † and energy conserv ation (Parse val identity). Step 5 (T wo-tap FIR structure): Applying (10)–(11) to the constant polyphase matrix yields the analysis filters G ( z ) = E 00 ( z 2 ) + z − 1 E 01 ( z 2 ) = g 0 + g 1 z − 1 , (42) H ( z ) = E 10 ( z 2 ) + z − 1 E 11 ( z 2 ) = h 0 + h 1 z − 1 , (43) which are two-tap FIR filters parameterized by the entries of U . Conclusion: Any two-channel paraunitary filter bank with constant polyphase matrix E ( z ) ≡ U ∈ O (2) necessarily implements a MERA-inspired layer with two-tap FIR analysis filters. This completes the proof of equiv alence. B Proof of Cor ollary 1 (Uniqueness of Haar for T wo-T ap QMF) Pr oof of Cor ollary 1. It is shown that the Haar wav elet is the unique real two-tap FIR filter bank satisfying both PR and QMF paraunitarity . Step 1 (Orthogonality): The paraunitarity condition U ℓ U † ℓ = I applied to (17) yields  g 0 g 1 g 1 − g 0   g 0 g 1 g 1 − g 0  † =  g 2 0 + g 2 1 0 0 g 2 0 + g 2 1  = I . (44) This immediately giv es the normalization constraint g 2 0 + g 2 1 = 1 . (45) 20 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Step 2 (Parameterization): Eq. (45) parameterizes all solutions as points on the unit circle: g 0 = cos θ , g 1 = sin θ , θ ∈ [0 , 2 π ) . (46) Step 3 (DC response maximization): Among all orthonormal solutions, the Haar wa velet uniquely maximizes the DC response | G (0) | : | G (0) | = | g 0 + g 1 | = | cos θ + sin θ | . (47) This is maximized when cos θ = sin θ , i.e., θ = π / 4 , yielding g 0 = g 1 = 1 √ 2 , | G (0) | = √ 2 . (48) Step 4 (Uniqueness): Combining orthogonality (45) with the symmetry requirement g 0 = g 1 giv es the unique solution g 0 = g 1 = 1 √ 2 , h 0 = g 1 = 1 √ 2 , h 1 = − g 0 = − 1 √ 2 , (49) corresponding to the Haar filters G ( z ) = 1 √ 2 (1 + z − 1 ) , H ( z ) = 1 √ 2 (1 − z − 1 ) , (50) with polyphase matrix U Haar = 1 √ 2  1 1 1 − 1  . (51) Conclusion: Therefore, for two-tap filters, the QMF-paraunitary family forms a one-parameter manifold M = { U ( θ ) : θ ∈ [0 , 2 π ) } with U ( θ ) =  cos θ sin θ sin θ − cos θ  . The Haar filter bank corresponds to θ = π / 4 , yielding the computationally simplest coefficients g 0 = g 1 = 1 / √ 2 and uniquely maximizing the DC gain | G (0) | = √ 2 among all members of M . This makes Haar the canonical choice for initialization, while the learnable angles θ ℓ explored in this work span the full QMF-paraunitary f amily . Remark (Relationship to QMF). F or two-tap filters, the QMF-paraunitary f amily forms a one-parameter manifold within the reflection component of O (2) (i.e., det( U ) = − 1 ). W ith Haar initialization ( θ ℓ = π / 4 ) and polar projection, the learned filters remain in this QMF-paraunitary family throughout training. The PSNR gains in Figs. 5a–5f arise from learning optimal rotation angles θ ℓ  = π / 4 that better match trace-specific LRD statistics, while preserving both QMF structure and perfect reconstruction. Extending to rotations ( det = +1 ) would exit the QMF family; longer filters ( N > 2 ) would enlarge the design space further . References [1] W . Leland, M. T aqqu, W . Willinger , and D. W ilson, “On the self-similar nature of Ethernet traffic (extended version), ” IEEE/A CM T ransactions on Networking , vol. 2, no. 1, pp. 1–15, 1994. [2] V . Paxson and S. Floyd, “Wide-Area Traf fic: The Failure of Poisson Modeling, ” in Pr oceedings of the A CM SIGCOMM ’94 . A CM, 1994, pp. 257–268. [3] G. Mill ´ an, “On the LRD of the aggregated traf fic flows in high-speed computer networks, ” arXiv pr eprint arXiv:2103.03981 , 2021. [4] I. Norros, “On the use of fractional Brownian motion in the theory of connectionless networks, ” IEEE Journal on Selected Ar eas in Communications , vol. 13, no. 6, pp. 953–962, 1995. [5] M. Parulekar and A. M. Mako wski, “T ail probabilities for a multiplexer with self-similar traf fic, ” in Pr oceedings of IEEE INFOCOM’96. Confer ence on Computer Communications , vol. 3. IEEE, 1996, pp. 1452–1459. [6] R. Boutaba, M. A. Salahuddin, N. Limam, S. A youbi, N. Shahriar , F . Estrada-Solano, and O. M. Caicedo, “ A comprehensiv e survey on machine learning for networking: ev olution, applications and research opportunities, ” Journal of Internet Services and Applications , v ol. 9, no. 1, pp. 1–99, 2018. 21 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 [7] O. Aouedi, V . A. Le, K. Piamrat, and Y . Ji, “Deep learning on network traffic prediction: Recent advances, analysis, and future directions, ” ACM Computing Surve ys , vol. 57, no. 6, pp. 1–37, 2025. [8] M. S. T aqqu, W . Willinger , and R. Sherman, “Proof of a fundamental result in self-similar traffic modeling, ” ACM SIGCOMM Computer Communication Revie w , vol. 27, no. 2, pp. 5–23, 1997. [9] W . W illinger , V . Paxson, and M. S. T aqqu, “Self-similarity and heavy tails: structural modeling of network traf fic, ” A practical guide to heavy tails: statistical techniques and applications , v ol. 23, no. 1, pp. 27–53, 1998. [10] I. Daubechies, T en lectur es on wavelets . SIAM, 1992. [11] P . Abry , D. V eitch, and P . Flandrin, “Long-range dependence: Revisiting aggre gation with wav elets, ” J ournal of T ime Series Analysis , v ol. 19, no. 3, pp. 253–266, 1998. [12] S. Mallat, A W avelet T our of Signal Pr ocessing: The Sparse W ay , 3rd ed. Amsterdam, Boston: Academic Press, 2009. [13] D. Szostak, A. Włodarczyk, and K. W alkowiak, “Machine learning classification and regression approaches for optical network traffic prediction, ” Electr onics , vol. 10, no. 13, p. 1578, 2021. [Online]. A vailable: https://www .mdpi.com/2079- 9292/10/13/1578 [14] I. Lohrasbinasab, A. Shahraki, A. T aherkordi, and A. Delia Jurcut, “From statistical- to machine learning-based network traf fic prediction, ” T ransactions on Emer ging T elecommunications T echnologies , v ol. 33, no. 4, p. e4394, 2022. [Online]. A vailable: https://onlinelibrary .wiley .com/doi/abs/10.1002/ett.4394 [15] J. W ang, Z. W ang, J. Li, and J. W u, “Multilevel wavelet decomposition network for interpretable time series analysis, ” in Pr oceedings of the 24th ACM SIGKDD International Confer ence on Knowledge Discovery & Data Mining , 2018, pp. 2437–2446. [16] H. Khan and B. Y ener , “Learning filter widths of spectral decompositions with wav elets, ” Advances in Neural Information Pr ocessing Systems , vol. 31, 2018. [17] W . Ha, C. Singh, F . Lanusse, S. Upadhyayula, and B. Y u, “ Adaptiv e wav elet distillation from neural networks through interpretations, ” Advances in Neural Information Pr ocessing Systems , vol. 34, pp. 20 669–20 682, 2021. [18] M. W olter and J. Garcke, “ Adaptiv e wavelet pooling for con v olutional neural networks, ” in International Confer- ence on Artificial Intelligence and Statistics . PMLR, 2021, pp. 1936–1944. [19] G. Michau, G. Frusque, and O. Fink, “Fully learnable deep wav elet transform for unsupervised monitoring of high-frequency time series, ” Pr oceedings of the National Academy of Sciences , vol. 119, no. 8, p. e2106598119, 2022. [20] M. Lezcano-Casado and D. Mart ´ ınez-Rubio, “Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group, ” in Pr oceedings of the 36th International Confer ence on Machine Learning , ser . Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov , Eds., vol. 97. PMLR, 2019, pp. 3794–3803. [Online]. A vailable: ht t p s : //proceedings.mlr .press/v97/lezcano- casado19a.html [21] H. Sato, Riemannian optimization and its applications . Springer , 2021, vol. 670. [22] N. Boumal, An intr oduction to optimization on smooth manifolds . Cambridge, UK: Cambridge University Press, 2023. [23] Y . Fei, Y . Liu, C. Jia, Z. Li, X. W ei, and M. Chen, “ A survey of geometric optimization for deep learning: from Euclidean space to Riemannian manifold, ” ACM Computing Surve ys , vol. 57, no. 5, pp. 1–37, 2025. [24] G. V idal, “Entanglement renormalization, ” Physical Review Letter s , vol. 99, no. 22, p. 220405, 2007. [25] R. Or ´ us, “ A practical introduction to tensor networks: matrix product states and projected entangled pair states, ” Annals of Physics , vol. 349, pp. 117–158, 2014. [26] G. Evenbly and S. R. White, “Entanglement renormalization and wa velets, ” Physical Review Letter s , vol. 116, no. 14, p. 140403, 2016. [27] J. Haegeman, B. Swingle, M. W alter , J. Cotler , G. Evenbly , and V . B. Scholz, “Rigorous free-fermion entanglement renormalization from wa velet theory , ” Physical Review X , v ol. 8, no. 1, p. 011003, 2018. [28] M. Griev es, “Digital twin: Manufacturing excellence through virtual f actory replication, ” pp. 1–7, 03 2014. [Online]. A vailable: https://www .researchgate.net/publication/275211047 Digital Twin Manufacturing Excelle nce through Virtual Factory Replication [29] C. Zhou, H. Y ang, and X. Duan, “Concepts of Digital T win Network, ” Internet Engineering T ask Force, Internet-Draft draft-zhou-nmrg-digitaltwin-netw ork-concepts-00, 2020, work in Progress. [Online]. A v ailable: https://datatracker .ietf.org/doc/draft- zhou- nmrg- digitaltwin- network- concepts/00/ 22 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 [30] C. Zhou, H. Y ang, X. Duan, D. Lopez, A. Pastor , Q. W u, M. Boucadair , and C. Jacquenet, “Network Digital T win: Concepts and Reference Architecture, ” Internet Engineering T ask Force, Internet-Draft draft-irtf-nmrg-network-digital-twin-arch-10, 2025, work in Progress. [Online]. A vailable: https://datatracker .ietf.org/doc/draft- irtf- nmrg- network- digital- twin- arch/10/ [31] X. Hesselbach and X. Calle-Heredia, “Digital T win Networks requirements: T o wards an ultra-reliable infrastruc- ture, ” in 2025 25th Anniversary International Confer ence on T ranspar ent Optical Networks (ICTON) . IEEE, 2025, pp. 1–4. [32] M. T ariq, F . Naeem, and H. V . Poor, “T o ward experience-dri ven traf fic management and orchestration in digital- twin-enabled 6G networks, ” arXiv pr eprint arXiv:2201.04259 , 2022. [33] N. P . Kuruv atti, M. A. Habibi, S. Partani, B. Han, A. Fellan, and H. D. Schotten, “Empowering 6G Communication Systems W ith Digital T win T echnology: A Comprehensive Surv ey, ” IEEE Access , vol. 10, pp. 112 158–112 186, 2022. [34] Z. W ang, D. Jiang, and S. Mumtaz, “Network-W ide Data Collection Based on In-Band Network T elemetry for Digital T win Networks, ” IEEE T ransactions on Mobile Computing , vol. 24, no. 1, pp. 86–101, 2025. [35] WIDE Project and MA WI W orking Group, “MA WI working group traffic archiv e (WIDE project), ” [Online]. A vailable: https://mawi.wide.ad.jp/mawi/, 2026, accessed: Jan. 21, 2026. [36] G. V idal, “Class of quantum many-body states that can be ef ficiently simulated, ” Physical Review Letters , v ol. 101, no. 11, p. 110501, 2008. [37] J. A. Reyes and E. M. Stoudenmire, “Multi-scale tensor network architecture for machine learning, ” Machine Learning: Science and T echnology , v ol. 2, no. 3, p. 035036, 2021. [38] P .-A. Absil, R. Mahon y , and R. Sepulchre, Optimization algorithms on matrix manifolds . Princeton, NJ: Princeton Univ ersity Press, 2008. [39] P . P . V aidyanathan, Multir ate systems and filter banks . Englewood Clif fs, NJ: Prentice Hall, 1993. [40] M. Smith and T . Barnwell, “Exact reconstruction techniques for tree-structured subband coders, ” IEEE T ransac- tions on Acoustics, Speech, and Signal Pr ocessing , v ol. 34, no. 3, pp. 434–441, 1986. [41] G. Evenbly , “MERA julia code example, ” https://www .tensors .net/mera, T ensors.net, site maintained by Glen Evenbly . Accessed: Nov . 6, 2025. [42] D. P . Kingma and J. Ba, “ Adam: A method for stochastic optimization, ” arXiv pr eprint arXiv:1412.6980 , 2014. [43] I. Goodfello w , Y . Bengio, and A. Courville, Deep Learning . MIT Press, 2016, http://www .deeplearningbook.org. 23

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment