Digital Twin--Driven Adaptive Wavelet Strategy for Efficient 6G Backbone Network Telemetry

D I G I T A L T W I N – D R I V E N A DA P T I V E W A V E L E T S T R A T E G Y F O R E FFI C I E N T 6 G B A C K B O N E N E T W O R K T E L E M E T RY Alexandre Barbosa de Lima Pontiﬁcal Catholic Univ ersity of S ˜ ao Paulo, Brazil ablima@pucsp.br Xavier Hesselbach Univ ersitat Polit ` ecnica de Catalunya, Spain xavier.hesselbach@upc.edu Jos ´ e Roberto de Almeida Amazonas Univ ersity of S ˜ ao Paulo, Brazil jose.amazonas@usp.br February 24, 2026 A B S T R A C T Classical orthogonal wa velets guarantee perfect reconstruction but rely on ﬁx ed bases optimized for polynomial smoothness, achie ving suboptimal compression on signals with fractal spectral signatures. Con versely , learned methods offer adaptivity but typically enforce orthogonality via soft penalties, sacriﬁcing structural guarantees. This work establishes a rigorous equi valence between Multiscale Entanglement Renormalization Ansatz (MERA) tensor networks and paraunitary ﬁlter banks. The resulting framew ork learns adaptiv e wa velets while enforcing exact orthogonality through manifold-constrained optimization, guaranteeing perfect reconstruction and energy conserv ation throughout training. V alidation on Long-Range Dependent (LRD) network traf ﬁc demonstrates that learned ﬁlters outper- form classical wav elets by 0.5–3.8 dB PSNR on six MA WI backbone traces (2020–2025, 314 Mbps– 1.75 Gbps) while preserving the Hurst exponent within estimation uncertainty ( | ∆ H | ≤ 0 . 03 ). These results establish MERA-inspired wa velets as a principled approach for telemetry compression in 6G digital twin synchronization. Keyw ords: Digital twin synchronization, adaptive w avelets, semantic telemetry , 6G networks, long-range dependence, paraunitary ﬁlter banks, network telemetry compression. 1 Introduction Network traf ﬁc in future 6G systems is expected to exhibit long-range dependence (LRD), characterized by power -law correlation decay: ϕ ( k ) ∼ k − β , 0 < β < 1 [1 – 3]. This fractal structure has profound implications for network management: buf fer overﬂo w probabilities decay polynomially – not exponentially – with buf fer size, fundamentally challenging classical queueing models [4, 5]. For digital twins (DT) driving closed-loop optimization, capturing these self-similar dynamics is not merely a statistical ex ercise but a prerequisite for stability . As highlighted in surveys on machine learning for networking [6, 7], these dynamics call for adaptiv e multiscale representations capable of capturing trafﬁc correlations across scales while preserving ph ysical interpretability and robustness. This work focuses on backbone aggre gation telemetry , where trafﬁc from thousands of edge cells con ver ges into high-capacity core links. Statistical aggregation theorems establish that superposition of heterogeneous sources with heavy-tailed distrib utions preserves or ampliﬁes LRD at such aggre gation scales [8, 9], making backbone traces a natural testbed for v alidating LRD-preserving transforms. This setting motiv ates the de velopment of adapti ve wa velets that exploit traf ﬁc-speciﬁc correlation structures while maintaining the mathematical guarantees (perfect reconstruction, energy conserv ation) required for reliable signal processing. While 6G networks will encompass heterogeneous access technologies – from millimeter-wa ve massi ve MIMO to satellite non-terrestrial networks – wireless edge telemetry , with its distinct statistical properties induced by channel fading and mobility , represents a complementary challenge identiﬁed as future work (Section 8). A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 While the focus of this study is on backbone aggreg ation telemetry rather than access-level radio traf ﬁc, this choice is deliberate. Digital twin synchronization fundamentally depends on preserving the statistical in v ariants of aggregated trafﬁc ﬂo ws – most notably LRD – which arise from multiplexing heterogeneous sources and persist independently of the underlying access technology (4G, 5G, or beyond). As such, backbone telemetry provides a technology-agnostic and representativ e testbed for validating synchronization-preserving compression mechanisms. The discrete wavelet transform (D WT) remains a cornerstone for multiscale analysis of LRD traf ﬁc [10 – 12]. Con- ventional orthonormal wavelets provide the mathematical safety net required for control systems – offering perfect reconstruction (PR) 1 and Parse val ener gy conservation. Ho wever , they rely on ﬁx ed, a priori designed ﬁlter banks (e.g., Haar , Daubechies) that cannot adapt to the ev olving correlation structure of real network trafﬁc, leading to suboptimal compression. Con versely , recent machine learning approaches introduce data-dri ven adaptability through neural or statistical representations [6, 7, 13, 14]. Y et, these “black-box” methods often relax structural guarantees, introducing approximation errors that can lead to unpredictable behavior under perturbations or resource constraints. This tension between data-dri ven adaptability and mathematical rigor deﬁnes the central challenge for application-aw are telemetry: can we learn multiscale r epresentations that r emain orthonormal and pro vably stable while adapting to the statistics of network trafﬁc? Existing learned transforms generally fall into three categories: (i) unconstrained models that abandon orthogonality for ﬂexibility , losing PR guarantees [15, 16]; (ii) soft-constrained approaches that enforce properties via loss penalties, which hold only approximately and require laborious hyperparameter tuning [17, 18]; or (iii) structural methods that impose conjugate quadrature ﬁlter (CQF) constraints [19] but do not guarantee exact orthogonality at intermediate training steps. Such approximations are insufﬁcient for mission-critical DT : Parse val violations corrupt energy budgets, and imperfect reconstruction distorts the trafﬁc’ s LRD signature, degrading the twin’ s predictiv e stability . Lezcano-Casado and Mart ´ ınez-Rubio [20] do maintain e xact orthogonality via exponential parametrization, b ut their frame work targets recurrent neural netw orks (RNN) for temporal sequence modeling rather than multiscale signal decomposition. Although neural autoencoders achieve impressi ve rate-distortion on generic signals, they lack the interpretable multiscale structure and LRD-preserv ation guarantees that network state synchronization requires. This work answers afﬁrmati vely by introducing a manifold-constrained optimization scheme where the frame work enforces orthogonality at e very training iteration through polar projection onto the orthogonal manifold [21 – 23], ensuring PR and Parse val ener gy conservation hold throughout learning. The mathematical foundation draws from the multiscale entanglement renormalization ansatz (MERA) [24, 25] – a hierarchical tensor network (TN) from quantum many-body physics, reformulated here as a trainable cascade of local 2 × 2 orthogonal transformations. While the physics literature has treated wav elets as a mathematical analogy when synthesizing quantum states [26, 27], this work in verts the paradigm: by imposing constraints on MERA tensors and interpreting the resulting decomposition as a learnable ﬁlter bank, Theorem 1 (Section 5) establishes that MERA layers are mathematically equiv alent to two-channel paraunitary ﬁlter banks at ev ery decomposition le vel. This equi valence is not approximate or asymptotic – it holds exactly , enabling a frame work that uniﬁes data-driv en adaptability with the mathematical rigor necessary for reliable closed-loop operation. W ith polar projection ensuring orthogonality at ev ery training iteration, the frame work guarantees energy preservation and in vertibility at all scales while retaining full adaptability to data statistics. When used for compression, it yields interpretable rate-distortion trade-of fs by retaining a fraction ρ of coefﬁcients, of fering empirical validation of its ener gy compaction properties. Having moti vated the need for adapti ve wa velets with structural guarantees, the main contrib utions of this work are summarized as follows: • A formal equiv alence between MERA TN and orthonormal paraunitary wav elet ﬁlter banks is established (Theorem 1), bridging concepts from quantum many-body theory and multirate signal processing. • A learning frame work operating directly on the Stiefel manifold O (2) (the group of 2 × 2 orthogonal matrices) is introduced, enforcing PR and Parse val energy preserv ation via polar projection at every iteration. This eliminates the approximation errors inherent to soft-penalty methods [15, 18] and ensures orthogonality throughout training, unlike CQF-based approaches [19] that allo w intermediate coef ﬁcient drift or exponential parametrizations [20] designed for RNN stability . • Experimental validation on six real-world backbone traf ﬁc traces spanning 2020–2025 (314 Mbps–1.75 Gbps) demonstrates 0.5–3.8 dB Peak Signal-to-Noise Ratio (PSNR) gains over ﬁxed w avelet bases while preserving Hurst exponents within 95% conﬁdence interv als at 90% compression, establishing superior LRD retention. Roadmap The remainder of this paper is organized as follo ws: 1 PR is the property that ˆ x [ n ] = x [ n ] with no aliasing or distortion. 2 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 • Section 2 : DT telemetry compression – requirements, bottleneck analysis, and scope. • Section 3 : Mathematical foundations. • Section 4 : MERA-inspired wavelet architecture. • Section 5 : Equiv alence to paraunitary ﬁlter banks (Theorem 1). • Section 6 : Learning framework with manifold optimization. • Section 7 : V alidation on real backbone traces. • Section 8 : Conclusion and future directions. 2 Digital T win Synchronization: A Lay ered Perspecti ve This section establishes the context and requirements that moti vate the proposed framew ork. Fig. 1 illustrates the layered architecture, highlighting the telemetry compression layer addressed by this work. 2.1 The Network Digital T win Paradigm Originally proposed in [28] as a digital representation to support the design and de velopment of manufactured components, the concept of a Digital T win (DT) has e volved into an essential component to wards the ne xt generation networks and services. The DT paradigm refers to the construction of continuously updated digital counterparts capable of mirroring the behavior and state of physical entities (or e ven purely virtual, or a physical-virtual hybrid entity). DT methodologies ha ve been e xtended to the domain of communication infrastructures, emer ging the Digital T win Network (DTN) architectures [29] and the Network Digital T win (NDT) instances [30]. An NDT constitutes a virtualized replica of an Original Network (ON), whether physical or virtual, and remains tightly coupled with it through information exchange that enables near real-time state synchronism. Therefore, NDTs are able to support advanced functionalities such as online simulation, network design, optimization, and AI-dri ven control and orchestration mechanisms, which can be exploited by the ON to enhance operational ef ﬁciency and overall performance. Moreov er, the ﬂexible and interoperable design of NDTs makes them suitable for deploying ne w network services. An NDT does not require dedicated physical equipment to be realized, but it can be instantiated either on speciﬁc computing resources or through virtualized resources and service infrastructures. Fixed hardware solutions can guarantee an e xact replica of the original, at the same cost. In comparison, virtualization-based approaches enable signiﬁcantly more dynamic control and allow the NDT to be tailored more easily to the requirements, and usually with a reduced cost. Network Digital T wins represent an emerging paradigm for 6G network management, where a dynamic virtual replica of the physical infrastructure enables simulation-based optimization, capacity planning, and “what-if ” analysis before deploying changes to production systems [31 – 34]. The DT continuously ingests telemetry data from the physical network – capturing trafﬁc characteristics, queue states, and resource utilization – to maintain synchronization between the virtual model and real-world dynamics [32]. T ypically , a single Digital T win is associated with an Original, say 1:1. Howe ver , multiple Digital T wins can also be instantiated in parallel (say 1:N), forming a DT farm in which each instance can be focused on analyzing dif ferent aspects of the system or e valuating alternativ e strategies in order to be compared. So, a DT farm enables the distribution of analytical tasks across se veral specialized DT instances. Each Digital T win can be tailored to focus on a particular target. DT f arms allo w the analysis and comparison of alternati ve strategies under identical baseline conditions. Because all DTs originate from the same synchronized state of the original, their outcomes can be compared without af fecting the source, reducing risks and accelerating the analysis results. From a DT perspectiv e, backbone telemetry constitutes the dominant synchronization bottleneck, as it aggregates traf ﬁc originating from radio, edge, and core domains into a uniﬁed stochastic process. As a result, distortions introduced at this layer propagate directly into the virtual model, affecting the ﬁdelity of do wnstream simulation and optimization tasks. The effecti veness of a DT hinges on synchr onization ﬁdelity : the degree to which the virtual model accurately reﬂects the statistical and temporal properties of the physical network. This ﬁdelity directly impacts the reliability of simulations used for critical decisions such as congestion control, routing optimization, and service-lev el agreement (SLA) enforcement. High-ﬁdelity synchronization requires continuous telemetry ingestion at temporal resolutions sufﬁcient to capture traf ﬁc dynamics ranging from millisecond-scale microbursts to hour-scale session patterns. The backbone traces employed in this work (Section 7, T able 1) illustrate typical data volumes: at 1 ms sampling granularity , monitoring hundreds of 3 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Digital T win A pplication Layer Simulation, Optimization, What-if Analysis DT Synchronization Layer State Updates, Model Calibration T elemetry Compression Layer (This W ork) MERA-W avelet Codec: Adaptiv e, LRD-Preserving Data Collection Layer In-band Network T elemetry , Streaming T elemetry Physical Network Layer Routers, Switches, Links, Queues Figure 1: Layered architecture for NDT synchronization. The telemetry compression layer (highlighted) provides the interface between raw netw ork measurements and the virtual model. This work contrib utes the adaptiv e MERA-wavelet codec operating at this layer . concurrent links can generate gig abytes of raw telemetry per collection cycle 2 . While such overhead is negligible in ov erprovisioned core networks, it becomes relev ant in bandwidth-constrained scenarios such as satellite backhaul or disaggregated RAN fronthaul [34]. Beyond v olume reduction, a more fundamental requirement is statistical ﬁdelity : compression schemes must preserve the in variants that go vern network performance models – most critically , the LRD structure of trafﬁc (Section 7.1). This motiv ates the de velopment of adaptiv e transforms that maintain structural guarantees while exploiting signal-speciﬁc correlations. 2.2 T echnical Requirements and Scope Delimitation This work addresses the telemetry compression layer (Fig. 1) within the broader DT synchronization pipeline. The contribution is a signal processing solution that operates on time-series telemetry streams (e.g., byte-rate samples at 1 ms granularity) and produces compressed representations suitable for transmission to DT infrastructure. The compression method must simultaneously achiev e: 1. Rate-Distortion Efﬁciency: Maximize reconstruction ﬁdelity (PSNR) under bandwidth constraints. 2. Statistical Fidelity: Preserve the Hurst e xponent H within estimator conﬁdence intervals, ensuring that decompressed telemetry retains the LRD structure necessary for accurate queueing analysis. 3. Structural Guarantees: Provide PR and Parse val ener gy conservation for predictable, deterministic behavior in mission-critical operations. Standard W a velets vs. Learned Appr oaches Classical orthogonal wav elets (Haar , Daubechies, Coiﬂets) satisfy requirement (3) through their paraunitary properties b ut achiev e suboptimal performance on requirements (1) and (2) due to ﬁxed ﬁlter designs optimized for polynomial smoothness rather than po wer-law correlations. Con versely , fully learned neural approaches (e.g., autoencoders) may e xcel at (1) but sacriﬁce (3) by relaxing orthogonality constraints, introducing approximation errors incompatible with safety-critical DT applications. The proposed MERA-inspired adaptiv e w avelets reconcile this trade-of f by learning ﬁlter banks from data while maintaining paraunitary guarantees through manifold-constrained optimization (Section 6). By adapting to the speciﬁc spectral characteristics of backbone traf ﬁc, the method achieves superior ener gy compaction (up to 3.8 dB PSNR gain ov er ﬁxed wav elets, Section 7) while preserving H within 95% conﬁdence intervals at 90% compression (Section 7.4). 2 For example, 1 ms sampling over 15-minute windows yields ∼ 9 × 10 5 samples per trace. At 64-bit precision, a single link generates ∼ 7.2 MB per interval; scaling to 200 links produces ∼ 1.4 GB per c ycle [33]. 4 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Architectural P ositioning The codec is DT -agnostic: it interfaces with any simulator or emulator that ingests trafﬁc time series as input – including discrete-event simulators (e.g., NS-3), queueing models, ﬂuid-ﬂow approximations, or hardware-in-the-loop testbeds (e.g., MININET). Rather than prescribing a particular DT architecture, it provides a reusable compression module that preserves the statistical properties required across di verse modeling frame works. V alidation Strategy Follo wing standard practice in source coding research – where video codecs are validated using rate-distortion metrics on benchmark datasets without implementing full streaming protocol stacks – the proposed codec is v alidated using PSNR (rate-distortion ef ﬁciency) and Hurst e xponent deviation | ∆ H | (statistical ﬁdelity) on six years of MA WI backbone traces [35] spanning 314 Mbps–1.75 Gbps with H ∈ [0 . 77 , 0 . 93] . Baselines include classical orthogonal wa velets: Haar , Daubechies-4, Coiﬂet-3, Symmlet-8, and Biorthogonal-4.4. This validation demonstrates that compressed telemetry retains the statistical properties necessary for downstream DT models, independent of speciﬁc simulator implementations. Integration with full DT frame works – including topology modeling, routing protocols, and closed-loop control – represents important future work (Section 8) but f alls outside the scope of this signal processing contribution. 3 Mathematical Background This section establishes the mathematical foundations underlying the proposed framework: orthogonal transformations that preserve signal energy , the MERA TN architecture that org anizes these transformations hierarchically , and the Stiefel manifold on which constrained optimization is performed to maintain orthogonality throughout learning. 3.1 Unitary and Orthogonal T ransformations Deﬁnition 1 (Unitary T ransformation) . A linear operator U : H → H on a comple x inner pr oduct space H is unitary if it pr eserves inner products: ⟨ U x, U y ⟩ = ⟨ x, y ⟩ for all x, y ∈ H , equivalently U † U = I , wher e † denotes the conjugate transpose (Hermitian adjoint). Deﬁnition 2 (Orthogonal T ransformation) . F or r eal-valued spaces H = R n , a matrix U ∈ R n × n is orthogonal if U † U = I . The set of all such matrices forms the orthogonal gr oup O ( n ) = { U ∈ R n × n | U † U = I } . Remark 1 (Notational Con vention) . The dagg er symbol A † denotes the conjugate transpose, following con ventions in quantum TN [26, 36]. F or r eal matrices, A † = A T . This notation is r etained thr oughout to emphasize the structural connection to MERA formalism, while acknowledging that U † = U T in the r eal-valued implementation ( U ℓ ∈ O (2) ). Orthogonality ensures three critical properties for adapti ve wa velets: (i) energy conservation via the Parse val identity , (ii) PR through U † , and (iii) numerical stability under composition ( ∥ U ∥ = 1 ). These guarantees are maintained throughout optimization via polar projection onto O (2) (Section 6), distinguishing the proposed framework from approaches where orthogonality is imposed only approximately [15, 19]. 3.2 MERA T ensor Networks MERA TN were introduced by V idal [24] to efﬁciently represent quantum systems e xhibiting scale-i nv ariant correlations with po wer-law decay – a property that directly parallels LRD in network traf ﬁc. MERA organizes computation into hierarchical layers, each applying: 1. Disentanglers: local unitary transformations removing short-range correlations before coarse-graining. 2. Isometries: linear maps satisfying U † U = I that reduce degrees of freedom (typically by factor tw o) while preserving large-scale structure. This alternating disentangle–coarsen procedure across L layers directly parallels dyadic wa velet decomposition: each MERA layer corresponds to a resolution le vel, with isometries playing the role of analysis ﬁlters. As shown by Reyes and Stoudenmire [37], MERA can learn hierarchical correlations across resolutions, bridging quantum renormalization with deep-learning principles. Section 5 formalizes the equiv alence between MERA layers and paraunitary ﬁlter banks (Theorem 1), enabling adaptiv e wav elets with exact PR and energy conserv ation guarantees. 3.3 The Stiefel Manifold The orthogonality requirements of paraunitary ﬁlter banks frame learning as a constrained optimization problem on smooth manifolds. Speciﬁcally , the Stiefel manifold St( n, k ) is deﬁned as the set of matrices with orthonormal columns: St( n, k ) = { U ∈ C n × k | U † U = I k } . (1) 5 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Standard optimizers (SGD, Adam) compute updates in ambient Euclidean space. A linear update e U t +1 = U t − η ∇L ( U t ) (2) generally violates orthogonality , since the gradient may contain a nonzero component normal to the manifold. T o maintain structural integrity , the Euclidean step is followed by a polar retraction [38]: U t +1 = R ( e U t +1 ) = e U t +1  e U † t +1 e U t +1  − 1 / 2 , (3) which projects onto St( n, k ) , ensuring exact orthogonality at e very iteration (Section 6). It is emphasized that the architecture considered in this work corresponds to a disentangler-free, tree-structured MERA- inspired network, rather than a full MERA in the strict tensor-network-theoretic sense. This restriction is deliberate, as it preserves the e xact paraunitary structure required for perfect reconstruction. 4 Mera-Inspired W a velet Ar chitecture This section introduces a uniﬁed fr amework for adapti ve orthonormal w avelets tailored to LRD network traf ﬁc. The key idea is to reinterpret the MERA architecture as structured parameterizations of paraunitary ﬁlter banks, thereby enabling learnable multiscale representations that remain orthonormal by design. Section 4.1 moti vates the need for adaptiv e multiscale models in the presence of LRD traf ﬁc, while Section 4.2 introduces the MERA-inspired orthogonal layers (Deﬁnition 3) that provide the architectural foundation. 4.1 Adaptive Multiscale Models f or LRD T rafﬁc An accurate characterization of the Hurst exponent H from traf ﬁc measurements is critical for capacity planning, queueing analysis, and DT synchronization in 6G systems. The DWT pro vides a natural framework for both analyzing and representing LRD signals through multiresolution decomposition, whose hierarchical structure directly mirrors the scale-in variant correlation patterns characteristic of fractal network traf ﬁc. Fig. 2 illustrates the Mallat pyramid: at each scale ℓ ( ℓ = 1 , 2 , 3 ), the signal is recursiv ely split into approximation coefﬁcients a ℓ (low-pass ﬁlter g ) and detail coefﬁcients d ℓ (high-pass ﬁlter h ), with downsampling by tw o ( ↓ 2 ) at each stage. This dyadic decomposition not only aligns naturally with the self-similar structure of LRD processes but also enables robust Hurst exponent estimation via wa velet variance scaling [11], making w avelets the standard tool for LRD traf ﬁc analysis and compression. Figure 2: Multiresolution analysis (MRA) illustrating recursiv e approximation/detail splitting with decimation by two. The input discrete-time signal x n is successi vely ﬁltered by the lo w-pass ﬁlter g (scaling function) and high-pass ﬁlter h (wa velet function), followed by do wnsampling by a factor of two. The approximation stream propagates upward through all levels, while the detail streams are extracted at each corresponding scale. At each stage, the signal length is halved ( N → N/ 2 → N / 4 → N / 8 ), forming the dyadic tree structure characteristic of the D WT [12]. Despite this structural alignment, DWT relies on ﬁx ed, pre-designed ﬁlter banks (Haar, Daubechies, Coiﬂets, Symmlets) optimized for generic smoothness assumptions 3 [12]. While these bases provide rigorous guarantees, their ﬁlters are 3 Mallat [12] sho ws that classical wav elets achieve optimal approximation rates for functions in Besov spaces – those with bounded deriv ativ es admitting local polynomial approximations. Network traf ﬁc violates this assumption due to its impulsive, fractal structure. 6 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 hand-crafted to maximize vanishing moments, not to capture the power -law correlations ϕ ( k ) ∼ k − β characteristic of LRD traf ﬁc. Consequently , ﬁx ed wa velets achie ve suboptimal energy compaction on backbone traces: detail coefﬁcients retain signiﬁcant energy that could be concentrated into approximations, degrading rate-distortion performance under bandwidth constraints. The central challenge is thus to learn wavelet ﬁlters adapted to traf ﬁc-speciﬁc correlation structures while preserving the mathematical guarantees that make wa velets reliable for mission-critical telemetry . 4.2 MERA-Inspired Orthogonal Lay ers A hierarchical architecture inspired by MERA TN is introduced to pro vide a structured parameterization of orthonor- mal wav elets. The design is guided by three principles: (1) Locality – transformations operate on disjoint pairs; (2) Orthogonality – all operations preserve inner products; and (3) Hierar chy – layers correspond to dyadic scales. Deﬁnition 3 (MERA-Inspired Orthogonal Layer) . Let x ∈ R N be a discr ete-time signal with N = 2 L , wher e N is the number of samples and L is the maximum decomposition le vel. A MERA layer at scale ℓ applies a 2 × 2 orthogonal matrix U ℓ ∈ O (2) to disjoint pairs of samples (implicit downsampling by two ( ↓ 2 )): " a ( ℓ ) k d ( ℓ ) k # = U ℓ  x 2 k x 2 k +1  , k = 0 , . . . , N 2 ℓ − 1 , (4) wher e the outputs a = { a ( ℓ ) k } and d = { d ( ℓ ) k } have length N / 2 ℓ each and ar e the decimated appr oximation and detail coefﬁcients, r espectively . Example 1 (MERA Layer Computation) . Consider input x = [1 , 2 , 3 , 4] ⊤ and orthogonal matrix U 1 = 1 √ 2  1 1 1 − 1  (Haar). Applying Eq. (4) : " a (1) 0 d (1) 0 # = U 1  x 0 x 1  = 1 √ 2  1 1 1 − 1   1 2  = 1 √ 2  3 − 1  " a (1) 1 d (1) 1 # = U 1  x 2 x 3  = 1 √ 2  1 1 1 − 1   3 4  = 1 √ 2  7 − 1  Y ielding appr oximation a = 1 √ 2 [3 , 7] ⊤ and detail d = 1 √ 2 [ − 1 , − 1] ⊤ . Ener gy conservation: ∥ x ∥ 2 = 30 = ∥ a ∥ 2 + ∥ d ∥ 2 = 29 + 1 = 30 ✓ Orthogonality U † ℓ U ℓ = I (Deﬁnition 2) ensures (i) local pairwise energy conservation a 2 k + d 2 k = x 2 2 k + x 2 2 k +1 ; (ii) Parse val identity ∥ x ∥ 2 = ∥ a ( L ) ∥ 2 + P L ℓ =1 ∥ d ( ℓ ) ∥ 2 ; and (iii) PR via U † ℓ . A complete L -lev el ( L = 4 ) cascade (Fig. 3) applies these layers recursiv ely , producing { a ( L ) , d ( L ) , . . . , d (1) } . The resulting analysis operator A ∈ O ( N ) inherits all guarantees with O ( N ) complexity via decimation, matching f ast wav elet transforms. Summary This section established the MERA-inspired architecture for adapti ve wavelets. Section 4.1 moti vated the need for adapti ve multiscale models tailored to LRD trafﬁc . Section 4.2 introduced the orthogonal layer structure (Deﬁnition 3) that provides hierarchical decomposition while preserving energy conserv ation. Section V establishes the exact mathematical equi valence between these layers and classical paraunitary ﬁlter banks. 5 Equivalence to P araunitary Filter Banks This section establishes the exact equiv alence between MERA-inspired layers and two-channel paraunitary wa velet ﬁlter banks. Section 5.1 introduces the polyphase theory background necessary for this equi valence. Section 5.2 presents the main result (Theorem 1) demonstrating that MERA layers with constant orthogonal matrices are mathematically equiv alent to two-tap paraunitary ﬁlter banks. Section 5.3 formulates the manifold-constrained learning objecti ve. T o- gether , these results provide the theoretical foundation upon which the variational learning and optimization procedures dev eloped in subsequent sections are built. 5.1 Polyphase Theory Backgr ound Deﬁnition 4 (Paraunitary Filter Bank) . A two-channel ﬁlter bank with polyphase matrix E ( z ) =  G 0 ( z ) G 1 ( z ) H 0 ( z ) H 1 ( z )  (5) 7 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Figure 3: MERA-inspired wavelet circuit with four dyadic le vels ( L = 4 ). The input signal samples x 1 , . . . , x 16 feed the ﬁrst layer , which consists of parallel 2 × 2 orthogonal blocks U 1 acting on disjoint pairs. At each level, the approximation outputs a ( ℓ ) propagate upward through the hierarchy , while the detail outputs d ( ℓ ) are extracted at their respectiv e scales. This hierarchical structure parallels the D WT (Fig. 2) but employs learnable transformation blocks U ℓ , ℓ = 1 , 2 , 3 , 4 . is said to be paraunitary if E ( z ) E † ( z − 1 ) = I . (6) This condition ensur es: 1. PR: synthesis ﬁlters ˜ G ( z ) ≜ G ( z − 1 ) and ˜ H ( z ) ≜ H ( z − 1 ) satisfy ˆ x ( z ) = x ( z ) ; 2. P arseval energy conservation: ∥ x ∥ 2 = ∥ a ( L ) ∥ 2 + P L ℓ =1 ∥ d ( ℓ ) ∥ 2 ; 3. F requency-domain po wer complementarity: | G ( ω ) | 2 + | H ( ω ) | 2 = 2 . In multirate signal processing (Fig. 4), PR requires the polyphase matrix (5) to satisfy the paraunitary condition (6) [39]. Intuition: A MERA-inspired layer applies the same 2 × 2 orthogonal matrix U ℓ to disjoint pairs of samples – that is, it intrinsically oper ates in the polyphase domain , where ﬁltering and decimation collapse into a single matrix multiplication. This pairwise block transform is equiv alent to a two-channel paraunitary ﬁlter bank with two-tap ﬁnite impulse response (FIR) analysis ﬁlters. The special case where U ℓ exhibits quadrature mirror ﬁlter (QMF) structure, combined with maximum DC gain, uniquely yields the Haar wa velet 4 . Polyphase Decomposition. In multirate ﬁlter bank theory , the type-1 polyphase decomposition represents a ﬁlter G ( z ) = P n g [ n ] z − n by separating its e ven- and odd-indexed coef ﬁcients. Following the notation of V aidyanathan [39], G ( z ) denotes the full analysis ﬁlter , while G 0 ( z ) and G 1 ( z ) denote its e ven and odd polyphase components, respectively , deﬁned as G ( z ) = G 0 ( z 2 ) + z − 1 G 1 ( z 2 ) , (7) with G 0 ( z ) = P k g [2 k ] z − k and G 1 ( z ) = P k g [2 k + 1] z − k . For a 2 × 2 polyphase matrix, the entries admit two equi valent representations. The ﬁrst expresses the matrix directly in terms of the polyphase components of the analysis ﬁlters, E ( z ) =  G 0 ( z ) G 1 ( z ) H 0 ( z ) H 1 ( z )  , (8) 4 Strictly speaking, the classical QMF structure ( H ( z ) = G ( − z ) ) cannot simultaneously achie ve PR and linear phase with FIR ﬁlters, except for the trivial Haar case, as prov en by V aidyanathan [39]. The ﬁlters employed in this work belong to the class of CQF introduced by Smith and Barnwell [40] – also referred to as paraunitary QMF banks by V aidyanathan. 8 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Figure 4: T wo-channel CQF system. (a): Analysis stage splits input a ℓ − 1 ( m ) via ﬁlters g ( − m ) , h ( − m ) and downsampling by a factor of two ( ↓ 2 ), producing a ℓ ( n ) and d ℓ ( n ) . (b): Synthesis stage upsamples by a factor of two ( ↑ 2 ), ﬁlters via g ( n ) , h ( n ) , and sums to reconstruct a ℓ − 1 ( m ) . PR is characterized in the polyphase domain by a paraunitary matrix, a condition that will be guaranteed by orthogonal U ℓ (Theorem 1). whereas an alternativ e, more structural representation employs generic polyphase entries, E ( z ) =  E 00 ( z ) E 01 ( z ) E 10 ( z ) E 11 ( z )  . (9) Although (8) and (9) are algebraically equi valent, the generic notation in (9) emphasizes the polyphase matrix as the primary structural object. The correspondence between the two notations is gi ven by E 00 ( z ) ≡ G 0 ( z ) , E 01 ( z ) ≡ G 1 ( z ) , E 10 ( z ) ≡ H 0 ( z ) , and E 11 ( z ) ≡ H 1 ( z ) . Applying the polyphase decomposition (7) to both analysis ﬁlters yields: G ( z ) = E 00 ( z 2 ) + z − 1 E 01 ( z 2 ) = G 0 ( z 2 ) + z − 1 G 1 ( z 2 ) , (10) H ( z ) = E 10 ( z 2 ) + z − 1 E 11 ( z 2 ) = H 0 ( z 2 ) + z − 1 H 1 ( z 2 ) . (11) The usefulness of this representation follows from the Noble identities [39], which establish that ﬁltering by H ( z 2 ) followed by decimation by 2 ( ↓ 2 ) is equiv alent to decimation by 2 ( ↓ 2 ) followed by ﬁltering by H ( z ) . This commutation property allows the analysis outputs A ( z ) and D ( z ) to be computed directly in the decimated (polyphase) domain:  A ( z ) D ( z )  = E ( z )  X 0 ( z ) X 1 ( z )  , (12) where X 0 ( z ) and X 1 ( z ) denote the e ven and odd polyphase components of the input X ( z ) = X 0 ( z 2 ) + z − 1 X 1 ( z 2 ) . When E ( z ) ≡ U is constant (i.e., z -independent), so that all polyphase entries satisfy E ij ( z ) ≡ u ij , equations (10) – (11) reduce to length-2 FIR analysis ﬁlters, G ( z ) = u 00 + u 01 z − 1 , H ( z ) = u 10 + u 11 z − 1 . (13) This constant-polyphase structure is precisely the one induced by MERA layers, as demonstrated next. 5.2 Main Equivalence Result Theorem 1 (Architectural Equi valence) . A MERA-inspired layer (Deﬁnition 3) is equivalent to a two-channel parauni- tary ﬁlter bank whose polyphase r epr esentation is a constant orthonormal matrix E ( z ) ≡ U ℓ :  A ( z ) D ( z )  =  g 0 g 1 h 0 h 1   X 0 ( z ) X 1 ( z )  (14) wher e z = e j w , X 0 ( z ) = P k x 2 k z − k , X 1 ( z ) = P k x 2 k +1 z − k ar e e ven/odd polyphase components, and U ℓ =  g 0 g 1 h 0 h 1  ∈ O (2) . The pr oof is given in Appendix A. 9 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Corollary 1 (PR QMF as a special case) . The PR QMF constr aint h [ n ] = ( − 1) n g [ N − 1 − n ] ( N = 2 for a two-tap FIR ﬁlter) [12, 39] arises as a special case of the framework when additional symmetry is imposed. This condition ensur es that the highpass ﬁlter H ( z ) is derived fr om the lowpass G ( z ) thr ough fr equency re versal, ther eby guar anteeing both orthogonality and alias cancellation. Under the assumptions of Theor em 1, imposing the quadratur e-mirr or symmetry with N = 2 gives h [ n ] = ( − 1) n g [1 − n ] ⇐ ⇒ H ( z ) = − z − 1 G ( − z − 1 ) . (15) F or two-tap FIR analysis ﬁlter s G ( z ) = g 0 + g 1 z − 1 and H ( z ) = h 0 + h 1 z − 1 , the QMF r elation yields h 0 = g 1 , h 1 = − g 0 , (16) so that the polyphase matrix takes the form U ℓ =  g 0 g 1 g 1 − g 0  . (17) Furthermor e, the Haar wavelet is the unique r eal two-tap FIR ﬁlter bank satisfying both PR, QMF paraunitarity , and maximal DC gain (equivalently , g 0 = g 1 (see Appendix B for pr oof)). 5.3 Manifold-Constrained Lear ning Objective This subsection formulates the learning problem associated with the MERA-inspired paraunitary architecture. The optimization objecti ve is to learn scale-dependent orthogonal transformations that maximize ener gy compaction in LRD trafﬁc while guaranteeing PR and P arsev al energy conserv ation. Let θ = { U ℓ } L ℓ =1 denote the learnable parameters, where each U ℓ ∈ O (2) . For an input signal x , the analysis transform A θ produces the coefﬁcient set { a ( L ) , d (1) , . . . , d ( L ) } . Signal reconstruction is gi ven by ˆ x = S θ ( A θ ( x )) , where S θ = A † θ . The loss function L ( { U ℓ } ) promotes sparse multiscale representations by concentrating signal energy into a small number of approximation coef ﬁcients while penalizing the aggregate magnitude of detail coef ﬁcients across all scales: min { U ℓ ∈O (2) } L = λ sparse L X ℓ =1 1 N ℓ ∥ d ( ℓ ) ∥ 1 | {z } sparsity term + λ MSE N ∥ x − ˆ x ∥ 2 2 | {z } reconstruction term (MSE) (18) where N ℓ ≜ card ( d ( ℓ ) ) = N / 2 ℓ denotes the number of detail coefﬁcients at scale ℓ . Loss function terms The sparsity term ( ℓ 1 norm) promotes energy compaction into few lar ge-magnitude coefﬁcients. The r econstruction term (MSE) penalizes mismatch between the input signal and its reconstruction from the full coef ﬁcient set. This term is optional: when λ MSE = 0 , training minimizes only the sparsity objecti ve P ℓ ∥ d ( ℓ ) ∥ 1 . Since orthogonality is enforced by projection after each update, PR is guaranteed regardless of whether the MSE term is activ e. Normalization The sparsity term is normalized by N ℓ to ensure that λ sparse remains in variant with respect to changes in the decomposition depth L and the window size N . Without this normalization, the ef fectiv e weight of the sparsity penalty would decrease as decimation reduces the number of coef ﬁcients at coarser scales, requiring retuning of λ sparse for different conﬁgurations. Manifold Constraint The constraint U ℓ ∈ O (2) deﬁnes a smooth Riemannian manifold. This formulation differs from unconstrained empirical risk minimization, as orthogonality is enforced structurally via polar projection onto the orthogonal group (Section 6) rather than through soft penalty terms. As a result, perfect reconstruction and P arsev al energy conserv ation are satisﬁed by construction at ev ery training iteration. Summary This section established the exact equiv alence between MERA-inspired layers and paraunitary ﬁlter banks. Section 5.1 introduced the polyphase theory frame work, showing ho w two-channel ﬁlter banks operate in the decimated domain via constant polyphase matrices. Section 5.2 proved the main result (Theorem 1): MERA layers with orthogonal matrices U ℓ ∈ O (2) are mathematically equiv alent to two-tap paraunitary ﬁlter banks, inheriting perfect reconstruction and energy conserv ation guarantees. Corollary 1 showed that the Haar wav elet arises as the unique QMF ﬁlter maximizing DC gain. Section 5.3 formulated the manifold-constrained learning objecti ve, which promotes sparsity while enforcing orthogonality via polar projection at ev ery training iteration. Section 6 presents the optimization algorithm; Section 7 validates performance on real netw ork traces. 10 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Algorithm 1 MERA-W av elet Optimization Require: x , L , numiter , η , λ sparse , λ MSE , U 0 (optional) 1: U ← U 0 if provided; otherwise initialize randomly and project onto O (2) 2: f or k = 1 to numiter do 3: ( a, { d ( ℓ ) } L ℓ =1 ) ← M E R A A N A L Y Z E ( x, U ) ▷ Forw ard transform 4: L sparse ← 1 n d P L ℓ =1 ∥ d ( ℓ ) ∥ 1 ▷ Mean ℓ 1 norm 5: if λ MSE > 0 then 6: ˆ x ← M E R A S Y N T H E S I Z E ( a, { d ( ℓ ) } , U ) 7: L MSE ← 1 N ∥ ˆ x − x ∥ 2 2 8: else 9: L MSE ← 0 10: end if 11: L ← λ sparse L sparse + λ MSE L MSE 12: ∇ U ← B AC K P RO PAG A T E ( L , U, x ) ▷ Euclidean gradient 13: U ← A DA M S T E P ( U, ∇ U , η ) ▷ Update in R 2 × 2 14: for ℓ = 1 to L do 15: U ℓ ← U ℓ ( U † ℓ U ℓ ) − 1 / 2 ▷ Polar projection onto O (2) 16: end for 17: end f or 18: r eturn U 6 Learning Framew ork Having established in Section 5 that MERA-inspired layers form paraunitary ﬁlter banks, the practical optimization pipeline is described next 5 . This pipeline learns the scale isometries { U ℓ } L ℓ =1 directly from data while preserving PR, energy conserv ation, and numerical stability . 6.1 Optimization Pipeline The core learning procedure, detailed in Algorithm 1, implements a v ariational loop that optimizes the MERA-inspired ﬁlter banks on windowed traf ﬁc segments. The algorithm requires a real-valued signal windo w x ∈ R N , the number of decomposition le vels L , and non-negati ve loss weights λ sparse and λ MSE . The output is a collection of scale isometries U = { U 1 , . . . , U L } constrained to remain orthonormal throughout training. Each iteration proceeds in three phases: 1. Forward analysis (line 3): decompose x into multiscale coef ﬁcients { a ( L ) , d (1) , . . . , d ( L ) } . 2. Loss ev aluation (lines 4–11): compute the composite objecti ve combining sparsity promotion and (optionally) reconstruction ﬁdelity . 3. Constrained update (lines 12–15): apply Adam gradient step followed by polar projection to restore orthogonality . Gradient ﬂow and manif old projection Line 12 computes the Euclidean gradient ∇ U L in the ambient space R 2 × 2 via automatic dif ferentiation (AD). Line 13 applies Adam [42] with learning rate η . This Euclidean step violates the orthogonality constraint U † ℓ U ℓ = I . Line 15 restores the constraint via polar pr ojection : for each U ℓ , the nearest orthogonal matrix in Frobenius norm is U ℓ ( U † ℓ U ℓ ) − 1 / 2 . By enforcing U ℓ ∈ O (2) at e very iteration, the algorithm guarantees that paraunitarity , Parse val identity , and PR hold throughout training – not merely as soft approximations. Loss function The sparsity term L sparse (line 4) promotes ener gy compaction by penalizing the mean absolute v alue of detail coef ﬁcients. The normalization factor n d denotes the total number of detail coef ﬁcients across all scales, ensuring that the ef fectiv e weight of the sparsity penalty remains in v ariant with respect to the decomposition depth L and the signal length N . 5 The implementation is inspired by the MERA Julia code example released by Ev enbly [41]. 11 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Gradient-Based Optimization The frame work employs AD to compute gradients ∇ U L with respect to the ﬁlter parameters. Gradients are computed via the chain rule applied through the computational graph of the MERA transform – a process commonly termed backpr opagation in the machine learning literature [43]. The Adam optimizer [42] adapts learning rates per-parameter using exponential moving a verages of ﬁrst ( m ) and second gradient ( v ) moments: m t = β 1 m t − 1 + (1 − β 1 ) ∇ U L , (19) v t = β 2 v t − 1 + (1 − β 2 )( ∇ U L ) 2 , (20) U t = U t − 1 − η ˆ m t √ ˆ v t + ϵ , (21) where ˆ m t and ˆ v t are bias-corrected estimates. This adaptiv e scheme provides f aster con vergence and reduced sensiti vity to hyperparameter selection compared to vanilla stochastic gradient descent. The default parameters ( β 1 = 0 . 9 , β 2 = 0 . 999 , ϵ = 10 − 8 ) are used throughout (T able 2). 6.2 Initialization The framework adopts Haar warm-start initialization : each U ℓ is set to U Haar = 1 √ 2  1 1 1 − 1  . This provides coarse energy compaction from the outset, and subsequent Adam updates reﬁne this prior to match trace-speciﬁc correlations. Empirically , Haar initialization accelerates conv ergence compared to random starting points while achieving identical ﬁnal performance. For random initialization, each U ℓ can be drawn from a Gaussian distrib ution and immediately projected onto O (2) , providing a neutral baseline that does not bias the learned ﬁlters to ward any w avelet family . 6.3 Computational Complexity The proposed framew ork preserves the ef ﬁciency of classical wav elet transforms: Inference. Analysis and synthesis have comple xity O ( N ) , identical to the DWT . Each lev el ℓ processes N / 2 ℓ samples with constant-cost 2 × 2 operations. T raining. Each iteration requires O ( N ) for forward/backward passes. Polar projection operates on 2 × 2 matrices with O (1) cost per le vel, contributing O ( L ) = O (log N ) ov erhead. T otal training cost is O ( T · N ) for T iterations. Parameter efﬁciency . The learned transform requires only 4 L scalar parameters (one 2 × 2 orthogonal matrix per lev el). For L = 5 , this amounts to 20 trainable parameters – enabling rapid adaptation without risk of overﬁtting. Summary This section presented the MERA-inspired w avelet learning frame work. Algorithm 1 inte grates AD, Adam optimization, and manifold-constrained projection into a uniﬁed pipeline. Section 7 validates the framew ork on six years of backbone trafﬁc, demonstrating that learned orthonormal wa velets adapt to trafﬁc-speciﬁc correlation structures while preserving the mathematical guarantees essential for 6G telemetry . 7 Experimental Results This section presents an empirical v alidation of the proposed adapti ve MERA-inspired wa velet frame work. The e valuation assesses whether learned orthonormal wa velets simultaneously achie ve impro ved rate-distortion performance and preserve the LRD properties critical for DT synchronization. 7.1 The LRD Preser vation Requirement As mentioned in Section 1, backbone trafﬁc e xhibits LRD characterized by power -law autocorrelation decay: ϕ ( k ) ∼ k − β , 0 < β < 1 , (22) where the decay exponent β relates to the Hurst parameter H ∈ (0 . 5 , 1) via β = 2 − 2 H . This slow correlation decay has profound implications for network models commonly employed in DT frame works: 12 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Queueing Analysis For ﬁnite buf fers of size B packets, the overﬂo w probability under LRD input decays polynomially rather than exponentially [4, 5]: P ( overﬂo w ) ∼ B − (2 − 2 H ) = B − β . (23) In contrast, Markovian (memoryless) models predict P ( overﬂo w ) ∼ e − λB . This disparity leads to 10 2 – 10 3 × errors in buf fer dimensioning when H is underestimated, fundamentally altering capacity pro visioning rules for ultra-reliable low-latenc y communications (URLLC) in 6G systems. Capacity Planning The effecti ve bandwidth required to meet target loss rates scales differently under LRD traf ﬁc compared to Poisson or exponential models [1]. DT -driv en optimization algorithms that rely on trafﬁc statistics as input parameters will generate in valid pro visioning decisions if the telemetry compression distorts H . Consequence If telemetry compression degrades the LRD signature (e.g., by attenuating long-timescale correlations), the DT’ s predictions become statistically inconsistent with the physical network. This can lead to mis-provisioning, SLA violations, or instability in closed-loop control scenarios. 7.2 Experimental Setup 7.2.1 Dataset and Prepr ocessing The e v aluation utilizes trans-Paciﬁc backbone traces from the MA WI (Measurement and Analysis on the WIDE Internet) W orking Group Traf ﬁc Archi ve [35]. Six captures spanning 2020–2025 (Samplepoint-F) were selected to represent heterogeneous operating conditions, with traf ﬁc loads ranging from 314 Mbps to 1.75 Gbps. Pack et-level metadata were aggregated into byte-per -millisecond time series. T able 1 summarizes the characteristics of the traces. At the time of this study , publicly a vailable, lar ge-scale, millisecond-resolution traf ﬁc traces from operational 5G or beyond-5G networks are not av ailable. This limitation is widely ackno wledged in the literature. Consequently , this work validates the proposed framew ork on backbone aggregation traces, which capture the emer gent statistical properties – particularly LRD – that digital twins must preserve for stable closed-loop optimization. T able 1: MA WI trace characteristics (Samplepoint-F , 15-min captures). T race Duration Packets A vg. rate 202004081229 900 s 81 M 314 Mbps 202103181400 900 s 86 M 416 Mbps 202204131100 900 s 119 M 769 Mbps 202301131400 900 s 108 M 776 Mbps 202406192000 900 s 194 M 1.75 Gbps 202504090300 900 s 126 M 885 Mbps Scope limitation: These backbone traces capture aggregated trafﬁc from thousands of sources, exhibiting the LRD structure characteristic of statistical multiplexing. The framew ork’ s performance on wireless edge telemetry – where individual user dynamics, channel f ading, and mobility introduce distinct correlation structures – is deferred to future in vestigation (Section 8). 7.2.2 T raining Conﬁguration Experiments employ a two-stage training schedule optimizing MERA-wav elet parameters on 1024-sample non- ov erlapping windows. The optimization utilizes the Adam solver with a sparsity-driv en objectiv e ( λ sparse = 1 . 0 , λ MSE = 0 ), ensuring that the learned ﬁlters prioritize energy compaction into approximation coefﬁcients. T able 2 details the complete hyperparameter conﬁguration. Repr oducibility: All experiments ran on a single Apple M3 Pro laptop using Julia 1.11 with CPU-only execution. Ran- dom seed 12345 ensures deterministic initialization. The complete codebase, including hyperparameter conﬁguration ﬁles, training scripts, and learned ﬁlters, is av ailable at https://github.com/alexandreblima/MERA- wavelets . 7.2.3 Baselines Performance is compared against ﬁxed orthonormal wavelets: Haar (length-2), Daubechies-4 ( db4 ), Coiﬂet-3, Symmlet- 8, and Biorthogonal 4.4. These baselines allo w us to isolate the beneﬁts of data-dri ven adaptivity under strict paraunitary constraints. 13 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Δ PSNR (dB) ρ 0.0 0.2 0.4 0.6 0.8 0.5 1.0 1.5 2.0 2.5 MERA - Haar MERA - DB4 MERA - Coiflet-3 MERA - Symmlet-8 MERA - Biorthogonal-4.4 (a) 2020 (314 Mbps, H =0 . 89 ) Δ PSNR (dB) 0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.0 2.5 3.0 MERA - Haar MERA - DB4 MERA - Coiflet-3 MERA - Symmlet-8 MERA - Biorthogonal-4.4 ρ (b) 2021 (416 Mbps, H =0 . 77 ) 0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.0 2.5 3.0 MERA - Haar MERA - DB4 MERA - Coiflet-3 MERA - Symmlet-8 MERA - Biorthogonal-4.4 (c) 2022 (769 Mbps, H =0 . 93 ) 0.0 0.2 0.4 0.6 0.8 1.0 1.5 2.0 2.5 3.0 MERA - Haar MERA - DB4 MERA - Coiflet-3 MERA - Symmlet-8 MERA - Biorthogonal-4.4 ρ Δ PSNR (dB) (d) 2023 (776 Mbps, H =0 . 86 ) Δ PSNR (dB) 0.0 0.2 0.4 0.6 0.8 1.5 2.0 2.5 3.0 3.5 4.0 MERA - Haar MERA - DB4 MERA - Coiflet-3 MERA - Symmlet-8 MERA - Biorthogonal-4.4 ρ (e) 2024 (1.75 Gbps, H =0 . 88 ) 0.0 0.2 0.4 0.6 0.8 ρ 0.5 1.0 1.5 2.0 2.5 MERA - Haar MERA - DB4 MERA - Coiflet-3 MERA - Symmlet-8 MERA - Biorthogonal-4.4 Δ PSNR (dB) (f) 2025 (885 Mbps, H =0 . 83 ) Figure 5: PSNR gains of MERA-learned wav elets over ﬁxed baselines as a function of retention ratio ρ . A retention ratio of ρ = 0 . 1 corresponds to 90% compression (retaining only 10% of coefﬁcients by magnitude). The learned ﬁlters consistently outperform classical wa velets across all compression lev els and trafﬁc conditions. 7.2.4 Evaluation Metrics Reconstruction ﬁdelity is quantiﬁed using PSNR: PSNR = 10 · log 10  MAX 2 I MSE  , MSE = 1 N N X i =1 ( x i − ˆ x i ) 2 (24) where MAX I is the peak magnitude of the window . Statistical ﬁdelity is assessed by the preservation of the Hurst exponent ( H ), estimated via Abry–V eitch wa velet regression [11]. The error metric is ∆ H = H compressed − H orig . 14 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 T able 2: MERA-W avelet training hyperparameters for all experiments (Section 7). Parameter V alue Justiﬁcation Ar chitectur e Decomposition lev els L = 5 Captures scales 2 1 – 2 5 (2–32 ms) Initialization Haar warm-start Lev erages wav elet prior Optimization T otal iterations 100 (50 + 50) T wo-stage schedule Stage 1 learning rate η 1 = 5 × 10 − 3 Coarse adaptation Stage 2 learning rate η 2 = 2 . 5 × 10 − 3 Fine-tuning (halved η 1 ) Adam parameters β 1 =0 . 9 , β 2 =0 . 999 Standard defaults Adam epsilon ϵ = 10 − 8 Numerical stability Loss function Sparsity weight λ sparse = 1 . 0 ℓ 1 penalty on detail coefﬁcients MSE weight λ MSE = 0 . 0 Disabled (no improv ement observed) Data pr ocessing W indow size 1024 samples Power -of-two for dyadic D WT W indow stride 1024 samples Non-overlapping windo ws Retention ratios ρ ∈ { 0 . 01 , . . . , 0 . 80 } Rate-distortion ev aluation Implementation Random seed 12345 Reproducibility Parametrization MERA (polar proj.) Algorithm 1, Section 6 Hardware Apple M3 Pro CPU-only execution 7.3 Compression P erformance After training, the learned ﬁlters are e valuated under v arying bandwidth constraints by retaining only a fraction ρ ∈ (0 , 1] of the wavelet coefﬁcients ranked by magnitude. Speciﬁcally , gi ven the full coefﬁcient vector c = [ a ( L ) , d ( L ) , . . . , d (1) ] , the compressed representation retains the ⌈ ρ · | c |⌉ coefﬁcients with lar gest absolute values, setting the remainder to zero. Reconstruction is then performed via the inv erse MERA transform S θ . This coef ﬁcient-thresholding approach follows standard practice in wa velet compression [12] and enables direct comparison across retention ratios. Note that ρ is an evaluation parameter – it does not appear in the training objecti ve (18) . The learned ﬁlters are optimized for general sparsity (minimizing detail coef ﬁcient magnitudes), and the retention ratio is varied at test time to characterize rate-distortion performance across dif ferent compression le vels. T o facilitate direct comparison with ﬁxed w avelet baselines, performance is reported in terms of ∆PSNR , deﬁned as ∆PSNR( ρ ) ≜ PSNR MERA ( ρ ) − PSNR baseline ( ρ ) . (25) Fig. 5 presents the Rate-Distortion performance for all six MA WI traces. The proposed MERA-inspired w avelet framew ork consistently outperforms ﬁxed baselines across the full range of retention ratios ( ρ ). Rate-Distortion Analysis The learned ﬁlters achie ve PSNR gains ranging from 0 . 5 dB to 3 . 8 dB compared to the best ﬁxed alternati ve. • Peak P erformance (2024): The largest gains are observed in the 2024 trace (Fig. 5e), where MERA achiev es a 3 . 8 dB improv ement over Coiﬂet-3, Symmlet-8, and Biorthogonal-4.4 . This trace corresponds to the highest network load (1.75 Gbps) and strong LRD ( H ≈ 0 . 88 ), indicating that adaptiv e ﬁlters effecti vely capture the bursty dynamics of saturated links. • Con vergence of Baselines: Higher-order ﬁxed wa velets ( db4 , Coiﬂet, Symmlet) tend to cluster within a narro w performance band ( < 0 . 3 dB dif ference). MERA breaks this ceiling, demonstrating that optimizing the spectral tilt of the ﬁlter bank yields beneﬁts beyond simply increasing the number of v anishing moments. • Haar Comparison: While Haar performs robustly due to its short support, MERA consistently surpasses it by 0 . 6 – 3 . 1 dB, proving that the learned ﬁlters successfully balance time-domain localization with frequenc y selectivity . 15 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 7.4 Statistical Fidelity (LRD Preser vation) Beyond pointwise error (MSE), 6G DT will require the preserv ation of the self-similar traf ﬁc structure. T able 3 lists the reference Global Hurst Exponents ( H ) for the raw traces, conﬁrming persistent LRD ( 0 . 77 ≤ H ≤ 0 . 93 ) across all years. T able 3: Global Hurst exponent estimates from MA WI traces (Abry–V eitch regression, 95% conﬁdence interv al). T race (MA WI) ˆ H 95% CI 202004081229 0.8897 [0.853, 0.926] 202103181400 0.7674 [0.684, 0.851] 202204131100 0.9313 [0.883, 0.979] 202301131400 0.8641 [0.817, 0.911] 202406192000 0.8771 [0.825, 0.929] 202504090300 0.8329 [0.787, 0.878] Hurst Exponent Preser vation T able 4 reports the deviation ∆ H in the reconstructed signal. At a retention ratio of ρ = 0 . 1 (90% compression), the method maintains | ∆ H | ≤ 0 . 03 for all traces. This demonstrates that the learned basis functions preserve the po wer-la w decay of the autocorrelation function even at high compression rates. Importantly , increasing the retention factor ρ does not necessarily impro ve the stability of the Hurst e xponent. While larger ρ preserves more coef ﬁcients, it also reintroduces small-amplitude detail components primarily associated with high-frequency ﬂuctuations. Since Hurst exponent estimation depends on the stability of multiscale scaling beha vior rather than local reconstruction ﬁdelity , these weak high-frequency contributions may increase estimator sensitivity and perturb the slope of the wav elet logscale diagram. Conv ersely , moderate sparsiﬁcation suppresses such weak detail coefﬁcients, effecti vely acting as a structural denoising mechanism that stabilizes scaling statistics. Therefore, the observed v ariations of ∆ H with ρ reﬂect estimator sensitivity rather than degradation of the reconstructed signal. The threshold | ∆ H | ≤ 0 . 03 was selected based on the statistical precision of the estimator . As derived from the 95% conﬁdence interv als for the raw traces (T able 3), the intrinsic uncertainty of the Abry–V eitch estimator for these ﬁnite-length windows ranges from ± 0 . 036 (T race 2020) to ± 0 . 083 (T race 2021). Ev en for the critical high-load scenario (T race 2024), the measurement error is ≈ ± 0 . 052 . Consequently , maintaining compression deviations within 0 . 03 ensures that the LRD structure of the reconstructed telemetry remains statistically indistinguishable from the original source, preserving the validity of the data for queueing analysis within the limits of measurement precision. The spectral analysis in Fig. 6 corroborates this, showing that the energy distrib ution across scales ℓ maintains linearity . While deviations occur at v ery large scales ( ℓ > 15 ) due to ﬁnite-size effects and non-stationarity , the primary scaling region essential for LRD modeling is preserv ed. T able 4: Hurst exponent deviations across compression lev els (learned MERA). V alues with | ∆ H | ≤ 0 . 03 are shown in bold , indicating strong preservation of LRD. T race H orig ∆ H ( ρ =0 . 1) ∆ H ( ρ =0 . 2) ∆ H ( ρ =0 . 4) ∆ H ( ρ =0 . 8) 2020 0.890 +0.027 -0.011 -0.038 -0.049 2021 0.767 +0.011 -0.017 -0.039 -0.051 2022 0.931 +0.002 -0.028 -0.055 -0.064 2023 0.864 +0.020 -0.012 -0.039 -0.048 2024 0.877 -0.010 -0.041 -0.065 -0.079 2025 0.833 -0.004 -0.046 -0.078 -0.086 Mean 0.846 +0.009 -0.026 -0.052 -0.063 7.5 Learned Filter Analysis T o understand the adaptation mechanism, we examine the ﬁlters learned from the 2024 trace (Fig. 7). Starting from a Haar initialization, the optimization con ver ges to an asymmetric structure (T able 5) that increases the support of the basis functions. The frequency response rev eals that the learned ﬁlters introduce speciﬁc passband ripples that deviate from the “maximum ﬂatness” criteria of Daubechies wav elets. These deviations are not artifacts b ut data-driven adaptations that 16 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Figure 6: W avelet ener gy spectra S ℓ vs. scale ℓ for MA WI traces. The linear growth conﬁrms po wer-law scaling (LRD). MERA ﬁlters are optimized to match this spectral tilt. maximize energy compaction for the speciﬁc spectral signature of internet traf ﬁc, v alidating the use of the MERA framew ork for discovering domain-speciﬁc orthogonal bases. T able 5: Learned ﬁlter coefﬁcients for trace 202406192000 . De viations from Haar (length-2) indicate adaptation to trafﬁc structure. Lev el ℓ ∥ g ℓ − g Haar ∥ 2 g ℓ (low-pass) h ℓ (high-pass) 1 0.0177 [0 . 7195 , 0 . 6945] ⊤ [ 0 . 6945 , − 0 . 7195] ⊤ 2 0.0505 [0 . 7419 , 0 . 6705] ⊤ [ 0 . 6705 , − 0 . 7419] ⊤ 3 0.0473 [0 . 7398 , 0 . 6729] ⊤ [ 0 . 6729 , − 0 . 7398] ⊤ 4 0.0331 [0 . 6833 , 0 . 7301] ⊤ [ 0 . 7301 , − 0 . 6833] ⊤ 5 0.0222 [0 . 7226 , 0 . 6913] ⊤ [ 0 . 6913 , − 0 . 7226] ⊤ Summary The experimental validation conﬁrms that the proposed MERA-wav elet framework ef fectively bridges the gap between theoretical orthogonality and data-dri ven adaptation. The key takea ways for 6G DT implementations are: • Rate-Distortion Superiority: The learned ﬁlters achiev e consistent PSNR gains of 0 . 5 – 3 . 8 dB ov er standard wa velet families. The advantage is most pronounced in high-load, bursty scenarios (e.g., the 2024 trace with 1.75 Gbps), conﬁrming that adapti ve bases successfully capture the non-stationary dynamics of modern backbone trafﬁc. • Statistical Preservation: Crucially for predictive modeling, the method preserves the self-similar nature of the trafﬁc. The Hurst exponent deviations remain negligible ( | ∆ H | ≤ 0 . 03 ) ev en at 90% compression ( ρ = 0 . 1 ), ensuring that the reconstructed telemetry retains the correlation structure necessary for accurate network simulation. • Structural Guarantees: Unlike unconstrained deep learning approaches, the MERA-based optimization con ver ges to interpretable, perfectly reconstructing ﬁlter banks. The results demonstrate that strict paraunitary constraints can be maintained without sacriﬁcing the ﬂexibility required to adapt to diverse spectral signatures. These ﬁndings position the MERA-wav elet not merely as a compression tool, but as a reliable interface for high-ﬁdelity data synchronization in 6G architectures. 17 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Figure 7: Frequency response of analysis ﬁlters (T race 2024). Left: learned paraunitary ﬁlters. Right: Haar initialization. T op: low-pass cascades G ℓ ( ω ) . Bottom: high-pass ﬁlters H ℓ ( ω ) . Haar itself exhibits ripple due to the length - 2 basis. Both learned cascades keep this ripple structure; howe ver , the low-pass shift their amplitude and zero locations slightly across levels, especially in the pass-band and transition regions. 8 Conclusion This w ork addressed a central challenge in the design of DT for 6G networks: achieving high-ﬁdelity telemetry compression while preserving the strict structural guarantees required for reliable closed-loop operation. Instead of treating compression as a generic rate – distortion problem, the proposed approach framed telemetry as a synchronization mechanism, where violations of in vertibility , ener gy conservation, or LRD directly compromise the predictiv e stability of the DT . A rigorous and exact equiv alence between MERA tensor networks and tw o-channel paraunitary wav elet ﬁlter banks was established, enabling a learning frame work that overcomes the limitations of ﬁx ed wav elet designs. Experimental validation on real-world backbone aggre gation traces demonstrated consistent rate – distortion gains of up to 3.8 dB ov er classical orthogonal and biorthogonal wav elets, while preserving the self-similar structure of trafﬁc within strict Hurst-exponent bounds. These improvements were obtained without relaxing paraunitary constraints, ensuring PR and Parse val ener gy conservation at all scales. Beyond compression performance, the results position the MERA-wa velet framew ork as a principled synchronization interface between physical networks and their DT . By preserving the multiscale statistical inv ariants that underpin trafﬁc modeling, the proposed method pro vides a technology-agnostic foundation for telemetry pipelines in bandwidth- constrained 6G architectures. Extensions to wireless and edge en vironments, where mobility and radio-induced nonstationarity introduce additional challenges, constitute a natural direction for future in vestigation. 18 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 A Proof of Theor em 1 (Architectural Equi valence) Theorem 1 (Ar chitectural Equi valence). A MERA-inspired layer (Deﬁnition 3) is equiv alent to a two-channel paraunitary ﬁlter bank whose polyphase representation is a constant orthonormal matrix E ( z ) ≡ U ℓ :  A ( z ) D ( z )  =  g 0 g 1 h 0 h 1   X 0 ( z ) X 1 ( z )  . (26) Pr oof of Theor em 1. Both directions of the equiv alence are deriv ed. (Sufﬁciency) Assume a MERA-inspired local operator U ℓ ∈ O (2) acts pointwise on adjacent pairs of samples,  a k d k  = U ℓ  x 2 k x 2 k +1  =  g 0 g 1 h 0 h 1   x 2 k x 2 k +1  , k ∈ Z . (27) It is shown ne xt that this local transformation induces a paraunitary ﬁlter bank. Step 1 (Time-domain output): Expanding (27) yields the component-wise relations a k = g 0 x 2 k + g 1 x 2 k +1 , d k = h 0 x 2 k + h 1 x 2 k +1 . (28) Step 2 (Polyphase decomposition): Deﬁne the decimated z-transforms of the outputs A ( z ) = X k a k z − k , D ( z ) = X k d k z − k , (29) and the ev en/odd polyphase components of the input X 0 ( z ) = X k x 2 k z − k , X 1 ( z ) = X k x 2 k +1 z − k . (30) Step 3 (Z-transform substitution): Substituting the expressions for a k and d k from Step 1 into the z-transforms gives A ( z ) = g 0 X 0 ( z ) + g 1 X 1 ( z ) , (31) D ( z ) = h 0 X 0 ( z ) + h 1 X 1 ( z ) , (32) which can be written compactly in matrix form as  A ( z ) D ( z )  =  g 0 g 1 h 0 h 1   X 0 ( z ) X 1 ( z )  = E ( z )  X 0 ( z ) X 1 ( z )  , (33) where E ( z ) ≡ U ℓ is the constant polyphase matrix. Step 4 (T wo-tap FIR ﬁlters): Applying (10)–(11), the analysis ﬁlters are G ( z ) = E 00 ( z 2 ) + z − 1 E 01 ( z 2 ) , H ( z ) = E 10 ( z 2 ) + z − 1 E 11 ( z 2 ) . (34) Since E ( z ) ≡ U ℓ is constant (z-independent), substituting the scalar entries E 00 = g 0 , E 01 = g 1 , E 10 = h 0 , E 11 = h 1 yields G ( z ) = g 0 + g 1 z − 1 , H ( z ) = h 0 + h 1 z − 1 , (35) which are length-2 FIR analysis ﬁlters parameterized by the entries of U ℓ . Step 5 (Paraunitarity): The orthogonality condition U ℓ ∈ O (2) directly implies paraunitarity of the polyphase matrix: E ( z ) E † ( z − 1 ) = U ℓ U ℓ † = I . (36) This ensures power complementarity in the frequenc y domain: | G ( ω ) | 2 + | H ( ω ) | 2 = 2 . Step 6 (Perfect reconstruction): Choosing the synthesis polyphase matrix R ( z ) = E † ( z − 1 ) = U ℓ † ensures R ( z ) E ( z ) = I , guaranteeing alias cancellation. In the time domain, this yields U ℓ †  U ℓ  x 2 k x 2 k +1  = ( U ℓ † U ℓ )  x 2 k x 2 k +1  =  x 2 k x 2 k +1  , (37) conﬁrming perfect reconstruction of the input samples. 19 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Conclusion: A MERA layer with U ℓ ∈ O (2) induces a critically sampled, two-channel, two-tap paraunitary ﬁlter bank with constant polyphase matrix E ( z ) ≡ U ℓ , inheriting guarantees such as perfect reconstruction, energy conserv ation (Parse val identity), and O ( N ) complexity . (Necessity) Suppose now that the analysis stage of a two-channel paraunitary ﬁlter bank has a constant polyphase matrix E ( z ) ≡ U =  g 0 g 1 h 0 h 1  , U U † = I . (38) It is established next that this ﬁlter bank necessarily implements a MERA-inspired layer . Step 1 (Polyphase repr esentation): By the polyphase decomposition of the two-channel analysis bank, the output transforms are  A ( z ) D ( z )  = E ( z )  X 0 ( z ) X 1 ( z )  , (39) where X 0 ( z ) = P k x 2 k z − k and X 1 ( z ) = P k x 2 k +1 z − k are the even and odd polyphase components of the input signal. Step 2 (Time-domain relation): Since E ( z ) ≡ U is constant (z-independent), all polyphase entries are scalars. Matching coefﬁcients in the z-transform yields the time-domain relation  a k d k  = U  x 2 k x 2 k +1  , k ∈ Z . (40) This sho ws that the analysis operation applies the same matrix U to each pair of adjacent samples ( x 2 k , x 2 k +1 ) independently , followed by implicit do wnsampling. Step 3 (Equivalence to MERA layer): The pairwise transformation in the pre vious step is precisely the deﬁnition of a MERA-inspired layer (Deﬁnition 3):  a k d k  = U ℓ  x 2 k x 2 k +1  . (41) Thus, the ﬁlter bank analysis coincides exactly with the action of a MERA layer using U ℓ = U . Step 4 (Paraunitarity veriﬁcation): The paraunitarity condition E ( z ) E † ( z − 1 ) = I reduces to U U † = I for constant E ( z ) ≡ U , conﬁrming that U ∈ O (2) . This ensures perfect reconstruction via U † and energy conserv ation (Parse val identity). Step 5 (T wo-tap FIR structure): Applying (10)–(11) to the constant polyphase matrix yields the analysis ﬁlters G ( z ) = E 00 ( z 2 ) + z − 1 E 01 ( z 2 ) = g 0 + g 1 z − 1 , (42) H ( z ) = E 10 ( z 2 ) + z − 1 E 11 ( z 2 ) = h 0 + h 1 z − 1 , (43) which are two-tap FIR ﬁlters parameterized by the entries of U . Conclusion: Any two-channel paraunitary ﬁlter bank with constant polyphase matrix E ( z ) ≡ U ∈ O (2) necessarily implements a MERA-inspired layer with two-tap FIR analysis ﬁlters. This completes the proof of equiv alence. B Proof of Cor ollary 1 (Uniqueness of Haar for T wo-T ap QMF) Pr oof of Cor ollary 1. It is shown that the Haar wav elet is the unique real two-tap FIR ﬁlter bank satisfying both PR and QMF paraunitarity . Step 1 (Orthogonality): The paraunitarity condition U ℓ U † ℓ = I applied to (17) yields  g 0 g 1 g 1 − g 0   g 0 g 1 g 1 − g 0  † =  g 2 0 + g 2 1 0 0 g 2 0 + g 2 1  = I . (44) This immediately giv es the normalization constraint g 2 0 + g 2 1 = 1 . (45) 20 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 Step 2 (Parameterization): Eq. (45) parameterizes all solutions as points on the unit circle: g 0 = cos θ , g 1 = sin θ , θ ∈ [0 , 2 π ) . (46) Step 3 (DC response maximization): Among all orthonormal solutions, the Haar wa velet uniquely maximizes the DC response | G (0) | : | G (0) | = | g 0 + g 1 | = | cos θ + sin θ | . (47) This is maximized when cos θ = sin θ , i.e., θ = π / 4 , yielding g 0 = g 1 = 1 √ 2 , | G (0) | = √ 2 . (48) Step 4 (Uniqueness): Combining orthogonality (45) with the symmetry requirement g 0 = g 1 giv es the unique solution g 0 = g 1 = 1 √ 2 , h 0 = g 1 = 1 √ 2 , h 1 = − g 0 = − 1 √ 2 , (49) corresponding to the Haar ﬁlters G ( z ) = 1 √ 2 (1 + z − 1 ) , H ( z ) = 1 √ 2 (1 − z − 1 ) , (50) with polyphase matrix U Haar = 1 √ 2  1 1 1 − 1  . (51) Conclusion: Therefore, for two-tap ﬁlters, the QMF-paraunitary family forms a one-parameter manifold M = { U ( θ ) : θ ∈ [0 , 2 π ) } with U ( θ ) =  cos θ sin θ sin θ − cos θ  . The Haar ﬁlter bank corresponds to θ = π / 4 , yielding the computationally simplest coefﬁcients g 0 = g 1 = 1 / √ 2 and uniquely maximizing the DC gain | G (0) | = √ 2 among all members of M . This makes Haar the canonical choice for initialization, while the learnable angles θ ℓ explored in this work span the full QMF-paraunitary f amily . Remark (Relationship to QMF). F or two-tap ﬁlters, the QMF-paraunitary f amily forms a one-parameter manifold within the reﬂection component of O (2) (i.e., det( U ) = − 1 ). W ith Haar initialization ( θ ℓ = π / 4 ) and polar projection, the learned ﬁlters remain in this QMF-paraunitary family throughout training. The PSNR gains in Figs. 5a–5f arise from learning optimal rotation angles θ ℓ  = π / 4 that better match trace-speciﬁc LRD statistics, while preserving both QMF structure and perfect reconstruction. Extending to rotations ( det = +1 ) would exit the QMF family; longer ﬁlters ( N > 2 ) would enlarge the design space further . References [1] W . Leland, M. T aqqu, W . Willinger , and D. W ilson, “On the self-similar nature of Ethernet trafﬁc (extended version), ” IEEE/A CM T ransactions on Networking , vol. 2, no. 1, pp. 1–15, 1994. [2] V . Paxson and S. Floyd, “Wide-Area Traf ﬁc: The Failure of Poisson Modeling, ” in Pr oceedings of the A CM SIGCOMM ’94 . A CM, 1994, pp. 257–268. [3] G. Mill ´ an, “On the LRD of the aggregated traf ﬁc ﬂows in high-speed computer networks, ” arXiv pr eprint arXiv:2103.03981 , 2021. [4] I. Norros, “On the use of fractional Brownian motion in the theory of connectionless networks, ” IEEE Journal on Selected Ar eas in Communications , vol. 13, no. 6, pp. 953–962, 1995. [5] M. Parulekar and A. M. Mako wski, “T ail probabilities for a multiplexer with self-similar traf ﬁc, ” in Pr oceedings of IEEE INFOCOM’96. Confer ence on Computer Communications , vol. 3. IEEE, 1996, pp. 1452–1459. [6] R. Boutaba, M. A. Salahuddin, N. Limam, S. A youbi, N. Shahriar , F . Estrada-Solano, and O. M. Caicedo, “ A comprehensiv e survey on machine learning for networking: ev olution, applications and research opportunities, ” Journal of Internet Services and Applications , v ol. 9, no. 1, pp. 1–99, 2018. 21 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 [7] O. Aouedi, V . A. Le, K. Piamrat, and Y . Ji, “Deep learning on network trafﬁc prediction: Recent advances, analysis, and future directions, ” ACM Computing Surve ys , vol. 57, no. 6, pp. 1–37, 2025. [8] M. S. T aqqu, W . Willinger , and R. Sherman, “Proof of a fundamental result in self-similar trafﬁc modeling, ” ACM SIGCOMM Computer Communication Revie w , vol. 27, no. 2, pp. 5–23, 1997. [9] W . W illinger , V . Paxson, and M. S. T aqqu, “Self-similarity and heavy tails: structural modeling of network traf ﬁc, ” A practical guide to heavy tails: statistical techniques and applications , v ol. 23, no. 1, pp. 27–53, 1998. [10] I. Daubechies, T en lectur es on wavelets . SIAM, 1992. [11] P . Abry , D. V eitch, and P . Flandrin, “Long-range dependence: Revisiting aggre gation with wav elets, ” J ournal of T ime Series Analysis , v ol. 19, no. 3, pp. 253–266, 1998. [12] S. Mallat, A W avelet T our of Signal Pr ocessing: The Sparse W ay , 3rd ed. Amsterdam, Boston: Academic Press, 2009. [13] D. Szostak, A. Włodarczyk, and K. W alkowiak, “Machine learning classiﬁcation and regression approaches for optical network trafﬁc prediction, ” Electr onics , vol. 10, no. 13, p. 1578, 2021. [Online]. A vailable: https://www .mdpi.com/2079- 9292/10/13/1578 [14] I. Lohrasbinasab, A. Shahraki, A. T aherkordi, and A. Delia Jurcut, “From statistical- to machine learning-based network traf ﬁc prediction, ” T ransactions on Emer ging T elecommunications T echnologies , v ol. 33, no. 4, p. e4394, 2022. [Online]. A vailable: https://onlinelibrary .wiley .com/doi/abs/10.1002/ett.4394 [15] J. W ang, Z. W ang, J. Li, and J. W u, “Multilevel wavelet decomposition network for interpretable time series analysis, ” in Pr oceedings of the 24th ACM SIGKDD International Confer ence on Knowledge Discovery & Data Mining , 2018, pp. 2437–2446. [16] H. Khan and B. Y ener , “Learning ﬁlter widths of spectral decompositions with wav elets, ” Advances in Neural Information Pr ocessing Systems , vol. 31, 2018. [17] W . Ha, C. Singh, F . Lanusse, S. Upadhyayula, and B. Y u, “ Adaptiv e wav elet distillation from neural networks through interpretations, ” Advances in Neural Information Pr ocessing Systems , vol. 34, pp. 20 669–20 682, 2021. [18] M. W olter and J. Garcke, “ Adaptiv e wavelet pooling for con v olutional neural networks, ” in International Confer- ence on Artiﬁcial Intelligence and Statistics . PMLR, 2021, pp. 1936–1944. [19] G. Michau, G. Frusque, and O. Fink, “Fully learnable deep wav elet transform for unsupervised monitoring of high-frequency time series, ” Pr oceedings of the National Academy of Sciences , vol. 119, no. 8, p. e2106598119, 2022. [20] M. Lezcano-Casado and D. Mart ´ ınez-Rubio, “Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group, ” in Pr oceedings of the 36th International Confer ence on Machine Learning , ser . Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov , Eds., vol. 97. PMLR, 2019, pp. 3794–3803. [Online]. A vailable: ht t p s : //proceedings.mlr .press/v97/lezcano- casado19a.html [21] H. Sato, Riemannian optimization and its applications . Springer , 2021, vol. 670. [22] N. Boumal, An intr oduction to optimization on smooth manifolds . Cambridge, UK: Cambridge University Press, 2023. [23] Y . Fei, Y . Liu, C. Jia, Z. Li, X. W ei, and M. Chen, “ A survey of geometric optimization for deep learning: from Euclidean space to Riemannian manifold, ” ACM Computing Surve ys , vol. 57, no. 5, pp. 1–37, 2025. [24] G. V idal, “Entanglement renormalization, ” Physical Review Letter s , vol. 99, no. 22, p. 220405, 2007. [25] R. Or ´ us, “ A practical introduction to tensor networks: matrix product states and projected entangled pair states, ” Annals of Physics , vol. 349, pp. 117–158, 2014. [26] G. Evenbly and S. R. White, “Entanglement renormalization and wa velets, ” Physical Review Letter s , vol. 116, no. 14, p. 140403, 2016. [27] J. Haegeman, B. Swingle, M. W alter , J. Cotler , G. Evenbly , and V . B. Scholz, “Rigorous free-fermion entanglement renormalization from wa velet theory , ” Physical Review X , v ol. 8, no. 1, p. 011003, 2018. [28] M. Griev es, “Digital twin: Manufacturing excellence through virtual f actory replication, ” pp. 1–7, 03 2014. [Online]. A vailable: https://www .researchgate.net/publication/275211047 Digital Twin Manufacturing Excelle nce through Virtual Factory Replication [29] C. Zhou, H. Y ang, and X. Duan, “Concepts of Digital T win Network, ” Internet Engineering T ask Force, Internet-Draft draft-zhou-nmrg-digitaltwin-netw ork-concepts-00, 2020, work in Progress. [Online]. A v ailable: https://datatracker .ietf.org/doc/draft- zhou- nmrg- digitaltwin- network- concepts/00/ 22 A P R E P R I N T - F E B R U A RY 2 4 , 2 0 2 6 [30] C. Zhou, H. Y ang, X. Duan, D. Lopez, A. Pastor , Q. W u, M. Boucadair , and C. Jacquenet, “Network Digital T win: Concepts and Reference Architecture, ” Internet Engineering T ask Force, Internet-Draft draft-irtf-nmrg-network-digital-twin-arch-10, 2025, work in Progress. [Online]. A vailable: https://datatracker .ietf.org/doc/draft- irtf- nmrg- network- digital- twin- arch/10/ [31] X. Hesselbach and X. Calle-Heredia, “Digital T win Networks requirements: T o wards an ultra-reliable infrastruc- ture, ” in 2025 25th Anniversary International Confer ence on T ranspar ent Optical Networks (ICTON) . IEEE, 2025, pp. 1–4. [32] M. T ariq, F . Naeem, and H. V . Poor, “T o ward experience-dri ven traf ﬁc management and orchestration in digital- twin-enabled 6G networks, ” arXiv pr eprint arXiv:2201.04259 , 2022. [33] N. P . Kuruv atti, M. A. Habibi, S. Partani, B. Han, A. Fellan, and H. D. Schotten, “Empowering 6G Communication Systems W ith Digital T win T echnology: A Comprehensive Surv ey, ” IEEE Access , vol. 10, pp. 112 158–112 186, 2022. [34] Z. W ang, D. Jiang, and S. Mumtaz, “Network-W ide Data Collection Based on In-Band Network T elemetry for Digital T win Networks, ” IEEE T ransactions on Mobile Computing , vol. 24, no. 1, pp. 86–101, 2025. [35] WIDE Project and MA WI W orking Group, “MA WI working group trafﬁc archiv e (WIDE project), ” [Online]. A vailable: https://mawi.wide.ad.jp/mawi/, 2026, accessed: Jan. 21, 2026. [36] G. V idal, “Class of quantum many-body states that can be ef ﬁciently simulated, ” Physical Review Letters , v ol. 101, no. 11, p. 110501, 2008. [37] J. A. Reyes and E. M. Stoudenmire, “Multi-scale tensor network architecture for machine learning, ” Machine Learning: Science and T echnology , v ol. 2, no. 3, p. 035036, 2021. [38] P .-A. Absil, R. Mahon y , and R. Sepulchre, Optimization algorithms on matrix manifolds . Princeton, NJ: Princeton Univ ersity Press, 2008. [39] P . P . V aidyanathan, Multir ate systems and ﬁlter banks . Englewood Clif fs, NJ: Prentice Hall, 1993. [40] M. Smith and T . Barnwell, “Exact reconstruction techniques for tree-structured subband coders, ” IEEE T ransac- tions on Acoustics, Speech, and Signal Pr ocessing , v ol. 34, no. 3, pp. 434–441, 1986. [41] G. Evenbly , “MERA julia code example, ” https://www .tensors .net/mera, T ensors.net, site maintained by Glen Evenbly . Accessed: Nov . 6, 2025. [42] D. P . Kingma and J. Ba, “ Adam: A method for stochastic optimization, ” arXiv pr eprint arXiv:1412.6980 , 2014. [43] I. Goodfello w , Y . Bengio, and A. Courville, Deep Learning . MIT Press, 2016, http://www .deeplearningbook.org. 23

Digital Twin--Driven Adaptive Wavelet Strategy for Efficient 6G Backbone Network Telemetry

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment