SCONE: A Practical, Constraint-Aware Plug-in for Latent Encoding in Learned DNA Storage
DNA storage has matured from concept to practical stage, yet its integration with neural compression pipelines remains inefficient. Early DNA encoders applied redundancy-heavy constraint layers atop raw binary data - workable but primitive. Recent neural codecs compress data into learned latent representations with rich statistical structure, yet still convert these latents to DNA via naive binary-to-quaternary transcoding, discarding the entropy model’s optimization. This mismatch undermines compression efficiency and complicates the encoding stack. A plug-in module that collapses latent compression and DNA encoding into a single step. SCONE performs quaternary arithmetic coding directly on the latent space in DNA bases. Its Constraint-Aware Adaptive Coding module dynamically steers the entropy encoder’s learned probability distribution to enforce biochemical constraints - Guanine-Cytosine (GC) balance and homopolymer suppression - deterministically during encoding, eliminating post-hoc correction. The design preserves full reversibility and exploits the hyperprior model’s learned priors without modification. Experiments show SCONE achieves near-perfect constraint satisfaction with negligible computational overhead (<2% latency), establishing a latent-agnostic interface for end-to-end DNA-compatible learned codecs.
💡 Research Summary
The paper introduces SCONE (Simplified Constraint‑aware ON‑network Encoder), a plug‑in that fuses the entropy‑coding stage of modern learned compressors directly with DNA‑compatible encoding. Traditional DNA storage pipelines first compress data into binary latent representations, then apply a fixed 2‑bit‑to‑base mapping (00→A, 01→T, 10→G, 11→C). This post‑hoc conversion discards the probability distribution learned by the hyper‑prior entropy model and forces additional redundancy or error‑correction layers to satisfy biochemical constraints such as GC balance and homopolymer limits. Consequently, compression efficiency suffers and the overall stack becomes non‑differentiable.
SCONE replaces the binary arithmetic coder with a quaternary arithmetic coder that operates directly on the four‑symbol DNA alphabet. The latent vector y, after quantization, is fed to a Gaussian conditional model (μ, σ) that yields a probability vector p =
Comments & Academic Discussion
Loading comments...
Leave a Comment