Low-Latency and Low-Complexity MLSE for Short-Reach Optical Interconnects
To meet the high-speed, low-latency, and low-complexity demand for optical interconnects, simplified maximum likelihood sequence estimation (MLSE) is proposed in this paper. Simplified MLSE combines computational simplification and reduced state in MLSE. MLSE with a parallel sliding block architecture reduces latency from linear order to logarithmic order. Computational simplification reduces the number of multipliers from exponential order to linear order. Incorporating the reduced state with computational simplification further decreases the number of adders and comparators. The simplified MLSE is evaluated in a 112-Gbit/s PAM4 transmission over 2-km standard single-mode fiber. Experimental results show that the simplified MLSE significantly outperforms the FFE-only case in bit error ratio (BER) performance. Compared with simplified 1-step MLSE, the latency of simplified MLSE is reduced from 34 delay units in linear order to 7 delay units in logarithmic order. The simplified scheme in MLSE reduces the number of variable multipliers from 512 in exponential order to 33 in linear order without BER performance deterioration, while reducing the number of adders and comparators to 37.2% and 8.4%, respectively, with nearly identical BER performance.
💡 Research Summary
The paper addresses the pressing need for high‑speed, low‑latency, and low‑complexity signal processing in short‑reach optical interconnects, which are increasingly demanded by data‑center workloads such as AI, cloud computing, and IoT. Intensity‑modulation/direct‑detection (IM/DD) transceivers are attractive for their low cost and simple architecture, but bandwidth‑limited front‑end components introduce severe inter‑symbol interference (ISI) that degrades performance. Conventional solutions combine a feed‑forward equalizer (FFE), a post‑filter, and a maximum‑likelihood sequence estimator (MLSE). While MLSE can effectively cancel residual ISI, its implementation based on the Viterbi algorithm suffers from two major drawbacks: (1) exponential growth of the number of branch‑metric (BM) calculations (and thus multipliers) with modulation order and channel memory, and (2) a serial add‑compare‑select (ACS) operation that forces latency to increase linearly with the length of the processing block.
The authors propose a simplified MLSE that merges two complementary complexity‑reduction techniques—computational simplification and reduced‑state trellis—within a parallel sliding‑block architecture. The key ideas are:
-
Computational simplification – By expanding all BM equations for two successive symbols (the first layer) and grouping common terms into four vectors (A, B, C, D), the authors replace most variable multiplications with a handful of shared multipliers and a large number of shift‑and‑add operations. Only one variable multiplier for α² and two for the received‑symbol‑dependent terms remain, reducing the total number of variable multipliers from 512 (exponential) to 33 (linear) for the 112‑Gb/s PAM‑4 case. Constant multiplications are implemented as bit‑shifts, further saving hardware.
-
Reduced‑state trellis – The output of the preceding FFE is used as a pre‑decision value to prune unlikely state transitions, shrinking the trellis width. This directly cuts the number of adders and comparators required in the ACS stages.
-
Parallel sliding‑block processing – Instead of processing symbols sequentially within a block (1‑step MLSE, latency ≈ N + 2 delay units for a block of N symbols), the proposed architecture processes the block in a multi‑layer, tree‑like fashion. The first layer aggregates BM pairs, the second layer merges the results of the first, and so on, halving the number of active ACS units at each layer. Consequently, the ACS latency collapses from O(N) to O(log₂ N), yielding a total latency of log₂(N) + 2 delay units. In the experimental configuration (N ≈ 64), latency drops from 34 to 7 units, an 80 % reduction.
The experimental validation uses a 112‑Gb/s PAM‑4 signal transmitted over 2 km standard single‑mode fiber. Four configurations are compared: (i) FFE‑only, (ii) conventional 1‑step MLSE, (iii) simplified 1‑step MLSE (computational simplification only), and (iv) the full simplified MLSE (computational simplification + reduced state). Results show:
- BER performance – Simplified MLSE matches the BER of conventional MLSE and outperforms FFE‑only by more than two orders of magnitude, achieving error‑free operation (< 10⁻⁶) at the same optical launch power.
- Latency – The proposed scheme reduces processing delay from 34 to 7 delay units compared with 1‑step MLSE, confirming the logarithmic scaling.
- Hardware resources – Variable multipliers drop from 512 to 33 (≈ 93 % reduction). Adders and comparators are reduced to 37.2 % and 8.4 % of the original counts, respectively, without sacrificing BER.
The authors conclude that the simplified MLSE delivers a practical trade‑off: it retains the ISI‑cancellation capability of full‑complexity MLSE while dramatically cutting both computational load and processing latency, making it suitable for latency‑critical data‑center interconnects. Future work is suggested on extending the method to higher‑order PAM‑M formats, longer fiber spans, and ASIC/FPGA implementation to verify real‑time performance and power efficiency.
Comments & Academic Discussion
Loading comments...
Leave a Comment