ChronoSelect: Robust Learning with Noisy Labels via Dynamics Temporal Memory

ChronoSelect: Robust Learning with Noisy Labels via Dynamics Temporal Memory
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Training deep neural networks on real-world datasets is often hampered by the presence of noisy labels, which can be memorized by over-parameterized models, leading to significant degradation in generalization performance. While existing methods for learning with noisy labels (LNL) have made considerable progress, they fundamentally suffer from static snapshot evaluations and fail to leverage the rich temporal dynamics of learning evolution. In this paper, we propose ChronoSelect (chrono denoting its temporal nature), a novel framework featuring an innovative four-stage memory architecture that compresses prediction history into compact temporal distributions. Our unique sliding update mechanism with controlled decay maintains only four dynamic memory units per sample, progressively emphasizing recent patterns while retaining essential historical knowledge. This enables precise three-way sample partitioning into clean, boundary, and noisy subsets through temporal trajectory analysis and dual-branch consistency. Theoretical guarantees prove the mechanism’s convergence and stability under noisy conditions. Extensive experiments demonstrate ChronoSelect’s state-of-the-art performance across synthetic and real-world benchmarks.


💡 Research Summary

ChronoSelect tackles the pervasive problem of noisy labels in deep learning by introducing a temporally aware memory mechanism that captures the full evolution of each training sample’s predictions while using only a minimal amount of storage. The core of the method is the Temporal Memory Space (TMS), which maintains four hierarchical memory units per sample: Long‑term, Mid‑term, Short‑term, and Immediate. At every training epoch, the latest prediction is inserted into the Immediate unit, and a carefully designed sliding update propagates information upward through the hierarchy with controlled decay. Mathematically, the update rules (Eq. 5) assign decreasing weights to older information (β coefficients) and increasing weights to newer information (α coefficients), mimicking a biologically inspired forgetting process. This design ensures rapid adaptation in early training while preserving stability in later stages.

ChronoSelect operates with a dual‑branch architecture: two separate augmentations (weak and strong) generate two views of each input, each maintaining its own TMS. From the four‑stage memory, two temporal signatures are extracted. The convergence metric Γₜ(x) checks whether loss values across the four stages decrease monotonically, indicating a stable learning trajectory. The consistency metric ψ(x) measures agreement between the two branches throughout training. By combining these signatures, samples are automatically partitioned into three categories without any hand‑tuned thresholds:

  • Clean (D_c) – Γₜ = 1 and ψ = 1, meaning loss consistently drops and both branches predict the same class.
  • Boundary (D_b) – Γₜ = 1 but ψ < 1, indicating a convergent loss but disagreement between views, typical of samples near decision boundaries.
  • Noisy (D_n) – Γₜ = 0, reflecting non‑monotonic loss and low consistency, suggesting mislabeled data.

Each subset receives a tailored loss function. Clean samples are trained with standard cross‑entropy to reinforce reliable knowledge. Boundary samples receive a combination of label‑smoothing and regularization losses to preserve useful but ambiguous information while preventing over‑fitting. Noisy samples are down‑weighted or corrected via smoothing‑based pseudo‑labeling, minimizing their harmful influence.

The authors provide rigorous theoretical guarantees. Theorem 3.1 proves that, as training epochs t → ∞, every memory unit converges to the model’s steady‑state prediction p* with an error that decays as O(1/t). Theorem 3.2 shows that any perturbation ε in predictions leads to bounded deviations in the memory, bounded by 4·ε/(t+1)+O(1/t²). These results confirm that the memory system is both convergent and robust to label noise.

Extensive experiments on synthetic and real‑world benchmarks—including CIFAR‑10/100 with up to 80 % symmetric noise, Clothing1M, WebVision, and Food‑101N—demonstrate that ChronoSelect consistently outperforms state‑of‑the‑art noisy‑label methods such as Co‑Teaching+, JoCoR, DivideMix, and ELR+. The method achieves higher accuracy across all noise levels, particularly excelling in high‑noise regimes where accurate boundary detection is crucial. Moreover, the memory footprint is modest: only four probability vectors per sample are stored, a drastic reduction compared to sliding‑window approaches that keep k recent predictions. Ablation studies confirm the importance of the four‑stage hierarchy, the sliding decay, and the dual‑branch consistency; removing any component leads to measurable performance drops.

In summary, ChronoSelect contributes (1) a compact yet expressive temporal memory that captures the full learning trajectory, (2) a sliding update with controlled forgetting that balances adaptation and stability, (3) a threshold‑free, three‑way sample partitioning based on convergence and consistency, and (4) formal convergence and stability proofs. These innovations collectively advance the robustness of deep learning models trained on noisy datasets and open avenues for future work such as adaptive memory depth, multimodal extensions, and integration with semi‑supervised frameworks.


Comments & Academic Discussion

Loading comments...

Leave a Comment