Cs-Cl
APE-Bench: Evaluating Automated Proof Engineering for Formal Math Libraries
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse
CrossCheck-Bench: Diagnosing Compositional Failures in Multimodal Conflict Resolution
Latent Debate: A Surrogate Framework for Interpreting LLM Thinking
PIRA: Preference-Oriented Instruction-Tuned Reward Models with Dual Aggregation
MemBuilder: Reinforcing LLMs for Long-Term Memory Construction via Attributed Dense Rewards
Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR
X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests
End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning
Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce
AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees
Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs
Mental Health Impacts of AI Companions: Triangulating Social Media Quasi-Experiments, User Perspectives, and Relational Theory
Retrieval-Augmented Generation for Natural Language Art Provenance Searches in the Getty Provenance Index
DP-Fusion: Token-Level Differentially Private Inference for Large Language Models
Learning to Evolve: Bayesian-Guided Continual Knowledge Graph Embedding
Hallucination-Resistant Relation Extraction via Dependency-Aware Sentence Simplification and Two-tiered Hierarchical Refinement
Anchored Supervised Fine-Tuning
LLM-Based Multi-Agent Blackboard System for Information Discovery in Data Science
Draft-based Approximate Inference for LLMs
PAL: Probing Audio Encoders via LLMs -- Audio Information Transfer into LLMs
Mind the Gap: Assessing Wiktionary's Crowd-Sourced Linguistic Knowledge on Morphological Gaps in Two Related Languages