Computation and Language (NLP)

Learning Multilingual Embeddings for Cross-Lingual Information Retrieval Using Topically Aligned Corpora

Cross-lingual information retrieval is a challenging task in the absence of aligned parallel corpora. In this paper, we address this problem by considering topically aligned corpora designed for evaluating an IR setup. To emphasize, we neither use any sentence-aligned corpora or document-aligned corpora, nor do we use any language specific resources such as dictionary, thesaurus, or grammar rules. Instead, we use an embedding into a common space and learn word correspondences directly from there. We test our proposed approach for bilingual IR on standard FIRE datasets for Bangla, Hindi and English. The proposed method is superior to the state-of-the-art method not only for IR evaluation measures but also in terms of time requirements. We extend our method successfully to the trilingual setting.

Learning Multilingual Embeddings for Cross-Lingual Information Retrieval Using Topically Aligned Corpora

Tag-Enhanced Tree-Structured Neural Networks for Implicit Discourse Relation Classification

PyBangla at BLP-2025 Task 2 Improving Bangla-to-Python Code Generation with Iterative Self-Correction and Multilingual Agents

Parallel Universes, Parallel Languages A Comprehensive Study on Multilingual Counterfactual Example Generation Using Large Language Models

Technical Report on K-EXAONE

AdaGReS Adaptive Greedy Context Selection via Redundancy-Aware Scoring for Token-Budgeted RAG

AI-Powered Integration Bridging Historical Texts to Modern Databases

BERT-JEPA Reorganizing CLS Embeddings for Language-Invariant Semantics

Beyond Perfect APIs A Comprehensive Evaluation of Large Language Model Agents Under Real-World API Complexity

Big AI is accelerating the metacrisis What can we do?

Bridging the Data Gap Creating a Hindi Text Summarization Dataset from the English XSUM

Classifying long legal documents using short random chunks

Cost-Efficient Cross-Lingual Retrieval-Augmented Generation for Low-Resource Languages A Case Study in Bengali Agricultural Advisory

DeCode Decoupling Content and Delivery for Medical QA

Defensive M2S Training Guardrail Models on Compressed Multi-turn Conversations

Do Large Language Models Know What They Are Capable Of?

Emergent Introspective Awareness in Large Language Models

Encyclo-K Dynamic Knowledge Assessment for LLMs

Entropy-Aware Boost for LLM Reasoning Speed

Evaluating LLM-Generated Scientific Summaries vs Experts

Exploring the Performance of Large Language Models on Subjective Span Identification Tasks

FormationEval, an open multiple-choice benchmark for petroleum geoscience

Graphs in Memory Myth or Reality?

HarmTransform Debating Stealthy Threats in LLM Safety

Hidden Prompt Attacks Across Languages in Academic Reviewing

Hierarchical Density Modeling for LLM Embeddings

Hypergraph Memory Boosts Multi-step RAG s Long-Context Reasoning

iCLP Harnessing Implicit Cognition for LLM Reasoning

Intention Collapse Intention-Level Metrics for Reasoning in Language Models

JMedEthicBench A Multi-Turn Conversational Benchmark for Evaluating Medical Safety in Japanese Large Language Models

JP-TL-Bench A New Gauge for Japanese-English Translation Excellence

JudgeWEL LLM-Verified NER for Luxembourgish

Language as Mathematical Structure Examining Semantic Field Theory Against Language Games

Lying with Truths Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage

mHC Manifold-Constrained Hyper-Connections

Mind Meld From Human Memory to AI Futures

Modeling Language as a Sequence of Thoughts

Multi-Dimensional Prompt Chaining to Improve Open-Domain Dialogue Generation

pdfQA Diverse, Challenging, and Realistic Question Answering over PDFs

Peer-Review Inspired LLM Ensemble Scoring and Selecting Excellence

Practising Responsibility Ethics in NLP as a Hands-On Course

PrivacyBench A Conversational Benchmark for Evaluating Privacy in Personalized AI

Quantization Quandary Does It Halt LLM Self-Explanations?

R-Debater Retrieval-Augmented Debate Generation through Argumentative Memory

Revisiting Faithfulness Beyond Hint Verbalization in CoT

Robust Uncertainty Quantification for Factual Generation of Large Language Models

Routing by Analogy kNN-Augmented Expert Assignment for Mixture-of-Experts

SHAP Insights into Transformer-Based News Bias Detectors

Skim-Aware Contrastive Learning for Efficient Document Representation

STED & Score Ensuring LLM Output Reliability

StressRoBERTa Detecting Chronic Stress via Social Media Analysis

Stylometry Analysis of Human and Machine Text for Academic Integrity

Subtle Sexism Experts vs. Algorithms in Detection Debate

Summarizing the Unseen Less-Resourced Language Techniques

Surprisal and Metaphor Novelty Judgments Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets

T3C Test-Time Tensor Compression with Consistency Guarantees

Tackling the Inherent Difficulty of Noise Filtering in RAG

Uncovering Hidden Reasoning Beyond Human-Defined Concepts

Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time

Unlocking Safety SAE-Powered Subspace Tuning for LLMs

When Should LLMs Say I Don t Know ?

Where s My Plate? Testing Robots Hidden Item Sense

< Category Statistics (Total: 566) >

Start searching

No results found