평행 우주, 평행 언어: LLM 기반 다언어 역사실 예제 생성의 포괄적 연구

초록

카운터팩셜은 모델의 예측을 바꾸는 최소한으로 수정된 입력물로, 모델의 행동을 설명하는 유망한 접근법입니다. 대형 언어 모델(LLMs)은 영어 카운터팩셜 생성에 뛰어나며 다국어 능력을 보여주지만, 다국어 카운터팩셜 생성에서 얼마나 효과적인지 명확하지 않습니다. 이 연구에서는 다국어 카운터팩셜을 포괄적으로 연구합니다. 먼저, 6개 언어로 직접 생성된 카운터팩셜과 영어 번역을 통한 도출된 카운터팩셜에 대한 자동 평가를 수행했습니다. 번역 기반의 카운터팩셜은 직접 생성된 것보다 더 높은 유효성을 제공하지만, 원래 영어 카운터팩셜과는 비교할 수 없는 퀄리티를 보여줍니다. 두 번째로, 고자원 유럽 언어의 카운터팩셜에 적용된 수정 패턴이 매우 비슷하다는 것을 발견하여 교차언어적 변화가 공통 전략 원칙을 따르고 있음을 시사합니다. 세 번째로, 생성된 카운터팩셜에서 언어를 불문하고 일관되게 나타나는 네 가지 주요 오류 유형을 식별하고 분류했습니다. 마지막으로 다국어 카운터팩셜 데이터 증강(CDA)은 특히 저자원 언어에 대해 교차언어적 CDA보다 더 큰 모델 성능 향상을 가져왔지만, 생성된 카운터팩셜의 미비한 점이 모델 성능과 견고성의 개선을 제한한다는 것을 밝혔습니다.

상세 요약

This paper explores how large language models (LLMs) perform in generating multilingual counterfactual examples and analyzes their effectiveness across different languages. Counterfactuals are minimally altered inputs that change a model’s prediction, serving as promising tools to explain model behavior. The study compares directly generated counterfactuals with those derived via English translation for six languages and finds that while the latter offers higher validity, they still fall short in quality compared to original English counterfactuals. Notably, patterns of edits across high-resource European languages indicate a common strategic principle followed in cross-lingual perturbations. The paper also identifies four main types of errors consistently appearing in generated counterfactuals. Multilingual Counterfactual Data Augmentation (CDA) significantly improves model performance, especially for low-resource languages, but the imperfections in generated counterfactuals limit further gains.

초록

상세 요약

📜 논문 원문 (영문)