The CLEF-2026 CheckThat! Lab: Advancing Multilingual Fact-Checking
The CheckThat! lab aims to advance the development of innovative technologies combating disinformation and manipulation efforts in online communication across a multitude of languages and platforms. While in early editions the focus has been on core tasks of the verification pipeline (check-worthiness, evidence retrieval, and verification), in the past three editions, the lab added additional tasks linked to the verification process. In this year’s edition, the verification pipeline is at the center again with the following tasks: Task 1 on source retrieval for scientific web claims (a follow-up of the 2025 edition), Task 2 on fact-checking numerical and temporal claims, which adds a reasoning component to the 2025 edition, and Task 3, which expands the verification pipeline with generation of full-fact-checking articles. These tasks represent challenging classification and retrieval problems as well as generation challenges at the document and span level, including multilingual settings.
💡 Research Summary
The CLEF 2026 CheckThat! Lab presents three new tasks that together cover the full fact‑checking pipeline, with a strong emphasis on multilinguality and realistic evaluation. Task 1 (Source Retrieval for Scientific Web Claims) asks participants to identify the scholarly paper implicitly referenced in a social‑media post. The task is offered in English, German and French, using a large English training set (15,699 pairs) and newly annotated German and French sets (≈1,500 pairs each). Systems are evaluated with Mean Reciprocal Rank at 5 (MRR@5), reflecting the practical need to surface the correct source within the top five results.
Task 2 (Fact‑Checking Numerical and Temporal Claims) focuses on claims that contain quantities or time expressions. It is provided in English, Spanish and Arabic, with 8,000 English, 2,808 Spanish and 3,260 Arabic claims. Each claim is accompanied by 20 reasoning traces generated by GPT‑4o‑mini at different temperature settings. Participants must train a verifier that ranks these traces, removes redundancy, and produces a final veracity label. Evaluation combines Recall@5 and MRR@5 for trace ranking with macro‑averaged F1 and class‑wise F1 for the final decision, encouraging both accurate reasoning and correct verdicts.
Task 3 (Generating Full‑Fact‑Checking Articles) is a new addition that asks systems to write a complete fact‑checking article, including inline citations, given a claim, its truth label, and a set of evidence documents. This task is English‑only. Training data come from the WatClaimCheck corpus (26 k examples) and the public ExClaim plus the private AmbiguousSnopes collections (1.2 k test examples). Evaluation uses three reference‑based metrics: entailment score (textual entailment with the gold article), citation correctness (whether cited text is supported by the evidence), and citation completeness (proportion of evidence correctly cited). A secondary, reference‑free metric employs LLM‑as‑judge Elo ratings to assess overall writing quality.
The paper situates these tasks within the broader landscape of fact‑checking shared tasks such as FEVER, FakeNews Challenge, and SemEval, highlighting the novelty of integrating source retrieval, reasoning‑trace ranking, and article generation in a multilingual setting. The authors conclude by outlining future work: extending language coverage to low‑resource languages, strengthening cross‑document and numerical reasoning, improving handling of implicit evidence, and aligning task design more closely with real‑world journalistic workflows.
Comments & Academic Discussion
Loading comments...
Leave a Comment