Sammanfattning
ELOQUENT is a set of shared tasks for evaluating the quality and usefulness of generative language models. ELOQUENT aims to apply high-level quality criteria, grounded in experiences from deploying models in real-life tasks, and to formulate tests for those criteria, preferably implemented to require minimal human assessment effort and in a multilingual setting. The tasks for the first year of ELOQUENT were (1) Topical quiz, in which language models are probed for topical competence; (2) HalluciGen, in which we assessed the ability of models to generate and detect hallucinations; (3) Robustness, in which we assessed the robustness and consistency of a model output given variation in the input prompts; and (4) Voight-Kampff, run in partnership with the PAN lab, with the aim of discovering whether it is possible to automatically distinguish human-generated text from machine-generated text. This first year of experimentation has shown—as expected—that using self-assessment with models judging models is feasible, but not entirely straight-forward, and that a a judicious comparison with human assessment and application context is necessary to be able to trust self-assessed quality judgments.
Originalspråk | engelska |
---|---|
Titel på värdpublikation | Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2024 |
Redaktörer | Lorraine Goeuriot, Philippe Mulhem, Georges Quénot, Didier Schwab, Giorgio Maria Di Nunzio, Laure Soulier, Petra Galuščáková, Alba García Seco de Herrera, Guglielmo Faggioli, Nicola Ferro |
Antal sidor | 20 |
Utgivningsort | Cham |
Förlag | Springer |
Utgivningsdatum | 19 sep. 2024 |
Sidor | 53-72 |
ISBN (tryckt) | 978-3-031-71907-3 |
ISBN (elektroniskt) | 978-3-031-71908-0 |
DOI | |
Status | Publicerad - 19 sep. 2024 |
MoE-publikationstyp | A4 Artikel i en konferenspublikation |
Evenemang | International Conference of the CLEF Association - Grenoble, Frankrike Varaktighet: 9 sep. 2024 → 12 sep. 2024 Konferensnummer: 15 |
Publikationsserier
Namn | Lecture Notes in Computer Science |
---|---|
Volym | 14959 |
ISSN (tryckt) | 0302-9743 |
ISSN (elektroniskt) | 1611-3349 |
Bibliografisk information
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
Vetenskapsgrenar
- 6121 Språkvetenskaper
- 113 Data- och informationsvetenskap