Abstract
ELOQUENT is a set of shared tasks for evaluating the quality and usefulness of generative language models. ELOQUENT aims to apply high-level quality criteria, grounded in experiences from deploying models in real-life tasks, and to formulate tests for those criteria, preferably implemented to require minimal human assessment effort and in a multilingual setting. The tasks for the first year of ELOQUENT were (1) Topical quiz, in which language models are probed for topical competence; (2) HalluciGen, in which we assessed the ability of models to generate and detect hallucinations; (3) Robustness, in which we assessed the robustness and consistency of a model output given variation in the input prompts; and (4) Voight-Kampff, run in partnership with the PAN lab, with the aim of discovering whether it is possible to automatically distinguish human-generated text from machine-generated text. This first year of experimentation has shown—as expected—that using self-assessment with models judging models is feasible, but not entirely straight-forward, and that a a judicious comparison with human assessment and application context is necessary to be able to trust self-assessed quality judgments.
Original language | English |
---|---|
Title of host publication | Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2024 |
Editors | Lorraine Goeuriot, Philippe Mulhem, Georges Quénot, Didier Schwab, Giorgio Maria Di Nunzio, Laure Soulier, Petra Galuščáková, Alba García Seco de Herrera, Guglielmo Faggioli, Nicola Ferro |
Number of pages | 20 |
Place of Publication | Cham |
Publisher | Springer |
Publication date | 19 Sept 2024 |
Pages | 53-72 |
ISBN (Print) | 978-3-031-71907-3 |
ISBN (Electronic) | 978-3-031-71908-0 |
DOIs | |
Publication status | Published - 19 Sept 2024 |
MoE publication type | A4 Article in conference proceedings |
Event | International Conference of the CLEF Association - Grenoble, France Duration: 9 Sept 2024 → 12 Sept 2024 Conference number: 15 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Volume | 14959 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Fields of Science
- 6121 Languages
- 113 Computer and information sciences
- Evaluation
- Generative language models
- LLM
- Self-assessed quality
- Shared task