Overview of ELOQUENT 2024—Shared Tasks for Evaluating Generative Language Model Quality

Jussi Karlgren, Luise Dürlich, Evangelia Gogoulou, Liane Guillou, Joakim Nivre, Magnus Sahlgren, Aarne Talman, Shorouq Zahra

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

ELOQUENT is a set of shared tasks for evaluating the quality and usefulness of generative language models. ELOQUENT aims to apply high-level quality criteria, grounded in experiences from deploying models in real-life tasks, and to formulate tests for those criteria, preferably implemented to require minimal human assessment effort and in a multilingual setting. The tasks for the first year of ELOQUENT were (1) Topical quiz, in which language models are probed for topical competence; (2) HalluciGen, in which we assessed the ability of models to generate and detect hallucinations; (3) Robustness, in which we assessed the robustness and consistency of a model output given variation in the input prompts; and (4) Voight-Kampff, run in partnership with the PAN lab, with the aim of discovering whether it is possible to automatically distinguish human-generated text from machine-generated text. This first year of experimentation has shown—as expected—that using self-assessment with models judging models is feasible, but not entirely straight-forward, and that a a judicious comparison with human assessment and application context is necessary to be able to trust self-assessed quality judgments.

Originalspråkengelska
Titel på värdpublikationExperimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2024
RedaktörerLorraine Goeuriot, Philippe Mulhem, Georges Quénot, Didier Schwab, Giorgio Maria Di Nunzio, Laure Soulier, Petra Galuščáková, Alba García Seco de Herrera, Guglielmo Faggioli, Nicola Ferro
Antal sidor20
UtgivningsortCham
FörlagSpringer
Utgivningsdatum19 sep. 2024
Sidor53-72
ISBN (tryckt)978-3-031-71907-3
ISBN (elektroniskt)978-3-031-71908-0
DOI
StatusPublicerad - 19 sep. 2024
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangInternational Conference of the CLEF Association - Grenoble, Frankrike
Varaktighet: 9 sep. 202412 sep. 2024
Konferensnummer: 15

Publikationsserier

NamnLecture Notes in Computer Science
Volym14959
ISSN (tryckt)0302-9743
ISSN (elektroniskt)1611-3349

Bibliografisk information

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Vetenskapsgrenar

  • 6121 Språkvetenskaper
  • 113 Data- och informationsvetenskap

Citera det här