ELOQUENT 2024 — Robustness Task

Magnus Sahlgren, Jussi Jerker Karlgren, Luise Dürlich, Evangelia Gogoulou, Aarne Talman, Shorouq Zahra

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

ELOQUENT is a set of shared tasks for evaluating the quality and usefulness of generative language models. ELOQUENT aims to apply high-level quality criteria, grounded in experiences from deploying models in real-life tasks, and to formulate tests for those criteria, preferably implemented to require minimal human assessment effort and in a multilingual setting. One of the tasks for the first year of ELOQUENT was the robustness task, in which we assessed the robustness and consistency of a model output given variation in the input prompts. We found that indeed the consistency varied, both across prompt items and across models, and on a methodological note we find that using a oracle model for assessing the submitted responses is feasible, and intend to investigate consistency across such assessments for different oracle models. We intend to run this task in coming editions for ELOQUENT to establish a solid methodology for further assessing consistency, which we believe to be a crucial component of trustworthiness as a top level quality characteristic of generative language models.

Originalspråkengelska
Titel på värdpublikationWorking Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024)
RedaktörerGuglielmo Faggioli, Nicola Ferro, Petra Galuščáková, Alba García Seco de Herrera
Antal sidor5
UtgivningsortAachen
FörlagCEUR-WS.org
Utgivningsdatum2024
Sidor703-707
StatusPublicerad - 2024
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangConference and Labs of the Evaluation Forum - Grenoble, Frankrike
Varaktighet: 9 sep. 202412 sep. 2024
Konferensnummer: 15

Publikationsserier

NamnCEUR Workshop Proceedings
Förlag CEUR-WS.org
Volym3740
ISSN (tryckt)1613-0073

Bibliografisk information

Publisher Copyright:
© 2024 Copyright for this paper by its authors.

Vetenskapsgrenar

  • 6121 Språkvetenskaper
  • 113 Data- och informationsvetenskap

Citera det här