Grounded and well-rounded: a methodological approach to the study of cross-modal and cross-lingual grounding

Timothee Mickus, Elaine Zosa, Denis Paperno

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Grounding has been argued to be a crucial component towards the development of more complete and truly semantically competent artificial intelligence systems. Literature has divided into two camps: While some argue that grounding allows for qualitatively different generalizations, others believe it can be compensated by mono-modal data quantity. Limited empirical evidence has emerged for or against either position, which we argue is due to the methodological challenges that come with studying grounding and its effects on NLP systems. In this paper, we establish a methodological framework for studying what the effects are---if any---of providing models with richer input sources than text-only. The crux of it lies in the construction of comparable samples of populations of models trained on different input modalities, so that we can tease apart the qualitative effects of different input sources from quantifiable model performances. Experiments using this framework reveal qualitative differences in model behavior between cross-modally grounded, cross-lingually grounded, and ungrounded models, which we measure both at a global dataset level as well as for specific word representations, depending on how concrete their semantics is.
Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics : EMNLP 2023
EditorsHouda Bouamor, Juan Pino, Kalika Bali
Number of pages12
Place of PublicationKerrville
PublisherThe Association for Computational Linguistics
Publication date1 Dec 2023
Pages11031-11042
ISBN (Electronic)979-8-89176-061-5
DOIs
Publication statusPublished - 1 Dec 2023
MoE publication typeA4 Article in conference proceedings
EventConference on Empirical Methods in Natural Language Processing - , Singapore
Duration: 6 Dec 202310 Dec 2023
https://2023.emnlp.org

Fields of Science

  • 6121 Languages
  • 113 Computer and information sciences

Cite this