It Is Not Easy To Detect Paraphrases: Analysing Semantic Similarity With Antonyms and Negation Using the New SemAntoNeg Benchmark

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

We investigate to what extent a hundred publicly available, popular neural language models capture meaning systematically. Sentence embeddings obtained from pretrained or fine-tuned language models can be used to perform particular tasks, such as paraphrase detection, semantic textual similarity assessment or natural language inference. Common to all of these tasks is that paraphrastic sentences, that is, sentences that carry (nearly) the same meaning, should have (nearly) the same embeddings regardless of surface form.We demonstrate that performance varies greatly across different language models when a specific type of meaning-preserving transformation is applied: two sentences should be identified as paraphrastic if one of them contains a negated antonym in relation to the other one, such as “I am not guilty” versus “I am innocent”.We introduce and release SemAntoNeg, a new test suite containing 3152 entries for probing paraphrasticity in sentences incorporating negation and antonyms. Among other things, we show that language models fine-tuned for natural language inference outperform other types of models, especially the ones fine-tuned to produce general-purpose sentence embeddings, on the test suite. Furthermore, we show that most models designed explicitly for paraphrasing are rather mediocre in our task.
Original languageEnglish
Title of host publicationProceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
EditorsJasmijn Bastings, Yonatan Belinkov, Yanai Elazar, Dieuwke Hupkes, Naomi Saphra, Sarah Wiegreffe
Number of pages14
Place of PublicationStroudsburg
PublisherThe Association for Computational Linguistics
Publication date8 Dec 2022
Pages249–262
ISBN (Electronic)978-1-959429-05-0
Publication statusPublished - 8 Dec 2022
MoE publication typeA4 Article in conference proceedings
EventBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP - [Hybrid event], Abu Dhabi, United Arab Emirates
Duration: 7 Dec 20227 Dec 2022
Conference number: 5

Fields of Science

  • 6121 Languages
  • 113 Computer and information sciences

Cite this