Grouping business news stories based on salience of named entities

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

In news aggregation systems focused on broad news domains, certain stories may appear in multiple articles. Depending on the relative importance of the story, the number of versions can reach dozens or hundreds within a day. The text in these versions may be nearly identical or quite different. Linking multiple versions of a story into a single group brings several important benefits to the end-user—reducing the cognitive load on the reader, as well as signaling the relative importance of the story. We present a grouping algorithm, and explore several vector-based representations of input documents: from a baseline using keywords, to a method using salience—a measure of importance of named entities in the text. We demonstrate that features beyond keywords yield substantial improvements, verified on a manually-annotated corpus of
business news stories.
Original languageEnglish
Title of host publication15th Conference of the European Chapter of the Association for Computational Linguistics : Proceedings of Conference, Volume 1: Long Papers
Number of pages11
Place of PublicationStroudsburg, PA
PublisherThe Association for Computational Linguistics
Publication date2017
Pages1096-1106
ISBN (Electronic)978-1-945626-34-0
DOIs
Publication statusPublished - 2017
MoE publication typeA4 Article in conference proceedings
EventConference of the European Chapter of the Association for Computational Linguistics - Valencia, Spain
Duration: 3 Apr 20177 Apr 2017
Conference number: 15

Fields of Science

  • 113 Computer and information sciences

Cite this