Grouping business news stories based on salience of named entities

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Kuvaus

In news aggregation systems focused on broad news domains, certain stories may appear in multiple articles. Depending on the relative importance of the story, the number of versions can reach dozens or hundreds within a day. The text in these versions may be nearly identical or quite different. Linking multiple versions of a story into a single group brings several important benefits to the end-user—reducing the cognitive load on the reader, as well as signaling the relative importance of the story. We present a grouping algorithm, and explore several vector-based representations of input documents: from a baseline using keywords, to a method using salience—a measure of importance of named entities in the text. We demonstrate that features beyond keywords yield substantial improvements, verified on a manually-annotated corpus of
business news stories.
Alkuperäiskielienglanti
Otsikko15th Conference of the European Chapter of the Association for Computational Linguistics : Proceedings of Conference, Volume 1: Long Papers
Sivumäärä11
JulkaisupaikkaStroudsburg, PA
KustantajaAssociation for Computational Linguistics
Julkaisupäivä2017
Sivut1096-1106
ISBN (elektroninen)978-1-945626-34-0
DOI - pysyväislinkit
TilaJulkaistu - 2017
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaConference of the European Chapter of the Association for Computational Linguistics - Valencia, Espanja
Kesto: 3 huhtikuuta 20177 huhtikuuta 2017
Konferenssinumero: 15

Lisätietoja


Volume:
Proceeding volume:

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Lainaa tätä

Escoter, L., Pivovarova, L., Du, M., Katinskaia, A., & Yangarber, R. (2017). Grouping business news stories based on salience of named entities. teoksessa 15th Conference of the European Chapter of the Association for Computational Linguistics: Proceedings of Conference, Volume 1: Long Papers (Sivut 1096-1106). Stroudsburg, PA: Association for Computational Linguistics. https://doi.org/10.18653/v1/e17-1103
Escoter, Llorenc ; Pivovarova, Lidia ; Du, Mian ; Katinskaia, Anisia ; Yangarber, Roman. / Grouping business news stories based on salience of named entities. 15th Conference of the European Chapter of the Association for Computational Linguistics: Proceedings of Conference, Volume 1: Long Papers. Stroudsburg, PA : Association for Computational Linguistics, 2017. Sivut 1096-1106
@inproceedings{5a461571388346838dd4c33028b0d79f,
title = "Grouping business news stories based on salience of named entities",
abstract = "In news aggregation systems focused on broad news domains, certain stories may appear in multiple articles. Depending on the relative importance of the story, the number of versions can reach dozens or hundreds within a day. The text in these versions may be nearly identical or quite different. Linking multiple versions of a story into a single group brings several important benefits to the end-user—reducing the cognitive load on the reader, as well as signaling the relative importance of the story. We present a grouping algorithm, and explore several vector-based representations of input documents: from a baseline using keywords, to a method using salience—a measure of importance of named entities in the text. We demonstrate that features beyond keywords yield substantial improvements, verified on a manually-annotated corpus ofbusiness news stories.",
keywords = "113 Computer and information sciences",
author = "Llorenc Escoter and Lidia Pivovarova and Mian Du and Anisia Katinskaia and Roman Yangarber",
note = "Volume: Proceeding volume:",
year = "2017",
doi = "10.18653/v1/e17-1103",
language = "English",
pages = "1096--1106",
booktitle = "15th Conference of the European Chapter of the Association for Computational Linguistics",
publisher = "Association for Computational Linguistics",
address = "International",

}

Escoter, L, Pivovarova, L, Du, M, Katinskaia, A & Yangarber, R 2017, Grouping business news stories based on salience of named entities. julkaisussa 15th Conference of the European Chapter of the Association for Computational Linguistics: Proceedings of Conference, Volume 1: Long Papers. Association for Computational Linguistics, Stroudsburg, PA, Sivut 1096-1106, Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Espanja, 03/04/2017. https://doi.org/10.18653/v1/e17-1103

Grouping business news stories based on salience of named entities. / Escoter, Llorenc; Pivovarova, Lidia; Du, Mian; Katinskaia, Anisia; Yangarber, Roman.

15th Conference of the European Chapter of the Association for Computational Linguistics: Proceedings of Conference, Volume 1: Long Papers. Stroudsburg, PA : Association for Computational Linguistics, 2017. s. 1096-1106.

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

TY - GEN

T1 - Grouping business news stories based on salience of named entities

AU - Escoter, Llorenc

AU - Pivovarova, Lidia

AU - Du, Mian

AU - Katinskaia, Anisia

AU - Yangarber, Roman

N1 - Volume: Proceeding volume:

PY - 2017

Y1 - 2017

N2 - In news aggregation systems focused on broad news domains, certain stories may appear in multiple articles. Depending on the relative importance of the story, the number of versions can reach dozens or hundreds within a day. The text in these versions may be nearly identical or quite different. Linking multiple versions of a story into a single group brings several important benefits to the end-user—reducing the cognitive load on the reader, as well as signaling the relative importance of the story. We present a grouping algorithm, and explore several vector-based representations of input documents: from a baseline using keywords, to a method using salience—a measure of importance of named entities in the text. We demonstrate that features beyond keywords yield substantial improvements, verified on a manually-annotated corpus ofbusiness news stories.

AB - In news aggregation systems focused on broad news domains, certain stories may appear in multiple articles. Depending on the relative importance of the story, the number of versions can reach dozens or hundreds within a day. The text in these versions may be nearly identical or quite different. Linking multiple versions of a story into a single group brings several important benefits to the end-user—reducing the cognitive load on the reader, as well as signaling the relative importance of the story. We present a grouping algorithm, and explore several vector-based representations of input documents: from a baseline using keywords, to a method using salience—a measure of importance of named entities in the text. We demonstrate that features beyond keywords yield substantial improvements, verified on a manually-annotated corpus ofbusiness news stories.

KW - 113 Computer and information sciences

UR - http://eacl2017.org

U2 - 10.18653/v1/e17-1103

DO - 10.18653/v1/e17-1103

M3 - Conference contribution

SP - 1096

EP - 1106

BT - 15th Conference of the European Chapter of the Association for Computational Linguistics

PB - Association for Computational Linguistics

CY - Stroudsburg, PA

ER -

Escoter L, Pivovarova L, Du M, Katinskaia A, Yangarber R. Grouping business news stories based on salience of named entities. julkaisussa 15th Conference of the European Chapter of the Association for Computational Linguistics: Proceedings of Conference, Volume 1: Long Papers. Stroudsburg, PA: Association for Computational Linguistics. 2017. s. 1096-1106 https://doi.org/10.18653/v1/e17-1103