Analysis of Textual Variation by Latent Tree Structures

Forskningsoutput: Kapitel i bok/rapport/konferenshandlingKonferensbidragVetenskapligPeer review

Sammanfattning

We introduce Semstem, a new method for the
reconstruction of so called stemmatic trees, i.e., trees encoding
the copying relationships among a set of textual variants.
Our method is based on a structural expectation-maximization
(structural EM) algorithm. It is the first computer-based
method able to estimate general latent tree structures, unlike
earlier methods that are usually restricted to bifurcating trees
where all the extant texts are placed in the leaf nodes. We
present experiments on two well known benchmark data
sets, showing that the new method outperforms current stateof-
the-art both in terms of a numerical score as well as
interpretability.
Originalspråkengelska
Titel på gästpublikation2011 IEEE 11th International Conference on Data Mining (ICDM 2011)
RedaktörerDiane Cook, Jian Pei, Wei Wang, Osmar Zaïane, Xindong Wu
Antal sidor10
FörlagIEEE Computer Society
Utgivningsdatum11 dec 2011
Sidor567-576
ISBN (tryckt)9781457720758
DOI
StatusPublicerad - 11 dec 2011
MoE-publikationstypA4 Artikel i en konferenspublikation
EvenemangUnknown host publication - , Kanada
Varaktighet: 1 jan 1800 → …

Vetenskapsgrenar

  • 113 Data- och informationsvetenskap

Citera det här