Analysis of Textual Variation by Latent Tree Structures

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

We introduce Semstem, a new method for the
reconstruction of so called stemmatic trees, i.e., trees encoding
the copying relationships among a set of textual variants.
Our method is based on a structural expectation-maximization
(structural EM) algorithm. It is the first computer-based
method able to estimate general latent tree structures, unlike
earlier methods that are usually restricted to bifurcating trees
where all the extant texts are placed in the leaf nodes. We
present experiments on two well known benchmark data
sets, showing that the new method outperforms current stateof-
the-art both in terms of a numerical score as well as
interpretability.
Original languageEnglish
Title of host publication2011 IEEE 11th International Conference on Data Mining (ICDM 2011)
EditorsDiane Cook, Jian Pei, Wei Wang, Osmar Zaïane, Xindong Wu
Number of pages10
PublisherIEEE Computer Society
Publication date11 Dec 2011
Pages567-576
ISBN (Print)9781457720758
DOIs
Publication statusPublished - 11 Dec 2011
MoE publication typeA4 Article in conference proceedings
EventUnknown host publication - , Canada
Duration: 1 Jan 1800 → …

Fields of Science

  • 113 Computer and information sciences

Cite this