Projekteja vuodessa
Abstrakti
We examine supervised learning for multi-class, multi-label text
classification. We are interested in exploring classification in a
real-world setting, where the distribution of labels may change
dynamically over time. First, we compare the performance of an array of
binary classifiers trained on the label distribution found in the
original corpus against classifiers trained on balanced data, where
we try to make the label distribution as nearly uniform as possible. We
discuss the performance trade-offs between balanced vs. unbalanced
training, and highlight the advantages of balancing the training set.
Second, we compare the performance of two classifiers, Naive Bayes and
SVM, with several feature-selection methods, using balanced training. We
combine a Named-Entity-based rote classifier with the statistical
classifiers to obtain better performance than either method alone.
classification. We are interested in exploring classification in a
real-world setting, where the distribution of labels may change
dynamically over time. First, we compare the performance of an array of
binary classifiers trained on the label distribution found in the
original corpus against classifiers trained on balanced data, where
we try to make the label distribution as nearly uniform as possible. We
discuss the performance trade-offs between balanced vs. unbalanced
training, and highlight the advantages of balancing the training set.
Second, we compare the performance of two classifiers, Naive Bayes and
SVM, with several feature-selection methods, using balanced training. We
combine a Named-Entity-based rote classifier with the statistical
classifiers to obtain better performance than either method alone.
Alkuperäiskieli | englanti |
---|---|
Otsikko | Unknown host publication |
Sivumäärä | 12 |
Kustantaja | Springer-Verlag |
Julkaisupäivä | lokak. 2014 |
Tila | Julkaistu - lokak. 2014 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisuussa |
Tapahtuma | International Conference on Statistical Language and Speech Processing (SLSP 2014) - Grenoble, Ranska Kesto: 14 lokak. 2014 → 16 lokak. 2014 Konferenssinumero: 2 |
Julkaisusarja
Nimi | Lecture notes in artificial intelligence |
---|---|
Numero | 8791 |
Tieteenalat
- 113 Tietojenkäsittely- ja informaatiotieteet
-
PULS
Yangarber, R. (Projektinjohtaja), Du, M. (Osallistuja), Pivovarova, L. (Osallistuja), Pierce, M. (Osallistuja), von Etter, P. (Osallistuja) & Huttunen, S. (Osallistuja)
01/12/2007 → …
Projekti: Tutkimusprojekti
-
LLL: Language Learning Lab
Yangarber, R. (Projektinjohtaja), Katinskaia, A. (Osallistuja), Hou, J. (Osallistuja), Furlan, G. (Osallistuja) & Kylliäinen, I. P. (Osallistuja)
Projekti: Tutkimusprojekti