Supervised Classification Using Balanced Training

Mian Du, Matthew Pierce, Lidia Pivovarova, Roman Yangarber

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Kuvaus

We examine supervised learning for multi-class, multi-label text
classification. We are interested in exploring classification in a
real-world setting, where the distribution of labels may change
dynamically over time. First, we compare the performance of an array of
binary classifiers trained on the label distribution found in the
original corpus against classifiers trained on balanced data, where
we try to make the label distribution as nearly uniform as possible. We
discuss the performance trade-offs between balanced vs. unbalanced
training, and highlight the advantages of balancing the training set.
Second, we compare the performance of two classifiers, Naive Bayes and
SVM, with several feature-selection methods, using balanced training. We
combine a Named-Entity-based rote classifier with the statistical
classifiers to obtain better performance than either method alone.
Alkuperäiskielienglanti
OtsikkoUnknown host publication
Sivumäärä12
KustantajaSpringer-Verlag
Julkaisupäivälokakuuta 2014
TilaJulkaistu - lokakuuta 2014
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaInternational Conference on Statistical Language and Speech Processing (SLSP 2014) - Grenoble, Ranska
Kesto: 14 lokakuuta 201416 lokakuuta 2014
Konferenssinumero: 2

Julkaisusarja

NimiLecture notes in artificial intelligence
Numero8791

Lisätietoja


Volume:
Proceeding volume:

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Lainaa tätä

Du, M., Pierce, M., Pivovarova, L., & Yangarber, R. (2014). Supervised Classification Using Balanced Training. teoksessa Unknown host publication (Lecture notes in artificial intelligence; Nro 8791). Springer-Verlag.
Du, Mian ; Pierce, Matthew ; Pivovarova, Lidia ; Yangarber, Roman. / Supervised Classification Using Balanced Training. Unknown host publication. Springer-Verlag, 2014. (Lecture notes in artificial intelligence; 8791).
@inproceedings{b257c30a5b0c40599c2648949710a596,
title = "Supervised Classification Using Balanced Training",
abstract = "We examine supervised learning for multi-class, multi-label text classification. We are interested in exploring classification in a real-world setting, where the distribution of labels may change dynamically over time. First, we compare the performance of an array of binary classifiers trained on the label distribution found in the original corpus against classifiers trained on balanced data, where we try to make the label distribution as nearly uniform as possible. We discuss the performance trade-offs between balanced vs. unbalanced training, and highlight the advantages of balancing the training set. Second, we compare the performance of two classifiers, Naive Bayes and SVM, with several feature-selection methods, using balanced training. We combine a Named-Entity-based rote classifier with the statistical classifiers to obtain better performance than either method alone.",
keywords = "113 Computer and information sciences",
author = "Mian Du and Matthew Pierce and Lidia Pivovarova and Roman Yangarber",
note = "Volume: Proceeding volume:",
year = "2014",
month = "10",
language = "English",
series = "Lecture notes in artificial intelligence",
publisher = "Springer-Verlag",
number = "8791",
booktitle = "Unknown host publication",
address = "Germany",

}

Du, M, Pierce, M, Pivovarova, L & Yangarber, R 2014, Supervised Classification Using Balanced Training. julkaisussa Unknown host publication. Lecture notes in artificial intelligence, Nro 8791, Springer-Verlag, International Conference on Statistical Language and Speech Processing (SLSP 2014), Grenoble, Ranska, 14/10/2014.

Supervised Classification Using Balanced Training. / Du, Mian; Pierce, Matthew; Pivovarova, Lidia; Yangarber, Roman.

Unknown host publication. Springer-Verlag, 2014. (Lecture notes in artificial intelligence; Nro 8791).

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

TY - GEN

T1 - Supervised Classification Using Balanced Training

AU - Du, Mian

AU - Pierce, Matthew

AU - Pivovarova, Lidia

AU - Yangarber, Roman

N1 - Volume: Proceeding volume:

PY - 2014/10

Y1 - 2014/10

N2 - We examine supervised learning for multi-class, multi-label text classification. We are interested in exploring classification in a real-world setting, where the distribution of labels may change dynamically over time. First, we compare the performance of an array of binary classifiers trained on the label distribution found in the original corpus against classifiers trained on balanced data, where we try to make the label distribution as nearly uniform as possible. We discuss the performance trade-offs between balanced vs. unbalanced training, and highlight the advantages of balancing the training set. Second, we compare the performance of two classifiers, Naive Bayes and SVM, with several feature-selection methods, using balanced training. We combine a Named-Entity-based rote classifier with the statistical classifiers to obtain better performance than either method alone.

AB - We examine supervised learning for multi-class, multi-label text classification. We are interested in exploring classification in a real-world setting, where the distribution of labels may change dynamically over time. First, we compare the performance of an array of binary classifiers trained on the label distribution found in the original corpus against classifiers trained on balanced data, where we try to make the label distribution as nearly uniform as possible. We discuss the performance trade-offs between balanced vs. unbalanced training, and highlight the advantages of balancing the training set. Second, we compare the performance of two classifiers, Naive Bayes and SVM, with several feature-selection methods, using balanced training. We combine a Named-Entity-based rote classifier with the statistical classifiers to obtain better performance than either method alone.

KW - 113 Computer and information sciences

M3 - Conference contribution

T3 - Lecture notes in artificial intelligence

BT - Unknown host publication

PB - Springer-Verlag

ER -

Du M, Pierce M, Pivovarova L, Yangarber R. Supervised Classification Using Balanced Training. julkaisussa Unknown host publication. Springer-Verlag. 2014. (Lecture notes in artificial intelligence; 8791).