Abstrakti
Automatic readability assessment is considered as a challenging task in NLP due to its high degree
of subjectivity. The majority prior work in assessing readability has focused on identifying the
level of education necessary for comprehension without the consideration of text quality, i.e., how
naturally the text flows from the perspective of a native speaker. Therefore, in this thesis, we aim
to use language models, trained on well-written prose, to measure not only text readability in terms
of comprehension but text quality.
In this thesis, we developed two word-level metrics based on the concordance of article text with
predictions made using language models to assess text readability and quality. We evaluate both
metrics on a set of corpora used for readability assessment or automated essay scoring (AES) by
measuring the correlation between scores assigned by our metrics and human raters. According
to the experimental results, our metrics are strongly correlated with text quality, which achieve
0.4-0.6 correlations on 7 out of 9 datasets. We demonstrate that GPT-2 surpasses other language
models, including the bigram model, LSTM, and bidirectional LSTM, on the task of estimating
text quality in a zero-shot setting, and GPT-2 perplexity-based measure is a reasonable indicator
for text quality evaluation.
of subjectivity. The majority prior work in assessing readability has focused on identifying the
level of education necessary for comprehension without the consideration of text quality, i.e., how
naturally the text flows from the perspective of a native speaker. Therefore, in this thesis, we aim
to use language models, trained on well-written prose, to measure not only text readability in terms
of comprehension but text quality.
In this thesis, we developed two word-level metrics based on the concordance of article text with
predictions made using language models to assess text readability and quality. We evaluate both
metrics on a set of corpora used for readability assessment or automated essay scoring (AES) by
measuring the correlation between scores assigned by our metrics and human raters. According
to the experimental results, our metrics are strongly correlated with text quality, which achieve
0.4-0.6 correlations on 7 out of 9 datasets. We demonstrate that GPT-2 surpasses other language
models, including the bigram model, LSTM, and bidirectional LSTM, on the task of estimating
text quality in a zero-shot setting, and GPT-2 perplexity-based measure is a reasonable indicator
for text quality evaluation.
Alkuperäiskieli | englanti |
---|---|
Valvoja/neuvonantaja |
|
Kustantaja | |
Tila | Julkaistu - 12 helmik. 2020 |
OKM-julkaisutyyppi | G2 Pro gradu, diplomityö, ylempi amk-opinnäytetyö |
Tieteenalat
- 113 Tietojenkäsittely- ja informaatiotieteet