Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF

Yan Shao, Christian Hardmeier, Jörg Tiedemann, Joakim Nivre

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

We present a character-based model for joint segmentation and POS tagging for Chinese. The bidirectional RNN-CRF architecture for general sequence tagging is adapted and applied with novel vector representations of Chinese characters that capture rich contextual information and lower-than-character level features. The proposed model is extensively evaluated and compared with a state-of-the-art tagger respectively on CTB5, CTB9 and UD Chinese. The experimental results indicate that our model is accurate and robust across datasets in different sizes, genres and annotation schemes. We obtain state-of-the-art performance on CTB5, achieving 94.38 F1-score for joint segmentation and POS tagging.
Original languageEnglish
Title of host publicationThe Eighth International Joint Conference on Natural Language Processing : Proceedings of the Conference, Vol. 1 (Long Papers)
Number of pages11
Place of PublicationTaipei
PublisherAsian Federation of Natural Language Processing
Publication date1 Nov 2017
Pages173-183
ISBN (Electronic)978-1-948087-00-1
Publication statusPublished - 1 Nov 2017
MoE publication typeA4 Article in conference proceedings
EventInternational Joint Conference on Natural Language Processing - Taipei, Taiwan
Duration: 27 Nov 20171 Dec 2017
Conference number: 8
http://ijcnlp2017.org/site/page.aspx?pid=901&sid=1133&lang=en

Fields of Science

  • 6121 Languages
  • language technology
  • computational linguistics
  • 113 Computer and information sciences

Cite this

Shao, Y., Hardmeier, C., Tiedemann, J., & Nivre, J. (2017). Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF. In The Eighth International Joint Conference on Natural Language Processing: Proceedings of the Conference, Vol. 1 (Long Papers) (pp. 173-183). Taipei: Asian Federation of Natural Language Processing.
Shao, Yan ; Hardmeier, Christian ; Tiedemann, Jörg ; Nivre, Joakim. / Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF. The Eighth International Joint Conference on Natural Language Processing: Proceedings of the Conference, Vol. 1 (Long Papers). Taipei : Asian Federation of Natural Language Processing, 2017. pp. 173-183
@inproceedings{5e75d7bf1a9a4957a2c012a3f8011869,
title = "Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF",
abstract = "We present a character-based model for joint segmentation and POS tagging for Chinese. The bidirectional RNN-CRF architecture for general sequence tagging is adapted and applied with novel vector representations of Chinese characters that capture rich contextual information and lower-than-character level features. The proposed model is extensively evaluated and compared with a state-of-the-art tagger respectively on CTB5, CTB9 and UD Chinese. The experimental results indicate that our model is accurate and robust across datasets in different sizes, genres and annotation schemes. We obtain state-of-the-art performance on CTB5, achieving 94.38 F1-score for joint segmentation and POS tagging.",
keywords = "6121 Languages, language technology, computational linguistics, 113 Computer and information sciences",
author = "Yan Shao and Christian Hardmeier and J{\"o}rg Tiedemann and Joakim Nivre",
year = "2017",
month = "11",
day = "1",
language = "English",
pages = "173--183",
booktitle = "The Eighth International Joint Conference on Natural Language Processing",
publisher = "Asian Federation of Natural Language Processing",
address = "Taiwan",

}

Shao, Y, Hardmeier, C, Tiedemann, J & Nivre, J 2017, Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF. in The Eighth International Joint Conference on Natural Language Processing: Proceedings of the Conference, Vol. 1 (Long Papers). Asian Federation of Natural Language Processing, Taipei, pp. 173-183, International Joint Conference on Natural Language Processing, Taipei, Taiwan, 27/11/2017.

Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF. / Shao, Yan; Hardmeier, Christian; Tiedemann, Jörg; Nivre, Joakim.

The Eighth International Joint Conference on Natural Language Processing: Proceedings of the Conference, Vol. 1 (Long Papers). Taipei : Asian Federation of Natural Language Processing, 2017. p. 173-183.

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

TY - GEN

T1 - Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF

AU - Shao, Yan

AU - Hardmeier, Christian

AU - Tiedemann, Jörg

AU - Nivre, Joakim

PY - 2017/11/1

Y1 - 2017/11/1

N2 - We present a character-based model for joint segmentation and POS tagging for Chinese. The bidirectional RNN-CRF architecture for general sequence tagging is adapted and applied with novel vector representations of Chinese characters that capture rich contextual information and lower-than-character level features. The proposed model is extensively evaluated and compared with a state-of-the-art tagger respectively on CTB5, CTB9 and UD Chinese. The experimental results indicate that our model is accurate and robust across datasets in different sizes, genres and annotation schemes. We obtain state-of-the-art performance on CTB5, achieving 94.38 F1-score for joint segmentation and POS tagging.

AB - We present a character-based model for joint segmentation and POS tagging for Chinese. The bidirectional RNN-CRF architecture for general sequence tagging is adapted and applied with novel vector representations of Chinese characters that capture rich contextual information and lower-than-character level features. The proposed model is extensively evaluated and compared with a state-of-the-art tagger respectively on CTB5, CTB9 and UD Chinese. The experimental results indicate that our model is accurate and robust across datasets in different sizes, genres and annotation schemes. We obtain state-of-the-art performance on CTB5, achieving 94.38 F1-score for joint segmentation and POS tagging.

KW - 6121 Languages

KW - language technology

KW - computational linguistics

KW - 113 Computer and information sciences

M3 - Conference contribution

SP - 173

EP - 183

BT - The Eighth International Joint Conference on Natural Language Processing

PB - Asian Federation of Natural Language Processing

CY - Taipei

ER -

Shao Y, Hardmeier C, Tiedemann J, Nivre J. Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF. In The Eighth International Joint Conference on Natural Language Processing: Proceedings of the Conference, Vol. 1 (Long Papers). Taipei: Asian Federation of Natural Language Processing. 2017. p. 173-183