Conference on Empirical Methods in Natural Language Processing / 2019

Attention Is All You Need for Chinese Word Segmentation

Sufeng Duan, Hai Zhao

ML SystemsPopular and Landmark Papers

Taking greedy decoding algorithm as it should be, this work focuses on further strengthening the model itself for Chinese word segmentation (CWS), which results in an even more fast and more accurate CWS model. Our model consists of an attention only stacked encoder and a light enough decoder for the greedy segmentation plus two highway connections for smoother training, in which the encoder is composed of a newly proposed Transformer variant, Gaussian-masked Directional (GD) Transformer, and a biaffine attention scorer. With the effective encoder design, our model only needs to take unigram features for scoring. Our model is evaluated on SIGHAN Bakeoff benchmark datasets. The experimental results show that with the highest segmentation speed, the proposed model achieves new state-of-the-art or comparable performance against strong baselines in terms of strict closed test setting.

40 citations2 influential

Full paper

Read the original paper

Open PDF Source page

Learning resources

arXiv PDFPDF arXiv abstract pagearXiv Google Scholar referencesGoogle Scholar Papers with Code searchPapers with Code Semantic Scholar paper pageSemantic Scholar YouTube explanationsYouTube

Reading state

Discuss in ChatGPT

Uses your own ChatGPT account. The paper context is copied into a tutor prompt before ChatGPT opens.

Preview prompt

You are my AI/ML research paper instructor. I want to deeply understand the paper below.

First, teach it in layers:
1. One-paragraph intuition.
2. Problem statement and why it mattered.
3. Key method, architecture, or algorithm.
4. Important equations or mechanisms, explained intuitively.
5. Experiments and evidence.
6. Limitations, assumptions, and failure modes.
7. How this paper influenced later AI/ML/Deep Learning/GenAI work.
8. A 30-minute study plan with checkpoints.
9. Quiz me with 5 questions and wait for my answers.

When something is not available in the attached context, say what is missing and infer carefully.

### Paper attached as context
Title: Attention Is All You Need for Chinese Word Segmentation
Authors: Sufeng Duan, Hai Zhao
Year: 2019
Venue: Conference on Empirical Methods in Natural Language Processing
Categories: ML Systems, Popular and Landmark Papers
Citations: 40
Paper URL: https://arxiv.org/abs/1910.14537v3
Open PDF: https://arxiv.org/pdf/1910.14537v3

Abstract:
Taking greedy decoding algorithm as it should be, this work focuses on further strengthening the model itself for Chinese word segmentation (CWS), which results in an even more fast and more accurate CWS model. Our model consists of an attention only stacked encoder and a light enough decoder for the greedy segmentation plus two highway connections for smoother training, in which the encoder is composed of a newly proposed Transformer variant, Gaussian-masked Directional (GD) Transformer, and a biaffine attention scorer. With the effective encoder design, our model only needs to take unigram features for scoring. Our model is evaluated on SIGHAN Bakeoff benchmark datasets. The experimental results show that with the highest segmentation speed, the proposed model achieves new state-of-the-art or comparable performance against strong baselines in terms of strict closed test setting.

Learning resources:
- PDF: arXiv PDF (https://arxiv.org/pdf/1910.14537v3)
- arXiv: arXiv abstract page (https://arxiv.org/abs/1910.14537v3)
- Google Scholar: Google Scholar references (https://scholar.google.com/scholar?q=Attention%20Is%20All%20You%20Need%20for%20Chinese%20Word%20Segmentation)
- Papers with Code: Papers with Code search (https://paperswithcode.com/search?q=Attention%20Is%20All%20You%20Need%20for%20Chinese%20Word%20Segmentation)
- Semantic Scholar: Semantic Scholar paper page (https://www.semanticscholar.org/paper/d4f68b2c033a79fc02f30d8cffb6cbc532cdbd51)
- YouTube: YouTube explanations (https://www.youtube.com/results?search_query=Attention%20Is%20All%20You%20Need%20for%20Chinese%20Word%20Segmentation+paper+explained)