Binary Tree based Chinese Word Segmentation

Reading time: 3 minute
...

📝 Original Info

  • Title: Binary Tree based Chinese Word Segmentation
  • ArXiv ID: 1305.3981
  • Date: 2013-05-20
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Chinese word segmentation is a fundamental task for Chinese language processing. The granularity mismatch problem is the main cause of the errors. This paper showed that the binary tree representation can store outputs with different granularity. A binary tree based framework is also designed to overcome the granularity mismatch problem. There are two steps in this framework, namely tree building and tree pruning. The tree pruning step is specially designed to focus on the granularity problem. Previous work for Chinese word segmentation such as the sequence tagging can be easily employed in this framework. This framework can also provide quantitative error analysis methods. The experiments showed that after using a more sophisticated tree pruning function for a state-of-the-art conditional random field based baseline, the error reduction can be up to 20%.

💡 Deep Analysis

Deep Dive into Binary Tree based Chinese Word Segmentation.

Chinese word segmentation is a fundamental task for Chinese language processing. The granularity mismatch problem is the main cause of the errors. This paper showed that the binary tree representation can store outputs with different granularity. A binary tree based framework is also designed to overcome the granularity mismatch problem. There are two steps in this framework, namely tree building and tree pruning. The tree pruning step is specially designed to focus on the granularity problem. Previous work for Chinese word segmentation such as the sequence tagging can be easily employed in this framework. This framework can also provide quantitative error analysis methods. The experiments showed that after using a more sophisticated tree pruning function for a state-of-the-art conditional random field based baseline, the error reduction can be up to 20%.

📄 Full Content

Chinese word segmentation is a fundamental task for Chinese language processing. The granularity mismatch problem is the main cause of the errors. This paper showed that the binary tree representation can store outputs with different granularity. A binary tree based framework is also designed to overcome the granularity mismatch problem. There are two steps in this framework, namely tree building and tree pruning. The tree pruning step is specially designed to focus on the granularity problem. Previous work for Chinese word segmentation such as the sequence tagging can be easily employed in this framework. This framework can also provide quantitative error analysis methods. The experiments showed that after using a more sophisticated tree pruning function for a state-of-the-art conditional random field based baseline, the error reduction can be up to 20%.

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut