This paper presents a method of designing specific high-order dependency factor on the linear chain conditional random fields (CRFs) for named entity recognition (NER). Named entities tend to be separated from each other by multiple outside tokens in a text, and thus the first-order CRF, as well as the second-order CRF, may innately lose transition information between distant named entities. The proposed design uses outside label in NER as a transmission medium of precedent entity information on the CRF. Then, empirical results apparently demonstrate that it is possible to exploit long-distance label dependency in the original first-order linear chain CRF structure upon NER while reducing computational loss rather than in the second-order CRF.
The concept of conditional random fields (CRFs) (John Lafferty, Andrew McCallum, & Fernando Pereira, 2001) has been successfully adapted in many sequence labeling problems (Andrew McCallum & Wei Li, 2003;Fei Sha & Fernando Pereira, 2003;John Lafferty et al., 2001;McDonald & Pereira, 2005). Even in deep-learning architecture, CRF has been used as a fundamental element in named entity recognition (Lample, Ballesteros, Subramanian, Kawakami, & Dyer, 2016;Liu, Tang, Wang, & Chen, 2017).
One of the primary advantages of applying the CRF to language processing is that it learns transition factors between hidden variables corresponding to the label of single word. The fundamental assumption of the model is that the current hidden state is conditioned on present observation as well as the previous state. For example, a part-ofspeech (POS) tag depends on the word itself, as well as the POS tag transitions from the previous word. In the problem, the POS tags are adjacent to each other in a text forming a tag sequence; therefore, the sequence labeling model can fully capture dependencies between labels.
In contrast, a CRF in named entity recognition (NER) cannot fully capture dependencies between named entity (NE) labels. According to Ratinov & Roth (2009), named entities in a text are separated by successive “outside tokens” (i.e., words that are non-named entities syntactically linking two NEs) and considerable number of NEs have a tendency to exist at a distance from each other. Therefore, high-order interdependencies of named entities between successive outside tokens are not captured by first-order or second-order transition factors.
One major issue in previous studies was concerned with the way in which to explore long-distance dependencies in NER. Only dependencies between neighbor labels are generally used in practice because conventional high-order CRFs are known to be intractable in NER (Ye, Lee, Chieu, & Wu, 2009). Previous studies have demonstrated that implementation of the higherorder CRF exploiting pre-defined label patterns leads to slight performance improvement in the conventional CRF in NER (Cuong, Ye, Lee, & Chieu, 2014;Fersini, Messina, Felici, & Roth, 2014;Sarawagi & Cohen, 2005;Ye et al., 2009). However, there are certain drawbacks associated with handling named entity transitions within arbitrary length outside tokens.
In an attempt to utilize long-distance transition information of NEs through non-named entity to-kens, this study explores the method which modifies the first-order linear-chain CRF by using the induction method.
Prior to introducing the new model formulation, the following information presents the general concept of CRF. As a sequence labeling model, the conventional CRF models the conditional distribution 𝑃(𝒚|𝒙) in which x is the input (e.g., token, word) sequence and y is the label sequence of x. A hidden state value set consists of target entity labels and a single outside label. By way of illustration, presume a set {𝐴, 𝐵, 𝑂} as the hidden state value set; assign 𝐴 or 𝐵 to NEs, likewise, assign 𝑂 to outside words. From the hidden state set, a label sequence is formed in a linear chain in NER; for example, a sequence 〈𝐴, 𝑂, ⋯ 𝑂, 𝐵〉 in which successive outside words are between the two NE words. Because the first-order model assumes that state transition dependencies exist only between proximate two labels to prevent an increase in computational complexity, the first-order CRF learns bigram label transitions from the subsequence;
{(𝐴, 𝑂), (𝑂, 𝑂), (𝑂, 𝐵)} that is, label transition data learnt from the example sequence. In the example, dependency (𝐴 , 𝐵) is not captured in the model.
The main purpose of the precursor-induced CRF model, introduced in this study, is to capture specific high-order named entity dependency that is an outside word sequence between two NEs. The main idea can be explained in the following manner:
It mainly focuses on beneficial use of outside label as a medium delivering dependency between separated NEs.
Focuses on label subsequence hav-ing〈𝑒𝑛𝑡𝑖𝑡𝑦, 𝑜𝑢𝑡𝑠𝑖𝑑𝑒 , 𝑒𝑛𝑡𝑖𝑡𝑦〉 pattern. ( The first outside label in an outside subsequence explicitly has a first-order dependency with its adjacent entity. If the first outside label tosses the information to the next, the information possibly flows forward.
By induction process, the information of the first entity can flow through multiple outside labels to the second entity state (Figure 1(c)).
In the pre-induced CRF, the outside state with a memory element behaves as if an information transmission medium is delivering information about the presence or absence of the preceding entity forward. It is required to expand state set. States are collected and only entity states are selected. Multiplied outside state set is derived by multiplication of entity states and outside state. Expanded state set is consequently derived as a union of entity states and multiplied outside states.
Turning to the formul
This content is AI-processed based on open access ArXiv data.