NeuroNER: an easy-to-use program for named-entity recognition based on neural networks

Named-entity recognition (NER) aims at identifying entities of interest in a text. Artificial neural networks (ANNs) have recently been shown to outperform existing NER systems. However, ANNs remain challenging to use for non-expert users. In this pa…

Authors: Franck Dernoncourt, Ji Young Lee, Peter Szolovits

NeuroNER: an easy-to-use program for named-entity recognition based on   neural networks
Neur oNER: an easy-to-use pr ogram f or named-entity r ecognition based on neural networks Franck Dernoncourt ∗ MIT francky@mit.edu Ji Y oung Lee ∗ MIT jjylee@mit.edu Peter Szolo vits MIT psz@mit.edu Abstract Named-entity recognition (NER) aims at identifying entities of interest in a text. Ar- tificial neural networks (ANNs) hav e re- cently been shown to outperform existing NER systems. Ho we ver , ANNs remain challenging to use for non-expert users. In this paper , we present NeuroNER, an easy- to-use named-entity recognition tool based on ANNs. Users can annotate entities us- ing a graphical web-based user interface (BRA T): the annotations are then used to train an ANN, which in turn predict entities’ locations and categories in ne w texts. NeuroNER makes this annotation- training-prediction flow smooth and ac- cessible to anyone. 1 Introduction Named-entity recognition (NER) aims at identify- ing entities of interest in the text, such as location, org anization and temporal expression. Identified entities can be used in various do wnstream appli- cations such as patient note de-identification and information extraction systems. They can also be used as features for machine learning systems for other natural language processing tasks. Early systems for NER relied on rules de- fined by humans. Rule-based systems are time- consuming to dev elop, and cannot be easily trans- ferred to new types of texts or entities. T o address these issues, researchers hav e developed machine- learning-based algorithms for NER, using a vari- ety of learning approaches, such as fully super - vised learning, semi-supervised learning, unsuper - vised learning, and active learning. NeuroNER is based on a fully supervised learning algorithm, which is the most studied approach ( Nadeau and Sekine , 2007 ). ∗ These authors contributed equally to this w ork. Fully supervised approaches to NER include support vector machines (SVM) ( Asahara and Matsumoto , 2003 ), maximum entropy mod- els ( Borthwick et al. , 1998 ), decision trees ( Sekine et al. , 1998 ) as well as sequential tagging meth- ods such as hidden Markov models ( Bikel et al. , 1997 ), Marko v maximum entropy models ( K umar and Bhattacharyya , 2006 ), and conditional ran- dom fields (CRFs) ( McCallum and Li , 2003 ; Tsai et al. , 2006 ; Benajiba and Rosso , 2008 ; Filannino et al. , 2013 ). Similar to rule-based systems, these approaches rely on handcrafted features, which are challenging and time-consuming to de velop and may not generalize well to ne w datasets. More recently , artificial neural networks (ANNs) have been shown to outperform other supervised algorithms for NER ( Collobert et al. , 2011 ; Lample et al. , 2016 ; Lee et al. , 2016 ; Labeau et al. , 2015 ; Dernoncourt et al. , 2016 ). The effecti veness of ANNs can be attributed to their ability to learn effecti ve features jointly with model parameters directly from the training dataset, instead of relying on handcrafted features de veloped from a specific dataset. Ho we ver , ANNs remain challenging to use for non-expert users. Contributions NeuroNER makes state-of-the- art named-entity recognition based on ANN a v ail- able to anyone, by focusing on usability . T o enable users to create or modify annotations for a new or existing corpus, NeuroNER interfaces with the web-based annotation program BRA T ( Stenetorp et al. , 2012 ). NeuroNER makes the annotation- training-prediction flow smooth and accessible to anyone, while le v eraging the state-of-the-art pre- diction capabilities of ANNs. NeuroNER is open source and freely av ailable online 1 . 1 NeuroNER can be found online at: http://neuroner.com 2 Related W ork Existing publicly av ailable NER systems geared to ward non-experts do not use ANNs. For example, Stanford NER ( Finkel et al. , 2005 ), ABNER ( Settles , 2005 ), the MITRE Identifica- tion Scrubber T oolkit (MIST) ( Aberdeen et al. , 2010 ), ( Boag et al. , 2015 ), BANNER ( Leaman et al. , 2008 ) and NERsuite ( Cho et al. , 2010 ) rely on CRFs. GAPSCORE uses SVMs ( Chang et al. , 2004 ). Apache cT AKES ( Sav ov a et al. , 2010 ) and Gate’ s ANNIE ( Cunningham et al. , 1996 ; May- nard and Cunningham , 2003 ) use mostly rules. NeuroNER, the first ANN-based NER system for non-experts, is more generalizable to ne w corpus due to the ANNs’ capability to learn ef fecti ve fea- tures jointly with model parameters. Furthermore, in many cases, the NER systems assume that the user already has an annotated cor - pus formatted in a specific data format. As a result, users often hav e to connect their annotation tool with the NER systems by reformatting annotated data, which can be time-consuming and error- prone. Moreov er , if users want to manually im- prov e the annotations predicted by the NER sys- tem (e.g., if the y use the NER system to accelerate the human annotations), they hav e to perform ad- ditional data con v ersion. NeuroNER streamlines this process by incorporating BRA T , a widely- used and easy-to-use annotation tool. 3 System Description NeuroNER comprises two main components: an NER engine and an interface with BRA T . Neu- roNER also comes with real-time monitoring tools for training, and pre-trained models that can be loaded to the NER engine in case the user does not have access to labelled training data. Figure 1 presents an ov ervie w of the system. 3.1 NER engine The NER engine takes as input three sets of data with gold labels: the training set, the validation set, and the test set. Additionally , it can also take as input the deployment set, which refers to any ne w text without gold labels that the user wishes to label. The files that comprise each set of data should be in the same format as used for the anno- tation tool BRA T or the CoNLL-2003 NER shared task dataset ( Tjong Kim Sang and De Meulder , 2003 ), and or ganized in the corresponding folder . The NER engine’ s ANN contains three layers: • Character-enhanced tok en-embedding layer , • Label prediction layer , • Label sequence optimization layer . The character -enhanced tok en-embedding layer maps each token to a vector representation. The sequence of vector representations corresponding to a sequence of tokens is then input to label pre- diction layer , which outputs the sequence of vec- tors containing the probability of each label for each corresponding token. Lastly , the label se- quence optimization layer outputs the most likely sequence of predicted labels based on the se- quence of probability vectors from the previous layer . All layers are learned jointly . The ANN as well as the training process hav e sev eral hyperparameters such as charac- ter embedding dimension, character-based token- embedding LSTM dimension, token embedding dimension, and dropout probability . All hyperpa- rameters may be specified in a configuration file that is human-readable, so that the user does not hav e to di ve into any code. Listing 1 presents an excerpt of the configuration file. [dataset] dataset_folder = dat/conll [character_lstm] using_character_lstm = True char_embedding_dimension = 25 char_lstm_dimension = 50 [token_lstm] token_emb_pretrained_file = glove.txt token_embedding_dimension = 200 token_lstm_dimension = 300 [crf] using_crf = True random_initial_transitions = True [training] dropout = 0.5 patience = 10 maximum_number_of_epochs = 100 maximum_training_time = 10 number_of_cpu_threads = 8 Listing 1: Excerpt of the configuration file used to define the ANN as well as the training process. Only the dataset folder parameter needs to be changed by the user: the other parameters hav e reasonable default values, which the user may op- tionally tune. Deployment T raining & Monitoring T rain set V alidation set T ensorBoard graphs Learning curve Prediction & Evaluation T est set T est set with predicted entities Confusion matrix Classification report NeuroNER engine NeuroNER engine Deployment set Deployment set with predicted entities NeuroNER engine Figure 1: NeuroNER system ov ervie w . In the NeuroNER engine, the training set is used to train the parameters of the ANN, and the validation set is used to determine when to stop training. The user can monitor the training process in real time via the learning curv e and T ensorBoard. T o e v aluate the trained ANN, the labels are predicted for the test set: the performance metrics can be calculated and plotted by comparing the predicted labels with the gold labels. The e v aluation can be done at the same time as the training if the test set is provided along with the training and validation sets, or separately after the training or using a pre-trained model. Lastly , the NeuroNER engine can label the deployment set, i.e. any ne w text without gold labels. 3.2 Real-time monitoring for training As training an ANN may take many hours, or e ven a few days on v ery large datasets, NeuroNER provides the user with real-time feedback during the training for monitoring purpose. Feedback is gi ven through two different means: plots gener- ated by NeuroNER, and T ensorBoard. Plots NeuroNER generates sev eral plots show- ing the training progress and outcome at each epoch. Plots include the evolution of the ov erall F1-score ov er time, confusion matrices visualizing the number of correct versus incorrect predictions for each class, and classification reports showing the F1-score, precision and recall for each class. T ensorBoard As NeuroNER is based on T en- sorFlo w , it le verages the functionalities of T ensor- Board. T ensorBoard is a suite of web applications for inspecting and understanding T ensorFlow runs and graphs. It allows to view in real time the per- formances achie ved by the ANN being trained. Moreov er , since it is web-based, these perfor- mances can be con veniently shared with anyone remotely . Lastly , since graphs generated by T en- sorBoard are interactiv e, the user may gain further insights on the ANN performances. 3.3 Pre-trained models Some users may prefer not to train any ANN model, either due to time constraints or unav ail- able gold labels. For example, if the user wants to tag protected health information, they might not be able to hav e access to a labeled identifiable dataset. T o address this need, NeuroNER provides a set of pre-trained models. Users are encouraged to contribute by uploading their o wn trained models. NeuroNER also comes with several pre-trained to- ken embeddings, either with word2v ec ( Mikolov et al. , 2013a , b , c ) or GloV e ( Pennington et al. , 2014 ), which the NeuroNER engine can load eas- ily once specified in the configuration file. 3.4 Annotations NeuroNER is designed to smoothly integrate with the freely av ailable web-based annotation tool BRA T , so that non-expert users may create or im- prov e annotations. Specifically , NeuroNER ad- dresses two main use cases: • creating ne w annotations from scratch, e.g. if the goal is to annotate a dataset for which no gold label is av ailable, • improving the annotations of an already la- beled dataset: the annotations may ha ve been done by another human or by a previous run of NeuroNER. In the latter case, the user may use NeuroNER in- teracti vely , by iterating between manually improv- ing the annotations and running the NeuroNER en- gine with the new annotations to obtain more ac- curate annotations. NeuroNER can take as input datasets in the BRA T format, and outputs BRA T -formatted pre- dictions, which makes it easy to start training di- rectly from the annotations as well as visualize and analyze the predictions. W e chose BRA T for two main reasons: it is easy to use, and it can be de- ployed as a web application, which allows cro wd- sourcing. As a result, the user may quickly gather a vast amount of annotations by using crowd- sourcing marketplaces such as Amazon Mechan- ical T urk ( Buhrmester et al. , 2011 ) and Crowd- Flo wer ( Finin et al. , 2010 ). 3.5 System requir ements NeuroNER runs on Linux, Mac OS X, and Mi- crosoft W indo ws. It requires Python 3.5, T ensor- Flo w 1.0 ( Abadi et al. , 2016 ), scikit-learn ( Pe- dregosa et al. , 2011 ), and BRA T . A setup script is provided to make the installation straightforward. It can use the GPU if a vailable, and the number of CPU threads and GPUs to use can be specified in the configuration file. 3.6 Perf ormances T o assess the quality of NeuroNER’ s predictions, we use two publicly and freely a v ailable datasets for named-entity recognition: CoNLL 2003 and i2b2 2014. CoNLL 2003 ( Tjong Kim Sang and De Meulder , 2003 ) is a widely studied dataset with 4 usual types of entity: persons, organizations, lo- cations and miscellaneous names. W e use the En- glish version. Model CoNLL 2003 i2b2 2014 Best published 90.9 97.9 NeuroNER 90.5 97.7 T able 1: F1-scores (%) on the test set compar- ing NeuroNER with the best published methods in the literature, viz. ( Passos et al. , 2014 ) for CoNLL 2003, ( Dernoncourt et al. , 2016 ) for i2b2 2014. The i2b2 2014 dataset ( Stubbs et al. , 2015 ) was released as part of the 2014 i2b2/UTHealth shared task T rack 1. It is the lar gest publicly av ail- able dataset for de-identification, which is a form of named-entity recognition where the entities are protected health information such as patients’ names and patients’ phone numbers. 22 systems were submitted for this shared task. T able 1 compares NeuroNER with state-of-the- art systems on CoNLL 2003 and i2b2 2014. Al- though the hyperparameters of NeuroNER were not optimized for these datasets (the default hyper- parameters were used), the performances of Neu- roNER are on par with the state-of-the-art sys- tems. 4 Conclusions In this article we hav e presented NeuroNER, an ANN-based NER tool that is accessible to non- expert users and yields state-of-the-art results. Ad- dressing the need of many users who want to cre- ate or improve annotations, NeuroNER smoothly integrates with the web-based annotation tool BRA T . References Mart ´ ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffre y Dean, Matthieu Devin, et al. 2016. T ensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 . John Aberdeen, Samuel Bayer , Reyyan Y eniterzi, Ben W ellner , Cheryl Clark, Da vid Hanauer , Bradley Ma- lin, and L ynette Hirschman. 2010. The mitre identi- fication scrubber toolkit: design, training, and as- sessment. International journal of medical infor- matics 79(12):849–859. Masayuki Asahara and Y uji Matsumoto. 2003. Japanese named entity extraction with redundant morphological analysis. In Pr oceedings of the 2003 Confer ence of the North American Chapter of the Association for Computational Linguistics on Hu- man Language T echnology-V olume 1 . Association for Computational Linguistics, pages 8–15. Y assine Benajiba and Paolo Rosso. 2008. Arabic named entity recognition using conditional random fields. In Pr oc. of W orkshop on HLT & NLP within the Ar abic W orld, LREC . Citeseer , volume 8, pages 143–153. Daniel M Bikel, Scott Miller, Richard Schwartz, and Ralph W eischedel. 1997. Nymble: a high- performance learning name-finder . In Pr oceedings of the fifth confer ence on Applied natural language pr ocessing . Association for Computational Linguis- tics, pages 194–201. W illiam Boag, Ke vin W acome, T ristan Naumann, and Anna Rumshisky . 2015. Cliner: A lightweight tool for clinical named entity recognition. American Medical Informatics Association (AMIA) J oint Sum- mits on Clinical Resear c h Informatics (poster) . Andrew Borthwick, John Sterling, Eugene Agichtein, and Ralph Grishman. 1998. Nyu: Description of the mene named entity system as used in muc-7. In In Pr oceedings of the Seventh Messag e Understanding Confer ence (MUC-7) . Michael Buhrmester , Trac y Kwang, and Samuel D Gosling. 2011. Amazon’ s mechanical turk a ne w source of inexpensiv e, yet high-quality , data? P er- spectives on psychological science 6(1):3–5. Jeffre y T Chang, Hinrich Sch ¨ utze, and Russ B Altman. 2004. Gapscore: finding gene and protein names one word at a time. Bioinformatics 20(2):216–225. HC Cho, N Okazaki, M Miwa, and J Tsujii. 2010. Nersuite: a named entity recognition toolkit. Tsu- jii Laboratory , Department of Information Science, University of T okyo, T okyo, J apan . Ronan Collobert, Jason W eston, L ´ eon Bottou, Michael Karlen, Koray Ka vukcuoglu, and Pa vel Kuksa. 2011. Natural language processing (almost) from scratch. The J ournal of Machine Learning Resear ch 12:2493–2537. Hamish Cunningham, Y orick W ilks, and Robert J Gaizauskas. 1996. Gate: a general architecture for text engineering. In Proceedings of the 16th confer- ence on Computational linguistics-V olume 2 . Asso- ciation for Computational Linguistics, pages 1057– 1060. Franck Dernoncourt, Ji Y oung Lee, Ozlem Uzuner , and Peter Szolovits. 2016. De-identification of patient notes with recurrent neural netw orks. J ournal of the American Medical Informatics Association . Michele Filannino, Gavin Brown, and Goran Nenadic. 2013. Mantime: T emporal expression identifica- tion and normalization in the tempe v al-3 challenge. arXiv pr eprint arXiv:1304.7942 . T im Finin, W ill Murnane, Anand Karandikar , Nicholas Keller , Justin Martineau, and Mark Dredze. 2010. Annotating named entities in twitter data with crowdsourcing. In Pr oceedings of the N AA CL HLT 2010 W orkshop on Cr eating Speech and Language Data with Amazon’ s Mechanical T urk . Association for Computational Linguistics, pages 80–88. Jenny Rose Finkel, T rond Grenager , and Christopher Manning. 2005. Incorporating non-local informa- tion into information extraction systems by gibbs sampling. In Pr oceedings of the 43r d annual meet- ing on association for computational linguistics . As- sociation for Computational Linguistics, pages 363– 370. N Kumar and Pushpak Bhattacharyya. 2006. Named entity recognition in Hindi using MEMM. T echbical Report, IIT Mumbai . Matthieu Labeau, Ke vin L ¨ oser , and Alexandre Al- lauzen. 2015. Non-lexical neural architecture for fine-grained POS tagging . In Pr oceedings of the 2015 Confer ence on Empirical Methods in Natu- ral Languag e Pr ocessing . Association for Computa- tional Linguistics, Lisbon, Portugal, pages 232–237. http://aclweb .org/anthology/D15-1025 . Guillaume Lample, Miguel Ballesteros, Sandeep Sub- ramanian, Kazuya Kawakami, and Chris Dyer . 2016. Neural architectures for named entity recognition. arXiv pr eprint arXiv:1603.01360 . Robert Leaman, Graciela Gonzalez, et al. 2008. Ban- ner: an ex ecutable surv e y of adv ances in biomedical named entity recognition. In P acific symposium on biocomputing . volume 13, pages 652–663. Ji Y oung Lee, Franck Dernoncourt, Ozlem Uzuner , and Peter Szolovits. 2016. Feature-augmented neural networks for patient note de-identification. COL- ING Clinical NLP . Diana Maynard and Hamish Cunningham. 2003. Mul- tilingual adaptations of annie, a reusable informa- tion extraction tool. In Pr oceedings of the tenth confer ence on Eur opean chapter of the Association for Computational Linguistics-V olume 2 . Associa- tion for Computational Linguistics, pages 219–222. Andrew McCallum and W ei Li. 2003. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Pr oceedings of the seventh conference on Natu- ral language learning at HLT -NAA CL 2003-V olume 4 . Association for Computational Linguistics, pages 188–191. T omas Mikolov , Kai Chen, Greg Corrado, and Jef- frey Dean. 2013a. Ef ficient estimation of word representations in vector space. arXiv pr eprint arXiv:1301.3781 . T omas Mikolo v , Ilya Sutsk e ver , Kai Chen, Gre g S Cor - rado, and Jeff Dean. 2013b. Distributed representa- tions of words and phrases and their compositional- ity . In Advances in neural information pr ocessing systems . pages 3111–3119. T omas Mikolov , W en-tau Y ih, and Geoffre y Zweig. 2013c. Linguistic regularities in continuous space word representations. In HLT -NAA CL . pages 746– 751. David Nadeau and Satoshi Sekine. 2007. A sur- ve y of named entity recognition and classification. Lingvisticae In vestigationes 30(1):3–26. Alexandre Passos, V ineet Kumar , and Andrew Mc- Callum. 2014. Lexicon infused phrase embeddings for named entity resolution . In Pr oceedings of the Eighteenth Conference on Computational Natural Language Learning . Association for Computational Linguistics, Ann Arbor, Michigan, pages 78–86. http://www .aclweb.or g/anthology/W14-1609 . Fabian Pedregosa, Ga ¨ el V aroquaux, Alexandre Gram- fort, V incent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer , Ron W eiss, V incent Dubourg, et al. 2011. Scikit-learn: Machine learning in python. Journal of Machine Learning Resear c h 12(Oct):2825–2830. Jeffre y Pennington, Richard Socher , and Christopher D Manning. 2014. GloV e: global vectors for word representation. Pr oceedings of the Empiricial Meth- ods in Natural Language Pr ocessing (EMNLP 2014) 12:1532–1543. Guergana K Sav ov a, James J Masanz, Philip V Ogren, Jiaping Zheng, Sunghwan Sohn, Karin C Kipper- Schuler , and Christopher G Chute. 2010. Mayo clin- ical text analysis and knowledge extraction system (ctakes): architecture, component e v aluation and ap- plications. Journal of the American Medical Infor- matics Association 17(5):507–513. Satoshi Sekine et al. 1998. Nyu: Description of the japanese ne system used for met-2. In Pr oc. Mes- sage Under standing Confer ence . Burr Settles. 2005. ABNER: An open source tool for automatically tagging genes, proteins, and other entity names in text. Bioinformatics 21(14):3191– 3192. Pontus Stenetorp, Sampo Pyysalo, Goran T opi ´ c, T omoko Ohta, Sophia Ananiadou, and Jun’ichi Tsu- jii. 2012. Brat: a web-based tool for nlp-assisted text annotation. In Pr oceedings of the Demonstrations at the 13th Conference of the Eur opean Chapter of the Association for Computational Linguistics . Associa- tion for Computational Linguistics, pages 102–107. Amber Stubbs, Christopher K otfila, and ¨ Ozlem Uzuner . 2015. Automated systems for the de-identification of longitudinal clinical narrativ es: Overvie w of 2014 i2b2/uthealth shared task track 1. Journal of biomedical informatics 58:S11–S19. Erik F Tjong Kim Sang and Fien De Meulder . 2003. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Pr oceedings of the seventh conference on Natural language learning at HLT -NAA CL 2003-V olume 4 . Association for Computational Linguistics, pages 142–147. Richard Tzong-Han Tsai, Cheng-Lung Sung, Hong-Jie Dai, Hsieh-Chuan Hung, T ing-Y i Sung, and W en- Lian Hsu. 2006. Nerbio: using selected word con- junctions, term normalization, and global patterns to improv e biomedical named entity recognition. BMC bioinformatics 7(5):S11.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment