SynsetRank: Degree-adjusted Random Walk for Relation Identification

In relation extraction, a key process is to obtain good detectors that find relevant sentences describing the target relation. To minimize the necessity of labeled data for refining detectors, previous work successfully made use of BabelNet, a semant…

Authors: Shinichi Nakajima, Sebastian Krause, Dirk Weissenborn

SynsetRank: Degree-adjusted Random Walk for Relation Identification
JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2015 1 SynsetRank: De gree-adjusted Random W alk for Relation Identification Shinichi Nakajima ∗ , Sebastian Krause, Dirk W eissenborn, Sven Schmeier , Nico Görnitz, Feiyu Xu Abstract —In relation extraction, a key process is to obtain good detectors that find rele vant sentences describing the target relation. T o minimize the necessity of labeled data for refining detectors, pre vious work successfully made use of BabelNet, a semantic graph structure expr essing relationships between synsets , as side information or prior knowledge. The goal of this paper is to enhance the use of graph structure in the framework of random walk with a few adjustable parameters. Actually , a straightforward application of random walk degrades the performance even after parameter optimization. With the insight from this unsuccessful trial, we propose SynsetRank , which adjusts the initial probability so that high degr ee nodes influence the neighbors as strong as low degr ee nodes. In our experiment on 13 relations in the FB15K-237 dataset, SynsetRank significantly outperforms baselines and the plain random walk approach. Index T erms —relation extraction, random walk, PageRank, BabelNet. I . I N T RO D U C T I O N Many NLP tasks are concerned with recognizing semantic concepts in large amounts of text, including the problem of detecting mentions of real-world e vents [ 1 ], [ 2 ] and relations between entities [ 3 ], [ 4 ], [ 5 ]. The high up-front cost for training NLP systems with labeled data has lead to the design of supervision paradigms which use only distant or weak guidance from manually created examples [ 6 ], [ 7 ], [ 8 ]. Such methods can benefit from additional clues about the presence of semantic concepts in language fragments, coming from lexical-semantic resources like W ordNet [9] and BabelNet [10]. Consider the following example: Mike and Julie Miller celebrated a fabulous wedding last summer , only two years after the y had first met. In the first part of the sentence, the term wedding is a strong indicator for the presence of a marriage relation mention, while the second part has no such indicator . Information about such relation-rele vant terms is helpful for , e.g., extracting sentence templates for pattern-based relation extraction, or pre-filtering texts before fine-grained processing takes place. Existing information-extraction systems typically exploit lexical-semantic repositories for increased lexical coverage, by retrie ving synonyms for observ ed terms or by calculating similarity scores based on the graph structure of these resources [ 11 ], [ 12 ], [ 13 ]. Few approaches e xplicitly identify the entries ∗ corresponding author (email: nakajima@tu-berlin.de) Shinichi Nakajima and Nico Görnitz are with T echnische Un versität Berlin, Machine Learning Group, Marchstr . 23, 10587 Berlin, Germany . Shinichi Nakajima, Sebastian Krause, Dirk W eissenborn, Sven Schmeier and Feiyu Xu are with Berlin Big Data Center, 10587 Berlin, Germany . Sebastian Krause, Dirk W eissenborn, Sven Schmeier and Feiyu Xu are with DFKI, Language T echnology Lab, Alt-Moabit 91c, Berlin, Germany . which express semantic concepts on the textual lev el. An exception is the work by Moro et al. (2013) [ 14 ], who start with an initial frequency distribution of terms co-occurring with relation examples in a large text collection. Relation-rev elant terms are then determined through an ad-hoc combination of this initial distrib ution with the graph structure of the repository . In this paper , we improv e Moro et al. ’ s approach by casting the problem to a ranking problem, and applying the random walk approach with a simple modification. W e test our approach on a publicly av ailable dataset and compare it to sev eral baselines. W e ev aluate the model performance in terms of the quality of positiv ely labeled word synsets and reach drastically better performance. I I . B AC K G R O U N D This paper focuses on the automatic identification of relation- relev ant entries in lexical-semantic resources, i.e., we want to obtain relation detectors . As relation we understand any kind of real-world relationship between persons, locations, etc., examples are the kinship relations marriage , par ent-child , siblings , or business concepts such as company acquisition , employment tenur e . Lexical-semantic r epositories are in ventories of word senses, which link words to their meaning and to other words, i.e., these resources hav e an underlying graph structure. A prominent instance of the many lexical-semantic resources out there is BabelNet 1 [ 10 ], a large-scale multilingual semantic network which was built automatically through the algorithmic integration of W ikipedia and W ordNet. The core components (nodes) are so-called synsets , which are sets of synon ymous terms; the edges correspond to synset relationships such as hypernymy and meronymy . A. Finding Domain-Relevant T erms A lot of work has dealt with acquiring relev ant terms for semantic relations. Nguyen et al. (2010) [ 15 ] analyzed the distribution of trigger wor ds for semantic relations in annotated data in order to filter extraction patterns. For a similar reason, Xu et al. (2002) [ 16 ] collected rele vant terms with a TFIDF- based strategy . Other approaches incorporate lexical knowledge from W ordNet. Zhou et al. (2005) [ 3 ] presented a feature- based relation extractor which utilizes semi-automatically build trigger-word lists from W ordNet. Culotta and Sorensen (2004) [ 11 ] used W ordNet hypern yms for increased extraction cov erage. Stevenson and Greenwood (2005) [ 12 ] defined a 1 http://babelnet.org JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2015 2 similarity function for learned linguistic patterns that was built on W ordNet information. None of the above approaches, howe ver , explicitly deter- mines and outputs which parts of the lexical-semantic resource contain the terms that are relev ant to a gi ven semantic relation. B. Extracting Relation-Specific Sub-Graphs Moro et al. (2013) [ 14 ] proposed another approach to the term identification problem. Their algorithm gets as input a set of sentences which hav e been labeled with relation mentions in a distantly supervised manner . This noisy set of relation mentions is processed by word-sense disambiguation [ 17 ] to build links from the word lev el to the level of synsets in W ordNet and BabelNet. This induces a frequency distribution ov er synsets, from which the most frequent items plus their direct neighbors in the resource are selected to build the final relation-specific sub-graph. Moro et al. (2013) employ these sub-graphs for filtering linguistic patterns in a relation- extraction scenario. While their approach sho ws good results, it also leav es room for improv ements mainly due to its ad-hoc, heuristic utilization of the av ailable synset links in the sense in ventory . W e use their approach as one of the baselines against which we compare our proposed model. C. P a geRank: Random W alk for W ebpage Ranking Moro et al. (2013) choose the most frequent synsets and their neighbors as the relev ant synsets, which can be naturally cast as information propagation through random walks. Random walk was successfully applied for ranking webpages in the name of P ageRank [ 18 ], [ 19 ]. Our first trial is to apply PageRank for ranking synsets, according to the relev ance to the target relation. Consider a graph G = ( V , E ) with a set V of nodes and a set E of edges connecting two nodes. W e denote the number of nodes by N = |V | . In our application, each node i ∈ V corresponds to a synset, and each edge ( i, j ) ∈ E corresponds to a semantic connection between two synsets. Each edge has a label l ∈ { 1 , . . . , L } , which specifies the relation between two synsets, such as hypern ymy and meronymy . W e formally prepare L graphs { G ( l ) = ( V , E ( l ) ) } L l =1 , which share the nodes V but have L different sets of edges, each of which consists of a single edge label. F or each l , we express the existence of edges by an N × N binary matrix E ( l ) , and we prepare a weight vector w ∈ R L + . Then, we construct a transition matrix Q 0 ∈ R N × N as the weighted sum over all edge labels: Q 0 i,j =    P L l =1 w l E ( l ) i,j P N j 0 =1 P L l =1 w l E ( l ) i,j 0 if ( i, j ) ∈ S L l =1 E ( l ) , 0 otherwise . The BabelNet graph is directed, and the number of edge types is L = 29 . Since the semantic edge direction does not necessary indicates the direction in which the relevance information should flow , we treat the edges in the opposite direction as another edge type. Thus, we hav e L = 58 edge types in total. T o av oid dead-ends and spider-tr aps issues, our implemen- tation of PageRank is equipped with taxation and r estarting [ 19 ], [ 20 ]. This can be realized by adding a sink-sour ce node, which absorbs α ∈ [0 , 1] proportion of flow from all nodes, and re-distributes it according to the initial distribution p (0) ∈ R N + . Here, we use the original frequency distribution, observed from the text corpus (see Section II-B ), ov er synsets as the initial distribution p (0) . W e also add self-links, with which the random walkers stay at the same node with probability β ∈ [0 , 1] . Thus, the distribution after t random walks is defined as e p ( t ) > = e p ( t − 1) > Q , where (1) Q =  (1 − α ) { (1 − β ) Q 0 + β I N } k p (0) > 0  , k i = ( α if ∃ j 0 s.t. ( i, j 0 ) ∈ S L l =1 E ( l ) , 1 − (1 − α ) β otherwise . Here, I N denotes the N × N identity matrix, and e p ( t ) ∈ R N +1 denotes the distribution (after t random walks) over the original nodes and the ( N + 1) -th sink-source node. W e set e p (0) to the initial distribution augmented with a zero for the sink- source node, i.e., e p (0) = ( p (0) > , 0) > . After random walks, we rank the synsets based on the distrib ution e p ( t ) . The unkno wn parameters α, β , and t are optimized by using the validation data, while the edge weights are fixed to w l = 1 , ∀ l in this paper . I I I . P RO P O S E D M E T H O D As shown in Section IV, PageRank performs worse than Moro et al. ’ s baseline method even after parameter optimization. This is not very surprising because the problem of ranking webpages and the problem of ranking synsets are substantially different. T aking this difference into account, we propose a new method. A. SynsetRank: De gr ee-Adjusted Random W alk for Synset Ranking By nature of random walks, a node with more outgoing edges influences each neighboring node less, since random walk ers are dispersed ov er many edges. This is what PageRank, which simulates web surfers, intends to do, but this is not appropriate for synset ranking, where neighbors to frequent synsets should be high ranked, regardless of the degree (the number of edges) of the frequent node. Our idea is to adjust the original frequency , as well as the restarting probability , to compensate this undesired phenomenon. In our random walk formulation (1) , this can be done simply by replacing the original frequency distrib ution p (0) with a re-weighted one: b p (0) = p (0) ∗ d k p (0) ∗ d k 1 , where d i = P N j =1 P L l =1 w l E ( l ) i,j . Here ∗ denotes the element-wise product of vectors. This simple modification makes the influence of a node to each neighbor equal, regardless of the degree, and sho ws a drastic improv ement in our experiment in Section IV. W e call this degree-adjusted random walk approach SynsetRank . JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2015 3 T ABLE I A R EA U ND E R T H E RO C CU RV E ( AU C) F OR 1 3 R E L A T I ON S OF T HE FB15 K -237 D A T A S E T . Relation Frequency Moro PageRank PageRank SynRank SynRank (common) (common) /awar d/award_nominee/awar d_nominations ./awar d/award_nomination/awar d_nominee 0.4120 0.4666 0.4698 0.3794 0.4398 0.4868 /awar d/award_winner/awar ds_won ./awar d/award_honor/awar d_winner 0.6204 0.4625 0.5079 0.5378 0.4847 0.5022 /base/popstra/celebrity/friendship ./base/popstra/friendship/participant 0.3825 0.4162 0.4216 0.4084 0.4824 0.4232 /education/educational_de gree/people_with_this_de gr ee ./education/education/institution 0.7002 0.8573 0.7128 0.6355 0.8743 0.8646 /education/educational_institution/students_graduates ./education/education/major_field_of_study 0.6803 0.7710 0.7099 0.7239 0.7726 0.7891 /film/actor/film./film/performance/film 0.7149 0.6519 0.7228 0.6998 0.6694 0.6479 /film/dir ector/film 0.6237 0.7116 0.5760 0.5762 0.6873 0.6847 /location/location/contains 0.5856 0.6747 0.5858 0.5682 0.6624 0.6709 /music/performance_r ole/r egular_performances ./music/gr oup_membership/r ole 0.6373 0.7483 0.6717 0.6666 0.8185 0.8137 /or ganization/organization_member/member_of ./or ganization/organization_member ship/organization 0.9200 0.8020 0.9126 0.8876 0.8812 0.8230 /people/person/nationality 0.5507 0.7180 0.8411 0.7413 0.8820 0.8399 /people/person/place_of_birth 0.5905 0.6977 0.7792 0.6914 0.8315 0.7796 /people/person/places_lived ./people/place_lived/location 0.6573 0.7203 0.7534 0.7357 0.8238 0.7861 A verage A UC 0.6212 0.6691 0.6665 0.6348 0.7162 0.7009 I V . E X P E R I M E N T In this section, we show our experimental results. A. Data and T ask W e used the FB15K-237 dataset of T outanov a et al. (2015) [ 21 ] for our experiments, and follow the training/validation/test split suggested by the authors. This dataset provides a large number of relation instances from the factual kno wledgebase Freebase along with textual mentions, i.e., parses of sentences which contain references to the argument entities of the relation instances. The task is to find the synsets (nodes in BabelNet) that are semantically relev ant for the target relation, i.e., its occurrence is likely to trigger a specific semantic relation (from a knowledge base like Freebase, e.g., the relation /people/person/place_of_birth connecting humans to the place they were born in). Here, ‘triggering’ means that this synset (word surface form) in a sentence (e.g., the word ‘born’ in the sentence ‘John was born in Ne w Y ork’), makes it probable that this sentence refers to this semantic relation (which the sentence actually does, in this e xample). Such information about relation relev ancy is useful for downstream text analytics tasks where it serves as a further signal for making a relation extraction decision (Does the sentence ‘John was born in New Y ork’ contain the fact triple ?). The dataset FB15K-237 was created by combining (a) fact triples from Freebase, (b) many sentences which mention entities for which facts are listed in Freebase. As the task for which FB15K-237 was created, unfortunately , is different from ours, we cannot follow the ev aluation procedure suggested in T outanov a et al. (2015) [ 21 ]. Accordingly , we created our own gold-standard labels by hand-labeling a subset of synsets, as explained shortly . 2 In order to avoid data sparsity issues, we determined the twenty relations with the highest number of mentions in the training partition, and remov ed sev en from these which were redundant with respect to the other relations or which were semantically lightweight from the point of vie w of textual mentions (see Appendix A for details of the remov ed seven relations). W e used the 13 relations shown in the first column of T able I. For each of these relations and each data partition, we b uild positi ve/negati ve sets of te xtual mentions. The positi ve mentions are simply the ones that contain the arguments of a relation instance, while the neg ativ e ones are constructed follo wing the strategy outlined by T outanov a et al. (2015) [ 21 ]. For the data in the training partition, we apply word-sense disambiguation to the positive and negati ve textual mentions, this way creating an initial synset frequency distribution among positiv e and negati ve examples for each relation, similar to Moro et al. (2013) [14]. 2 W e will make the evaluation data publicly available upon acceptance. JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2015 4 Furthermore, for each relation and the mentions in the validation and test partition, we prepare an ev aluation dataset with manually annotated labels. W e start with the top-50 most frequent synsets for the relation in the respectiv e part of the data, which occur in positi ve textual mentions but not in negati ve ones. W e do a two-step graph walk on BabelNet which extends these nodes with two randomly selected neighbors for each already included node. The resulting synsets and the corresponding lemmas/words are gi ven to human annotators who label the synsets as positi ve/neg ative with respect to the relation, i.e., they judge whether or not the synset is relev ant for the semantics of the relation. In our experiments, we use BabelNet version 2.5.1, which contains roughly 9M synsets, 11M lexicalizations, and 262M links. There are in total 430k positiv e textual mentions for the 13 relations in the training partition of the data. The ev aluation set has on a verage 2,857 synsets per relation, with a +/- ratio of 1:35. B. Result T able I shows the area under the ROC curve (A UC) on the test partition for the 13 relations. ‘Frequency’ denotes the baseline method where the synsets are rank ed based on the original frequency p (0) . W e clearly observe that the plain PageRank tends to perform worse than Moro et al. ’ s baseline method, while our proposed degree-adjusted SynsetRank shows drastically better performance than all others. For PageRank and SynsetRank, the optimal parameters (maximizing the A UC on the validation partition) are grid- searched ov er α = 0 . 0 , 0 . 2 , . . . , 1 . 0 , β = 0 . 0 , 0 . 2 , . . . , 1 . 0 , and t = 1 , . . . , 5 for each relation. For PageRank (common) and SynsetRank (common), the common optimal parameters (maximizing the av erage A UC ov er all 13 relations) are used. SynsetRank impro ves the average A UC of Moro et al. ’ s baseline method by roughly 0.05, which is similar to the per- formance gain by Moro et al. (2013) [ 14 ] from the Frequency baseline. Although the necessity of parameter optimization can be a bottleneck of SynsetRank, the second best result with SynsetRank (common) implies the possibility of using common parameters for all relations—Once we optimize the parameters for some set of relations, one could use the same parameters for new relations. V . C O N C L U S I O N Extracting kno wledge from the internet is one of the most important near-future goals for researchers in the field of natural language processing, machine learning, and artificial intelligence. Relation extraction (RE) is a key technology . W e cast the problem of finding good detectors as a synset ranking problem, and applied the random walk approach with simple modification. Our experiment sho wed promising results. W e lea ve the quality assessment of downstream applications as future work. W e also plan to apply the supervised random walk approach [ 22 ] to optimize the weights w for each edge label, which further exploits existing kno wledge for better performance. T ABLE II R E MO VE D S E V E N R E LAT IO N S . /awar d/award_nominee/awar d_nominations ./awar d/award_nomination/nominated_for /awar d/award_winning_work/awar ds_won ./awar d/award_honor/awar d_winner /location/location/adjoin_s ./location/adjoining_r elationship/adjoins /location/hud_county_place/place /film/film/r elease_date_s ./film/film_r egional_r elease_date/film_release_r e gion /film/film/country /influence/influence_node/influenced_by A P P E N D I X A R E M OV E D S E V E N R E L A T I O N S T able II summarizes the remov ed sev en relations in our experiment. The first and the second r emoved relations are semantically very close to the first and the second e valuated relations, respecti vely , in T able I. Likewise, the third and the fourth removed relations are close to the eighth ev aluated relation, and the fifth and the sixth removed relations are close to the se venth ev aluated relation. Here, ‘semantically close’ means that the synsets relevant for the respectiv e relations are very likely to be the same, and we can expect that the result on each removed relation is similar to its closest ev aluated relation. The se venth remo ved relation is unlikely to be mentioned at all in the text, so is not interesting for relation extraction. Accordingly , we removed those se ven relations to reduce the hand-labeling work ( ∼ 8 person-hours per relation). A C K N O W L E D G M E N T SN, SK, D W , FX, SC thank the support from the Berlin Big Data Center project (FKZ 01IS14013A). NG was supported by ALICE II grant (FKZ 01IB15001B). R E F E R E N C E S [1] E. Alfonseca, D. Pighin, and G. Garrido, “HEAD Y: News headline abstraction through event pattern clustering. ” in A CL , 2013, pp. 1243– 1253. [2] C. Zhang, S. Soderland, and D. S. W eld, “Exploiting parallel news streams for unsupervised e vent extraction, ” T A CL , vol. 3, pp. 117–129, 2015. [3] G. Zhou, J. Su, J. Zhang, and M. Zhang, “Exploring various knowledge in relation extraction, ” in ACL . The Association for Computer Linguistics, 2005. [4] N. Nakashole, G. W eikum, and F . M. Suchanek, “P A TTY : A taxonomy of relational patterns with semantic types, ” in EMNLP-CoNLL . A CL, 2012, pp. 1135–1145. [5] T . M. Mitchell, W . W . Cohen, E. R. H. Jr ., P . P . T alukdar , J. Betteridge, A. Carlson, B. D. Mishra, M. Gardner , B. Kisiel, J. Krishnamurthy , N. Lao, K. Mazaitis, T . Mohamed, N. Nakashole, E. A. Platanios, A. Ritter , M. Samadi, B. Settles, R. C. W ang, D. T . W ijaya, A. Gupta, X. Chen, A. Saparov , M. Greaves, and J. W elling, “Nev er-ending learning, ” in AAAI . AAAI Press, 2015, pp. 2302–2310. [6] Z. Zhang, “W eakly-supervised relation classification for information extraction, ” in CIKM . ACM, 2004, pp. 581–588. [7] M. Mintz, S. Bills, R. Snow , and D. Jurafsky , “Distant supervision for relation e xtraction without labeled data, ” in ACL/IJCNLP . The Association for Computer Linguistics, 2009, pp. 1003–1011. JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, A UGUST 2015 5 [8] R. Hoffmann, C. Zhang, X. Ling, L. S. Zettlemoyer , and D. S. W eld, “Kno wledge-based weak supervision for information extraction of ov erlapping relations, ” in ACL . The Association for Computer Linguistics, 2011, pp. 541–550. [9] C. Fellbaum, Ed., W or dNet: an electronic le xical database , 1998. [10] R. Navigli and S. P . Ponzetto, “BabelNet: The automatic construction, ev aluation and application of a wide-cov erage multilingual semantic network, ” Artificial Intelligence , vol. 193, pp. 217–250, 2012. [11] A. Culotta and J. S. Sorensen, “Dependency tree kernels for relation extraction, ” in ACL , 2004, pp. 423–429. [12] M. Stevenson and M. Greenwood, “ A semantic approach to IE pattern induction, ” in Pr oceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL ’05) . Association for Computational Linguistics, 2005, pp. 379–386. [13] G. Zhou and M. Zhang, “Extracting relation information from text documents by exploring v arious types of knowledge, ” Inf. Process. Manage . , vol. 43, no. 4, pp. 969–982, 2007. [14] A. Moro, H. Li, S. Krause, F . Xu, R. Navigli, and H. Uszkoreit, “Semantic rule filtering for web-scale relation extraction, ” in Pr oc. of ISWC , 2013, pp. 347–362. [15] Q. L. Nguyen, D. T ikk, and U. Leser , “Simple tricks for improving pattern- based information extraction from the biomedical literature, ” Journal of Biomedical Semantics , vol. 1, no. 1, pp. 1–17, 2010. [16] F . Xu, D. Kurz, J. Piskorski, and S. Schmeier, “ A domain adaptiv e approach to automatic acquisition of domain rele vant terms and their relations with bootstrapping, ” in LREC . European Language Resources Association, 2002. [17] R. Na vigli, “W ord sense disambiguation: A survey , ” A CM Comput. Surv . , vol. 41, no. 2, pp. 10:1–10:69, 2009. [18] S. Brin and L. Page, “ Anatomy of a large-scale hypertextual web search engine, ” in Pr oc. of WWW , 1998, pp. 107–117. [19] J. Leskov ec, A. Rajaraman, and J. D. Ullman, Mining of Massive Datasets 2nd Edition . Cambridge University Press, 2014. [20] H. T ong, C. Faloutsos, and J. Y . Pan, “Fast random walk with restart and its applications, ” in Pr oc. of ICDM , 2006. [21] K. T outanov a, D. Chen, P . Pantel, H. Poon, P . Choudhury , and M. Gamon, “Representing text for joint embedding of text and kno wledge bases, ” in Pr oceedings of the 2015 Confer ence on Empirical Methods in Natural Language Pr ocessing . Lisbon, Portugal: Association for Computational Linguistics, September 2015, pp. 1499–1509. [Online]. A vailable: http://aclweb .org/anthology/D15- 1174 [22] L. Backstrom and J. Leskovec, “Supervised random walks: Predicting and recommending links in social networks, ” in Proc. of WSDM , 2011. Shinichi Nakajima is a senior researcher in Berlin Big Data Center , Machine Learning Group, T ech- nische Universität Berlin. He receiv ed the master degree on physics in 1995 from Kobe university , and worked with Nik on Corporation until September 2014 on statistical analysis, image processing, and machine learning. He receiv ed the doctoral degree on computer science in 2006 from T okyo Institute of T echnology . His research interest is in theory and applications of machine learning, in particular , Bayesian learning theory , computer vision, and data mining. Sebastian Krause is a PhD student at the Language T echnology Lab of the German Research Center for Artificial Intelligence (DFKI). He got his Diplom degree in Computer Science from the Humboldt Univ ersity Berlin and has recently worked on natural language processing, in particular on text mining problems. Dirk W eissenborn is since April 2014 a Researcher und PhD Student at DFKI. His background is in Machine Learning with a focus on NLP . Sven Schmeier is a senior consultant and project leader at the Language T echnology Lab at the German Research Center for Artificial Intelligence (DFKI) in Berlin. Sven Schmeier holds a Diploma in Computer Science and a PhD in Computational Linguistics from the University of Saarland. In 2000 he was co-founder of the DFKI Spin-Off company Xtramind (now Attensity). In 2005 he was the leader of the research group at Semgine GmbH now reformed to medx GmbH in Berlin. In 2007 he was co-founder of the company Y ocoy T echnologies GmbH with Dr. Feiyu Xu and Prof. Hans Uszkoreit. Nico Görnitz is a research associate in the machine learning group at the Berlin Institute of T echnology (TU Berlin, Berlin, Germany) headed by Klaus- Robert Müller . In 2014 he did an internship with the eScience Group, led by David Heckerman (Mi- crosoft Research, Los Angeles, US). Before, he was employed as a research associate from 2010-2014 and during 2010-2012 also affiliated with the Friedrich Miescher Laboratory of the Max Planck Society in Tïbingen, where he was co-advised by Gunnar Rätsch. He received a diploma degree (MSc equivalent) in computer engineering (T echnische Informatik) from the Berlin Institute of T echnology with a thesis in machine learning for computer security in 2010. Feiyu Xu is Principal Researcher and Head of Research Group T ext Analytics in the Language T echnology Lab of DFKI. She also is co-founder of Y ocoy T echnologies GmbH, a 2007 spin-of f from DFKI. Y ocoy is developing next generation mobile language and travel guides. Since 2004, Dr. Xu is vice-director of the Joint Research Laboratory for Language T echnology of Shanghai Jiao T ong Univ ersity and Saarland University . Feiyu Xu studied technical translation at T ongji University in Shanghai after having been nominated and selected with a wai ver of the national admission exam in year 1987. She then studied computational linguistics at Saarland University from 1992 to 1998 and graduated by receiving a Diplom (MSc) with distinction. Her PhD-Thesis is about "bootstrapping relation extraction from semantic seed" in "information extraction". In 2014, Feiyu Xu has completed a habilitation in big text data analytics. In 2012, Feiyu Xu has won a Google Focused Research A w ard for Natural Language Understanding as co-PI with Hans Uszkoreit and Roberto Navigli. In 2014, Feiyu Xu was honored as DFKI Research Fellow . She has extensi ve experience in multilingual information systems, information extraction, text mining, big data analytics, business intelligence, question answering and mobile applications of NLP technologies. She has successfully led more than 30 national and international research and de velopment projects. She has broad and in-depth experience of the total cycle of innovation in her expert areas, from basic research, to application and dev elopment and finally to products and their commercialization.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment