A Critical Examination of RESCAL for Completion of Knowledge Bases with Transitive Relations

Link prediction in large knowledge graphs has received a lot of attention recently because of its importance for inferring missing relations and for completing and improving noisily extracted knowledge graphs. Over the years a number of machine learn…

Authors: Pushpendre Rastogi, Benjamin Van Durme

A Critical Examination of RESCAL f or Completion of Knowledge Bases with T ransitiv e Relations Pushpendre Rastogi † and Benjamin V an Durme Johns Hopkins Uni versity Abstract Link prediction in large kno wledge graphs has recei ved a lot of attention recently because of its importance for inferring missing relations and for completing and improving noisily extracted kno wledge graphs. Over the years a number of ma- chine learning researchers ha ve presented v arious models for predicting the presence of missing relations in a kno wledge base. Although all the pre vious methods are pre- sented with empirical results that show high performance on select datasets, there is almost no pre vious work on understand- ing the connection between properties of a kno wledge base and the performance of a model. In this paper we analyze the RESCAL method (Nickel et al., 2011) and sho w that it can not encode asymmetric transitive relations in kno wledge bases. 1 Introduction Large-scale and highly accurate knowledge bases (KB) such as Freebase (Bollacker et al., 2008) and Y A GO2 (Hoffart et al., 2013), ha ve come to be recognized as essential for high per - formance on v arious Natural Language Process- ing(NLP) tasks. Relation extraction, Question An- swering (Dalton et al., 2014; Fader et al., 2014; Y ao and V an Durme, 2014) and Entity Recogni- tion/Disambiguation in informal domains (Ritter et al., 2011; Zheng et al., 2012) are a fe w examples of tasks where KBs ha ve prov ed to be inv aluable. As these examples demonstrate, increasing the re- call of knowledge bases without compromising on the precision has a direct impact on several tasks that are the focus of NLP research. Because of this importance of high recall in kno wledge bases † pushpendre@jhu.edu and because the recall of ev en Freebase, the largest open source KB, is still quite low 1 a number of researchers hav e published heuristics with their em- pirical performance on automatically inferring the information that is missing in knowledge bases. Unfortunately the literature on theoretical analysis for these methods is still scarce. In this paper we analyze RESCAL (Nickel et al., 2011) which is a widely cited method for in- ferring missing relations in KBs. The RESCAL method embeds entities and relations in a KB using vectors and matrices respectiv ely and it predicts the true status of en edge between two nodes us- ing these representations. Although RESCAL was introduced in 2011 and has been sho wn to be effec- ti ve on a v ariety of datasets (T outanov a et al., 2015; Nickel et al., 2011; Nickel et al., 2012) there has been no theoretical analysis of the failure modes of this method. W e sho w , both theoretically and experimentally (Sections 2 and 3), that RESCAL is not suitable for predicting missing relations in a KB that contains transitiv e and asymmetric relations such as the “type of ” relation which is very impor - tant in Freebase (Guha, 2015) and the “hypernym” relation which is important in W ordNet (Miller , 1995). 2 Analysis of RESCAL Notation: A knowledge base contains, b ut is not equal to, a collection of (subject, r elation, object) triples. Each triple encodes the fact that a subject entity is related to an object through a particular type of r elation . Let V and R denote the finite set of entities and relationships. W e assume that R includes a type for the null r elation or no relation . Let V = |V | and R = |R| denote the number of entities and relations. W e use v and r to denote a 1 It was reported by Dong et al. (2014) in October 2013, that 71% of people in Freebase had no kno wn place of birth and that 75% had no known nationality . generic entity and relation respecti vely . The short- hand [ n ] denotes { x | 1 ≤ x ≤ n, x ∈ N } . Let E be the number of triples known to us and let e denote a generic triple. W e denote the subject, ob- ject and relation of e through e sub ∈ V , e obj ∈ V and e rel ∈ R respecti vely and we denote the entire collection of facts as E = { e k | k ∈ [E] } . RESCAL: The RESCAL model associates each entity v with the vector a v ∈ R d and it represents the relation r through the matrix M r ∈ R d × d . Let v and v 0 denote two entities whose relationship is unkno wn, the RESCAL model predicts the relation between v and v 0 to be: ˆ r = argmax r ∈R s ( v , r , v 0 ) (1) s ( v , r , v 0 ) = s ( v , r , v 0 ) = a T v M r a v 0 (2) Note that in general if the matrix M r is asym- metric then the score function s would also be asymmetric, i.e., s ( v , r , v 0 ) 6 = s ( v 0 , r, v ) . Let Θ = { a v | v ∈ V } ∪ { M r | r ∈ R} . Clearly Θ parameterizes RESCAL. Therefore e ven though the same embedding is used to represent an entity regardless of whether it is the first or the second entity in a relation, the RESCAL model could still handle asymmetric relations if the matrix M r is asymmetric. T ransitiv e Relations and RESCAL: In addi- tion to relational information about the binary con- nections between entities, many KBs contain infor- mation about the relations themselves. For exam- ple, consider the toy knowledge base depicted in Figure 1. Based on the information that Fluffy is-a Dog and that a Dog is-a Animal and that is-a is a transiti ve relations we can infer missing relations such as Fluffy is-a Animal . Fluffy ! Dog ! Animal ! Organism ! Figure 1: A toy kno wledge base containing only is-a rela- tions. The dashed edges indicate unobserved relations that can be recovered using the observed edges and the fact that is-a is a transitiv e relation. Let us now analyze what happens when we encode a transitiv e, asymmetric relation with RESCAL. Consider the situation where the set R only contains two relations { r 0 , r 1 } . r 1 denotes the presence of the is-a relation and r 0 denotes the absence of that relation. The RESCAL model can only follo w the chain of transitiv e relations and infer missing edges using existing information in the graph if for all triples of vertices v , v 0 , v 00 in V for which we hav e observed ( v , is-a, v 0 ) and ( v 0 , is-a, v 00 ) the follo wing holds true: s ( v , r 1 , v 0 ) > s ( v , r 0 , v 0 ) ∧ s ( v 0 , r 1 , v 00 ) > s ( v 0 , r 0 , v 00 ) = ⇒ s ( v , r 1 , v 00 ) > s ( v , r 0 , v 00 ) This can be re written as: ∀ v , v 0 , v 00 ∈ V a T v ( M r 1 − M r 0 ) a v 0 > 0 ∧ a T v 0 ( M r 1 − M r 0 ) a v 00 > 0 = ⇒ a T v ( M r 1 − M r 0 ) a v 00 > 0 (3) W e now define a tr ansitive matrix and and state a theorem that we prov e in the Appendix. Definition W e say that a matrix M ∈ R d × d is transiti ve if e very triple of vectors a, b, c ∈ R d that satisfy a T M b > 0 and b T M c > 0 also satisfy a T M c > 0 . Theorem 1. Every transitive matrix is symmetric. If we enforce the constraint in Equation 3 to hold for all possible vectors and not just a finite number of vectors then M r 1 − M r 0 is a transiti ve matrix. By Theorem 1 M r 1 − M r 0 must be symmetric. This implies that if the RESCAL model predicts that s ( v , r 1 , v 0 ) > s ( v , r 0 , v 0 ) then it would also predict that s ( v 0 , r 1 , v ) > s ( v 0 , r 0 , v ) . In terms of the toy KB shown in Figure 1; if the RESCAL model predicts that Fluffy is-a Animal then it would also predict that Animal is-a Fluffy . Therefore the RESCAL model is not suitable for encoding assymmetric, transiti ve relations. 3 Experiments During our analysis in Section 2, we made assump- tion that the constraint of equation 3 held o ver all vectors in R d instead of just a finite number of vector triples. This assumption w as used to make conclusions about RESCAL using Theorem 1. A fair criticism of our analysis is that practically the RESCAL model only needs to encode a finite number of v ertices into vector space and it is pos- sible that there exists an asymmetric matrix that can correctly make the finite number of deductions that are possible inside a finite KB. This could be especially true when the dimensionality d of the RESCAL embeddings is high. On the other hand, it is intuitiv e that as the number of entities inside a KB increases our assumptions and analysis would become increasingly better approximations of re- ality . Therefore the performance of the RESCAL model should degrade as the number of entities inside the KB increases and the dimensionality of the embeddings remains constant. 3.1 On Simulated Data In order to test the applicability of our analysis we performed the follo wing experiment: W e started with a complete, balanced, rooted, directed binary tree T , with edges directed fr om the root to its children. W e then augmented T as follows: For e very tuple of distinct vertices v , v 0 we added a new edge to T if there already existed a directed path starting at v and ending at v 0 in T . W e stopped when we could not add any more edges without creating multi-edges. For the rest of the paper we denote this resulting set of ordered pairs of v ertices as E and those pairs of vertices that are not in E as E c . For example E contains an edge from the root verte x to e very other vertex and E c contains an edge from ev ery vertex to the root vertex. For a tree of depth 11, V = 2047 , E = 18 , 434 and |E c | = 4 , 171 , 775 . W e trained the RESCAL model under two set- tings: In the first setting we used entire E and E c as training inputs to the RESCAL model. W e de- note this setup as FullSet . In the second setting we randomly sample E c and select only E = |E | edges from E c . W e denote this training setup as SubSet . Note ho wev er , that all in the edges in E including all the edges in the original tree are always used during both FullSet and SubSet . For both the settings of FullSet and SubSet we trained the RESCAL model 5 times and e v aluated the models’ predictions on the following three sub- sets of the edges: E , E c and E ( rev ) . E and E c were introduced earlier . T o recall, E contains all ordered pairs of v ertices that are in the transitive relation of being connected, E c contains pairs of vertices that are not connected and not in a relation. E ( rev ) denotes the set of ordered pairs whose rev erse pair exists in E . I.e., E rev = { ( u, v ) | ( v , u ) ∈ E } . For e very edge in each of these subsets we ev aluate the model’ s performance under 0 − 1 loss. For example, when we ev aluate the performance of RESCAL on an edge ( v , v 0 ) ∈ E we e valuate whether the model assigns a higher score to ( v , r 1 , v 0 ) than ( v , r 0 , v 0 ) and re ward the model by 1 point if it makes the right prediction and 0 otherwise. As before, r 1 and r 0 denote the presence and abscence of relationship between v and v 0 . W e note that low Performance on E rev and high performance on E would indicate e xactly the kind of failure that we predicted from our analysis. As explained earlier , the dimensionality of the RESCAL embedding, d , and the number of enti- ties, V significantly influence the performance of RESCAL therefore we v ary them and tab ulate the results in T able 1 and 2. upload to arxiv d V = 2047 4095 8191 50 66 100 100 60 100 100 54 100 100 100 76 100 100 69 100 100 63 100 100 200 86 100 100 79 100 100 72 100 100 400 94 100 100 88 100 100 81 100 100 T able 1: Percentage accuracy of RESCAL with FullSet . Every table element is a triple of numbers measuring the per - formance of RESCAL on E , E c , E rev respectiv ely . V denotes the number of nodes in the tree and d denotes the number of dimensions used to parameterize the entities. d V = 2047 4095 8191 50 100 93 52 100 91 48 100 89 44 100 100 78 58 100 92 56 100 89 52 200 100 60 72 100 71 61 100 90 59 400 100 54 67 100 57 70 100 65 62 T able 2: Accuracy of RESCAL trained with SubSet . 3.2 On W ordNet In order to test our analysis on real data we per- formed experiments on the W ordNet dataset. W ord- Net contains vertices called synsets that are ar- ranged in a tree like hierarchy under the relation of hyponymy . For example, a dog is a hyponym of animal and an animal is a hyponym of living - thing therefore a dog is a hypon ym of living thing . T o conduct our experiments we extracted all the hyponyms of the living thing synset as a tree and edges to that tree to form a transitiv e closure un- der the hypon ym relation. The living thing synset contained 16255 hyponyms which were connected with 16489 edges and after performing the transi- ti ve closure the number of edges became 128241 , i.e., V = 16 , 255 and E = 128 , 241 . W e per- formed tw o experiments under the FullSet and Sub- Set protocols in exactly the same way as described in Section 3.1 with the new graph. The results, sho wn in T able 3, exhibit the same trends as seen in T able 1 and 2. See the follo wing section for a more thorough discussion of results. d FullSet SubSet 50 71 100 100 100 93 58 100 79 100 100 100 94 60 200 84 100 100 100 93 63 400 89 100 100 100 68 69 T able 3: Results from experiments on W ordNet. Specifically we chose to use the subtree rooted at the living things synset from the W ordNet hierar- chy . Every synset in the subtree corresponds to a verte x. Consequently , for all our experiments V = 16413 . 4 Related W ork Most previous works for inferring the missing infor- mation in kno wledge bases assumes that a kno wl- edge base is just a graph with labeled vertices and labeled edges (Nickel et al., 2016; T outanova et al., 2015) and they either focus on inferring which labeled edge, if any , should be used to connect two pre viously unconnected vertices or the y try to learn what v ertex label/entity type, if an y , should be used to annotate an unlabeled entity . The task of predicting missing edges in a KB, which we focus on, has pre viously been called Link Prediction (Liben-Nowell and Kleinberg, 2007; Nickel et al., 2011), Kno wledge Base Completion (KBC) (Socher et al., 2013; W est et al., 2014) or more broadly Relational Machine Learning (Nick el et al., 2016). 2 Besides the before-mentioned papers the follo wing publications also present models for KBC which we list without comments (Bordes et al., 2011; Lao et al., 2011; Gardner and Mitchell, 2015; Lin et al., 2015; Zhao et al., 2015; W ang et al., 2015; He et al., 2015; W ei et al., 2015). 5 Results and Discussion T able 1 and 2 sho w the performance of the RESCAL model, for encoding three subsets of re- lational information, E , E c and E rev in increasingly 2 The terminology used for the task of inferring missing verte x labels, which is not the focus of this paper, is ev en more div erse. This task has been termed Class/Labeled Instance acquisition (V an Durme and Pasca, 2008; T alukdar and Pereira, 2010), Collectiv e Classification (Sen et al., 2008) and V ertex Nomination (Fishkind et al., 2015). large KBs with a single transiti ve relation under a broad range of settings. The results in T able 1 were obtained by feed- ing RESCAL E ∪ E c as training data. Note that RESCAL recei ved all possible information during training so we are e valuating the training accurac y of the model at this point. Lo w accuracy under this setting implies that the model does not have the ca- pacity to learn the rules in the knowledge base. W e observe that the accuracy of RESCAL decreases as the number of entities, V increases and it increases as the dimensionality , d increases which in line with our predictions. W e also note that since E c is much larger than E therefore the training objectiv e of RESCAL fa vors good performance on E c and accordingly the accuracy of RESCAL on edges in E c remains high b ut the performance on E suf fers. The high accurac y of RESCAL with V = 2047 and d = 400 suggests that with a high enough dimen- sionality of the embeddings it is possible to embed a finite database with high accurac y . But increasing the dimensionality of RESCAL embeddings can be- come infeasible for an extremely lar ge kno wledge base. Also we can observe that the performance of RESCAL degrades as the number of entities in- side the KB increases and the dimensionality of the embeddings remains constant. The results in T able 2 were obtained by training RESCAL with E and a subset of E c . This train- ing method is closer to the way such embedding based methods for KBC are usually trained (Nickel et al., 2016). W e observe that the accuracy of the RESCAL model on E rev is substantially lo wer than its performance on either E or E c , especially in the upper triangle region of the table where V is high and d is low . This result is in accordance with our analysis that under the RESCAL mode if s ( v , r 1 , v 0 ) > s ( v , r 0 , v 0 ) then s ( v 0 , r 1 , v ) > s ( v 0 , r 0 , v ) as well. Our results also highlight a problem with the commonly employed KBC e val- uation protocol of randomly di viding the edge set of a graph into train and test sets for measuring kno wledge base completion accuracy . For example with d = 50 the a verage accuracy on both E and E c is quite high b ut on E rev accuracy is low ev en though E rev is a subset of E c . Such a failure w ould stay undetected with existing e v aluation methods. 6 Conclusions In this paper we in vestigated a popular KBC algo- rithm named RESCAL and through our analysis of the scoring function employed in RESCAL and our experiments on simulated data, we sho wed that the RESCAL method does not perform well in encoding transitiv e and asymmetric relations and specifically that its inferences about edges that are the re verse of edges that are present in a knowledge hav e a high chance of being incorrect. Although our analysis relied on somewhat strong assump- tions that the constraint in equation 2 holds true ov er all points in the vector space we sho wed that the insights gained were useful in practice. One of the key idea underlying our w ork was that kno wledge bases should be considered as more than just graphs since KBs also contain logical structure amongst the predicates. By taking such logical structure, e.g., the constraint that if v ertex v connects to v 0 and v 0 connects to v 00 then v con- nects to v 00 , to a logical e xtreme we came up with a well founded argument about the performance of RESCAL in encoding kno wledge bases with tran- siti ve relations. W e belie ve that this idea can be gainfully used to analyze other KBC methods as well. A Proof of Theor em 1 W e note that Theorem 1 was first prov en by Grin- berg (2015). Here we gi ve an alternati ve proof. Lemma 2. Every transitive matrix is PSD. Pr oof. Consider the triplet of v ectors c := x, b := M c, a := M b . Then a T ( M b ) = || M b || 2 ≥ 0 and b T ( M c ) = || b || 2 ≥ 0 and a T M c = b T M b . Either b = 0 or b 6 = 0 and M b = 0 , or both M b 6 = 0 and b 6 = 0 which implies b T M b > 0 (by transiti vity). In all three cases b T M b ≥ 0 . Lemma 3. Let M 1 , M 2 ∈ R d × d \ { 0 } . If ∀ x, y : x T M 1 y > 0 = ⇒ x T M 2 y > 0 then M 1 = λM 2 for some λ > 0 . W e defer the proof of this technical lemma to the supplementary material submitted with the paper . Lemma 4. If ∃ x, y x T M y > 0 but x T M T y < 0 then M is not transitive. Pr oof . Let x, y be two vectors that satisfy x T M y > 0 and x T M T y < 0 . Since x T M T y = y T M x therefore y T M ( − x ) > 0 . If we assume M is transitive, then x T M ( − x ) > 0 by transitivity , but Lemma 2 sho ws such an x cannot exist. Theorem 1. Every transitive matrix is symmetric. Pr oof . By Lemma 4 x T M y > 0 = ⇒ x T M T y > 0 . Using Lemma 3 we get M = λM T for some λ > 0 . Clearly λ = 1 . References [Bollacker et al.2008] Kurt Bollacker , Colin Evans, Prav een Paritosh, Tim Sturge, and Jamie T aylor . 2008. Freebase: a collaborati vely created graph database for structuring human kno wledge. In Pr o- ceedings of the 2008 ACM SIGMOD international confer ence on Management of data , pages 1247– 1250. A CM. [Bordes et al.2011] Antoine Bordes, Jason W eston, Ro- nan Collobert, and Y oshua Bengio. 2011. Learning structured embeddings of kno wledge bases. In AAAI Confer ence on Artificial Intelligence . [Dalton et al.2014] Jeffrey Dalton, Laura Dietz, and James Allan. 2014. Entity query feature expan- sion using knowledge base links. In Pr oceedings of the 37th international ACM SIGIR conference on Resear ch & development in information r etrieval , pages 365–374. A CM. [Dong et al.2014] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, W ilko Horn, Ni Lao, Ke vin Murphy , Thomas Strohmann, Shaohua Sun, and W ei Zhang. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Pr oceedings of the 20th A CM SIGKDD international confer ence on Knowledge discovery and data mining , pages 601– 610. A CM. [Fader et al.2014] Anthony Fader , Luke Zettlemoyer , and Oren Etzioni. 2014. Open question answering ov er curated and extracted kno wledge bases. In Pr o- ceedings of the 20th A CM SIGKDD International Confer ence on Knowledge Discovery and Data Min- ing , KDD ’14, pages 1156–1165, New Y ork, NY , USA. A CM. [Fishkind et al.2015] Donniell E Fishkind, V ince L yzin- ski, Henry Pao, Li Chen, Carey E Priebe, et al. 2015. V ertex nomination schemes for membership predic- tion. The Annals of Applied Statistics , 9(3):1510– 1532. [Gardner and Mitchell2015] Matt Gardner and T om Mitchell. 2015. Efficient and expressi ve knowledge base completion using subgraph feature extraction. In Pr oceedings of the 2015 Conference on Empiri- cal Methods in Natur al Langua ge Pr ocessing , pages 1488–1498, Lisbon, Portugal, September . Associa- tion for Computational Linguistics. [Grinberg2015] Darij Grinberg. 2015. Existence and characterization of transiti ve matrices? MathOver - flow . URL:http://mathoverflo w .net/q/212808 (ver- sion: 2015-08-01). [Guha2015] Ramanathan Guha. 2015. T o wards a model theory for distrib uted representations. In AAAI Spring Symposium Series . [He et al.2015] W enqiang He, Y ansong Feng, Lei Zou, and Dongyan Zhao. 2015. Kno wledge base comple- tion using matrix factorization. In W eb T echnolo gies and Applications , pages 256–267. Springer . [Hoff art et al.2013] Johannes Hof fart, F abian M Suchanek, Klaus Berberich, and Gerhard W eikum. 2013. Y ago2: A spatially and temporally en- hanced knowledge base from wikipedia. Artificial Intelligence , 194:28–61. [Lao et al.2011] Ni Lao, T om Mitchell, and W illiam W . Cohen. 2011. Random walk inference and learn- ing in a large scale knowledge base. In Pr oceedings of the Confer ence on Empirical Methods in Natu- ral Language Pr ocessing , EMNLP ’11, pages 529– 539, Stroudsburg, P A, USA. Association for Compu- tational Linguistics. [Liben-Nowell and Kleinber g2007] David Liben- Nowell and Jon Kleinberg. 2007. The link- prediction problem for social networks. Journal of the American society for information science and technology , 58(7):1019–1031. [Lin et al.2015] Y ankai Lin, Zhiyuan Liu, Maosong Sun, Y ang Liu, and Xuan Zhu. 2015. Learning en- tity and relation embeddings for knowledge graph completion. In The 29th AAAI Conference on Artifi- cial Intelligence , pages 2181–2187. AAAI. [Miller1995] George A Miller . 1995. W ordnet: a lex- ical database for english. Communications of the A CM , 38(11):39–41. [Nickel et al.2011] Maximilian Nickel, V olk er T resp, and Hans-Peter Kriegel. 2011. A three-way model for collectiv e learning on multi-relational data. In Pr oceedings of the 28th international conference on machine learning (ICML-11) , pages 809–816. [Nickel et al.2012] Maximilian Nickel, V olk er T resp, and Hans-Peter Kriegel. 2012. Factorizing yago: scalable machine learning for linked data. In Pr o- ceedings of the 21st international confer ence on W orld W ide W eb , pages 271–280. A CM. [Nickel et al.2016] Maximilian Nickel, Ke vin Murphy , V olk er T resp, and Evgeniy Gabrilovich. 2016. A re- view of relational machine learning for knowledge graphs. Pr oceedings of the IEEE , 104(1):11–33, Jan. [Ritter et al.2011] Alan Ritter , Sam Clark, Oren Etzioni, et al. 2011. Named entity recognition in tweets: an experimental study . In Pr oceedings of the Con- fer ence on Empirical Methods in Natural Language Pr ocessing , pages 1524–1534. Association for Com- putational Linguistics. [Sen et al.2008] Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor , Brian Galligher, and T ina Eliassi-Rad. 2008. Collectiv e classification in network data. AI magazine , 29(3):93. [Socher et al.2013] Richard Socher , Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Reasoning with neural tensor networks for knowl- edge base completion. In Advances in Neur al Information Pr ocessing Systems , pages 926–934. [T alukdar and Pereira2010] Partha Pratim T alukdar and Fernando Pereira. 2010. Experiments in graph- based semi-supervised learning methods for class- instance acquisition. In Proceedings of the 48th An- nual Meeting of the Association for Computational Linguistics , pages 1473–1481. Association for Com- putational Linguistics. [T outanov a et al.2015] Kristina T outanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury , and Michael Gamon. 2015. Representing text for joint embedding of text and knowledge bases. In Pr oceedings of the 2015 Conference on Empiri- cal Methods in Natur al Langua ge Pr ocessing , pages 1499–1509, Lisbon, Portugal, September . Associa- tion for Computational Linguistics. [V an Durme and Pasca2008] Benjamin V an Durme and Marius Pasca. 2008. Finding cars, goddesses and enzymes: Parametrizable acquisition of labeled in- stances for open-domain information extraction. In AAAI , volume 8, pages 1243–1248. [W ang et al.2015] Quan W ang, Bin W ang, and Li Guo. 2015. Kno wledge base completion using embed- dings and rules. In Pr oceedings of the 24th In- ternational Confer ence on Artificial Intelligence , IJ- CAI’15, pages 1859–1865. AAAI Press. [W ei et al.2015] Zhuoyu W ei, Jun Zhao, Kang Liu, Zhenyu Qi, Zhengya Sun, and Guanhua Tian. 2015. Large-scale knowledge base completion: Inferring via grounding network sampling over selected in- stances. In Pr oceedings of the 24th ACM Interna- tional on Conference on Information and Knowl- edge Management , CIKM ’15, pages 1331–1340, New Y ork, NY , USA. A CM. [W est et al.2014] Robert W est, Evgeniy Gabrilovich, Ke vin Murphy , Shaohua Sun, Rahul Gupta, and Dekang Lin. 2014. Knowledge base completion via search-based question answering. In WWW . [Y ao and V an Durme2014] Xuchen Y ao and Benjamin V an Durme. 2014. Information extraction ov er structured data: Question answering with freebase. In Pr oceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long P apers) , pages 956–966, Baltimore, Mary- land, June. Association for Computational Linguis- tics. [Zhao et al.2015] Y u Zhao, Sheng Gao, Patrick Galli- nari, and Jun Guo. 2015. Knowledge base comple- tion by learning pairwise-interaction differentiated embeddings. Data Mining and Knowledge Discov- ery , 29(5):1486–1504. [Zheng et al.2012] Zhicheng Zheng, Xiance Si, Fang- tao Li, Edward Y Chang, and Xiaoyan Zhu. 2012. Entity disambiguation with freebase. In Pr oceed- ings of the The 2012 IEEE/WIC/ACM International Joint Confer ences on W eb Intelligence and Intel- ligent Agent T echnolo gy-V olume 01 , pages 82–89. IEEE Computer Society . Supplementary Material Before pro ving Lemma 3 let us present its analogue for vectors. Lemma 5. Let x, y ∈ R d \ { 0 } . If @ z ∈ R d such that x T z > 0 and y T z < 0 then x = λy for some λ > 0 . Pr oof. If x = λy then x T y = λy T y . Since y T y > 0 therefore λ > 0 . In the case that x 6 = λy then by Cauchy Schwartz inequality D := ( x T y ) 2 − ( x T x )( y T y ) 6 = 0 . Consider the vector αx + β y with α = − x T y + y T y D and β = x T y + x T x D . It is easy to check that ( αx + β y ) T x and ( αx + β y ) T y equal 1 and − 1 , which contradicts the hypothesis. Lemma 3. Let M 1 , M 2 ∈ R d × d \ { 0 } . If ∀ x, y : x T M 1 y > 0 = ⇒ x T M 2 y > 0 then M 1 = λM 2 for some λ > 0 . Pr oof. Choose an x ∈ R d for which x T M 1 6 = 0 . If such an x does not M 1 = 0 in contradiction to the hypothesis. Note that if x T M 1 y 6 = 0 then either x T M 1 y or x T M 1 − y would be positiv e. Since ( x T M 1 ) y > 0 = ⇒ ( x T M 2 ) y > 0 therefore @ y for which ( x T M 1 ) y > 0 but ( x T M 2 ) y < 0 . By Lemma 5 x T M 1 = λ x x T M 2 . Furthermore from the proof of Lemma 5 λ x = x T M 1 M 2 x x T M 2 M 2 x therefore λ x is continous with respect to x . Now we prove that λ x is constant. Consider vectors x and αx . As sho wn earlier , ( αx ) T M 1 = λ αx ( αx ) T M 2 . But ( αx ) T M 1 = α ( x T M 1 ) = αλ x x T M 2 . Therefore λ αx = λ x . Since λ x is continous at 0 therefore λ αx equals the constant λ 0 . This implies x T ( M 1 − λ 0 M 2 ) = 0 . Clearly λ = λ 0 > 0 .

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment