In this paper, we describe ROOT13, a supervised system for the classification of hypernyms, co-hyponyms and random words. The system relies on a Random Forest algorithm and 13 unsupervised corpus-based features. We evaluate it with a 10-fold cross validation on 9,600 pairs, equally distributed among the three classes and involving several Parts-Of-Speech (i.e. adjectives, nouns and verbs). When all the classes are present, ROOT13 achieves an F1 score of 88.3%, against a baseline of 57.6% (vector cosine). When the classification is binary, ROOT13 achieves the following results: hypernyms-co-hyponyms (93.4% vs. 60.2%), hypernymsrandom (92.3% vs. 65.5%) and co-hyponyms-random (97.3% vs. 81.5%). Our results are competitive with stateof-the-art models.
💡 Deep Analysis
📄 Full Content
ROOT13: Spotting Hypernyms, Co-Hyponyms and Randoms
Enrico Santus*, Alessandro Lenci§, Tin-Shing Chiu*, Qin Lu*, Chu-Ren Huang*
* The Hong Kong Polytechnic University, Hong Kong
esantus@gmail.com, cstschiu@comp.polyu.edu.hk, {qin.lu, churen.huang}@polyu.edu.hk
§ University of Pisa, Italy
alessandro.lenci@ling.unipi.it
Abstract
In this paper, we describe ROOT13, a supervised system for
the classification of hypernyms, co-hyponyms and random
words. The system relies on a Random Forest algorithm and
13 unsupervised corpus-based features. We evaluate it with
a 10-fold cross validation on 9,600 pairs, equally distributed
among the three classes and involving several Parts-Of-
Speech (i.e. adjectives, nouns and verbs). When all the clas-
ses are present, ROOT13 achieves an F1 score of 88.3%,
against a baseline of 57.6% (vector cosine). When the clas-
sification is binary, ROOT13 achieves the following results:
hypernyms-co-hyponyms (93.4% vs. 60.2%), hypernyms-
random (92.3% vs. 65.5%) and co-hyponyms-random
(97.3% vs. 81.5%). Our results are competitive with state-
of-the-art models.
Introduction and Related Work Distinguishing hypernyms (e.g. dog-animal) from co-
hyponyms (e.g. dog-cat) and, in turn, discriminating them
from random words (e.g. dog-fruit) is a fundamental task
in Natural Language Processing (NLP). Hypernymy in fact
represents a key organization principle of semantic
memory (Murphy, 2002), the backbone of taxonomies and
ontologies, and one of the crucial inferences supporting
lexical entailment (Geffet and Dagan, 2005). Co-
hyponymy (or coordination), on the other hand, is the rela-
tion held by words sharing a close hypernym, which are
therefore attributionally similar (Weeds et al., 2014).
The ability of discriminating hypernymy, co-hyponymy
and random words has potentially infinite applications, in-
cluding automatic thesauri creation, paraphrasing, textual
entailment, sentiment analysis and so on (Weeds et al.,
2014). For this reason, in the last decades, numerous meth-
ods, datasets and shared tasks have been proposed to im-
prove computers’ ability in such discrimination, generally
achieving promising results (Weeds et al., 2014; Rimmel,
2014; Geffet and Dagan, 2005). Both supervised and unsu-
pervised approaches have been investigated. The former
have been shown to outperform the latter in Weeds et al.
(2014), even though Levy et al. (2015) have recently
claimed that these methods may learn whether a term y is a
prototypical hypernym, regardless of its actual relation
with a term x.
In this paper, we propose a supervised method, based on
a Random Forest algorithm and 13 corpus-based features.
In our evaluation, carried out using the 10-fold cross vali-
dation on 9,600 pairs, we achieved an accuracy of 88.3%
when the three classes are present, and of 92.3% and
97.3% when only two classes are present. Such results are
competitive with the state-of-the-art (Weeds et al., 2014).
Method and Evaluation
ROOT13 uses the Random Forest algorithm implemented
in Weka (Breiman, 2001), with the default settings. It relies
on 13 features that are described below. Each of them is
automatically extracted from a window-based Vector
Space Model (VSM), built on a combination of ukWaC
and WaCkypedia corpora (around 2.7 billion words) and
recording word co-occurrences within the 5 nearest content
words to the left and right of each target.
FEATURES. The feature set was designed to identify sev-
eral distributional properties characterizing the terms in the
pairs. On top of the standard features (e.g. vector cosine,
co-occurrence and frequencies), we have added several
features capturing the generality of the terms and of their
contexts1, plus two unsupervised measures for capturing
similarity (Santus et al., 2014b-c). All the features are
normalized in the range 0-1:
• Cos: vector cosine (Turney and Pantel, 2010);
• Cooc: co-occurrence frequency;
• Freq 1, 2: two features storing the frequency the terms;
• Entr 1, 2: two features storing the entropy of the terms;
1 Generality is measured as for Santus et al. (2014a).
• Shared: extent of the intersection between the top 1k
most mutually related contexts of the two terms2;
• APSyn: for every context in the intersection between
the top 1k most mutually related contexts of the two
terms, this measure adds 1, divided by its average rank
(Santus et al. 2014b-c);
• Diff Freqs: difference between the terms frequencies;
• Diff Entrs: difference between the terms entropies3;
• C-Freq 1, 2: two features storing the average frequency
among the top 1k most mutually related contexts for
each term;
• C-Entr 1, 2: two features, storing the average entropy
among the top 1k most mutual