The article describes a model of automatic analysis of puns, where a word is intentionally used in two meanings at the same time (the target word). We employ Roget's Thesaurus to discover two groups of words which, in a pun, form around two abstract bits of meaning (semes). They become a semantic vector, based on which an SVM classifier learns to recognize puns, reaching a score 0.73 for F-measure. We apply several rule-based methods to locate intentionally ambiguous (target) words, based on structural and semantic criteria. It appears that the structural criterion is more effective, although it possibly characterizes only the tested dataset. The results we get correlate with the results of other teams at SemEval-2017 competition (Task 7 Detection and Interpretation of English Puns) considering effects of using supervised learning models and word statistics.
Deep Dive into Detecting Intentional Lexical Ambiguity in English Puns.
The article describes a model of automatic analysis of puns, where a word is intentionally used in two meanings at the same time (the target word). We employ Roget’s Thesaurus to discover two groups of words which, in a pun, form around two abstract bits of meaning (semes). They become a semantic vector, based on which an SVM classifier learns to recognize puns, reaching a score 0.73 for F-measure. We apply several rule-based methods to locate intentionally ambiguous (target) words, based on structural and semantic criteria. It appears that the structural criterion is more effective, although it possibly characterizes only the tested dataset. The results we get correlate with the results of other teams at SemEval-2017 competition (Task 7 Detection and Interpretation of English Puns) considering effects of using supervised learning models and word statistics.
Computational Linguistics and Intellectual Technologies:
Proceedings of the International Conference “Dialogue 2017”
Moscow, May 31—June 3, 2017
Detecting Intentional Lexical
Ambiguity in English Puns
Mikhalkova E. V. (e.v.mikhalkova@utmn.ru),
Karyakin Yu. E. (y.e.karyakin@utmn.ru)
Tyumen State University, Tyumen, Russia
The article describes a model of automatic analysis of puns, where a word
is intentionally used in two meanings at the same time (the target word).
We employ Roget’s Thesaurus to discover two groups of words, which,
in a pun, form around two abstract bits of meaning (semes). They become
a semantic vector, based on which an SVM classifier learns to recognize
puns, reaching a score 0.73 for F-measure. We apply several rule-based
methods to locate intentionally ambiguous (target) words, based on struc-
tural and semantic criteria. It appears that the structural criterion is more
effective, although it possibly characterizes only the tested dataset. The re-
sults we get correlate with the results of other teams at SemEval-2017 com-
petition (Task 7 Detection and Interpretation of English Puns), considering
effects of using supervised learning models and word statistics.
Keywords: lexical ambiguity, pun, computational humor, thesaurus
Распознавание намеренной
лексической неоднозначности
в английских каламбурах
Михалькова Е. В. (e.v.mikhalkova@utmn.ru),
Карякин Ю. Е. (y.e.karyakin@utmn.ru)
ФГАОУ ВО «Тюменский государственный
университет», Тюмень, Россия
Mikhalkova E. V., Karyakin Yu. E.
1. Concerning puns
Computational humor is a branch of computational linguistics, which developed
fast in the 1990s. Its two main goals are interpretation and generation of all kinds
of humor.1 Recently we noticed a new rise of attention to this research area, espe-
cially concerning analysis of short genres like tweets [Davidov et al. 2010; Reyes et al.
2013; Castro et al. 2016]. Furthermore, a number of tasks at SemEval-2017 (an annual
event, organized by the Association for Computational Linguistics) was about analyz-
ing short funny utterances, like humorous tweets (Task 6: #HashtagWars: Learning
a Sense of Humor) and puns (Task 7: Detection and Interpretation of English Puns).
The following article is an extended review of the algorithm that we used for pun
recognition in SemEval, Task 7.
In [Miller et al. 2015], Tristan Miller and Iryna Gurevych give a comprehensive
account of what has already been done in automatic recognition of puns. They note
that the study of puns mainly focused around phonological and syntactic, rather than
semantic interpretation. At present, the problem of intentional lexical ambiguity
is viewed more as a WSD-task, solving which is not only helpful in detecting humor,
but can also provide new algorithms of sense evaluation for other NLP-systems.
The following terminology is basic in our research of puns. A pun is a) a short hu-
morous genre, where a word or phrase is used intentionally in two meanings, b) a means
of expression, the essence of which is to use a word or phrase so that in the given context
the word or phrase can be understood in two meanings simultaneously. A target word
is a word, used in a pun in two meanings. A homographic pun is a pun that “exploits
distinct meanings of the same written word” [Miller et al. 2015] (these can be mean-
ings of a polysemantic word, or homonyms, including homonymic word forms). A het-
erographic pun is a pun, in which the target word resembles another word or phrase
in spelling; we will call the latter the second target word. More data on classification
of puns and their elaborated examples can be found in [Hempelmann 2004].
(1) I used to be a banker, but I lost interest.
Ex. 1 (the Banker joke) is a homographic pun; “interest” is the target word.
(2) When the church bought gas for their annual barbecue, proceeds went from the
sacred to the propane.
Ex. 2 (the Church joke) is a heterographic pun; “propane” is the target word,
“profane” is the second target word.
Our model of automatic pun detection is based on the following premise: in a pun,
there are two groups of words and their meanings that indicate the two meanings,
in which the target word or phrase is used. These groups overlap, i.e. contain the same
words, used in different meanings.
In Ex. 1, words and collocations “banker”, “lost interest” point at the professional
status of the narrator and his/her career failure. At the same time, “used to”, “lost in-
terest” tell a story of losing emotional attachment to the profession: the narrator lost
curiosity. We propose an algorithm of homographic pun recognition that discovers
1
In [Mikhalkova 2010] we gave a brief account of main trends in computational humor up to 2010.
Detecting Intentional Lexical Ambiguity in English Puns
these two groups of words and collocations, based on common semes2, which words
in these groups share. When the groups are found, in homographic puns, the nex
…(Full text truncated)…
This content is AI-processed based on ArXiv data.