Detecting Intentional Lexical Ambiguity in English Puns

Reading time: 6 minute
...

📝 Original Info

  • Title: Detecting Intentional Lexical Ambiguity in English Puns
  • ArXiv ID: 1707.05468
  • Date: 2017-07-19
  • Authors: Researchers from original ArXiv paper

📝 Abstract

The article describes a model of automatic analysis of puns, where a word is intentionally used in two meanings at the same time (the target word). We employ Roget's Thesaurus to discover two groups of words which, in a pun, form around two abstract bits of meaning (semes). They become a semantic vector, based on which an SVM classifier learns to recognize puns, reaching a score 0.73 for F-measure. We apply several rule-based methods to locate intentionally ambiguous (target) words, based on structural and semantic criteria. It appears that the structural criterion is more effective, although it possibly characterizes only the tested dataset. The results we get correlate with the results of other teams at SemEval-2017 competition (Task 7 Detection and Interpretation of English Puns) considering effects of using supervised learning models and word statistics.

💡 Deep Analysis

Deep Dive into Detecting Intentional Lexical Ambiguity in English Puns.

The article describes a model of automatic analysis of puns, where a word is intentionally used in two meanings at the same time (the target word). We employ Roget’s Thesaurus to discover two groups of words which, in a pun, form around two abstract bits of meaning (semes). They become a semantic vector, based on which an SVM classifier learns to recognize puns, reaching a score 0.73 for F-measure. We apply several rule-based methods to locate intentionally ambiguous (target) words, based on structural and semantic criteria. It appears that the structural criterion is more effective, although it possibly characterizes only the tested dataset. The results we get correlate with the results of other teams at SemEval-2017 competition (Task 7 Detection and Interpretation of English Puns) considering effects of using supervised learning models and word statistics.

📄 Full Content

Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2017” Moscow, May 31—June 3, 2017 Detecting Intentional Lexical Ambiguity in English Puns Mikhalkova E. V. (e.v.mikhalkova@utmn.ru), Karyakin Yu. E. (y.e.karyakin@utmn.ru) Tyumen State University, Tyumen, Russia The article describes a model of automatic analysis of puns, where a word is intentionally used in two meanings at the same time (the target word). We employ Roget’s Thesaurus to discover two groups of words, which, in a pun, form around two abstract bits of meaning (semes). They become a semantic vector, based on which an SVM classifier learns to recognize puns, reaching a score 0.73 for F-measure. We apply several rule-based methods to locate intentionally ambiguous (target) words, based on struc- tural and semantic criteria. It appears that the structural criterion is more effective, although it possibly characterizes only the tested dataset. The re- sults we get correlate with the results of other teams at SemEval-2017 com- petition (Task 7 Detection and Interpretation of English Puns), considering effects of using supervised learning models and word statistics. Keywords: lexical ambiguity, pun, computational humor, thesaurus Распознавание намеренной лексической неоднозначности в английских каламбурах Михалькова Е. В. (e.v.mikhalkova@utmn.ru), Карякин Ю. Е. (y.e.karyakin@utmn.ru) ФГАОУ ВО «Тюменский государственный университет», Тюмень, Россия Mikhalkova E. V., Karyakin Yu. E.  1. Concerning puns Computational humor is a branch of computational linguistics, which developed fast in the 1990s. Its two main goals are interpretation and generation of all kinds of humor.1 Recently we noticed a new rise of attention to this research area, espe- cially concerning analysis of short genres like tweets [Davidov et al. 2010; Reyes et al. 2013; Castro et al. 2016]. Furthermore, a number of tasks at SemEval-2017 (an annual event, organized by the Association for Computational Linguistics) was about analyz- ing short funny utterances, like humorous tweets (Task 6: #HashtagWars: Learning a Sense of Humor) and puns (Task 7: Detection and Interpretation of English Puns). The following article is an extended review of the algorithm that we used for pun recognition in SemEval, Task 7. In [Miller et al. 2015], Tristan Miller and Iryna Gurevych give a comprehensive account of what has already been done in automatic recognition of puns. They note that the study of puns mainly focused around phonological and syntactic, rather than semantic interpretation. At present, the problem of intentional lexical ambiguity is viewed more as a WSD-task, solving which is not only helpful in detecting humor, but can also provide new algorithms of sense evaluation for other NLP-systems. The following terminology is basic in our research of puns. A pun is a) a short hu- morous genre, where a word or phrase is used intentionally in two meanings, b) a means of expression, the essence of which is to use a word or phrase so that in the given context the word or phrase can be understood in two meanings simultaneously. A target word is a word, used in a pun in two meanings. A homographic pun is a pun that “exploits distinct meanings of the same written word” [Miller et al. 2015] (these can be mean- ings of a polysemantic word, or homonyms, including homonymic word forms). A het- erographic pun is a pun, in which the target word resembles another word or phrase in spelling; we will call the latter the second target word. More data on classification of puns and their elaborated examples can be found in [Hempelmann 2004]. (1) I used to be a banker, but I lost interest. Ex. 1 (the Banker joke) is a homographic pun; “interest” is the target word. (2) When the church bought gas for their annual barbecue, proceeds went from the sacred to the propane. Ex. 2 (the Church joke) is a heterographic pun; “propane” is the target word, “profane” is the second target word. Our model of automatic pun detection is based on the following premise: in a pun, there are two groups of words and their meanings that indicate the two meanings, in which the target word or phrase is used. These groups overlap, i.e. contain the same words, used in different meanings. In Ex. 1, words and collocations “banker”, “lost interest” point at the professional status of the narrator and his/her career failure. At the same time, “used to”, “lost in- terest” tell a story of losing emotional attachment to the profession: the narrator lost curiosity. We propose an algorithm of homographic pun recognition that discovers 1 In [Mikhalkova 2010] we gave a brief account of main trends in computational humor up to 2010. Detecting Intentional Lexical Ambiguity in English Puns

these two groups of words and collocations, based on common semes2, which words in these groups share. When the groups are found, in homographic puns, the nex

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut