Computer Science / Artificial Intelligence Computer Science / HCI Computer Science / NLP

Stepwise Acquisition of Dialogue Act Through Human-Robot Interaction

February 20, 2026

Reading time: 6 minute

...

#Computer Science #Artificial Intelligence #HCI #NLP

📝 Original Info

Title: Stepwise Acquisition of Dialogue Act Through Human-Robot Interaction
ArXiv ID: 1810.09949
Date: 2023-06-15
Authors: : Qureshi, Nakamura, Yoshikawa, Ishiguro

📝 Abstract

A dialogue act (DA) represents the meaning of an utterance at the illocutionary force level (Austin 1962) such as a question, a request, and a greeting. Since DAs take charge of the most fundamental part of communication, we believe that the elucidation of DA learning mechanism is important for cognitive science and artificial intelligence. The purpose of this study is to verify that scaffolding takes place when a human teaches a robot, and to let a robot learn to estimate DAs and to make a response based on them step by step utilizing scaffolding provided by a human. To realize that, it is necessary for the robot to detect changes in utterance and rewards given by the partner and continue learning accordingly. Experimental results demonstrated that participants who continued interaction for a sufficiently long time often gave scaffolding for the robot. Although the number of experiments is still insufficient to obtain a definite conclusion, we observed that 1) the robot quickly learned to respond to DAs in most cases if the participants only spoke utterances that match the situation, 2) in the case of participants who builds scaffolding differently from what we assumed, learning did not proceed quickly, and 3) the robot could learn to estimate DAs almost exactly if the participants kept interaction for a sufficiently long time even if the scaffolding was unexpected.

💡 Deep Analysis

📄 Full Content

A dialogue act (DA) represents the meaning of an utterance at the illocutionary force level [1], for example, questions, requests, and greetings. Understanding DAs is the first step toward intention comprehension and is essential for the agent to properly respond to people, and hence many studies on DA classification have been done. Although many classification methods have been tried [2,3], recently, the technique using deep learning shows good performance [4][5][6] due to the rapid progress of the technology.

On the other hand, from the viewpoint of requiring less computation, DA classification based solely on function words was reported [7]. Though function words such as articles, prepositions, determiners etc. are not important for information retrieval purposes, they contain sufficient information for DA classification. For example, questions often have distinguishing features such as an interrogative (what, who) or an auxiliary verb (can, is) at the beginning. Compared to the above-mentioned conventional research, the problem setting of this research differs in the following four points [8]:

While previous research dealt with supervised learning, this research deals with learning not from labeled examples but from rewards 1 .
In conventional research, a lot of examples were batch processed, but in this research, a robot processes relatively few examples incrementally.
Conventional research aimed at the classification of DAs, but the purpose of this research is to enable a robot to respond appropriately according to DAs.
The way words and rewards given to learners varies by scaffolding.

Because the problem to solve is more difficult due to the difference in the settings in 1 to 4 above, we adopt a simple approach to estimate DAs only from function words. Furthermore, as we deal with interactions in Japanese in this research, we can estimate DAs almost exactly by focusing only on sentence-final particles which are one type of function words. This is because, in Japanese, the mental attitude of a speaker is usually expressed by sentence-final particles. For example, the question which is one of DAs is represented by a sentence-final particle ka.

Scaffolding is a process that enables a child or novice to solve a problem, carry out a task or achieve a goal which would be beyond his unassisted efforts [9]. In this paper, we consider changes provided by a teacher to help a learner as scaffolding. As an example of scaffolding, Roy, Frank, and Roy showed that caregivers gradually decreased the length of their utterances containing a particular word type up to the moment of birth of that word, and then gradually increased complexity [10]. We assume that scaffolding will take place even when a person teaches a robot, not a child.

Recently, Qureshi, Nakamura, Yoshikawa, and Ishiguro applied deep Q-learning to the problem of a robot learning social actions (wait, look towards human, wave hand and handshake) through human-robot interaction [11]. They reported that the robot interpreted the human behavior by intention depicting factors (e.g., human body language, walking trajectory or any ongoing activity, etc.). Hermann et al. presented an agent that learns to interpret language in a simulated 3D environment where it is rewarded for the successful execution of written instructions [12]. They trained a single agent to execute phrasal instructions pertaining to multiple tasks by employing a curriculum training regime. Both [11] and [12] proposed a learning system based on reinforcement learning as in this study. [12] used curriculum learning, which can be regarded as a kind of scaffolding. The uniqueness of this study is that scaffolds were not prepared carefully by researchers, but were provided naturally by participants.

The purpose of this study is to let a robot learn to estimate DAs and to make a response based on them utilizing scaffolds. In the following sections, we will report on the setting of the first experiment, the first learning model of the robot, and the result of the experiment. Then we will discuss the limitation of the learning model and propose a revised model, which lets the robot learn step by step utilizing scaffolds provided by a human. Finally, the result of the second experiment will be described. Although the summary of the first experiment, the first learning model, and the result of the first experiment has been reported in a short paper [8], these are described fully in Section II because we could not explain the detail of the learning model, and only the small part of the result was shown in [8].

In this section, we will describe the experimental setting of the first experiment. The human-robot interaction is performed in the following procedure [8]:

A participant puts one of the fruits (an apple or a banana) in front of a robot. The robot recognizes the fruit by its camera2 .
The participant speaks to the robot. Participant’s utterances are limited to a c

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on open access ArXiv data.

Stepwise Acquisition of Dialogue Act Through Human-Robot Interaction

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

The Responsibility Quantification (ResQu) Model of Human Interaction with Automation

Towards Neural Co-Processors for the Brain: Combining Decoding and Encoding in Brain-Computer Interfaces

A Review of Personality in Human Robot Interactions

Start searching

No results found