Bootstrapping Intrinsically Motivated Learning with Human Demonstrations
📝 Abstract
This paper studies the coupling of internally guided learning and social interaction, and more specifically the improvement owing to demonstrations of the learning by intrinsic motivation. We present Socially Guided Intrinsic Motivation by Demonstration (SGIM-D), an algorithm for learning in continuous, unbounded and non-preset environments. After introducing social learning and intrinsic motivation, we describe the design of our algorithm, before showing through a fishing experiment that SGIM-D efficiently combines the advantages of social learning and intrinsic motivation to gain a wide repertoire while being specialised in specific subspaces.
💡 Analysis
This paper studies the coupling of internally guided learning and social interaction, and more specifically the improvement owing to demonstrations of the learning by intrinsic motivation. We present Socially Guided Intrinsic Motivation by Demonstration (SGIM-D), an algorithm for learning in continuous, unbounded and non-preset environments. After introducing social learning and intrinsic motivation, we describe the design of our algorithm, before showing through a fishing experiment that SGIM-D efficiently combines the advantages of social learning and intrinsic motivation to gain a wide repertoire while being specialised in specific subspaces.
📄 Content
S. M. Nguyen, A. Baranes, P.-Y. Oudeyer (2011), Bootstrapping Intrinsically Motivated Learning with Human Demonstrations, in proceedings of the IEEE International Conference on Development and Learning. Bootstrapping Intrinsically Motivated Learning with Human Demonstration Sao Mai Nguyen, Adrien Baranes and Pierre-Yves Oudeyer Flowers Team, INRIA Bordeaux - Sud-Ouest, France Abstract—This paper studies the coupling of internally guided learning and social interaction, and more specifically the im- provement owing to demonstrations of the learning by intrinsic motivation. We present Socially Guided Intrinsic Motivation by Demonstration (SGIM-D), an algorithm for learning in continu- ous, unbounded and non-preset environments. After introducing social learning and intrinsic motivation, we describe the design of our algorithm, before showing through a fishing experiment that SGIM-D efficiently combines the advantages of social learning and intrinsic motivation to gain a wide repertoire while being specialised in specific subspaces. I. APPROACHES FOR ADAPTIVE PERSONAL ROBOTS The promise of personal robots operating in human environ- ments to interact with people on a daily basis points out the importance of adaptivity of the machine to its environment and users. The robot can no longer simply be all-programmed in advance by engineers, and reproduce only actions predesigned in factories. It needs to match its behaviour and learn new skills as the environment and users’ needs change. In order to learn an open-ended repertoire of skills, devel- opmental robots, like animal or human infants, need to be endowed with task-independent mechanisms which push them to explore new activities and new situations [1], [2]. The set of skills that could be learnt is actually infinite, and can not be completely learnt within a life-time. Thus, deciding how to explore and what to learn becomes crucial. Exploration strategies, mechanisms and constraints in recent years can be classified into two broad interacting families: 1) socially guided exploration; 2) internally guided exploration and in particular intrinsically motivated exploration. A. Socially Guided Exploration In order to build a robot that can learn and adapt to human environment, the most straightforward way is probably to transfer knowledge about tasks or skills from a human into a machine. That is why several works incorporate human input to a machine learning process. Many prior systems are strongly dependent on human guidance, unable to learn in the absence of human interaction, such as in some examples of learning by demonstration [3]–[6] or learning by physical guidance [7]. In such systems, the learner scarcely explores on his own to learn tasks or skills beyond what it has observed with a human. Many prior works have given a human trainer control of the reinforcement learning reward [8], [9], provide advice [10], or tele-operate the agent during training [11]. However, the more dependent on the human the system, the more challenging learning from interactions with a human is, due to limitations like human patience, ambiguous human input, correspondence problems [12] etc. Increasing the learners autonomy from human guidance could address these limitations. This is the case of internally guided exploration methods. B. Intrinsically Motivated Exploration Intrinsic motivation, a particular example of internal mech- anism for guiding exploration, has drawn a lot of attention recently, especially for open-ended cumulative learning of skills [1], [13]. The word intrinsic motivation was first used in psychology to describe the capability of humans to be attracted toward different activities for the pleasure that they experience intrinsically. These mechanisms have been shown crucial for humans to autonomously learn and discover new capabilities [14]–[16]. This inspired the creation of fully autonomous robots [17]–[22] with meta-exploration mechanisms monitor- ing the evolution of learning performances of the robot, in order to maximise informational gain, and with heuristics defining the notion of interest [23]–[25]. While driving an efficient progressive learning in numerous cases, most intrinsic motivation approaches address only par- tially the challenge of unlearnability and unboundedness [26]. Despite efforts in the case of continuous sensorimotor spaces, computing meaningful measures of interest still requires a sampling density which decreases the efficiency of those approaches as dimensionality grows. Even in bounded spaces, the measures of interest can be cast into a form of a non- stationary regression problem, which might face the curse- of-dimensionality [27]. Thus, without additional mechanisms, the identification of learnable zones with knowledge or com- petence progress becomes inefficient in high-dimensions. The second limitation relates to unboundedness. Actually, whatever the measure of interest used, if it is only based on the evaluation of performances of pre
This content is AI-processed based on ArXiv data.