This paper presents an algorithm for learning a highly redundant inverse model in continuous and non-preset environments. Our Socially Guided Intrinsic Motivation by Demonstrations (SGIM-D) algorithm combines the advantages of both social learning and intrinsic motivation, to specialise in a wide range of skills, while lessening its dependence on the teacher. SGIM-D is evaluated on a fishing skill learning experiment.
The promise of personal robots operating in human environments to interact with people on a daily basis points out the importance of adaptivity of the machine to its changing and unlimited environment, to match its behaviour and learn new skills and knowledge as the users' needs change.
In order to learn an open-ended repertoire of skills, developmental robots, like animal or human infants, need to be endowed with task-independent mechanisms to explore new activities and new situations [Weng et al., 2001;Asada et al., 2009]. The set of skills that could be learnt is infinite but can not be learnt completely within a life-time. Thus, deciding how to explore and what to learn becomes crucial. Exploration strategies of the recent years can be classified into two families: 1) socially guided exploration; 2) internally guided exploration and in particular instrinsically motivated exploration.
To build a robot that can learn and adapt to human environment, the most straightforward way might be to transfer knowledge about tasks or skills from a human to a machine. Several works incorporate human input to a machine learning process, for instance through human guidance to learn by demonstration [Chernova and Veloso, 2009;Lopes et al., 2009;Cederborg et al., 2010;Calinon, 2009] or by physical guidance [Calinon et al., 2007], through human control of the reinforcement learning reward [Blumberg et al., 2002;Kaplan et al., 2002], through human advice [Clouse and Utgoff, 1992], or through human tele-operation during training [Smart and Kaelbling, 2002]. However, high dependence on human teaching is limited because of human patience, ambiguous human input, the correspondence problem [Nehaniv and Dautenhahn, 2007], etc. Increasing the learner’s autonomy from human guidance could address these limitations. This is the case of internally guided exploration methods.
Intrinsic motivation, an example of internally guided exploration, has drawn attention recently, especially for open-ended cumulative learning of skills [Weng et al., 2001;Lopes and Oudeyer, 2010]. The word intrinsic motivation in psychology describes the attraction of humans toward different activities for the pleasure they experience intrinsically. This is crucial for autonomous learning and discovery of new capabilities [Ryan and Deci, 2000;Deci and Ryan, 1985;Oudeyer and Kaplan, 2008]. This inspired the creation of fully autonomous robots [Barto et al., 2004;Oudeyer et al., 2007;Baranes and Oudeyer, 2009;Schmidhuber, 2010;Schembri et al., 2007] with meta-exploration mechanisms monitoring the evolution of learning performances of the robot, in order to maximise informational gain, and with heuristics defining the notion of interest [Fedorov, 1972;Cohn et al., 1996;Roy and McCallum, 2001].
Nevertheless, most intrinsic motivation approaches address only partially the challenges of unlearnability and unboundedness [Oudeyer et al., to appear]. As interestingness is based on the derivative of the evolution of performance of acquired knowledge or skills, computing measures of interest requires a level of sampling density that decreases the efficiency as the level of sampling grows. Even in bounded spaces, the measures of interest, mostly non-stationary regressions, face the curse of dimensionality [Bishop, 2007]. Thus, without additional mechanisms, the identification of learnable zones where knowledge and competence can progress, becomes inefficient. The second limit relates to unboundedness. If the measure of interest depends only on the evaluation of performances of predictive models or of skills, it is impossible to explore/sample inside all localities in a life time. Therefore, complementary mechanisms have to be introduced in order to constrain the growth of the size and complexity of practically explorable spaces and allow the organism to introduce self-limits in the unbounded world and/or drive them rapidly toward learnable subspaces. Among constraining processes are motor synergies, morphological computation, maturational constraints as well as social guidance.
Intrinsic motivation and socially guided learning, traditionally opposed, yet strongly interact in the daily life of humans. Both approaches have their own limits, but combining both could on the contrary solve them. Social guidance can drive a learner into new intrinsically motivating spaces or activities which it may continue to explore alone for their own sake, but which might have been discovered only thanks to social guidance. Robots may acquire new strategies for achieving those intrinsically motivated activities by external observation or advice. Reinforcement learning can let the human directly control the actions of a robot agent with teleoperation to provide example task demonstrations [Peters and Schaal, 2008;Kormushev et al., 2010] arXiv:1111.6790v1 [cs.AI] 29 Nov 2011 which initialize the learning process by imitation learning and subsequently, improve the policy by reinforcement lea
This content is AI-processed based on open access ArXiv data.