Simple Kinematic Feedback Enhances Autonomous Learning in Bio-Inspired Tendon-Driven Systems
Error feedback is known to improve performance by correcting control signals in response to perturbations. Here we show how adding simple error feedback can also accelerate and robustify autonomous learning in a tendon-driven robot. We implemented tw…
Authors: Ali Marjaninejad, Dario Urbina-Melendez, Francisco J. Valero-Cuevas
Simple Kinematic F eedback Enhances A utonomous Learning in Bio-Inspir ed T endon-Driv en Systems Ali Marjaninejad 1 , Daro Urbina-Melndez 2 , and Francisco V alero-Cuev as 3 , Senior Member , IEEE Abstract — Error feedback is known to improve performance by correcting contr ol signals in response to perturbations. Here we show how adding simple error feedback can also accelerate and robustify autonomous learning in a tendon- driven robot. W e implemented two versions of the General- to-Particular (G2P) autonomous learning algorithm to produce multiple mo vement tasks using a tendon-driven leg with two joints and three tendons: one with and one without kinematic feedback. As expected, feedback improved perf ormance in simulation and hardware. Howev er , we see these improvements even in the presence of sensory delays of up to 100 ms and when experiencing substantial contact collisions. Importantly , feedback accelerates learning and enhances G2P’s continual refinement of the initial inverse map by providing the system with more relev ant data to train on. This allows the system to perform well even after only 60 seconds of initial motor babbling. I . I N T R O D U C T I O N The field of robotics in general would benefit greatly from autonomous learning to control movements with minimal prior kno wledge and limited experience [1], [2], [3]. Ex- tensiv e trial-and-error experience in the real world can be very costly both in biological and robotic systems. Not only does it risk injury , but the opportunity cost can be large. Therefore, ev olutionary game theory in biological systems fa vors systems that can function suboptimaly or well-enough with only limited experience, and continue to learn on-the- go from every experience [1]. Biological systems can then use sensory feedback to refine performance as needed. Such learning from limited experience is also attractiv e in robotics [2], [1], mostly in situations where optimality is not as critical as adaptability to unstructured en vironments, unpredictable payloads, or working with systems for which creating accurate models is costly or time consuming. Thus data-efficient learning that produces suboptimal beha vior can be a practical and attractiv e control strategy as it does not rely on accurate prior models or extensi ve expert knowledge [4], [5], [6], [7], [8], [9], [10], [11] , or require thousands of hours of or learning in simulation [12], [13], [14], [15], [16], [17] (please see [1] for detailed discussion on ho w our General-to-Particular (G2P) algorithm relates to the field). A drawback of learning with limited experience that produces suboptimal beha vior is that the performance of the 1 A. Marjaninejad is with Univ ersity of Southern California, Los Angeles, Ca 90089 USA e-mail: marjanin@usc.edu 2 D. Urbina-Melndez is with University of Southern California, Los Angeles, Ca 90089 USA. e-mail: urbiname@usc.edu 3 F . V alero-Cue vas is with University of Southern California, Los An- geles, Ca 90089 USA (corresponding author) email-: valero@usc.edu; phone: 213-740-4219 model can degrade when encountering dynamics far from those under which it was trained. On the other hand, systems that hea vily depend on the feedback error correction would not perform efficiently and are prone to instability , especially in the presence of sensory delays[18], [1]. Moreov er , it is important to note that in a tendon-driv en system, actuation is not directly connected to the joint and therefore, a simple off the shelf PID controller cannot be used without knowing the dynamical equations of the system [1], [19], [20]. Thus, here we explore the combination of data-efficient learning algorithm, G2P , with simple feedback to maintain the key benefits of learning under limited e xperience while improv- ing performance and rob ustness to perturbations (or unmod- eled dynamics) as needed. This approach is directly inspired by biological systems that, under certain circumstances, successfully use simple correcti ve responses triggered by delayed and non-collocated sensory signals [21], [22], [23]. As an initial proof-of-principle, we implemented two versions of the data-efficient autonomous learning algorithm, G2P [1] (that is originally designed to control tendon- driv en systems) : One purely feedforward (open-loop) as published in [1], and one with simple feedback on joint angles (close-loop). Both implementations of G2P find motor commands that produce desired leg kinematics by creating an inv erse map. The initial in verse map is generated from “motor babbling” input-output data (i.e., random sequences of input commands to the three motors driving the tendons that produce time histories of two joint angles of the leg). W e find that the performance is, as expected, better for the close-loop system as it compensates for errors in the leg joint angles arising from imperfections of the inv erse map or external perturbations (e.g., contact dynamics). Howe ver , we also find that, by collecting more task rele v ant data, this simple feedback accelerates learning and improves the quality of the in verse map that enables the system to work with shorter motor babbling sessions and also improves learning on-the-go capability (the refinement of the inv erse map from each experience) of the system. In addition, we also report improved performance even when sensory signals are delayed. W e have v alidated our method on a physical tendon-driv en leg to demonstrate its utility in real-world applications. I I . M E T H O D S In this section, we first discuss the design of the tendon- driv en system. Ne xt, we formulate and describe the controller design. Lastly , we discuss the tests we performed in detail. Fig. 1. Physical tendon-driven robotic limb (left) and the simulated system in MuJoCo en vironment (Right) Inverse Model Plant K P - K I /S + + + + + q d q d q d q p Activations q p q p .. . . Kinematics .. Fig. 2. Schematic of the close-loop system. A. T endon-driven leg design T endon-driv en anatomies are a relev ant use case because they are difficult to control as they are simultaneously nonlinear , under -determined and o ver-determined [20], [19], [1], [24]. The simulated leg is a similar design to the physical leg used in [1]. It is a 2-DoF planar leg actuated with 3 (minimal number of tendons to control a 2-DoF leg) tendons. Unlike [1], the simulation model uses a Hill-type model of skeletal muscle (MuJoCo’ s built in force-length and force- velocity model [25], [19] and has moment arms that can bowstring. The physical system used for validation is a replica of the one used in [1] with an improvement on the data acquisition system (we use PXI system from National Instruments, Austin, TX, USA). Figure 1 shows the tendon- driv en legs for the physical and the simulation systems. B. Contr oller design Our system takes desired mov ement kinematics (joint angles for each joint and their first two deriv ati ves; angular velocities and angular accelerations) and outputs activ ation values that will drive the actuators (skeletal muscles in simulation and electric brushless DC motors connected to the tendons on the physical system) to produce the desired kinematics on the leg. The feedforward path consists of an in verse mapping that maps the desired kinematics to the acti vations that will ideally (in the case of a flawless in verse map and without any perturbation) create activ ations required to replicate the desired kinematics on the plant (Figure 2). This in verse map is created by training a Multi-Layer Perceptron (MLP) Arti- ficial Neural Network (ANN) with 15 hidden layer neurons using the data collected during a short phase (5 minutes) of random mov ements and observing their corresponding kinematics which is called motor babbling (please see [1] or the supplementary code for more details on the feedforward path). Once the activ ations are calculated, they will be fed into the plant and the corresponding kinematics will be recorded. These observ ations can then be used in a feedback loop to compensate for any error in the in verse map or the error caused by external perturbation. Here we only use joint angles as the sensory feedback. Also, we mainly focus on reducing the error on the joint angles (as opposed to its deriv ativ es) since error in joint angles is less forgi ving than joint velocity or acceleration in successful compilation of most day to day tasks either being manipulation, locomotion or other movements; howe ver , if desired, user can substitute the error term with the angular velocity or acceleration or a weighted combination of them. W e kno w that for a giv en joint, joint angle and angular velocity are related by equation 1. ∆ q = ˙ q . dt (1) where t is time. Therefore, we can compensate the error in position by changing the velocity corresponding to the magnitude and the direction of the error . W e implement a PI controller like method where we add an adjustment term to the desired angular velocity of each joint proportional to its current and cumulativ e error (see discussions for alternati ve choices). Equations 2- 6 describe relationships between all system variables o ver a complete loop: a [ n ] (3 × 1) = AN N ( q c [ n ] (2 × 1) , ˙ q c [ n ] (2 × 1) , ¨ q c [ n ] (2 × 1) ) (2) where a [ n ] is the acti vation vector at time sample n and q c , ˙ q c , and ¨ q c , are joint angle, control angular velocity and control angular acceleration, respectively . These control kinematics are calculated as follo ws: ¨ q c = ¨ q d , q c = q d (3) ˙ q c = ˙ q d + ˙ q a (4) ˙ q a = K P (2 × 2) q e + K I (2 × 2) Z q e dt (5) q e = q d − q p (6) where subscripts d , a , and e stand for desired, adjustment, and error respectively . Also, K P and K I are diagonal matrices defining the proportional and integral coefficients for each joint. The complete schematic block diagram of the close-loop system is depicted in Figure 2. C. Studied tasks T o demonstrate the performance of the proposed method and its capabilities, we hav e tested it in a number of different cases, each of which demonstrates at least one of its prominent features. Wherev er applicable, we have compared the results with the ones produced by the open-loop method used in [1]. 1) Cyclical movements in-air task: During this task, the leg is suspended in-air (i.e. no contact dynamics/external perturbations in volv ed) and is commanded to perform 50 random cyclical patterns (10 cycle with 2.5 seconds each). Since there are no external perturbations applied to the system in this task, an ideal in verse map should be able to perform it flawlessly . Howe ver , the in verse map trained with limited experience is almost always imperfect; during this task we will study the ef fect of the proposed close-loop system on reducing these imperfections. These patterns are created by projecting a vector of 10 random v alues sampled from a uniform distribution ( U (0 , 1) ) into joint angle space as described in [1]. In short, each random number defines the normalized radial distance from the center of the joint angle space of one of equally distributed spokes (each 36 degrees apart) and then, these points will be connected and the resulting closed cycle will be filtered to make it smooth (see [1] for more details). 2) P oint-to-point movements in-air task: Unlike the con- tinuous and smooth cyclical task, point-to-point task is consisted of discrete ramp-and-hold movements. Since the in verse map was trained with full kinematics, it would be interesting to study how it performs when the desired task in volv es maintaining joint angles in specific positions (both angular velocity and accelerations will be equal to zero in all these positions). The point-to-point task is designed to study these cases and in volv ed 50 trials where in each trial the leg is going to be commanded to go to 10 random positions ( U ( j oint min , j oint max ) for each joint) and stay there for a predefined duration (2.5 seconds here). 3) Differ ent cycle period durations task: During the motor babbling, the in v erse map is introduced to a v ery sparse set of samples in the 6D kinematics space [1]. Although it has fully swiped across joint angles for both joints during the motor babbling phase, there are many combinations of these angles with their angular velocities and accelerations that will not be experienced. Here we are going to study the performance of the system for perfectly cyclical mov ements (sin and cos) ov er a wide range of cycle periods (1 / cycle frequency) to inv estigate how well the open-loop system performs in each case and to compare the performance of the proposed feedback controller . 4) P erformance in the presence of contact dynamics tasks: Dealing with contact dynamics is a current challenge in robotics [26], [27]. Therefore, it is important to test the per- formance of the proposed method in the presence of contact dynamics. W e have shown that the open-loop system can perform well when introduced to minor contact dynamics[1]; howe ver , the performance of the system has not been studied under the ef fect of significant contact dynamics caused by the need to push the system forward/backward in the presence of an antagonist force or the need to carry its own weight (note that the system was trained in-air and therefore adding weight will be a major change to its dynamics). Here, we hav e studied the performance of the system during two tasks both including contact dynamics. a) Locomotion with the gantry: In this task, we have lowered the chassis (so that the leg can touch the floor) and let it move on the x-axis (forward/backward) with friction. Moreov er , we hav e held it up with a spring-damper (b uild- in features in MuJoCo) so that it can partially compensate for the weight (similar to a gantry). Similar to the “Cycli- cal movements in-air”, here we hav e applied 50 different cyclical movements and studied the performance of the system. Please note that here we are simply applying random cyclical movements to compare open-loop and close-loop performance; ho wev er , a higher-le vel controller can also be used to find better mov ement trajectories to yield higher forward displacement [1]. b) Holding a postur e under a weight: In this task, we took off the spring-damper mechanism provided by the gantry and increased the weight of the chassis significantly . The goal here for the leg is to stay vertically straight (standing leg position) while reacting to a strong downward force applied to it, due to the added weight of the gantry . 5) Learning fr om each experience task: Experience can be very costly in real-world physical systems [1] and there- fore, an ef ficient system should be able to start performing as soon as possible and improving the performance with the data coming from each experience. During this task, we start with an in verse map created using a shorter duration (1 minute) and run the system on a cyclical task for 25 repetitions; after each repetition, we refine the inv erse map with the cumulati ve data from all the experience that the system had so far (including the motor babbling). W e repeat this process for 50 dif ferent cyclical trajectories. 6) V ariable feedback delay task: Delay in the sensory feedback or processing information is inevitable in real- world applications. In a system that solely depends on error correction, these delays can inject large errors and even driv e the system to instability . W e have studied the performance of the system over a wide range of loop delays (from 5 to 100 ms; which is about the lar gest delay in the human sensory- motor loop) over 50 random cyclical mov ements. All tasks were performed on both simulation (sim) and physical (phys) systems except ”performance in the presence of the contact dynamics” and ”variable feedback delay” tasks cyclical (sim) point-to-point (sim) cycle period (sim) cyclical (phys) point-to-point (phys) cycle period (phys) with contact refinements (w/ shorter babbling) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 mean error (rads) open-loop close-loop Fig. 3. The average error for open-loop and close-loop systems across all tasks. 0 5 10 15 20 25 0 0.5 1 1.5 angle (rads) distal 0 5 10 15 20 25 time (s) 0 1 2 angle (rads) proximal Fig. 4. The desired (black), open-loop (blue), and close-loop (orange) joint angles for one trial of the cyclical movements in-air task. which were only performed on simulation due to physical limitations. Also, physical results for the ”learning from each experience task” has already been studied on [1]. I I I . R E S U LT S In terms of the error (root mean square error between joint angles to the desired reference trajectories; will be referred to as “error” from here on), as expected, we see the close- loop control architecture reduces the error compared to the open-loop one in all cases. Fig. 3 shows the average error for the open-loop and the close-loop system across all tasks. A. Cyclical movements in-air task Fig. 4 shows a sample trial of the cyclical movement in-air task for the physical system (also see supplementary video). This figure shows that the error is larger at the distal joint compared to the proximal joint. This is because all three tendons cross the proximal joint first, thus errors propagate to, and accumulate at, the distal joint. For all “sample run” plots that were performed in both simulation and the physical system (Figs. 4, 5, and 7) we observe very similar patterns and are only reporting the physical system results here. The reader, ho wev er , can access all plots in [28]. B. P oint-to-point movements in-air task Figure 5 shows a good e xample of the limitations of an open-loop system (also see supplementary video). When 0 5 10 15 20 25 0 0.5 1 1.5 angle (rads) distal 0 5 10 15 20 25 time (s) 0 1 2 angle (rads) proximal Fig. 5. The desired (black), open-loop (blue), and the close-loop (orange) joint angles for one trial of the point-to-point movements in-air task (over one sample run). 2 4 6 8 10 cycle period (s) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 error (rads) open-loop (sim) close-loop (sim) open-loop (phys) close-loop (phys) Fig. 6. Error values for different cycle period durations task as a function of cycle duration for the open-loop and the close-loop systems (sim: simulation; phys: physical system). the system is commanded to go to a new position, it can do so—except in cases where the commanded change is small. The inv erse map may not hav e sufficient resolution to implement such small changes. Also, please note that both angular velocities and angular acceleration inputs will be zero (e xcept during the transitions which are very short) and the system needs to go to the right position based only on joint angle values. Ho wev er , the close-loop system detects and corrects those errors. Importantly , this also improv es the on-the-go training of the inv erse map (see learning from each experience task). Note the unav oidable small fluctuations around the desired location, which are naturally caused by having a simple error correction feedback strategy . Better tracking can be achiev ed with more sophisticated close-loop controllers, but that is beyond the objectiv e and scope of this work. C. Differ ent cycle period durations task The simple proportional-plus-integral feedback on joint angles has a limited bandwidth. W e expect, and see, that its ability to correct errors degrades for cyclical mo vements with shorter cycle periods (i.e., higher frequencies). Figure 6 shows improv ed performance of the close-loop system for cycle period durations longer than ∼ 2 seconds for both simulated and physical systems. As expected, the open- loop system also has problems at short cycle periods (due 0 5 10 15 20 25 0 0.5 1 1.5 angle (rads) distal desired open-loop close-loop 0 5 10 15 20 25 time (s) 0 1 2 angle (rads) proximal Fig. 7. The desired (black), open-loop (blue), and the close-loop (orange) joint angles for one sample run of the different cycle period durations task with a cycle period of 2.5 seconds. to effects of inertia, excitation of the nonlinearities of the double pendulum, bandwidth of the motors, etc), but the error does not improv e as the cycle periods lengthen. The close-loop system plateaus at a small average error of 0.1-0.2 rads per cycle quickly and then continues to reduce slo wly . The error in the open-loop system, in contrast, has an error that is roughly twice as large with minimum at periods of ∼ 1 . 5 − 2 seconds, which is perhaps closer to the region it experienced during babbling and also close to the system’ s resonant frequency . Figure 7 shows the desired and actual outcomes of the task (for both open-loop and close-loop systems) ov er one sample run for a 2.5 seconds cycle period (also see supplementary video). D. P erformance in the presence of contact dynamics tasks 1) Locomotion with the gantry: This leg system was designed to ultimately produce locomotion. Therefore, we tested in simulation how the close- and open-loop systems performed this task. When introduced to mild ground con- tact (barely swiping the ground) both methods performed similarly well and comparably to the in-air task—albeit with a slightly lar ger error . Ho wev er , when the simulated gantry was brought lower (and therefore more substantial contact dynamics were introduced), the open-loop system failed to clear the ground and could not complete the mov ement cycle to match the desired trajectories (see supplementary video and Figs. 3 and 8). In contrast, the close-loop system was able to complete the swing-phase and recover from the ground contact (see Supplementary video), which then resulted in very small errors even in the presence of these contact dynamics. This is expected as contact dynamics can be thought of as physical perturbations that were not included in the motor babbling. Thus the open-loop system naturally performs poorly (ev en with a well-refined, accurate in verse map). Howe ver , it was important to see that even simple feedback was able to compensate for such strong unmodeled, perturbations. 2) Holding a posture under a weight: In our simulations, we also observed that the open-loop system cannot com- pensate when a weight is applied to the chassis of the leg 0 5 10 15 20 25 0 1 2 angle (rads) distal 0 5 10 15 20 25 time (s) 0 1 2 angle (rads) proximal Fig. 8. The desired (black), open-loop (blue), and the close-loop (orange) joint angles for one sample run of the locomotion with the gantry task. (the leg collapses under the weight). In contrast, the close- loop system compensates—as much as the strengths of the muscles allo w—for the deviation from the desired posture and maintains the prescribed posture which is standing vertically (see Supplementary V ideo). E. Learning fr om each experience task Biological systems subject to Hebbian learning reinforce or attenuate synaptic connections with each experience [29], [30]. Similarly , the G2P algorithm adds the input-output data from each run (i.e., experience) to its database and recalcu- lates (i.e., refines) the in verse map with all av ailable data before the next run (i.e., w arm start of the ANN). Figure 9 shows the mean and standard deviation of error over 50 random cyclical movement tasks as a function of refinement number after each e xperience for both the open-loop (blue) and the close-loop (orange) systems. Both systems exhibit the expected reduction of error with increasing experience. Howe ver , this trend is accelerated in the close-loop system where both the mean and the standard deviation of the error plateau after only 6 refinements. W e believ e that the more relev ant data collected by the close-loop system contributes to this. T o test this idea, after each refinement, we tested both systems with switched in verse maps. This will distinguish contribution of the error correction of the feedback signal from the potential contribu- tion of a more precise in verse map. Open-loop system shows accelerated learning and smaller error when using the in verse map trained by the close-loop data (green). Also, although the error for the close-loop system with either inv erse model is very small and plateaus fast (after ∼ 5 refinements), it has smaller mean and standard deviation with the inv erse map trained with data collected by the close-loop system. The p-value between the 50 trials of the last refinement of close- loop systems using close-loop and open-loop in verse maps (orange and red curves, respectiv ely) was 3.0927e-04. This measure for the open-loop systems (green and blue curv es) was 1.0234e-07. These results show that not only does close- loop system reduce error by commanding correction signals, but it also enhances the refinements of the in verse map by providing more task specific data at each attempt. 0 5 10 15 20 25 refinement # 0.0 0.1 0.2 0.3 0.4 0.5 0.6 mean error (rads) open-loop close-loop ol w/ cl model cl w/ ol model Fig. 9. Error v alues for 50 random cyclical mov ements as a function of repetitions (and consecuti ve refinements) for open-loop (blue) and close- loop (orange) systems as well as open-loop and close-loop with switched in verse maps (green and red, respecti vely). T o demonstrate that the system will not suffer from over - fitting with the proposed method and will allow general- ization at the same time with improving the inv erse map on-the-go, we also tested refinements for both systems for 50 random cyclical mov ements introduced back to back and saw similar descending trend in the error ev en for the mov ement cycles that were not experienced before. Results for simulation and physical system can be accessed from [28] and [1],respectiv ely . Note that the babbling data are always included in the refinements (as the original version of G2P [1] to make sure it will not over -fit to experience alone. F . V ariable feedback delay task Figure 10 shows error over 50 random cyclical mov ements as a function of increasing feedback delay from 5–100 ms. W e plot open-loop error (red wireframe styled lines) for the same tasks as a reference and see that the close-loop system outperforms it for delays up to 100 ms. At very long delays, naturally , the close-loop system will treat corrections as perturbations, and performance will degrade and instabilities will likely arise. G. Sensitivity to pr oportional-and-inte gral (PI) feedback gains The choice of PI gains is traditionally made by either trial- and-error , Bode plots and, more recently , by using search algorithms (e.g. ev olutionary algorithms [31]). The choice of optimal PI gains is beyond the scope of this paper . Howe ver , we briefly explored the sensitivity to a wide range of PI gains over 50 cyclical movements and found that it still yields satisfactory performance (see Supplementary Information section in [28]—albeit with the expected faster rise times and greater overshoot with higher gains, and vice versa with lo wer gains. I V . S U M M A RY O F C O N T R I B U T I O N S Our method improves upon the current work in au- tonomous control of tendon-driv en systems [1] because i) it uses the inv erse map of the tendon-driv en system it autonomously learned during an initial motor babbling phase and only relies on feedback to compensate for inaccuracies as delays (ms) 0 20 40 60 80 100 trial # 0 10 20 30 40 50 mean error (rads) 0.1 0.2 0.3 0.4 0.5 Fig. 10. Error values for 50 random cyclical movements as a function of feedback delay for the close-loop system. Error for the open-loop system is also provided (red wireframe styled lines) for comparison. needed; and, more importantly , ii) sho ws that by collecting more relev ant data during a performance, simple feedback also facilitates and accelerates autonomous learning; natu- rally , more rele vant e xperience is more useful. V . D I S C U S S I O N W e chose the input velocity signal to apply correction changes to as oppose to the input position signal since velocity has a direction (is a vector) and can move the joint to the right position even with an imperfect model (as can be seen from the point-to-point experiment results, the outputs of position input can hav e high errors). Also, please remember that we chose position as the output to define error on and it is very common in controls to use the deriv ativ e of the tracked signal for correction (Air conditioning, Cruise control, etc.). Ho wev er , based on the need, used can choose any other error signal (or a weighted combination) and use PID gains to feed it to the most pertinent input to entertain the correction signals. In this paper, we showed the contributions of simple kine- matic feedback in improving both performance and learning rate of the in verse map generated using limited experience while being robust to sensory delays and choice of PI param- eters. W e performed our test in both simulation and physical implementations of a tendon-driv en robotic limb. Ho wev er , it would be very interesting to test the proposed system on more complex systems such as bipeds or quadrupeds and compare their performances in more sophisticated tasks in the future work, especially in their physical implementations. C O D E A V A I L A B I L I T Y The code and the supplementary files can be accessed through project’ s Github repository at: https://github. com/marjanin/G2P_with_fb AC K N OW L E D G M E N T S This project was supported by NIH Grants R01-052345 and R01-050520, award MR150091 by DoD, and award W911NF1820264 by D ARP A-L2M program. Also, by USC Prov ost Fellowship to A.M. and the Consejo Nacional de Ciencia y T ecnologa (Me xico) fellowship to D.U.-M. R E F E R E N C E S [1] A. Marjaninejad, D. Urbina-Mel ´ endez, B. A. Cohn, and F . J. V alero- Cuev as, “ Autonomous functional mov ements in a tendon-dri ven limb via limited experience, ” Nature Machine Intelligence , vol. 1, no. 3, p. 144, 2019. [2] R. Kwiatkowski and H. Lipson, “T ask-agnostic self-modeling ma- chines, ” Sci. Robot. , vol. 4, p. eaau9354, 2019. [3] A. Nagabandi, G. Kahn, R. S. Fearing, and S. Levine, “Neural network dynamics for model-based deep reinforcement learning with model- free fine-tuning, ” in 2018 IEEE International Confer ence on Robotics and Automation (ICRA) . IEEE, 2018, pp. 7559–7566. [4] H. Kobayashi and R. Ozawa, “ Adaptive neural network control of tendon-driv en mechanisms with elastic tendons, ” Automatica , vol. 39, no. 9, pp. 1509–1519, 2003. [5] H. G. Marques, A. Bharadwaj, and F . Iida, “From spontaneous motor activity to coordinated behaviour: a developmental model, ” PLoS computational biology , vol. 10, no. 7, p. e1003653, 2014. [6] J. Morimoto and K. Doya, “ Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning, ” Robotics and Autonomous Systems , vol. 36, no. 1, pp. 37–51, 2001. [7] A. Gijsberts and G. Metta, “Real-time model learning using incremen- tal sparse spectrum gaussian process regression, ” Neural Networks , vol. 41, pp. 59–69, 2013. [8] T . Geijtenbeek, M. V an De Panne, and A. F . V an Der Stappen, “Flexi- ble muscle-based locomotion for bipedal creatures, ” A CM T ransactions on Graphics (TOG) , vol. 32, no. 6, p. 206, 2013. [9] E. Rombokas, E. Theodorou, M. Malhotra, E. T odorov , and Y . Mat- suoka, “T endon-driven control of biomechanical and robotic systems: A path integral reinforcement learning approach, ” in 2012 IEEE International Conference on Robotics and Automation . IEEE, 2012, pp. 208–214. [10] A. Hunt, N. Szczecinski, and R. Quinn, “Development and training of a neural controller for hind leg walking in a dog robot, ” F r ontiers in neur or obotics , vol. 11, p. 18, 2017. [11] V . Kumar , Y . T assa, T . Erez, and E. T odorov , “Real-time behaviour synthesis for dynamic hand-manipulation, ” in 2014 IEEE International Confer ence on Robotics and Automation (ICRA) . IEEE, 2014, pp. 6808–6815. [12] V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. V eness, M. G. Bellemare, A. Grav es, M. Riedmiller , A. K. Fidjeland, G. Ostrovski, et al. , “Human-level control through deep reinforcement learning, ” Natur e , vol. 518, no. 7540, p. 529, 2015. [13] K. T akahashi, T . Ogata, J. Nakanishi, G. Cheng, and S. Sugano, “Dy- namic motion learning for multi-dof flexible-joint robots using active– passiv e motor babbling through deep learning, ” Advanced Robotics , vol. 31, no. 18, pp. 1002–1015, 2017. [14] N. Heess, S. Sriram, J. Lemmon, J. Merel, G. W ayne, Y . T assa, T . Erez, Z. W ang, S. Eslami, M. Riedmiller, et al. , “Emergence of locomotion behaviours in rich en vironments, ” arXiv pr eprint arXiv:1707.02286 , 2017. [15] M. Andrychowicz, B. Baker , M. Chociej, R. Jozefowicz, B. Mc- Grew , J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray , et al. , “Learning dexterous in-hand manipulation, ” arXiv preprint arXiv:1808.00177 , 2018. [16] J. Schulman, F . W olski, P . Dhariwal, A. Radford, and O. Klimov , “Proximal policy optimization algorithms, ” arXiv pr eprint arXiv:1707.06347 , 2017. [17] J. Schulman, S. Levine, P . Abbeel, M. Jordan, and P . Moritz, “T rust region policy optimization, ” in International conference on machine learning , 2015, pp. 1889–1897. [18] J. Bongard, V . Zykov , and H. Lipson, “Resilient machines through continuous self-modeling, ” Science , vol. 314, no. 5802, pp. 1118– 1121, 2006. [19] F . J. V alero-Cuev as, Fundamentals of neuromec hanics . Springer , 2016. [20] A. Marjaninejad and F . J. V alero-Cuev as, “Should anthropomorphic systems be redundant?” in Biomechanics of Anthr opomorphic Systems . Springer , 2019, pp. 7–34. [21] J. G. Milton, T . Ohira, J. L. Cabrera, R. M. Fraiser , J. B. Gyorffy , F . K. Ruiz, M. A. Strauss, E. C. Balch, P . J. Marin, and J. L. Alexander , “Balancing with vibration: a prelude for drift and act balance control, ” PLoS One , vol. 4, no. 10, p. e7427, 2009. [22] A. Cetinkaya, T . Hayakawa, and M. A. F . bin Mohd T aib, “Stabilizing unstable periodic orbits with delayed feedback control in act-and-wait fashion, ” Systems & Control Letters , vol. 113, pp. 71–77, 2018. [23] A. Cetinkaya and T . Hayakawa, “Sampled-data delayed feedback control for stabilizing unstable periodic orbits, ” in 2015 54th IEEE Confer ence on Decision and Contr ol (CDC) . IEEE, 2015, pp. 1409– 1414. [24] B. A. Cohn, M. Szedl ´ ak, B. G ¨ artner , and F . J. V alero-Cuevas, “Feasi- bility theory reconciles and informs alternati ve approaches to neuro- muscular control, ” F r ontiers in computational neur oscience , vol. 12, 2018. [25] E. T odorov , T . Erez, and Y . T assa, “Mujoco: A physics engine for model-based control, ” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems . IEEE, 2012, pp. 5026–5033. [26] N. Fazeli, M. Oller , J. W u, Z. Wu, J. T enenbaum, and A. Rodriguez, “See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion, ” Science Robotics , v ol. 4, no. 26, p. eaav3123, 2019. [27] N. Fazeli, S. Zapolsky , E. Drumwright, and A. Rodriguez, “Learning data-efficient rigid-body contact models: Case study of planar impact, ” arXiv preprint arXiv:1710.05947 , 2017. [28] A. Marjaninejad, D. Urbina-Mel ´ endez, and F . J. V alero-Cuev as, “Sim- ple kinematic feedback enhances autonomous learning in bio-inspired tendon-driv en systems, ” arXiv pr eprint arXiv:1907.04539 , 2019. [29] D. O. Hebb, The organization of behavior: a neuropsyc hological theory . Science Editions, 1962. [30] S. Grillner, “Biological pattern generation: the cellular and compu- tational logic of networks in motion, ” Neur on , vol. 52, no. 5, pp. 751–766, 2006. [31] A. Geramipour, M. Khazaei, A. Marjaninejad, and M. Khazaei, “Design of fpga-based digital pid controller using xilinx sysgen R for regulating blood glucose level of type-i diabetic patients, ” Int J Mechatr on Electr Comput T echnol , vol. 3, no. 7, pp. 56–69, 2013.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment