In the paper they investigate using a human reward signal in combination with environmental rewards for a reinforcement learning agent. In particular they analyze eight different ways to combine these two reward signals for performance gains. This makes an important contribution in formalizing the impact of social guidance on a reinforcement learning process.
As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able to teach agents desired behaviors. Recently, the tamer framework was introduced for designing agents that can be interactively shaped by human trainers who give only positive and negative feedback signals. Past work on tamer showed that shaping can greatly reduce the sample complexity required to learn a good policy, can enable lay users to teach agents the behaviors they desire, and can allow agents to learn within a Markov Decision Process (MDP) in the absence of a coded reward function. However, tamer does not allow this human training to be combined with autonomous learning based on such a coded reward function. This paper leverages the fast learning exhibited within the tamer framework to hasten a reinforcement learning (RL) algorithm’s climb up the learning curve, effectively demonstrating that human reinforcement and MDP reward can be used in conjunction with one another by an autonomous agent. We tested eight plausible tamer+rl methods for combining a previously learned human reinforcement function, H, with MDP reward in a reinforcement learning algorithm. This paper identifies which of these methods are most effective and analyzes their strengths and weaknesses. Results from these tamer+rl algorithms indicate better final performance and better cumulative performance than either a tamer agent or an RL agent alone.
The Robots podcast describes themselves as “the podcast for news and views on robotics. In addition to insights from high-profile professionals, Robots will take you for a ride through the world’s research labs, robotics companies and their latest innovations.”
I find much of Gopnik’s work inspiring for robot learning, and the ideas in this article are a good example. She lays out evidence and findings related to the difference between adult and child learning. In many ways children are much better at learning and exploring than adults. They observe and create theories that are consistent with a keen probabilistic analysis of seen events. These theories guide their “play” or exploration in a way that efficiently gathers information about their complex and dynamic world.
The description of adult versus child-like learning sounds like the traditional explore/exploit tradeoff in machine learning. But this raises a question we are often asked with respect to robot learning, do we actually want robots to explore like children? I think the answer is yes and no. We probably don’t want robots to need a babysitter, but we do want robots to exhibit the kind of creativity and experimentation that you see in some of Gopnik’s studies of causal structure for example.
I’m most excited about the idea that Gopnik ends the article with: “But what children observe most closely, explore most obsessively and imagine most vividly are the people around them. There are no perfect toys; there is no magic formula. Parents and other caregivers teach young children by paying attention and interacting with them naturally and, most of all, by just allowing them to play.”
I think that the importance of social learning in human development is a strong argument for robot learning by demonstration or instruction—that we should be looking for the short cuts and computational gains we can get from leveraging a partner.
“Robotics off the shelf: stronger, faster, cheaper … now what?”
Like the development of personal computers through the 1970s and 80s, an explosion of increasingly stronger, faster and cheaper robot platforms are emerging and becoming available as commercial-off-the-shelf products. These robots have a growing capability to identify relevant aspects of varied environments, find and manipulate objects of interest, traverse diverse terrain and act in a socially acceptable manner. As these robots make their way into society, there are questions to address: How will society use these robots? What are the uses we have yet to dream up? How does artificial intelligence meet these needs?
Technological revolutions like these are driven by a synergy between hardware platforms that manipulate physics and software that enables user applications, so a robot platform is only as good as the applications where it can be utilized. During my formative years of the 1980s, the personal computer was mostly an expensive novelty device with specialized applications that were often difficult to run with tedious user interfaces. Computing of this era was driven by slow systems with command-line interfaces and floppy disk drives that are a far cry from today’s user friendly systems. Relatively few were willing to climb the learning curve for applications such as VisiCalc, an early spreadsheet, or Summer Games on an Apple IIe, Commodore 64, or IBM PC. Over time, developments in software created the synergy between hardware and software development where advances on one side pushed the other side to meet and exceed new requirements. As a result, we now have a wealth of highly relevant and crucial software applications on a variety of computing devices, from desktops to supercomputers to smartphones. More importantly, our modern computing culture has increasingly succeeded in enabling greater populations of people to explore new forms of content and new applications without specialized training in computing. Brooks describes these trends as “exponentials” and provides a more in depth treatment of the relationship between robotics and general computing exponentials in his recent talks (http://fora.tv/2009/05/30/Rodney_Brooks_Remaking_Manufacturing_With_Robotics) as well as recent robotics roadmapping efforts (http://www.us-robotics.us/).
While I see robotics following a similar evolution to personal computing, there are two issues that make the robotics revolution distinctly challenging: uncertainty and purpose. The growth of personal computing has been due in large part to the “write local, run global” approach to software development. That is, a program written by a developer (write local) will reliably perform the same way when distributed to users across the world (run global) as for the original developer. Write local, run global is enabled due to reliable modeling of information through manipulating the physics of electricity in closed and controlled systems buried deep inside computing devices. In robotics, however, physical interactions are much more messy and uncertain.
Consider the task of taking out the trash, let’s say given an iRobot PackBot or a Willow Garage PR2. The steps to do this at a workplace may involve taking a bin from beside your desk in your office to a larger receptacle within the building. At home, that task will be different as it may be behind a cabinet door and may need to be taken outside. There may be an elevator at work or stairs at home.
It appears to be a simple task, but the rote programming of such a task requires the ability to recognize the object “trash can” from its appearance, determine how to grasp, carry and unload the bin without making a mess, as well as specific knowledge of the environment. Developing such a robot controller, or software, will surely require specialized training for computer programming as well as a significant cost in time and effort. Even after this controller was developed, our robot would only know how to remove trash in these two specific scenarios and potentially only for that user. Additional users may have their own distinct desires such as how certain bins should be carried to avoid damage, separate handling of recycling and interacting with household pets. And what happens when a user wants to repurpose the robot for a new task that the developer has yet to consider and implement? Will human users be able to adapt to these new capabilities and even develop their own? Just as computer scientists likely have a different vision for a website than graphic artists, there may be uses for robots that have yet to be considered by some scientists, but when robotics is made accessible, will become an emerging area for innovation.
Robot “learning from demonstration” (LfD) has emerged as a compelling direction for addressing the above issues by enabling users to create robot controllers and applications through instruction. Through LfD, robots are programmed implicitly from a user’s demonstration (or other forms of guidance) rather than explicitly through an intermediate form (e.g., hardcoded program) or task-unrelated secondary skills (e.g., computer programming). The intended behavior for a robot is “learned” from demonstrated examples of a human users intention. The key to unlocking the user’s desired robot controller lies in finding the hidden structure within this demonstration data.
Two trends in artificial intelligence give me strong belief that such robot LfD will become a reality. First, our ability to collect and process massive amounts of data for various problems has greatly improved. Successful examples include the use of Google for web search, reCAPTCHA for optical character recognition and emerging tools such as the Amazon Mechanical Turk. Second, progress in robot LfD is increasingly showing signs that many of the algorithmic pieces are in place to learn from human users. For example, my research group has been able to use LfD for various robotic tasks, such as enabling the iRobot PackBot to follow people and recognize their gestures and to acquire soccer skills for Sony AIBO robot dogs. Our work is only a small slice of the accomplishmentsacross the world, which includes learning tasks ranging from simple object fetching, to cooperative object stacking with humans, to highly dynamic ball-in-cup games and aerial flight maneuvers. As robot platforms and demonstration data collection increases, my conjecture is that learning algorithms for robot LfD will truly take hold.
Odest Chadwicke Jenkins
Assistant Professor of Computer Science
While we’re on the topic of biologically-inspired learning, here’s a video that recently hit YouTube of Alex Stoytchev’s robot. Their upper-torso robot is designed to learn by exploring its environment. In particular, they want to figure out how to get it to explore its environment and learn about objects in child-like ways. This is a problem that has been called affordance learning, learning what what effects objects in your environment produce when acted upon. In the video they demonstrate learning to classify 20 objects by sound only, with an action set of 5 exploratory actions.
I think affordance learning is an interesting topic for robots. We’re working on a slightly different problem than Alex’s lab, looking at how the robot can use people to help it learn about objects. But the end goal is the same, robots that could dynamically adapt to new environments without having to be pre-programmed with every skill needed.
I recently had a couple of interesting encounters with Amazon.com recommendations that are a nice examples of where I think Socially Guided Machine Learning could come into play. Learning about how to get things done in dynamic human environments is a hard problem, and maybe the best way to solve it is to let people help.
I have always been really happy with Amazon’s recommendations, I think because until recently I only bought books there. And their similarity metric for books and other such media works pretty well. A couple of months ago I became a parent, and started buying several non-book purchases on Amazon. And for many such purchases the similarity metric breaks down.
One example was diapers, right after I bought newborn diapers Amazon recommended that I buy diapers for toddlers. The second example is pictured above. I bought wooden letters to spell my son’s name, one of which was the letter “A” so amazon recommends “N.” This one I found particularly amusing. Imagine buying these wood letters at a store, and you pick up the letter “A” and a person comes over, “Oh, if you like the letter ‘A’ you’ll really love ‘N’ take my word for it!”
These two examples point out a hard problem with statistical machine learning. Coming up with the right similarity metric is often an art, and in the case of the multitude of things that are available on Amazon it is hard to imagine a similarity metric that would work well across their whole site. Sometimes their metric works and sometimes it doesn’t. The reason I bought the letter “A” is not actually very similar to the reason that I might buy the letter “N.” And the person that buys size 1 diapers is not actually that similar (in the timescale of minutes) to the person that will be buy size 4 diapers. There is additional knowledge about the world that comes into play in making this decision. And I think the interesting challenge for Socially Guided Machine Learning is to develop ways that people can help machines develop the right similarity metrics for decision making in various contexts.
Update: Another amusing email from Amazon.com, re the wooden letters… They are trying hard to learn more!
IJCAI 2009 is coming up (July 13-16th in Pasadena). If you are going to be there don’t miss the Robot Exhibit. Chad Jenkins and Monica Anderson have put together a great event. There are several “challenge” topics, one of which is Learning by Demonstration.
I think it is great to see robots at IJCAI/AAAI, as it is a well recognized vehicle to push the field of AI forward into real world (what Horvitz has described as open world) problems. The robot exhibit is going to be a yearly event, held in conjunction with various conferences (AAAI, IJCAI, and others). The 2010 event will be held at AAAI, and the challenge topics will be announced during this year’s event.
I’m running one of the AAAI Spring Symposia this year, Agents that Learn from Human Teachers, along with Cynthia Breazeal, Sonia Chernova, Dan Grollman, Charles Isbell, Fisayo Omojokun, and Satinder Singh.
Submissions are due by Oct. 3.
The symposium aims to bring together a multi-disciplinary group of researchers to discuss how we can enable agents to learn from real-time interaction with an everyday human partner, exploring the
ways in which machine learning can take advantage of elements of human-like social learning.
Topics of interest include, but are not limited to:
–ML for interactive, real-time learning
–supervised and semi-supervised learning approaches
–active learning approaches
–feature selection techniques
–methods for improving beyond the observed performance of the teacher based on the agent’s own successes and failures
It is geared toward people thinking about robots and software agents that learn with human input. And in addition to the AI and Machine Learning crowd, the organizing committee and I are looking to have a good representation of folks from developmental psych and social psych, to really address the human-side of the teaching-learning equation.
The project is about affordance learning, or learning about the effects of your actions in the world. The system gathers a training dataset, by playing with objects. It collects examples of the form: (perceptual context)(action performed)(observed effects). It learns SVM classifiers and is then able to predict effects for given action-context pairs.
We are interested in what would be different when the robot explores an environment by itself versus when it has a human teacher helping it explore the environment. Our intuition is that a teacher should make the process faster and more efficient, and we just finished an experiment that looks in detail at what exactly changes between self and social learning. In the experiment, people helped Junior by placing objects in the workspace for him to play with, helping learn affordances like “rollable,” “liftable,” “moveable.”
We’re in the process of analyzing and writing up the results, but we can say that both types of data sets result in reasonable classifiers. Social data sets are different in some interesting ways, like having much higher representation of positive examples. Junior was also able to consistently get people to provide help at just the right time, by using a gazing gesture when it couldn’t quite reach an object.
Details to come, but you can read more about the project here.
Many future applications for autonomous robots bring them into human environments as helpful assistants to untrained users in homes, ofﬁces, hospitals, and more. These applications will require robots to be able to quickly learn how to perform new tasks and skills from natural human instruction. The key here is to make it possible for the human to interact with the robot without having to read a manual.
The workshop on Interactive Robot Learning (IRL) was held at the Robotic Science and Systems 2008 conference. The discussion spanned the breadth of research questions at the intersection of Machine Learning and Human-Robot Interaction.
The workshop began with a keynote speaker, Jeff Orkin from MIT, with experience from the video game industry. Orkin gave an overview of a project in which he and his colleagues are collecting data from thousands of people playing a game called The Restaurant Game. In the game people act out a normal scene of being a waiter or a customer in a restaurant, thereby “teaching” the computer about social behavior and dialog that are common in this situation. One important thing that we learned from Orkin’s project related to IRL is that people will not always give perfect input or examples to your learning system. Therefore it is important to collect enough data and to use algorithms that let the anomalies wash out of the model.
In addition to invited keynote speakers we had 7 papers that were submitted, reviewed and selected to present at the workshop. In the morning session we had three of these speakers present. The first was Sylvain Calinon from EPFL, who spoke about a programming by demonstration framework that incorporates natural teaching mechanisms like paying attention to pointing and gaze direction of the teacher, and allowing the teacher to physically move the robot during training. Maren Bennewitz then presented a paper about recognizing gestures, like head nods, and hand gestures. Many of the gestures were quite generic and would have broad use in communicating to a robot learning system. Olaf Booij presented work on interactive mapping. In this project a robot learns a semantic map of places in a home, by clustering sensor data. It uses interaction with a human partner to simplify the task. In ambiguous situation it engages the human partner in a simple dialog, asking the name of the current location.
The afternoon session began with a keynote speaker, Jan Peters from the Max-Planck Institute of Biological Cybernetics. Peters works in robotics, nonlinear control, and machine learning. In his talk he covered a framework of motor skill learning for robots. This starts with parameterized motor primitives, used for movement generation, and then higher level tasks involve transforming these movements into motor commands. To achieve this Peters introduces an EM-based reinforcement learning algorithm, which learns smoothly without dangerous jumps in solution space. Additionally, learning smoothly in the solution space is likely to be the most understandable to a human teacher.
In the afternoon we had three more paper presentations. Two papers were in the realm of assistive robotics. Adriana Tapus presented a robot therapy system that learns and adapts on-line to personalize the therapy and maximize health benefits/outcomes. Their approach is a novel incremental learning method, positive results were found with both stroke rehabilitation patients and dementia patients. Ayanna Howard also presented an therapy application for interactive learning. Their goal is for the robot to be able to observe and evaluate a therapy patient’s exercises, assisting the job of the therapist. They presented two methods for learning to recognize therapy exercises from visual input. Jure Zabkar presented the final paper of this session about using qualitative representations in robot learning. Zabkar argued for this approach, as it results in models that are intuitive for non-expert human’s to inspect and understand, which is a key component of interactive learning.
The workshop ended with a final keynote speaker, Aude Billard from EPFL. Billard has made several contributions over the years, in the realm of robot programming by demonstration, and her talk covered many aspects of the work that her lab has tackled on the problem of imitation learning for robots. In particular their focus on the complementary problems of “what to imitate”, and “how to imitate.” The first is about determining the key components or features that really represent the goal of a task). Having a framework for determining “what to imitate” give the system means to generalize appropriately. Their approach is based on Gaussian Mixture Models. The “how to imitate” problem involves translating motions seen by a human to motions the robot itself can do, and also achieving the goal of the task. Their approach is a stable dynamical system, active in a hybrid cartesian-joint angle frame of reference. This ends up being able to handle perturbations in the environment, joint angle limits, and adapts to changes in target position.