In the paper they investigate using a human reward signal in combination with environmental rewards for a reinforcement learning agent. In particular they analyze eight different ways to combine these two reward signals for performance gains. This makes an important contribution in formalizing the impact of social guidance on a reinforcement learning process.
As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able to teach agents desired behaviors. Recently, the tamer framework was introduced for designing agents that can be interactively shaped by human trainers who give only positive and negative feedback signals. Past work on tamer showed that shaping can greatly reduce the sample complexity required to learn a good policy, can enable lay users to teach agents the behaviors they desire, and can allow agents to learn within a Markov Decision Process (MDP) in the absence of a coded reward function. However, tamer does not allow this human training to be combined with autonomous learning based on such a coded reward function. This paper leverages the fast learning exhibited within the tamer framework to hasten a reinforcement learning (RL) algorithm’s climb up the learning curve, effectively demonstrating that human reinforcement and MDP reward can be used in conjunction with one another by an autonomous agent. We tested eight plausible tamer+rl methods for combining a previously learned human reinforcement function, H, with MDP reward in a reinforcement learning algorithm. This paper identifies which of these methods are most effective and analyzes their strengths and weaknesses. Results from these tamer+rl algorithms indicate better final performance and better cumulative performance than either a tamer agent or an RL agent alone.
Last week I attended the IEEE International Conference on Development and Learning, held at the University of Michigan. This is an interesting conference that I’ve been going to for the past few years. It’s goal is to very explicitly mingle researchers working on Machine Learning and Robotics with researchers working on understanding human learning and development.
My lab had two presentations
“Optimality of Human Teachers for Robot Learners” (M. Cakmak, A. L. Thomaz): Here we take the notion of teaching in Machine Learning Theory, and analyze the extent to which people teaching our robot are adhering to theoretically optimal strategies. Turns out they teach about positive examples optimally, but not negative. And we can use active learning in the negative space to make up for people’s non-optimality.
“Batch vs. Interactive Learning by Demonstration” (P. Zang, R. Tian, A.L. Thomaz, C. Isbell): We show the computational benefits of collecting LbD examples online rather than in a batch fashion. In an interactive setting people automatically improve their teaching strategy when it is sub-optimal.
And here are some cool things I learned at ICDL.
Keynote speaker, Felix Warneken, gave a really interesting talk about the origins of cooperative behavior in humans. Are people helpful and good at teamwork because you learn it, or do we have some predisposition? His work takes you through a series of great experiments with young children, showing that helping and cooperation are things we are at least partly hardwired to do.
Chen Yu, from Indiana, does some really nice research looking into how babies look around a scene, and how this is different than adults or even older children. They do this by having them wear headbands with cameras, then they can do some nice correlations across multiple video streams and audio streams to analyze the data. For younger children, visual selection is very tied to manual selection. And the success of word learning is determined by the visual dominance of the named target.
Vollmer et al, from Bielefeld, did an analysis of their motionese video corpus, and showed the different ways that a child learner gives feedback to an adult teacher. Particularly that this changes from being dominated by gaze behaviors, to more complex anticipatory gestures between the ages of 8mo to 30 mo.
Several papers touched on the topic of Intrinsic motivation for robots, as inspired by babies and other natural learners. Over the past few years there has been growing interest in this idea. People have gone from focusing on curiosity and novelty, to competence and mastery. There were papers on this topic from Barto’s lab, and from Oudeyer’s. The IM CLeVeR project was also presented, this is a large EU funded collaboration that aims to address intrinsic motivation for robots.
The Nineteenth Edition of the Robotics Program at AAAI is happening right now in Atlanta GA, July 12-15th, at the Westin Peachtree Plaza. This year’s event is co-chaired by Monica Anderson and Andrea Thomaz, and is sponsored by the NSF, Microsoft Research, and iRobot.
The AAAI Robotics Program has a long tradition of demonstrating innovative research at the intersection of robotics and artificial intelligence. This year, the AAAI-10 Robotics Program will feature a workshop on “Enabling Intelligence through Middleware” (July 12th) and a robotics exhibition (July 13-15th) with the following demonstrations of intelligent robotics on display Tues-Thursday, in the Vinings rooms on the 6th floor (near registration).
Robotic Chess: Small Scale Manipulation Challenge: The AAAI-2010 Small-Scale Manipulation Challenge is designed to highlight advances in embodied intelligence using smaller than human size robots. Robotic chess requires the integration of sensing, planning and actuation and provides an opportunity for performance on a common, well-defined task. The chess challenge will run on Tuesday July 13th, with matches at 10am, 1pm, and 4pm. Additionally the chess robots will be on display throughout the exhibit.
Learning by Demonstration Challenge: This will be the second annual exhibit and challenge on robot Learning by Demonstration (LbD). The purpose of this event is to bring together research and commercial groups to demonstrate complete platforms performing LbD tasks. Our long-term aim is to define increasingly challenging experiments for future LbD events and greater scientific understanding of the area. This year 5 teams will bring robots that will learn a sorting task from a human teacher. The LbD challenge will run on Wednesday July 14th at 4pm. And the LbD robots will be on display throughout the exhibit.
Robotics Education Track: This venue offers an accessible and flexible opportunity for undergraduate, early graduate, or pre-college student teams to design, implement, and demonstrate an autonomous robotic system. The tasks involved span physically-embodied AI: exploration, interaction, and learning within an unknown environment. In the long run, we hope to motivate hands-on AI robotics investigation both for its own sake and in service to other academic disciplines and educational goals. This year we have 8 university teams contributing to this part of the exhibit.
I had a unique opportunity yesterday, I was invited to participate in a PCAST workshop (the President’s Council of Advisors on Science and Technology). The theme of the meeting was Bio/Info/Nano tech, what exciting opportunities are happening in these fields that will create jobs in the US, and what the government can do to spur innovation. I’ll probably have a couple of SWMR posts about the discussion, and thought I’d start off with one about my contribution to the discussion, since I was the only roboticist in the room.
They had a wide range of discussants, several of us were early career researchers, which I think were invited to share our “what’s new and exciting that’s going to create jobs” point of view. Another contingent of the discussion group were more seasoned researchers and entrepreneurs, that had a sort of from the trenches perspective of how the government’s support of basic research has changed over the years.
Each discussant had the opportunity in a three minute introduction to make a statement to the council. Here’s a recap of what I said:
The technology opportunity I decided to highlight is service robotics, because they have the potential to dramatically impact such a diverse set of societal needs. Robots that are capable of working alongside people will revolutionize workplaces, for example in manufacturing.
Robotics represents perhaps our best opportunity to achieve higher levels of domestic manufacturing agility and overall productivity needed to retain high-value manufacturing jobs in the U.S., provided that the current state of the technology can be significantly advanced.
Today’s industrial robots lack the capabilities required to do more than just blindly execute pre-programmed instructions in structured environments. This makes them expensive to deploy and unsafe for people to work alongside.
There is an opportunity to usher in a new era of agile and innovative manufacturing by developing service robots as co-workers in the manufacturing domain. These capable assistants would work safely in collaboration and close proximity to highly skilled workers. For example, providing logistical support, automatically fetching parts, packing/unpacking, loading, stacking boxes, emptying bins, detecting and cleaning spills.
Very similar logistical robotic support could help streamline the operation of hospitals, driving healthcare costs down.
In order to realize this vision, we need to move beyond robots only operating in relatively static structured environments. This presents several research challenges, and I think that the following three are most critical to progress.
– This requires advances in sensing and perception technology, allowing robots to keep track of a dynamically changing workplace.
– Manipulation is a key challenge as well, robots need the flexibility to be able to pickup and use objects in the environment without tedious pre-programming of specialized skills.
– Finally, an important challenge in bringing these robots to fruition is advances in human-robot interaction. We need these robots to work safely and efficiently in collaboration with human workers. People can’t just be seen as an obstacle for the robot to navigate around, the robot needs to reason about and understand people as interaction partners.
Recently, over 140 robotics experts across the country have come together to articulate a national robotics initiative, a robotics research roadmap. This roadmap lays out the target areas where we think robotics research efforts need to be supported in order to bring about robot technology that will have the biggest impact on our economy and our society.
The comment I got from one of the council members was interesting, she said (I’m paraphrasing) “Aren’t you leaving out the challenge of Sentience or AI needed?” I only had time for a short answer, and said something to the effect that, yes, I think that the notion of AI cuts across all of the three areas I mentioned, but particularly human-robot interaction. In order for a robot to work side-by-side with a human partner it will need human compatible intelligence capabilities.
But here on SWMR, I’ll give the longer answer….that, no I don’t think we need AI for service robots. Or I don’t think that’s what we should call it. Yes, perception and manipulation and HRI and autonomy in general all fit under the big umbrella term of AI. But the term AI is so vague, and it makes people think of science fiction, which then makes you feel like robots in society is some pipe dream far in the future. So, particularly in settings like PCAST where people want to hear about concrete objectives and job creation, it does our field no good to just lump everything under the term AI.
If instead we talk about the specific intelligence challenges suddenly it all seems much more achievable, and you can imagine some semi-autonomous form of service robots being deployed in the not so distant future. We see that, hey sensing technology is getting better and better, and look at all the academic and industrial partners working on the manipulation problem, that seems achievable. And in terms of AI for human-robot interaction, yes we need to make some significant advances in computational models of social intelligence before robots can truly interact with people in unstructured environments. But do we need to solve AI? I don’t think so.
One of the most interesting aspects is the modular design. As seen in the video linked above, the shells are easily changed out to create different looks, changing from male to female for example. Another unique characteristic is the facial features. They had the goal of having “no holes” in the face, which lead to a magnetic actuation design for the facial features (lips,brows). This was fun to see working, since its an idea we have been playing around with independently.
Looking forward to seeing more about what comes from the FloBi project!
We are excited about how well Simon did at the CHI 2010 Interactive Demo session last week. Our demo got a lot of traffic, especially during the opening event on Monday evening, and even got some coverage on PC World (who did the video below), engadget, and NPR.
This was Simon’s first venture out of the lab, so it has been interesting for forcing us to do more on-board perception, and generally putting the platform through its paces. We were doing an interactive learning demo, using active learning, where Simon learns a “cleaning up” task. The human teacher provides examples of what goes where. Simon uses these examples to build a model of what goes in each location. The teacher can also ask Simon if he has any questions, and the robot will point to or pick up an object that it is least certain about where it goes. In addition to learning, we added perception capabilities to give Simon more ambient awareness of the environment, with an attention mechanisms that pays attention to visually salient cues as determined by the Itti saliency model, as well as faces, and loud sounds.
Simon got to interact with hundreds of people, and was particularly popular with kids of the conference attendees.
And also got to meet the designer of his head shells, Carla Diana, who finally got to see the finished product in person.
Next summer, AAAI 2010 will be coming to Atlanta. I’m co-chairing the Robotics Exhibit with Monica Anderson. This is both an open exhibit for demonstrations of robotics research that intersects with AAAI, and demonstrations focused on specific challenge problems.
Each challenge is intended to be an experiment designed to motivate and evaluate an individual function of artificial intelligence for robotics, similar to the Semantic Robot Vision Challenge at AAAI-07.
This is the second year that Learning by Demonstration will be one of the topics. Last year we had open demonstrations of LbD systems. This year’s LbD event is being organized by Sonia Chernova, and folks are invited to optionally participate in a LbD challenge problem:
Optional challenge event in which all participants will perform an object relocation task that involves teaching the robot to move an object from one place to another. Each participating team will be provided with sample objects for practice in the weeks before the event. Due to differences in embodiment and learning algorithms, we expect to see a wide variety of approaches for performing the target behavior. A video showcasing the results will be compiled by the event organizers.
Applications for exhibitors aren’t due until later in the Spring, so plenty of time to get your learning robots ready for Atlanta!
This was by far the best robotics conference I’ve been to in a while. The diversity of topics, quality presentations, and highly engaged audience was great. Here are some highlights, in no particular order:
Sami Haddadin presented work from DLR about a robot co-worker, including previous work on safe robot control, and newer work that involves sensing a human in the workspace and the interaction schemes that are appropriate for different co-working scenarios.
Russ Tedrake presented his recent work on building robots that fly like birds! (e.g., perching on a string)
There was a presentation about the HRP-4C, which has been all over youtube for some time now — but I had not yet seen this video which the speaker announced as the “worlds first robot bride“
Dillman’s lab at Karlsruhe presented their work on interactive learning in the humanoids session, showing lots of great video of their robots doing kitchen tasks.
Prof. Inaba presented an overview of their lab’s work at the University of Tokyo, and the shear number and diversity of robots had their American colleagues drooling. This is where it shows that US robotics research doesn’t get near the level or longevity of funding you see in Japan and the EU.
New Scientist covered the recent IJCAI Robotics Event. One of the themes of the workshop and event was maximizing the potential for AI research and Robotics research to co-mingle. A major challenge in this respect is a lack of “out of the box” hardware platforms and software architectures for people to play around with. In the software department, people are talking about this as the need for an Operating System for robots.
This is what Microsoft would like Robotics Developer Studio to be. Willow Garage would like to see standardization around ROS. And as they mention on their wikipage, there have been so many of these kinds of projects in the past aiming for standards in the robot software/hardware interface.
I think it will be great when any of these projects gets the kind of critical mass that will create a standardization around it. To some extent, I don’t even care which one it is, but I do feel that an open source solution like ROS is going to be the most successful. The community of developers that need a Robot OS right now are definitely not waiting for someone else to deliver what they need. They are currently rolling their own solutions and the best way to create a standard is to direct that collective energy towards the same end.
The New Scientist article points to some of the current barriers to a standard OS, each robot out there has unique hardware and is often designed for a specific purpose and its software is optimized for that purpose.
In addition to this, when I’ve had conversations with people about standardization, the biggest barrier seems to be that everyone has their way of getting things done now that works for them, and they’d prefer everyone standardize to their way. But in the end, as said well by Chad Jenkins and Brian Gerkey, the frustration of endless re-implementation will eventually drive us to standardize.