So, Where’s My Robot?

Thoughts on Social Machine Learning

Is NO the opposite of YES?

This is what I’ve been thinking about for the last couple of days, in preparing a paper for the IEEE Symposium on Robot Human Interactive Communication this summer (in Korea, nice!).

This is a topic I’ve been interested for quite a while now, the asymmetry of feedback. In Reinforcement Learning it’s common to represent the distinction between good and bad feedback with the sign of a scalar reward signal (positive=good; negative=bad). Since RL algorithms are based on maximizing the sum of rewards over time, this works out: positive feedback increases the sum; negative feedback decreases it.

But I’m interested in people giving feedback to learning machines…and it quickly becomes clear that this simplification of a reward signal doesn’t capture what people mean with positive/negative feedback. People’s feedback doesn’t just boil down to a scalar value from -1 to 1. About a year ago we did some experiments with people training an RL agent and showed that almost all of the people in our study had an asymmetric (positive bias) to their rewards.

This isn’t too surprising, since biological systems clearly don’t have symmetric responses to positive and negative feedback. Evidence from neuroscience shows that positive and negative feedback stimulate physically different locations in the brain.

But the thing I’m interested in is what this should mean for a learning agent. This evidence of asymmetry in nature doesn’t tell us how or why to include asymmetry in our computational learning model, but it does inspire us to search for computational grounds for such inclusion with the goal of developing more efficient and robust learning algorithms.

In the paper we show two implementations (one with Leo one with Sophie) of strategies for treating good and bad feedback differently. In the first one, Leo assumes that a task demonstration followed by negative feedback will lead to refinement of that example. This lets the robot quickly refine the hypothesis space with the human partner, turning negative examples into positive ones. In the second, the Sophie’s Kitchen agent assumes that negative feedback means that it should ‘undo’ or ‘do over’ the last action. In experiments with human trainers, this version has significantly better learning behavior. The size of the state space visited is much smaller, there are significantly fewer failures, and fewer actions are needed to learn the task.

So, it seems that for a learning agent, NO is not simply the opposite of YES!

May 31st, 2007 Posted by | HRI, Machine Learning | one comment

Enter your password to view comments.

1 Comment »

No comments yet.

Leave a comment