The folks over at robots.net posted recently about the Intelligent Adaptive Curiosity work of Oudeyer and Kaplan at Sony’s Computer Science Lab in Paris. And as this is one of my favorite topics these days, I thought I’d do a follow-up.
In most machine learning examples, learning is an explicit activity. The system is designed to learn a particular thing at a particular time. But with people, on the other hand, there is a motivation or a drive to learn new things and achieve new goals. People have curiosity about new environments and experiences, and are able to judge their level of mastery in the environment. Learning is not activity, but is part of all activity.
Internal motivation drives the exploration process, causes the learner to recognize learning opportunities and take advantage of them, and to have the right amount of focus on a particular problem. Essentially, these internal motivations help a child or an adult or an animal, learn the right thing at the right time.
Machine Learning researchers would *love* to have a machine that learns flexibly, and proactively explores an environment and understand what kinds of things can be achieved in this environment. Which is why there has been focus recently by a few researchers on how to incorporate internal motivation into machine learning algorithms, to try to achieve some of the efficient learning by exploration that is seen in humans and animals. The following are two different approaches in Motivated Reinforcement Learning:
Intelligent adaptive curiosity is an approach that uses a Progress Drive, where learning progress is defined as the error in the prediction model, P(St+1 | St, a). In essence, the agent is `motivated’ to learn the world completely as the reward signal is defined by the agent’s world knowledge.
Intrinsically Motivated Reinforcement Learning uses intrinsic rewards in combination with extrinsic environmental rewards. In this case, intrinsic reward is proportional to the novelty of a state transition: (1-P(St+1|St)). New `skills’ or options are learned via Q-learning whereby the reward is the combination of the intrinsic reward and any extrinsic reward from the environment. Thus, a novel state change initially increases the reward received after that state change and this diminishes over time until the reward is only the extrinsic reward from the environment.
I think computational implementations of curiosity are a great start. What other drives are needed other than curiosity? Piaget talked about the two competing motivations of novelty (curiosity) and mastery. I think an important next step in these curiosity driven systems is a multifaceted motivation system that represents the push-pull of novelty and mastery. The system should be driven to new situations because it seeks novelty, but then pull back to practice known skills because there is also a desire to mastery the environment.