I recently taught a two-week introduction to HRI for our new crop of Robotics PhD students. It was an interesting exercise, to boil down some of the main topics and issues of HRI into six lectures/meetings. My goal was to communicate how some traditional robotics problems become conceptually different when you add a human-in-the-loop. And that there are some fundamental human abilities that make up social intelligence.
One amazing human capability is our propensity to recognize goal-directed motion. Whereas a computer vision algorithm struggles to parse a stream of video and determine whether there are people in it and what they might be doing, humans do this naturally from a very young age.
I think one of the most interesting findings in this realm for a roboticist is the work with infants that shows some of the principles of recognizing goal-directed action. Csibra performed a series of experiments with infants and children looking at the how they interpret simplified intentional action represented by geometric shapes, something like this famous Heider-Simmel video.
People watch this video and see a complex series of social dynamics and goal-directed actions. Most of the state of the art ways that robots use computer vision would generally see this as a bunch of pixels moving around. What this says to me is that perhaps we need a completely new approach to activity recognition. In a standard approach you first find a human, then track the human and identify their moving body parts, and then compare the way the parts are moving around to pre-existing models of various human activities. But clearly if I can so easily attribute human intentional action to squares and triangles, there is a much simpler feature space in which robots should be reasoning about intentional action.
I’m excited about the work of Dare Baldwin at the University of Oregon, she is working to uncover what low-level spatio-temporal features infants might be attending to when they correctly interpret intentional action. Which at the very least can provide inspiration if not a detailed roadmap for building some intentional action recognition into our social robots.
So where’s my robot?
Let’s focus on the my part of this question:
To me, “my” robot means a robot that just gets me. One that understands me on an intuitive, physical, gut-feeling level. A robot that moves in sync with my rhythm, that anticipates my every move and wish. One that’s exactly where I want it, and precisely when I want it to be there.
So when I envision the future of personal robotics, I imagine machines that move around the home or workplace in perfect synchrony with us; dance a subtle dance with the humans in their surroundings; robots that seem to appear out of thin air just where you want them, and a second before you really devised the request. And not in a creepy sneaking-up-on-you kind of way, but rather in a falling-in-love I-can’t-believe-we-just-said-that-at-the-same-time kind of way.
Whether we’re talking about robotic assistants to surgeons, or a machine that helps you unpack after a move – personal robots will only truly deserve their adjective when they adapt their physical movements to ours. It’s us humans’ fault: we’re just suckers for good timing.
While well-trained human teams can easily display such a beehive-like flutter of coordinated activities (don’t you love watching pit stop teams?), most robots interacting with humans today drag us into what amounts pretty much to a waiting game. It’s slow, choppy, unintuitive, and is usually structured in a somewhat tedious turn-taking fashion.
Why is that? What stands between us and my harmonic utopia of synchronized personal robotics? My hunch says it’s the brain. Ours and theirs. Could it be that most people who have been devising robots in academia and industry have been, well, too cerebral?
The pun is well intended. Considering that most robots are made by us mega-geeks, it’s no accident that robots are designed as brains with bodies, as computers with motors and sensors, and not the other way around. We CS types like to think in abstractions, in black boxes, in data and control flow, in learning as rules and as structured information. And most of us don’t like to exercise.
But when we think about humanity’s most important and impressive behaviors, from walking and grasping objects, to collaboration and communication, and to artistic and athletic performance – we don’t learn it by making rules and categorizing decisions. Or at least we don’t get very good at it.
Instead we get good by practicing, by putting our bodies in there, and “there” is usually something repetitive and difficult. One cannot learn to ski in the classroom, and dance ensembles cannot perfect their fluency through written correspondence. You can read all you want about how to play the piano, but to play the piano well you need to actually go over the same passages again and again. And just like the visit to the dentist in Jarmush’s “Coffee and Cigarettes”, you have to do it yourself, with your own body. You can’t have some intern send you the Cliff notes for Chopin’s Waltz Op. 34 No. 2.
This is especially true in collaborative activities. You and your partner can practice the tango all week long in your own separate studios; if you don’t rehearse together, using your own co-located bodies, the chance that you’ll be ready for showtime is close to nil.
One of the most fascinating questions to me these days is what this thing called practice really is, and how it is distinct from traditional notions of learning. Why is it embodied, and what does this endless repetition bring?
I believe we are bodies with brains. And so should robots be. When modeling practice, we should try to steer away from information-driven models, and instead make the robots use their bodies, for example by modeling activity in perception and action networks, and by exploring physical repetition. In my own work, I’ve seen that people seem to connect on a deep emotional level to robots that improve through practice in subtle physical ways, anticipating the human’s motions, and moving in sync with them.
Who knows where those future robots will practice. Maybe there will be a robot playground that they have to spend some time in, between factory and consumer. And maybe some of the practice will be at the customer’s homes. Will consumers have the patience for their new robot to get better at its tasks after they already bought it?
My non-cerebral embodied gut-feeling says they will.
Georgia Institute of Technology Center for Music Technology