Infants are experts at playing, with an amazing ability to generate novel structured behaviors in unstructured environments that lack clear extrinsic reward signals. We seek to mathematically formalize these abilities using a neural network that implements curiosity-driven intrinsic motivation. Using a simple but ecologically naturalistic simulated environment in which an agent can move and interact with objects it sees, we propose a “world-model” network that learns to predict the dynamic consequences of the agent’s actions. Simultaneously, we train a separate explicit “self-model” that allows the agent to track the error map of its world-model. It then uses the self-model to adversarially challenge the developing world-model. We demonstrate that this policy causes the agent to explore novel and informative interactions with its environment, leading to the generation of a spectrum of complex behaviors, including ego-motion prediction, object attention, and object gathering. Moreover, the world-model that the agent learns supports improved performance on object dynamics prediction, detection, localization and recognition tasks. Taken together, our results are initial steps toward creating flexible autonomous agents that self-supervise in realistic physical environments.

While the training of deep convolutional neural networks on large-scale image tasks has yielded impressive performance, neural networks need millions of labeled examples for training. This is definitely not how humans learn. We get some external semantic labels, but in very different ways, and learn much without them. Further, we, as Scientists in the Crib, interact with the world, exploring, testing, investigating. We play. And play. And play…

Playing child
A curious child playing with toys.

To bring the power of this learning into the realm of artificial intelligence, we endeavor to make an AI Baby. In this proof-of-concept work, we have engineered a 3D virtual environment and a “telekinetic magician” baby agent. The agent can swivel its head, move around the room, apply forces to objects, if close enough, and receives back images of what happened, given an action.

Environment Setup
Our telekinetic magician baby in its 3D virtual environment.

Under a random policy, interesting interactions with the objects in this environment are very rare and happen only about 1% of the time. 99% of the time, the agent experiences ego motion, but nothing else - experience that is in a sense boring. We would like to make an agent that plays with the objects just as an infant would do. Thus, successful and interesting behavior is measured by time spent playing with objects, as well as distances between the objects. Further, we would like understand in what ways these play behaviors lead to better representations of the world, both visually and in physical dynamics prediction. A key feature is that our models use deep convolutional structures trained from scratch on the experience of the agent as it goes.

Now, what system might execute this task? Following a long line of work in curiosity, we use a world-model component that learns to understand its experience in the world, along with a self-model component that learns to interact with the world in a way that exposes the world-model to interesting stimuli. The action choice is adversarial: the agent chooses actions in order to drive up its world-model loss.

Curious Model Setup
Self-aware model. The world-model learns to understand the world dynamics. The self-model learns to interact with the world in an interesting way by choosing actions that maximize the world-model loss.

First, let’s consider our simplest setup, that with one object in a room, and compare the succession of behaviors of one of these curious agents with a random baseline. First, both agents explore the room, performing ego motions and learning from them. Then, the training loss of the curious agent goes up, corresponding to a steady increase in time spent playing with the object. It learns from this and get much better on held-out object interaction data than the random baseline.

One object experiment
One object experiment. The self-aware agent learns to predict its ego motion, focus on objects and playing with them. The random baseline does not show any interesting behavior emerge.

In a two-object setup, we see a further behavior emerge: the curious agent learns to gather two objects together, and interacts with them simultaneously.

Two object experiment
Two object experiment. The self-aware agent further learns to gather objects together additionally to predicting ego motion, detecting objects and playing with them. The random baseline does not show this behavior.

The curious agent also exhibits emergent navigation and planning abilities. It learns to search for objects and keeps them in view once it founds them, moving towards objects and applying forces to gather objects in front of it.

Navigation behavior
Navigation and planning behavior. The self-aware agent learns to navigate towards objects and to plan actions that keep found objects in view.

Interestingly, in evaluating the visual encoding with an object classification transfer tasks, we see that this more sophisticated behavior leads to better performance. The self-aware agent achieves a 39.7 % accuracy on a 16-way object recognition task which is much higher than the random policy which only achieves a 12.3 % accuracy.

In future work, we hope not only to use human-inspired learning to make more robust AI, but to be able to use these as cognitive models to better understand early learning as well as developmental differences. In particular, next AI goals include exploring these sorts of behaviors in the context of realistic agent embodiment, as well as curiosity in the presence of other agents.

Human-centered AI Model Development
Human-centered AI model development. We use observations from human developmental experiments to guide the development of AI algorithms. Conversly, we use AI algorithms to explain observations in those areas.

Thanks so much for reading. Please check out our paper and code repository for more and do not hesitate to contact us if you have any questions!

To cite our paper please use:

@inproceedings{haber2018learning,
  title={Learning to Play with Intrinsically-Motivated Self-Aware Agents},
  author={Haber, Nick and Mrowca, Damian and Fei-Fei, Li and Yamins, Daniel LK},
  booktitle={Advances in Neural Information Processing Systems},
  year={2018}
}