Infants are experts at playing, with an amazing ability to generate novel structured behaviors in unstructured environments that lack clear extrinsic reward signals. We seek to mathematically formalize these abilities using a neural network that implements curiosity-driven intrinsic motivation. Using a simple but ecologically naturalistic simulated environment in which an agent can move and interact with objects it sees, we propose a “world-model” network that learns to predict the dynamic consequences of the agent’s actions. Simultaneously, we train a separate explicit “self-model” that allows the agent to track the error map of its world-model. It then uses the self-model to adversarially challenge the developing world-model. We demonstrate that this policy causes the agent to explore novel and informative interactions with its environment, leading to the generation of a spectrum of complex behaviors, including ego-motion prediction, object attention, and object gathering. Moreover, the world-model that the agent learns supports improved performance on object dynamics prediction, detection, localization and recognition tasks. Taken together, our results are initial steps toward creating flexible autonomous agents that self-supervise in realistic physical environments.
While the training of deep convolutional neural networks on large-scale image tasks has yielded impressive performance, neural networks need millions of labeled examples for training. This is definitely not how humans learn. We get some external semantic labels, but in very different ways, and learn much without them. Further, we, as Scientists in the Crib, interact with the world, exploring, testing, investigating. We play. And play. And play…

To bring the power of this learning into the realm of artificial intelligence, we endeavor to make an AI Baby. In this proof-of-concept work, we have engineered a 3D virtual environment and a “telekinetic magician” baby agent. The agent can swivel its head, move around the room, apply forces to objects, if close enough, and receives back images of what happened, given an action.

Under a random policy, interesting interactions with the objects in this environment are very rare and happen only about 1% of the time. 99% of the time, the agent experiences ego motion, but nothing else - experience that is in a sense boring. We would like to make an agent that plays with the objects just as an infant would do. Thus, successful and interesting behavior is measured by time spent playing with objects, as well as distances between the objects. Further, we would like understand in what ways these play behaviors lead to better representations of the world, both visually and in physical dynamics prediction. A key feature is that our models use deep convolutional structures trained from scratch on the experience of the agent as it goes.
Now, what system might execute this task? Following a long line of work in curiosity, we use a world-model component that learns to understand its experience in the world, along with a self-model component that learns to interact with the world in a way that exposes the world-model to interesting stimuli. The action choice is adversarial: the agent chooses actions in order to drive up its world-model loss.

First, let’s consider our simplest setup, that with one object in a room, and compare the succession of behaviors of one of these curious agents with a random baseline. First, both agents explore the room, performing ego motions and learning from them. Then, the training loss of the curious agent goes up, corresponding to a steady increase in time spent playing with the object. It learns from this and get much better on held-out object interaction data than the random baseline.

In a two-object setup, we see a further behavior emerge: the curious agent learns to gather two objects together, and interacts with them simultaneously.

The curious agent also exhibits emergent navigation and planning abilities. It learns to search for objects and keeps them in view once it founds them, moving towards objects and applying forces to gather objects in front of it.

Interestingly, in evaluating the visual encoding with an object classification transfer tasks, we see that this more sophisticated behavior leads to better performance. The self-aware agent achieves a 39.7 % accuracy on a 16-way object recognition task which is much higher than the random policy which only achieves a 12.3 % accuracy.
In future work, we hope not only to use human-inspired learning to make more robust AI, but to be able to use these as cognitive models to better understand early learning as well as developmental differences. In particular, next AI goals include exploring these sorts of behaviors in the context of realistic agent embodiment, as well as curiosity in the presence of other agents.

Thanks so much for reading. Please check out our paper and code repository for more and do not hesitate to contact us if you have any questions!
To cite our paper please use:
@inproceedings{haber2018learning,
title={Learning to Play with Intrinsically-Motivated Self-Aware Agents},
author={Haber, Nick and Mrowca, Damian and Fei-Fei, Li and Yamins, Daniel LK},
booktitle={Advances in Neural Information Processing Systems},
year={2018}
}