Facebook is training robot assistants to hear as well as see

The algorithms build on FAIR’s work in January of this year, when an agent was trained in Habitat to navigate unfamiliar environments without a map. Using just a depth-sensing camera, GPS, and compass data, it learned to enter a space much as a human would, and find the shortest possible path to its destination without wrong turns, backtracking, or exploration.

The first of these new algorithms can now build a map of the space at the same time, allowing it to remember the environment and navigate through it faster if it returns. The second improves the agent’s ability to map the space without needing to visit every part of it. Having been trained on enough virtual environments, it is able to anticipate certain features in a new one; it can know, for example, that there is likely to be empty floor space behind a kitchen island without navigating to the other side to look. Once again, this ultimately allows the agent to move through an environment faster.

Finally, the lab also created SoundSpaces, a sound-rendering tool that allows researchers to add highly realistic acoustics to any given Habitat environment. It could render the sounds produced by hitting different pieces of furniture, or the sounds of heels versus sneakers on a floor. The addition gives Habitat the ability to train agents on tasks that require both visual and auditory sensing, like “Get my ringing phone” or “Open the door where the person is knocking.”

Of the three developments, the addition of sound training is most exciting, says Ani Kembhavi, a robotics researcher at the Allen Institute for Artificial Intelligence, who was not involved in the work. Similar research in the past has focused more on giving agents the ability to see or to respond to text commands. “Adding audio is an essential and exciting next step,” he says. “I see many different tasks where audio inputs would be very useful.” The combination of vision and sound in particular is “an underexplored research area,” says Pieter Abeel, the director of the Robot Learning Lab at University of California, Berkeley.

Each of these developments, FAIR’s researchers say, brings the lab incrementally closer to achieving intelligent robotic assistants. The goal is for such companions to be able to move about nimbly and perform sophisticated tasks like cooking.

But it will be a long time before we can let robot assistants loose in the kitchen. One of the many hurdles FAIR will need to overcome: bringing all the virtual training to bear in the physical world, a process known as “sim2real” transfer. When the researchers initially tested their virtually trained algorithms in physical robots, the process didn’t go so well.

Moving forward, the FAIR researchers hope to start adding interaction capabilities into Habitat as well. “Let’s say I’m an agent,” says Kristen Grauman, a research scientist at FAIR and a computer science professor at the University of Texas, Austin, who led some of the work. “I walk in and I see these objects. What can I do with them? Where would I go if I’m supposed to make a soufflé? What tools would I pick up? These kinds of interactions and even manipulation-based changes to the environment would bring this kind of work to another level. That’s something we’re actively pursuing.”