Microsoft Research shows off advanced mixed reality app SemanticPaint

by Will Mason • July 1st, 2015

Painting in virtual reality can be an awe inspiring experience. There is something about turning an infinite blank canvas that engulfs your surroundings into a work of art you can explore, but what about transforming a canvas that already exists – like your living room?

SemanticPaint, a new mixed reality application coming out of the labs at Microsoft Research allows you to do just that – and that is only just scratching the surface.

In the demonstration our narrator brings a “consumer depth camera” into the space which allows him to scan in a dense 3D model of the environment. Then, one simply has to brush along the surface of any object and it will paint that surface. But that is the the beginning of the magic here.

From there voice commands can be used to tell the system to paint and label the objects in the space. For example you can reach out, touch and paint some of the surface of the chair in front of you and then command the system to “label chair” and it will interactively fill and learn that object.

Yes you heard that last part correctly, the program learns the environment around you. This is what I meant by painting is only scratching the surface.

By allowing you to label 3D objects on the fly, the system is able to quickly learn about the environment around you. In the video demonstration we see the camera pan over to the other side of the room, which had yet to be scanned in. The system automatically is able to detect similar objects in the room and label them accordingly. As the video demonstrates, the system is still improving (and continued machine learning will only continue to drive it to improve further) and certain objects – like the same chair, but in a different color – somewhat confuse the system. Even in cases like this, however, the user is able to dynamically relabel the objects in the room.

By drawing on objects and using a voice command one can classify an object in the virtual space.

By drawing on objects and using a voice command SemanticPaint can classify an object in the virtual space.

This dynamic, realtime labeling is something that is wholly new to the computer vision scene. According to the research published by Microsoft SemanticPaint “unlike offline systems, where capture, labeling and batch learning often takes hours or even days to perform, our approach is fully online… provides users with continuous live feedback of the recognition during capture, allowing them to immediately correct errors in the segmentation and/or learning.” This means that the program is able to rapidly scan in a dynamic environment and allow you to interactively segment it rather instantly rather than having to wait for hours or days.

The research has some fairly broad reaching applications “from robot guidance, to aiding partially sighted people, to helping us find objects and navigate our worlds, or experience new types of augmented realities.” The system, with it’s capacity for learning could eventually log thousands to millions of unique objects – allowing for instant recognition and segmentation of objects in, for example, your living room.

You may already see where I am going with this – dynamic room scale virtual reality.

One of the problems with room scale right now is that you need a decent amount of free space because otherwise you might trip over the couch, coffee table, cat, what have you. Computer vision projects like this will be key in enabling those types of experiences because it will eventually allow for full realtime, flawless detection of the environment – allowing for object overlay within the virtual world, and even dynamic object interaction in the virtual world.

Imagine a game that transformed your living room setup into the bridge on the starship enterprise, with the actual chair you are sitting in mapped to the captain’s chair. Or an augmented reality experience that placed the Lord of the Rings characters in the seats around you. These are the kind of things the tech in this research could enable.

At E3 Palmer Luckey confirmed the obvious, that Oculus is choosing to go with a computer vision, camera based approach for VR. They also announced a partnership between themselves and Microsoft, but have yet to comment as to how much further that rabbit hole goes beyond the Xbox controller bundling and Xbox One game streaming. It will be interesting to see how much the computer vision team at the respective companies share notes.

Either way, the progress that the team at Microsoft is making with computer vision is rather astounding. We are headed towards a whole new mixed reality future, and that future is looking brighter every day.

What's your reaction?