F8 2017: Facebook And OTOY’s Volumetric Camera System Will Deliver Six Degrees Of Freedom In 2017

by Ian Hamilton • April 19th, 2017

Facebook is working with partners like OTOY, Adobe, Framestore, Foundry and others to build out a new kind of camera system, tools and workflows capable of volumetric capture. The system should equate to a more realistic representation of reality in captured footage. In the end, it will let viewers in VR move their head around with greater freedom and see the action in a video from different angles. The plan is to release it later this year.

This new soccer ball-sized camera system will be an evolution of the earlier open “Surround 360” camera announced last year that equated to roughly $30,000 in parts. I asked Facebook for details on the pricing of the new six degrees of freedom system, but the company is planning for manufacturing partners to license the new camera designs and turn it into a product — so pricing is up to them.

There are two versions. One is slightly smaller than a soccer ball and the other is slightly bigger. The larger one includes 24 cameras while the smaller one has six (called x24 and x6 respectively). The system uses a variety of techniques to extract depth data from the images they capture and produce a small bubble of space in which a visitor to VR can move their head around and see the scene from different angles. This is potentially an enormous advancement for reality capture that could have broad implications for everything from Hollywood special effects workflows to recording memories with family.

“It’s legitimately a light field camera,” said Jules Urbach, CEO of OTOY. “This Facebook camera is basically the sweet spot. It’s the workflow of the future.”

A Potentially Breakthrough Moment For Recording Reality

360-degree videos are currently a major category of content you can see in a VR headset but many projects are substandard because it is difficult to capture and deliver. In addition, early adopting enthusiasts often complain about 360-degree videos because of the immersion-breaking fact that you can only turn your head in a scene but cannot move around. That’s because most existing cameras just capture a traditional video essentially wrapped around you like an IMAX dome. The above animation made by Vincent McCurley for the National Film Board of Canada explains the difference between “true” VR typically made in a game engine and traditional 360-degree videos captured by an array of cameras.

What Facebook and OTOY discovered earlier this year is that by combining their various technologies, which treat captured images more like a stream of data points, they can extract enough depth information to create a small space centered around the camera in which a person can move their head around with complete freedom to see the scene accurately depicted from different angles.

“Both of us [Facebook and OTOY] were developing technologies independently of each other,” said Facebook Engineering Director Brian Cabral. “When they matured to a certain point it was clear that they were very complementary.”

Facebook is working with post-production and visual effects companies including Adobe, OTOY, Foundry, Mettle, DXO, Here Be Dragons, Framestore, Magnopus, and The Mill to develop tools and workflows for this new system. Facebook is planning to license the new designs to partners with the aim of releasing a camera product later this year.

Diving Deep Into 6 Degrees Of Freedom VR Video

camera-volumetric-x24-x6

According to Facebook, the 24-camera system is said to capture full RGB and depth at every pixel for all of the cameras. It “oversamples” 4x at every point and uses depth-estimation algorithms to capture both high-resolution views and depth data. The six-camera system oversamples 3x. OTOY’s Urbach offered us some technical details in an email about how the system works.

“The lens arrangement has precise overlaps (which is how depth is optically generated from raw color data), and this overlap of views (~8 on the x24 footage we’ve used) is enough to fill in many occluded holes and minimize depth shadows in the view bubble around a user’s head,” Urbach wrote. “For far field depth shadows, we use heuristsics in our player to fill in any gaps not in the camera capture data. A simple workflow approach, which will be discussed [in a] F8 presentation, is to shoot a background plate of the environment before you shoot dynamic elements. Our software will allow you to easily layer these back in through Octane. To that end, OTOY is developing a template for moving the x24 in a path or circle for ~4-6 seconds that will give you a light field set capture.”

Diving deeper, we asked Urbach how close an object or subject can be to one of these cameras and he offered the following in-depth response:

For normal 6DOF video playback at 1:1 scale, it’s best to have the nearest part of the scene be  >= 1 m from camera origin unless you have multiple overlapping view volumes or plates layered over each other to provide extra near field coverage. Interestingly, the 1 m delimeter is the same distance [John] Carmack recommended to artists for the synthetic Render The Metaverse contest scenes…

Having no bounds around the camera position is the default mode in ‘snow globe’ mode (where you can shrink an entire scene into the palm of your hand –  and have it placed a cm from view origin like a real snow globe). It looks amazing, and is my favorite way to experience this content. It will be the default mode when we launch a 6DOF video bubble in AR/XR devices like Tango (or ODG R9).

Tagged with: , ,

  • These are definitely a very good news!

  • Jeff Katz

    Looks exactly like a Panono camera rip-off.

    • VR Geek

      But Panono is for images not video?

  • Herman Gene

    Why doesn’t anyone talk about the fact that none of the orb style cameras are actually shooting in true stereoscopic 3D? lol. It’s not even close. It’s just a 360° image. VR = 3D stereoscopic so stop calling this stuff VR.

    • Littlewave

      I’m not quite sure you understand how a light field volume works, these cameras are exactly what you’re asking for. Although limited to a relatively small volume (smaller for the smaller camera and larger for the large 24 camera ball) you can completely move around the scenes captured by these cameras. They are not simply a 360 video… you might not be able to walk very far around the volume but you can definitely “lean” around parts of the scene to peek behind objects closer to you.

      • Herman Gene

        No I understand how light field works = particularly when applied to in-development headset technology in which the viewers will be able to automatically change the IPD on the fly depended on perceived spacial distances from objects – which is exactly how the human eye functions. My point still stands: this is in no way a true stereoscopic presentation. While some depth volume will be achieved and perceived, it will be barely noticeable until you truly have two lenses side by side and spaced at least 65mm apart. The GoPro 360 3D rigs already achieve this perfectly and give an accurate stereo presentation.

        • Jules Urbach

          The effect is more than stereo at ipd. The video point cloud is a full 3D mesh scene in Unity so you have game engine VR navigation in the entire world. It’s a 50 m diameter scene capture, could be more but I clamped the horizon for the demo.

          • Herman Gene

            What do you mean ‘more than stereo at ipd?’

          • If you capture a photo or video with stereoscopic cameras you only get a 3D image when you keep your head steady excatly in the same way the two cameras did, but when you tilt your head onlya little, you lose the stereoscpic effect and overlap the two images.
            It seems this doesnt happen with these new cameras.

          • Herman Gene

            That’s because this new rig isn’t in any sense a true stereo experience… Look, I think this rig has amazing promise but VR without authentic 3D isn’t VR. It’s just a 360° tour. That’s not a bad thing per se, but it’s not VR.

          • Jules Urbach

            When I am looking at the 6DOF video scene in Unity it is a 3D obejct I can light, and run game levels/collusions/physics on top of. I think it will make great photorealistic game level design much simpler based on my tests so far. I am sharing more on this next week at the Unity Vision conference – and will also bring Unity demo shown in the above video.

          • Jules Urbach

            You can synthesize more than just left / right eye views (at 65 mm apart – typical ipd) with 6DOF video. You can generate a view form any x/y/z point in space with this data (as you would in a VR video game world), and you are not locked into a fixed mono/stereo viewpoint origin. The quality is scene dependent, but anything 80 cm and further looks good in motion when I move your head with the rift while sieated, and is a much better experience than stereo 360. That said, you can also you get up and walk through the scene, and if the camera wasd blocked, then you may see depth shadows where. Our viewer fills in those holes in real time, and any background plates as well as previous temporal samples can be fed into the hole filling system.

            Light field rendering is a step up from this, and goes through Octane, but the data form the camera is good enough to be rendered into an interesting (small) glossy light field even without cleanup. That is shown with the dino scene (rendered as an LF plate) in the above video.

        • Surykaty

          It’s not a true stereoscopic capture sure.. but the end result is a stereoscopic simulation with simulated IPD… if a simulation is good enough for the human eyes I say who cares what was the acquisition tech? 3D Scanner + the right software > primitive stereoscopic rig with primitive playback software

  • fuyou2

    Just seeing the Facebook logo on the camera makes me cringe….

  • jimrp

    I just want to know $$$ ?