The VR idea maze

Someone said recently, only slightly flippantly, that you can divide the world into people who think VR is part of the future and people who haven't had the demo yet. Unlike 3D, and unlike the primitive VR of the early 1990s, it's extremely hard to have a demo of Oculus (or indeed HTC Vive) without seeing it playing some major role in the future of tech. After that, though, the clarity ends - the state of VR now can be seen more than anything else as a list of questions - an 'idea maze' of possibilities. 

What hardware?

First and most obviously, what actual device at what price? Oculus is at least $1500, if you include a PC capable of driving it (there are only 10-15m of those today, out of 1.5bn PCs on earth), and yet even at $1500 Oculus today is really not what Oculus wants to deliver (as they've said): the graphics are good enough to impress, and to show the future, but they need to be better. And, obviously, it needs to be a lot cheaper. 

There are three roadmap issues here: we need better sensors, better screens and faster GPUs. That is, we need positional tracking (which phones can't do yet but might gain with multi-sensors cameras in the next year or so), we need screens that have a high-enough pixel density that you can't see the pixels (the so-called 'screen door effect') and we need GPUs that can give us 4K (or more) at 60 or 90 frames per seconds, twice (once for each eye) without needing an expensive new PC (or indeed a PC at all)

In the medium-term that almost certainly takes us from the PC ecosystem to the smartphone ecosystem: to smartphone chips powering something better than PC chips can do now (the screens and sensors are already from the smartphone ecosystem). Quite how long that takes and what it costs is a matter of opinion (and partly depends on how good you think it needs to get), but it means having strong expertise in semiconductors (Samsung and especially Apple) is a strategic advantage. The longer that takes, the more there's an opportunity for a product based on games consoles, which have the advantage of subsidised hardware that many people already own, and Sony is clearly aiming at this. 

Then, how do we wear those components? Saying that VR is probably about the smartphone ecosystem doesn't necessarily mean that the form factor you use is a smartphone per se

Mechanistically, there are three options: a cradle that holds your phone (the Gear VR model), a screen that's tethered from another device (a PC, a games console or a phone or tablet) and a stand-alone device. 

  • The advantage of the cradle is that it's a cheap incremental accessory (even cheaper once their necessary sensors are integrated into phones anyway), but hanging the battery, processor, gorilla glass and case off the front of your head is uncomfortable, the display panel needed for good VR is higher-density than a phone needs of itself, and you might want a larger or wider panel than is ideal for a phone to get a better angle of viewing. And with so many different Android phones out there, which ones support the cradle, or have the right sensors, or fit into it properly?

  • A stand-alone device is perhaps the most elegant and might allow the most optimised tech (and you can redistribute the components to get better weight balance) but is also obviously the most expensive.

  • A tethered screen gives you the lightest headset and a price somewhere between the other two options. An umbilical to a PC or games console gives you the best performance but at the cost of being tethered to a box on the floor; a cable to a smartphone in your pocket might be much more manageable (once smartphone GPUs are good enough).

Much of this reminds me of discussions 15 years ago about whether the mobile computing of the future would be a single device or several different ones in different pockets, connected by Bluetooth - or indeed of the argument about phones, phablets and tablets (which screen size is right for you?) There probably isn't one answer, just a set of trade-offs that change over time as the technology improves. 

However, the outcome of this question is interrelated with another question: how many?

How many? 

If you'd seen an Xbox One or PlayStation 4 30 years ago it would have blow your mind and you'd have been sure that it was part of the future - but how big a part? Would you have known that it would mean an install base of 50-100m units, and not 500m or a billion? This matters as much as anything because it determines the economies of scale for the components and content for the whole platform. 

A huge number of people today look at the graphics of a modern console game and say 'I've got to have that’, but a far larger number say politely 'yes, that's very impressive', and then wander off. There are a range of reasons why this is one can argue about, but VR has similar optionality - it might be everyone, or it might not. Is this one per person, one per some-people, one per household or one per some-households? Some of that is the device, some is people's willingness to put something on your head per se, and some is the content - just how interesting will the 'reality' be?

What content, made by what kind of people?

There's a pretty obvious set of low-hanging fruit for VR content: games (of a certain kind) and events - put the 360-degree VR camera on the stage at the concert or in the best seat at the sports event. Or hang it under a drone above a surfing championship, or fly through the Grand Canyon. 

After that, though, things get a lot less clear. A lot of the first filmed entertainment consisted of shooting a stage play or a vaudeville sketch, and, again, that was the easy low-hanging fruit. But then you get Chaplin and Eistenstein.   

The famous "Odessa Steps" sequence from Sergei Eisenstein's Battleship Potemkin, through which he made use of his theory of intellectual montage. Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for "fair use" for purposes such as criticism, comment, news reporting, teaching, scholarship, and research.

People worked out you can move the camera, and cut, and pan. You can have the solders always coming from the left and the rioters always from the right. You can zoom. We created a whole grammar of storytelling and film-making, and most or all of it doesn't work in VR. How do you direct the audience's attention when they can look around and wander off? Where does the crew go when you shout 'action' in a 360 degree shot - does everyone hide under a tarp? How do you do a shot like this if the audience can look the other way?

The greatest and longest Vertigo-shot in cinema, I think. From Goodfellas (1990). Made by Michael Ballhaus.

That in turn leads to another potential avenue in the maze - is the final product ‘video’ at all, or is it an actual 3D world you can move around in?  If you can move around in the scene, or move from the side of the court to the centre, then you need to move from a video with depth to a rendered fully-3D environment. That's how many games work and it's also how many special effects shots in new movies are created - they just render out from one viewpoint. But it’s certainly possible to create a real-time photorealistic 3D version of a sports match and then move around it in VR as the game is played. You could experience a match not just from the site of the pitch but through the eyes of any player, live, if you give Moore’s Law enough time. What does THAT do to storytelling?

20 years ago, before Netscape, the leap from 1.44 meg floppies to 650 meg CD ROMs led to a brief period where people thought interactivity on computers was going to be about video on plastic disks. This was the 'interactive movie' bubble: you would shoot a lot of live-action footage, stitch it together with decision trees and rendered graphics, and create a new form of entertainment that was better than games or movies. But this almost never produced anything good: the real answer was Grand Theft Auto (and then YouTube). The way to do interactivity was a game, not a movie, and the people to do it were games people, not movie people. So how does VR content work? Is it a new form for both games and scripted content? Will there be new forms of storytelling? Are those the right terms?  

AR and VR

That leads to another possible branch in the maze - augmented reality, or AR. Where VR is a closed world - you have a display panel in front of your eyes blocking out the world around you - AR is (ultimately) a pair of glasses that place things in the real world. Google Glass was not AR - it was just a HUD. Magic Leap (an a16z portfolio company) can place things in front of you such that they appear really to be there. It’s a pretty good demo. And yes, it is like this.  

Shot directly through Magic Leap technology on 10/14/15, without the use of special effects or compositing.

This is clearly a little further away from commercial shipping than Oculus. It also has some even harder challenges. VR lets you create a world, so what world? But AR can put things in the actual real world right in front of you - so what’s there right now, and what should you do with it, and what should you add, or remove? These are as much questions of AI as storytelling - you don’t have a stage, or rather the stage is the whole world. 

If one can answer those questions, then AR has the potential to be a new computing platform in a way that VR cannot - AR can be with you everywhere whereas VR needs a room, and so AR could be the next universal computing platform after mobile. 

But it’s also possible that AR and VR merge - that you have glasses that can do both. This might be no further away now that the app store was when 3G was launched. 

AR & VRBenedict Evans