Recently, I had a chance to work on a unique GPS-driven, AR laser tag experience for iOS and Android. It was a technically challenging project for several reasons, one of which was the lack of fully developed tools and standards in AR/VR/MR audio. This is especially true if you are developing cross-platform and are using Wwise for implementation. In this article, I’ll be discussing the spatial audio solutions that are available for Wwise users and sharing some of the demos I made while evaluating them.
current spatial audio solutionS
At the moment, there are at least ten different spatial audio solutions that I know of. Some sound better than others. Some are platform specific while others offer cross-platform support. As of Q4 2017, we're finally starting to see Wwise support becoming more widespread albeit in various degrees of maturity.
As the market is evolving at a rapid pace, I’m not going even to try and keep this chart up to date but there is where the industry current stands:
|Oculus (derived from RS3D)||√||√||√||√||√||√||√|
|Resonance Audio (Google VR)||√||√||√||√||√||√||√||√|
An Aural Comparison
If you’ve spent any time listening to spatial audio algorithms, it’s clear that some solutions sound better than others. There are numerous reasons (quality of the HRTF dataset, vector-based vs object-based panning, the shape of your own ears, etc.) which are beyond the scope of this article. To keep things simple, I will simply present some demos illustrating the qualitative differences between each solution as I evaluated them in Wwise. As the team was focused on solutions offering iOS and Android support, this is what was compared:
- Resonance Audio (GoogleVR)
In each of these examples, tests were done with a 3rd order Ambisonics buss feeding into the plug-ins' default settings and with room spatialization enabled. In hindsight, trying to match the room spatialization would have allowed for a truer comparison but such is life when you're trying to meet a tight deadline for a developer.
It should also be noted that while Auro3D and GoogleVR both support user-defined pans using Wwise’s panning tool, panning in RealSpace3D must be done within the plug-in itself. Again, not an apples-to-apples comparison, unfortunately, but it’s hopefully enough to give you a sense of the quality of the HRTF.
In this first example, I’m moving the sound of a river in a circular pan around the listener to illustrate the HRTF’s effect on the resulting frequency response. Auro3D is presented first, followed by GoogleVR, and then RealSpace3D. You should be wearing headphones while listening:
The results are relatively convincing although you can hear clearly how each HRTF imparts a degree of phasiness to the sound.
This second example illustrates a diagonal pan from left-surround to front-right:
In the third example, we have a pan along the z-axis from bottom to top illustrating height:
Here are the same tests, but this time using VO as an example. Since we are now dealing with more band-limited source, the phasiness induced by the HRTFs is minimized but so also is the binaural effect:
As you can hear from these examples, it’s clear that some solutions sound better than other. While I personally favor RealSpace3D, you can draw your own conclusions. This will hopefully at least provide some guidance on what solution may be best for your project.
One area I have yet to touch on is performance and stability and convincing a developer to ship with a crash-prone alpha version of an SDK is always going to be an uphill battle. In the end, we chose to go with Auro3D not for the quality but for the very practical reasons of performance and stability. On our min spec device (Galaxy S6) we were able to run the game with the Auro3D plug-in, a 2nd order Ambisonics buss (that’s nine channels of audio compared to six channels with a 5.1 mix), and one instance of Wwise’s RoomVerb. Pretty impressive for a mobile device. The GVR and RealSpace3D solutions, comparatively, were still experiencing growing pains at the time (Q2 2017) and were less performant. It's important to note though that their algorithms are much more complex as well.
I suspect that until spatial audio solutions have been more thoroughly vetted in real-world use cases, performance and stability will continue to be an issue that developers will need to carefully evaluate.
Forgoing audio Middleware?
One obvious question that arises is whether it would be better to forgo Wwise completely and work natively within Unity or Unreal. While this is far from ideal from the sound designer’s point of view, solutions such as Steam Audio and G’Audio offer great sounding products that at this time aren’t yet available for Wwise. While I recommend having robust sound tools before tackling spatial audio, for those teams with the resources and desire to showcase the latest in spatial audio technology, it's definitely a viable (but painful) option.
looking towards the future
If there’s one big takeaway from all this is that it’s still early days for spatial audio and that there is no perfect solution at the moment. As with any new technology, I think over the next 1-2 years some clear leaders will start to emerge but until then most teams will likely be forced to choose based on performance and cross-platform compatibility rather than quality.
Spatial audio is a complex field that is rapidly evolving and which we're only just beginning to scratch the surface of. As new advancements such as near field rendering and personalized HRTFs become adopted, it will be important for developers to stay abreast of the latest technology to be sure that their project sounds the best that it can. The importance of audio in AR/VR/MR should not be underestimated.