Exclusive: Developing robust voice signals for Spotify’s Car Thing

LinkedIn +

Spotify’s recently launched Car Thing is an in-car smart player featuring several interesting party tricks, including the ability to respond to voice commands even in a noisy vehicle environment or when music is being played at high volume.

Core to this ability is the integration of DSP Concepts’ Audio Weaver framework, which, says the US-based company, gives developers the ability to harness a host of tools, including noise interference cancellation technology. Audio Weaver is in essence a toolkit for creating embedded audio products. It is hardware agnostic and removes the need for automotive manufacturers to develop their own proprietary software and coding for audio applications.

Speaking to Automotive Interiors World, DSP’s CTO and co-founder, Paul Beckmann, explained, “When you look at most engineering disciplines, there are tools and frameworks to help you to get your job done. If you’re making a website, you don’t take a notepad and write HTML anymore, that’s what you did in the 90s; there are now a host of great tools for the job. Similarly, if you want to make a touchscreen interface in a car, you can use a system like QT; it’s the same with AI now as well.

“If somebody came to me and said, ‘I want to do machine learning, but I want to do everything from scratch’, I’d say they were crazy and do they know how long it would take to do that.”

DSP sees its Audio Weaver product as fulfilling a similar role within the audio space, with Beckmann noting there is currently no accepted industry standard framework available. “We want to step into this void with our technology. We essentially have an end-to-end framework for audio product developers. There are tools for designing and tuning, you can run on just about any processor you want, coupled with highly optimized implementations, and you can also write your own [code]blocks.”

One of the key elements of the Car Thing, beyond the use of Audio Weaver, is that it also implements DSP’s TalkTo system, an audio front end (AFE) that utilizes advanced signal processing techniques to create clean audio signals for voice assistants and ASR engines. It can support up to eight input microphones and is intended to facilitate voice control of devices in noisy environments.

In the case of the Spotify application, there were many challenges that needed to be overcome, as Beckmann outlines: “The Spotify app on your phone will stream directly over Bluetooth to the car, so the Car Thing doesn’t know exactly the music that’s being played.” This means that whereas normally, an echo canceller could be used to remove the music from the perspective of the voice recognition signal, in the Car Thing’s case, that approach cannot be used.

“In order to do an echo canceller, you need a reference channel, with access to the music. The trouble is, the Car Thing doesn’t have the reference signal so we have this technology called interference canceller, which is able to ignore the sound coming from the loudspeakers and focus in on the driver and the passengers in the car. Even if you’ve got your radio blasting, you can still talk to the Car Thing in a normal conversational voice. We’re one of the few companies who have cracked that nut as it’s a really hard problem.”

DSP has developed similar solutions for non-automotive applications, such as voice-controlled domestic air-conditioning units, where not only does the noise of the unit have to be accounted for, but also background sounds such as TV sets. However, as Beckmann points out, the automotive space is a unique scenario. There are various vehicle noises to contend with, the aforementioned music and, in the case of the Car Thing, air-vent noise due to an optional vent mounting bracket for the unit. All of these must be accounted for by the sound-processing algorithms. After much development, the result is that the signal from the unit’s four microphones is processed and output as a clean voice signal to the unit’s audio assistant, devoid of any background noise that could impair performance.

Of course, there is no such thing as a ‘standard’ vehicle noise environment, and Spotify had to ensure the device would work regardless of vehicle or installation location. “It was a fairly significant testing effort,” says Beckmann. “What Spotify did is test the device in a host of different locations and record the microphone signals. So they had a whole bunch of recordings and then we could use our software to tune the recordings to optimize performance. They tested different mounting options, different vehicles, and so forth. Even down to having the driver talking, the passenger talking or the rear seat talking.”

Ultimately, the integration of interference cancelling and other technologies bestows an impressive array of capabilities on what is a diminutive and low-cost device. With vehicle manufacturers looking ever more closely at the capabilities of their in-car technology – which seems set to become one of the key differentiators between models – the presence of tools on the market such as those provided by DSP Concepts (which is already working with several companies), that allow for complex functionality to be quickly and (relatively) easily incorporated, will no doubt prove popular.

Share this story:

About Author

, web editor

Lawrence has been covering engineering subjects – with a focus on motorsport technology – since 2007 and has edited and contributed to a variety of international titles. Currently, he is responsible for content across UKI Media & Events' portfolio of websites while also writing for the company's print titles.

Comments are closed.