Music source separation can be a tricky task for machines, while it’s easier for humans to distinguish the vocals, bass or drums. To help with this task, Facebook AI research scientist Alexandre Defossez has developed Demucs (deep extractor for music sources).
SEE ALSO: Deep learning in 3D with Facebook AI’s new tool PyTorch3D
As described in the famous “cocktail party effect”, humans have the ability to single in on a certain conversation in a loud environment. This task of sound source separation poses difficulties for machines though. Let’s see how AI tools manage this task and what sets Demucs apart.
Spectrograms vs. waveforms
Most commonly, as Defossez points out, AI separates music sources by analyzing spectrograms. While this method is well suited for instruments that resonate on a single frequency, spectrogram-based methods have their weaknesses. For examples, saxophone and guitar frequencies may cancel each other out.
This is where Demucs comes into play—an AI-based waveform model that is designed to work in a similar way to how computer vision detects patterns in images. “It detects patterns in the waveforms and then adds higher-scale structure,” as Defossez explains. Or in other words: “Demucs can re-create the audio that it thinks is there but got lost in the mix.”
Defossez based Demucs on Wave-U-Net, an earlier AI-powered waveform model, and then went on to fine-tune his model. It now not only outperforms Wave-U-Net, but is also “‘way beyond’ state-of-the-art spectrograms.”
In the future, technology like Demucs may improve the abilities of AI assistants to hear voice commands in loud environments. Additionally, it could also be used for hearing aids or noise-canceling headphones.
SEE ALSO: Using AI for managing images and videos at scale
If you’d like to experiment with Demucs, you can find further info in the research paper and download the code from GitHub.
See the Tech@Facebook blog post for further details and sound samples.
The post Facebook AI’s Demucs teaches AI to hear in a more human-like way appeared first on JAXenter.
Source : JAXenter