FOSDEM 2020
/
Schedule
/
Events
/
Developer rooms
/
Open Media
/
Spleeter by Deezer

Spleeter by Deezer

Open-Sourcing a Machine-Learning Music Source Separation Software

Track: Open Media devroom
Room: UB2.147
Day: Sunday
Start: 14:30
End: 14:55

Source separation, stem separation, de-mixing are all different ways of referring to the same problem of recovering the mono-instruments tracks that were mixed together to produce a music file. Recently, the research team at Deezer released a free and open source software as well as trained models to perform multi-source separation of music, with state-of-the-art accuracy. In this presentation we come back on our journey to open sourcing the Spleeter library, from doing the ground research, training the models, to releasing them. We put emphasis on the technological challenges that had to be solved as well as the practical and legal considerations that came into play.

Released on october 29th 2019, the Spleeter (https://github.com/deezer/spleeter) github repository received more than 5000 stars on its first week online and numerous positive feedbacks as well as press coverage. This talk will explain how we went from research code to this fairly easy to use open Python library, that integrates pre-trained models for inference and re-training.

While not a broadly known topic, the problem of source separation has interested a large community of music signal researchers for a couple of decades now. It starts from a simple observation: music recordings are usually a mix of several individual instrument tracks (lead vocal, drums, bass, piano etc..). The task of music source separation is: given a mix can we recover these separate tracks (sometimes called stems)? This has many potential applications: think remixes, upmixing, active listening, educational purposes, but also pre-processing for other tasks such as transcription.

The current state-of-the-art systems start to give convincing results on very wide catalogs of tracks, but the possibility of training such models remains largely bound by training data availability. In the case of copyrighted material like music, getting access to enough data is a pain point, and a source of inequality between research teams. Beside, an essential feature of good scientific research is that it must be reproducible by others. For these reasons and to even the playing field, we decided to not only release the code, but also our models pretrained on a carefully crafter in-house dataset.

Specific topics on which our presentation will dwell on are: - technical aspects of the models architecture and training - software design, and how to leverage tensorflow’s API in a user facing python library - how to package and version a code that leverages pre-trained models and that can be run on different architectures: CPU and GPU. - licensing and legal concerns - what we learned along the way - legacy