A neural voice transformation framework for modification of pitch and intensity

Doctoral thesis of Frederik Bous

UMR9912 STMS | IRCAM - CNRS - Sorbonne Université - Ministère de la Culture | Paris, France

This website contains supplementary material to my doctoral thesis.

Manuscript

Download the latest version (v1.1) of the manuscript directly from here.

Alternatively you can download the manuscript from HAL.

Viva voce

Chapter 4 – Glottal closure instant extraction

The chapter has been published at EUSIPCO 2020 and the paper is available on ArXiv. Implementation and weights are available from the git repository.

Chapter 5 – Neural vocoder

Details about the MultiBand Excitet Wavenet vocoder can be found in the original publication. Sound examples can be found on the demo page. Implementation and weights of the vocoder are available from the git repository.

Chapter 6 – Bottleneck auto-encoder

Audio samples of the pitch transformation can be found on the demo page. The original paper is open-access and can be found at the journal website.

Chapter 7 – Transformation of perceived voice level

The chapter has been published at ICASSP 2023 and is available on ArXiv. Audio samples of the voice level transformation can be found on the demo page. Inference code for voice level estimation can be found in the git repository.

Chapter 8 – Applications

Judith Deschamps: Farinelli voice

The Farinelli voice was exhibited during the show an.other voice at Casino Luxembourg and C-LAB in Taiwan.

Music compositions

Below are a few pieces that were written at IRCAM using the voice transformation software CIRCE. Click on the titles to see a video recording of the pieces’ performance.

Additional links

The software CIRCE is available on the IRCAM forum to any registered user (registration is free). Find my other publications through my OrcID 0000-0002-7477-7600.