Self-Supervised Learning of Speech Representation via Redundancy Reduction
Published in Gesellschaft für Informatik e.V.. pp. 11-19. Doctoral Consortium at KI 2023. Berlin. 45195, 2023
This work aims to investigate a novel self-supervised learning (SSL) method for speech representation that leverages redundancy reduction techniques to learn robust representations capturing speaker characteristics. Our proposed approach builds upon the Barlow Twins framework, introduced in computer vision, and we adapt it to speech processing. The primary objective is to assess the quality of the learned representations through comprehensive evaluations of various downstream tasks, including speaker identification, gender recognition, emotion recognition, and more. By exploiting the statistical relationships between different views of the same speech input, the proposed method encourages the model to capture speaker-specific information while attenuating the impact of irrelevant variations. This enables extraction of features invariant to non-speaker-related factors, such as language content or background noise. Read more
Recommended citation: Brima, Yusuf (2023): Self-Supervised Learning of Speech Representation via Redundancy Reduction. DC@KI2023: Proceedings of Doctoral Consortium at KI 2023. DOI: 10.18420/ki2023-dc-02. Gesellschaft für Informatik e.V.. pp. 11-19. Doctoral Consortium at KI 2023. Berlin. 45195 https://doi.org/10.18420/ki2023-dc-02