I-Vectors

From ALIZE wiki
Jump to: navigation, search

Contents

ALIZE, a bridge to Total Variability space

ALIZE/LIA_RAL includes functionalities that allow you to estimate TotalVariability matrix and to project speech utterances onto the TotalVariability sub-space. This page propose a brief overview of the i-Vector paradigm. You may also be interested by a full demo which can be downloaded at the end of this page.


Some background

Initially introduced for speaker recognition, i-vectors [1] have become very popular in the field of speech processing and recent publications show that they are also reliable for text-dependent speaker verification [2] language recognition (Martinez et al., 2011) and speaker diarization [3]. I-vectors convey the speaker characteristic among other information such as transmission channel, acoustic environment or phonetic content of the speech segment. Detailed descriptions of the Total Variability paradigm could be found in [1] [4] [5]. The i-vector extraction could be seen as a probabilistic compression process that reduces the dimensionality of speech-session super-vectors according to a linear-Gaussian model. The speaker- and channel-dependent super-vector <math>M_{(s,h)}</math> of concatenated Gaussian Mixture Model (GMM) means is projected in a low dimensionality space, named Total Variability space, as follows

<math>M_{(s,h)} = m + Tw_{(s,h)}</math>

where <math>m</math> is the mean super-vector of a gender-dependent Universal Background Model (UBM), <math>T</math> is called Total Variability matrix and <math>w_{(s,h)}</math> is the resulting i-vector.


Demo

This package contains a full demo of i-vector extraction as well as a documentation which summarizes the main options of required programs. To be run, this demo required LIA_RAL binaries properly compiled for the current architecture. This demo is released to help ALIZE users and does not guaranty any result.

Data provided as example are taken from the RSR2015 database [6].

Download the package Media:How_To_generate_iVectors_with_ALIZE.tar.gz


References

  1. 1.0 1.1 Najim Dehak, Patrick Kenny, Reda Dehak, Pierre Dumouchel, and Pierre Ouellet, Front-End Factor Analysis for Speaker Verification, in IEEE Transactions on Audio, Speech, and Language Processing, 19(4), pages 788–798, 2011
  2. Anthony Larcher, Pierre-Michel Bousquet, Kong Aik Lee, Driss Matrouf, Haizhou Li and Jean-Francois Bonastre, I-vectors in the context of phonetically-constrained short utterances for speaker verification, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2012
  3. Javier Franco-Pedroso, Ignacio Lopez-Moreno, Doroteo T. Toledano and Joaquin Gonzalez-Rodriguez, ATVS-UAM System Description for the Audio Segmentation and Speaker Diarization Albayzin 2010 Evaluation, in FALA "VI Jornadas en Tecnología del Habla" and II Iberian SLTech Workshop, pages 415–418, 2010
  4. D. Martinez, Oldricht Plchot, Lukas Burget, Ondrej Glembek and Pavel Matejka, Language Recognition in iVectors Space. in Proceedings of Interspeech, 2011
  5. Ahilan Kanagasundaram, Robbie Vogt, David Dean, Sridha Sridharan and Michael Mason, I-vector Based Speaker Recognition on Short Utterances, in Proceedings of Interspeech, 2011
  6. Anthony Larcher, Kong Aik Lee, Bin Ma and Haizhou Li, RSR2015: Database for Text-Dependent Speaker Verification using Multiple Pass-Phrases, submited to Interspeech, 2012