You are here

Giving Voice to Whispers: A Gender Classification Technique for Whispered Speech

With the help of modern voice recognition and computational algorithms, many important applications such as voice reconstruction have been made possible. Whispered or unvoiced speech, which is produced without any vibrations of the vocal folds, can be converted to  ‘neutral’ or ‘voiced’ speech by artificially inserting the voice information, which is commonly called ‘pitch’. This is extremely useful for survivors of Laryngeal cancer, for example, who have lost their larynges and can typically only speak in whispers.

However, in order to insert the appropriate pitch, the gender of the speaker must first be determined from the whispered speech sample. A recent paper published by Nisha Meenakshi and Dr. Prasanta Kumar Ghosh of the Electrical Engineering Department at the Indian Institute of Science (IISc) reports an extension of existing speech-based gender classification techniques to analyse whispered speech. In the absence of pitch, “vocal tract shape during speech can be used to identify the gender of the speaker”, says Nisha Meenakshi.

In the absence of pitch, the authors use an algorithm to analyse the frequency components contained in a recording. The relative strengths of different frequency components reflect the characteristics of an individual’s vocal tract cavity, called formant frequencies. Their algorithm reads a batch of recordings whose gender is known, called the training set, to discover the gender-specific patterns in the frequencies. It then fine-tunes the parameters by running through a second known set, called the development set. The optimised algorithm then analyses the unknown, test set. To compare the effect of the loss of pitch on the performance of their algorithm, the authors also analyse a batch of neutral speech using similar two passes.

The authors find that the loss of pitch information slightly affects the performance of the algorithm, as they see a drop in the gender classification accuracy from 93% for neutral speech to 89% for whispered speech. Nisha Meenakshi explains that “… the spectral shapes for neutral and whispered speech are different ... and, for neutral speech, the voice encoding in the spectral shape is more compact compared to whispered speech…”, which explains the reduced performance. However, despite these difficulties, the algorithm shows a remarkably high success rate in automatically classifying the gender of the test samples, and promises to be a highly competent tool for the field of voice reconstruction.

About the authors

Prasanta Kumar Ghosh is an Assistant Professor at the Department of Electrical Engineering (EE), Indian Institute of Science (IISc), Bangalore. Nisha Meenakshi is a PhD student at his lab.


About the paper

The paper appeared in theconference proceeding in the IEEE Xplore earlier this month. It was presented at the Twenty First National Conference onCommunications (NCC), 2015 held in Mumbai.