India is a land of diversity. The vast number of languages spoken in the country is a testimony to this fact. There are four language families, with twenty-two scheduled languages, with more than thirty languages spoken by over 1 million people.
This diversity of languages does bring with it a set of challenging tasks. One of them is education, the primary concern being enabling learning in Indian languages. Teaching-learning in one’s mother tongue is known to be really effective. Moreover, higher education is often out of reach of the vast majority of people due to the barrier of English. Recognising this need and gap, the Government of India, under Prime Minister’s Science, Technology, and Innovation Advisory Council (PM-STIAC) has the National Language Translation Mission (NLTM) as one of its core missions.
NLTM aims to make opportunities and developments in science and technology accessible to all, removing the barrier that the requirement of high-level proficiency in English poses. Using a combination of machine and human translation, the mission will eventually enable access to educational material bilingually– in English and one’s native Indian language. The Ministry of Electronics and IT (MEITy) is the implementation wing of the Government of this mission.
One of the opportunities for speech-to-speech machine translation is getting over 40,000 educational videos on NPTEL and SWAYAM that are in English, translated into many Indian languages. This also fits in with the newly formulated National Education Policy (NEP) that lays emphasis on imparting training in Indian languages. Currently, there is an ongoing effort to manually transcreate these videos into Indian languages. This involves enormous time and resources.
Responding to this challenge, a consortium of institutes consisting of IITB, IITM and IIITH led by professors Pushpak Bhattacharya at Indian Institute of Technology Bombay, S Umesh and Hema Murthy at Indian Institute of Technology Madras, and Dipti Mishra Sharma at International Institute of Information Technology Hyderabad have come together to create the speech-to-speech machine translation (SSMT) system from English to many Indian languages.
SSMT consists of a pipeline of stages: (i) first the spoken utterance is converted to text (ASR), (ii) then the produced text is translated to the target language text (MT), and (iii) finally, the translated text is rendered into speech (TTS).
SSMT poses several challenges, though: (a) each of ASR-MT-TTS may introduce errors, albeit small; (b) the text from ASR can be disfluent, i.e., have non-language elements like “uhh”, “umm” etc.; (c) the tone and accent of English vary from region to region in India; (d) word order changes from English to Indian Languages; (e) speakers mix languages as in Hinglish (Hindi+English), Banglish (Bengali+English), Tanglish (Tamil+English) etc; (f) finally, the appearance of text and speech need to be synchronized- the so-called lip sync problem.
The good part is that a machine would do the bulk of the translation efficiently. A small effort is required to review and edit the output manually at different stages of the pipeline. This has been tested through the implementation of the SSMT pipeline by the said consortium, and it is envisaged that this hybrid approach can reduce the fully manual translation effort by almost 75%.
The realisation of SSMT is poised to make available a host of digital learning content in many Indian languages thereby enhancing the accessibility of such content. In addition, as a way forward, if appropriate machine learning and AI models are built over them, then such a system can also interactively respond to queries from learners in their own language. Certainly, the future looks promising with the development of these applications and aim to minimise the learning gap, particularly in Indian languages.
Editor’s Note: This is part of the special lab stories feature we are bringing to you.