Facebook’s parent company Meta has unveiled an AI-powered speech recognition system that can understand and transcribe 1,100 languages in real-time.
The new technology, called Massively Multilingual Speech (MMS) is designed to help Facebook users communicate with each other in their native languages, regardless of their location or language barriers.
“Many of the world’s languages are in danger of disappearing, and the limitations of current speech recognition and generation technology will only accelerate this trend,” Meta said in a statement.
“We want to make it easier for people to access information and use devices in their preferred language, and today we’re announcing a series of artificial intelligence (AI) models that could help them do just that,” they said.
Meta said MMS expands their text-to-speech and speech-to-text technology from its initial capability of 100 languages to more than 1,100. Additionally, it can also identify more than 4,000 spoken languages.
How Meta Developed MMS
To train MMS, Meta said they turned to religious texts such as the Bible, which have been translated in several languages and had publicly available audio recordings of people reading the texts.
Meta said they created a dataset of readings of the Bible’s New Testament in more than 1,100 languages, which yielded 32 hours of data per language on average.
“While this data is from a specific domain and is often read by male speakers, our analysis shows that our models perform equally well for male and female voices,” Meta said.
Meta also claims that their model doesn’t “bias the model to produce more religious language,” despite the nature of the recordings used.
Applications
Meta’s MMS has the potential to revolutionise the way people communicate with each other. The technology is expected to be useful for Meta’s forays into augmented reality and virtual reality.
In addition to facilitating communication between people from different countries and cultures, the technology could also have applications in the fields of education and healthcare.
For example, it could be used to provide real-time translation services for non-native speakers in classrooms or hospitals.
“We’re open-sourcing our models and code so that others in the research community can build on our work and help preserve the world’s languages and bring the world closer together,” the company said.