At the Google I/O 2022 stage today, CEO Sundar Pichai announced that the company is supercharging Google Translate with 24 additional languages.
As Translate is already quite a robust product, all 24 new languages that are being added today happen to be used by demographics around the world who aren’t all that well represented in today’s tech landscape. Even then, the company argues that these languages are spoken by a combine populace of about 300 million people every day.
This impressive model is special because it has learned to translate these new languages by only looking at the languages themselves – meaning it hasn’t been shown any actual translation examples involving any of these languages. Google says that Zero-Shot Machine Translation has only looked at “monolingual text” – so, by just looking at text in any of these 24 languages, it seems to have gotten fluent enough so as to handle translations.
Impressive! Still, Google warns us that while this new tech is already delivering incredible results, it’s also not perfect quite yet.
Finally, here’s the complete list of all 24 new languages being added to Google Translate:
- Assamese, used by about 25 million people in Northeast India
- Aymara, used by about two million people in Bolivia, Chile and Peru
- Bambara, used by about 14 million people in Mali
- Bhojpuri, used by about 50 million people in northern India, Nepal and Fiji
- Dhivehi, used by about 300,000 people in the Maldives
- Dogri, used by about three million people in northern India
- Ewe, used by about seven million people in Ghana and Togo
- Guarani, used by about seven million people in Paraguay and Bolivia, Argentina and Brazil
- Ilocano, used by about 10 million people in northern Philippines
- Konkani, used by about two million people in Central India
- Krio, used by about four million people in Sierra Leone
- Kurdish (Sorani), used by about eight million people, mostly in Iraq
- Lingala, used by about 45 million people in the Democratic Republic of the Congo, Republic of the Congo, Central African Republic, Angola and the Republic of South Sudan
- Luganda, used by about 20 million people in Uganda and Rwanda
- Maithili, used by about 34 million people in northern India
- Meiteilon (Manipuri), used by about two million people in Northeast India
- Mizo, used by about 830,000 people in Northeast India
- Oromo, used by about 37 million people in Ethiopia and Kenya
- Quechua, used by about 10 million people in Peru, Bolivia, Ecuador and surrounding countries
- Sanskrit, used by about 20,000 people in India
- Sepedi, used by about 14 million people in South Africa
- Tigrinya, used by about eight million people in Eritrea and Ethiopia
- Tsonga, used by about seven million people in Eswatini, Mozambique, South Africa and Zimbabwe
- Twi, used by about 11 million people in Ghana