The world’s most advanced open translation site, Google Translate, just expanded its lead on its competitors.
Amharic, Kurmanji Kurdish, Luxembourgish, Samoan, Scottish Gaelic, Shona, Sindhi, Pashto, Corsican, Frisian, Kyrgyz (4.3 million speakers), Hawaiian and Xhosa (8.2 million speakers) will all be available for two-way translation in English, adding 120 million more people to the global population who can utilize the service.
Google Translate relies on a combination of curated digital content and a community of volunteers to lay the foundation for translations. The languages will reportedly be added over the next few days.
“As we scan the Web for billions of already translated texts, we use machine learning to identify statistical patterns at enormous scale, so our machines can ‘learn’ the language,” said Sveta Kelman, Senior Program Manager at Google Translate.
The additions of Sindhi (70 million speakers) and Pashto (40 million speakers) brings Google Translate to the attention of tens of millions of Pakistanis and Afghans. Amharic (22 million speakers) is the leading Semitic language after Standard Arabic and widely spoken in Ethiopia. Kurmanji Kurdish is the primary dialect of Kurds in Northern Iraq, Syria and Turkey, while Shona is spoken by maybe as many as 15 million people in and around Zimbabwe.
There are still several major languages the service does not yet offer, such as Ethiopian languages Tigrinya and Tigre, the Kurdish dialect of Sorani, the Indian language Odia, and several Chinese languages including Hong Kong’s primary tongue Cantonese.
Scottish Gaelic joins Irish Gaelic in that family of infrequently spoken yet national languages, while Luxembourgish, Corsican and Frisian add three minor but niche Western European languages to Google Translate’s repertoire. Google is playing catch-up on Kyrgyz, which is offered by Russian search engine service Yandex Translate. Xhosa is widely spoken in South Africa while Samoan and Hawaiian are major minority languages in the United States’ Pacific islands (guess which ones).
The pace at which Google adds more languages will depend on how quickly they can digitize physical sources, get access to already digital texts and collect volunteers to join different communities among their volunteers.
As Kelman said in Google Translate’s official announcement earlier today, “As already existing documents can’t cover the breadth of a language, we also rely on people like you in Translate Community to help improve current Google Translate languages and add new ones.”