SearchImagesVideoTranslateMail
Login

Products & Technologies

Yandex's Machine Translation Technology

The web is full of pages in a multitude of languages. Some of these pages happen to have the answer that the user of the Yandex search engine is looking for. So that the users could read an appliance manual or a news story originally published in a language they don’t understand, Yandex, since 2009, has been offering in its search results a web page translation option based on translation technology provided by PROMT.

Early in 2011, Yandex implemented a proprietary machine translation technology. Currently, the system can translate any type of text from English or Ukrainian into Russian and from Russian into either of these languages.

Yandex’s machine translation is based on statistical regularities rather than on sets of rules. Current machine translation systems cannot even be aware of the rules of a natural language. For a statistical machine translation system to ‘learn’ a language means to compare hundreds of thousands of parallel texts – the originals and their translations. This could be texts for different language versions of the same website. Initially, the system scans the internet for parallel texts using the web page addresses, which may differ only in language marking segments like «en» or «us» for the English language version and «ru» for the Russian one.

To identify texts as parallel, the system builds a list of unique characteristics for each new pair of texts it ‘learns’. These characteristics are categories like rare words, numbers, special characters used in a specific sequence. Each new document is compared against the existing set of characteristics created from the previously ‘learnt’ texts.

The current quality standards for machine translation require a system to process hundreds of millions phrases in many different languages. Since translation is a seriously resource consuming process demanding a lot of hard disk drive space or a large amount of RAM, the existing machine learning systems are few and far between.

Language learning

Yandex’s machine learning system has three key components: translation model, language model and decoder.

Translation model is a list of all words and phrases known to the system in one language and all possible translations for each word or phrase known to the system in another language together with probability value for each translation. There is a translation model for each pair of languages the system can process. To create a translation model, the system has to, first, find parallel texts, then, find pairs of matching phrases within these texts, and only then find pairs of matching words or word combinations.

To build a translation model for a pair of languages, say, Russian and English, the system analyzes pairs of phrases in both of these languages:

«London stands on the river Thames» — «Лондон стоит на берегу реки Темзы»

«Crossing the river by the Tower Bridge you can see the Tower of London» — «Пересекая реку по Тауэрскому мосту, можно увидеть Тауэр»

When the system is first exposed to the first pair, it doesn’t have enough information to find statistical patterns. So, the word stands or the word river or the word on is as good as the word London to translate Лондон into Russian.

But the words river and река used in a different context in the second pair of phrases increase probability of being each other’s equivalents in English and Russian. Now the system knows that, at least, river is a better translation for река than it is for Лондон.

The system constantly performs this comparing-matching process on millions of phrases in hundreds thousands of texts.

The system compares not only single words, but also sequences consisting of two, three, four or five words. The language model for each pair of languages processed by Yandex’s machine translation system has over billion pairs of words and phrases.

To create a language model – another component of Yandex’s machine translation technology – the system scans hundreds thousands of texts in a language and creates a list of all words and word combinations it finds in these texts, together with their frequency values. This is the system’s knowledge of the translation target language.

Translation

The actual process of translation is done by decoder. For every phrase in the source text it finds potentially matching phrases or their combinations in its translation model database and ranks these translations according to their probability. It may happen so that, say, for an English phrase ‘to be or not to be’, the potential Russian match with the highest probability value is the phrase быть или не бывает (to be or is not), while the phrase быть или не быть (to be or not to be) is only second best, etc. Then, the system uses evaluates all possible translation versions according to their frequency as they occur in its language model. In this case, the language model will clearly show that быть или не быть (to be or not to be) has a higher frequency than быть или не бывает (to be or is not). Finally, decoder chooses the version with the winning combination of probability – based on translation model, and frequency – based on language model.

In addition to isolated text, Yandex’s machine translation system can process entire web pages. A user typing a web address at translate.yandex.ru can, first, see the original web page. Then, the user’s browser parses the page’s html-code and sends to the server only text, paragraph by paragraph. So, an English text, for instance, turns into a text in Russian right before the user’s eyes.

In contrast to other systems that send to the server for translation the link to the whole web page, the Yandex system sends only text. If all a server receives is a web address, it may not be able to process exactly the page that the user sees (e.g. authorized access pages). With the Yandex translation system, the end user sees exactly the page they have accessed only with all words and phrases translated into another language. In addition, the user does not have to wait till every little thing on the page is translated. They can read as they go.

Progress in statistical machine translation

One of the benefits of the statistical method of machine translation is that it progresses together with the language. A statistics-based machine translation system adds a new word, or phrase, or form, or spelling to its language model the moment it finds it. The faster an innovation or a change spreads in a language, the sooner it will appear in the system’s language and translation models.

To improve translation quality, the databases of the Yandex machine translation system are regularly updated. Every new update is tested with BLEU (Bilingual Evaluation Understudy), an algorithm evaluating the quality of machine-translated text. A set of texts translated with Yandex’s system is compared against the reference set. Those new additions to the system’s databases that didn’t improve translation quality are rejected.

Yandex N.V.
Registered Office in The Hague
Address: Laan Copes van Cattenburch 52
The Hague 2585 GB, Netherlands
tel.: +31 70 3562237
Yandex LLC
Headquarters in Russia
Office in 
Moscow
  • Moscow
  • Saint Petersburg
  • Ekaterinburg
  • Novosibirsk
  • Kazan
  • Rostov-on-Don
  • Kiev (Ukraine)
  • Odessa (Ukraine)
  • Simferopol (Ukraine)
  • Minsk (Belarus)
  • Palo Alto (California, USA)
  • Istanbul (Turkey)
  • Lucerne (Switzerland)
  • Zürich (Switzerland)
Pic / Map
Address
16, Leo Tolstoy St., Moscow 119021, Russia
Reception
tel. +7 495 739-70-00, fax +7 495 739-70-70
Commercial Department
tel. +7 495 739-22-22 ext.1247, fax +7 495 739-23-32, eng-ad@yandex-team.ru
Toll-free number for regional users 8 800 333-9639 (or 8 800 333-YNDX)
Marketing & Media Relations
pr@yandex-team.ru
Investor Relations
tel. +7 495 974 3538
Office in 
Saint Petersburg
  • Moscow
  • Saint Petersburg
  • Ekaterinburg
  • Novosibirsk
  • Kazan
  • Rostov-on-Don
  • Kiev (Ukraine)
  • Odessa (Ukraine)
  • Simferopol (Ukraine)
  • Minsk (Belarus)
  • Palo Alto (California, USA)
  • Istanbul (Turkey)
  • Lucerne (Switzerland)
  • Zürich (Switzerland)
Pic / Map
Address
Piskarevskiy prospekt, building 2, block 2, 4th floor, Benois Business Centre, 195027, Saint Petersburg, Russia
Reception
tel. +7 812 633-36-00, fax +7 812 633-36-99
Toll-free number for regional users 8 800 333-9639 (or 8 800 333-YNDX)
Office in 
Ekaterinburg
  • Moscow
  • Saint Petersburg
  • Ekaterinburg
  • Novosibirsk
  • Kazan
  • Rostov-on-Don
  • Kiev (Ukraine)
  • Odessa (Ukraine)
  • Simferopol (Ukraine)
  • Minsk (Belarus)
  • Palo Alto (California, USA)
  • Istanbul (Turkey)
  • Lucerne (Switzerland)
  • Zürich (Switzerland)
Pic / Map
Address
10 Hohryakova St., Ekaterinburg 620014, Russia
Reception
fax +7 343 385-01-99
Sales Office
tel. +7 343 385-01-00 uralsales@yandex-team.ru
Toll-free number for regional users 8 800 333-9639 (or 8 800 333-YNDX)
Office in 
Novosibirsk
  • Moscow
  • Saint Petersburg
  • Ekaterinburg
  • Novosibirsk
  • Kazan
  • Rostov-on-Don
  • Kiev (Ukraine)
  • Odessa (Ukraine)
  • Simferopol (Ukraine)
  • Minsk (Belarus)
  • Palo Alto (California, USA)
  • Istanbul (Turkey)
  • Lucerne (Switzerland)
  • Zürich (Switzerland)
Pic / Map
Address
35 Krasnoyarskaya St., Novosibirsk 630004, Russia
Reception
fax +7 343 385-01-99
Sales Office
tel. +7 383 230-43-06 sales-nsk@yandex-team.ru
Office in 
Kazan
  • Moscow
  • Saint Petersburg
  • Ekaterinburg
  • Novosibirsk
  • Kazan
  • Rostov-on-Don
  • Kiev (Ukraine)
  • Odessa (Ukraine)
  • Simferopol (Ukraine)
  • Minsk (Belarus)
  • Palo Alto (California, USA)
  • Istanbul (Turkey)
  • Lucerne (Switzerland)
  • Zürich (Switzerland)
Pic / Map
Address
6 Spartakovskaya Street, 11th floor, right wing, Kazan 420107, Russia
Reception
tel. +7 843 524-71-71
Sales Office
kzn@yandex-team.ru
Office in 
Rostov-on-Don
  • Moscow
  • Saint Petersburg
  • Ekaterinburg
  • Novosibirsk
  • Kazan
  • Rostov-on-Don
  • Kiev (Ukraine)
  • Odessa (Ukraine)
  • Simferopol (Ukraine)
  • Minsk (Belarus)
  • Palo Alto (California, USA)
  • Istanbul (Turkey)
  • Lucerne (Switzerland)
  • Zürich (Switzerland)
Pic / Map
Address
70D, Gvardeysky Business Centre, Dolomanovsky Lane, Rostov-on-Don, 344011, Russia
Tel. +7 (863) 2-688-300
Office in 
Kiev (Ukraine)
  • Moscow
  • Saint Petersburg
  • Ekaterinburg
  • Novosibirsk
  • Kazan
  • Rostov-on-Don
  • Kiev (Ukraine)
  • Odessa (Ukraine)
  • Simferopol (Ukraine)
  • Minsk (Belarus)
  • Palo Alto (California, USA)
  • Istanbul (Turkey)
  • Lucerne (Switzerland)
  • Zürich (Switzerland)
Pic / Map
Address
Suite 30, 19–21 Bohdan Khmelnytsky Street, Kyiv, 01030 Ukraine
Reception
+38 044 586 41 48, fax. +38 044 586 41 48 ext. 6665
Marketing & Media Relations
pr@yandex-team.com.ua
Office in 
Odessa (Ukraine)
  • Moscow
  • Saint Petersburg
  • Ekaterinburg
  • Novosibirsk
  • Kazan
  • Rostov-on-Don
  • Kiev (Ukraine)
  • Odessa (Ukraine)
  • Simferopol (Ukraine)
  • Minsk (Belarus)
  • Palo Alto (California, USA)
  • Istanbul (Turkey)
  • Lucerne (Switzerland)
  • Zürich (Switzerland)
Pic / Map
Address
Polsky Spusk 11, Morskoi Business Center (8th floor), Odessa 65026, Ukraine
Reception
tel./fax: +38 0487 37-44-10, +38 0445 86-41-48
Skype: yandex.ukraine
Sales Office
adv@yandex.ua
Office in 
Simferopol (Ukraine)
  • Moscow
  • Saint Petersburg
  • Ekaterinburg
  • Novosibirsk
  • Kazan
  • Rostov-on-Don
  • Kiev (Ukraine)
  • Odessa (Ukraine)
  • Simferopol (Ukraine)
  • Minsk (Belarus)
  • Palo Alto (California, USA)
  • Istanbul (Turkey)
  • Lucerne (Switzerland)
  • Zürich (Switzerland)
Pic / Map
Address
1a (4th Floor), Kazanskaya St., Simferopol, 95000, Ukraine
Tel. +38 (048) 737-44-10, +38 (044) 586-41-48
Office in 
Minsk (Belarus)
  • Moscow
  • Saint Petersburg
  • Ekaterinburg
  • Novosibirsk
  • Kazan
  • Rostov-on-Don
  • Kiev (Ukraine)
  • Odessa (Ukraine)
  • Simferopol (Ukraine)
  • Minsk (Belarus)
  • Palo Alto (California, USA)
  • Istanbul (Turkey)
  • Lucerne (Switzerland)
  • Zürich (Switzerland)
Pic / Map
Address
Office 308, Rubin Plaza Business Center, 5 Dzerzhinskogo prospekt, Minsk, 220089, Belarus
Reception
+375 17 328-19-61, fax. +375 17 328-15-14
Office in 
Palo Alto (California, USA)
  • Moscow
  • Saint Petersburg
  • Ekaterinburg
  • Novosibirsk
  • Kazan
  • Rostov-on-Don
  • Kiev (Ukraine)
  • Odessa (Ukraine)
  • Simferopol (Ukraine)
  • Minsk (Belarus)
  • Palo Alto (California, USA)
  • Istanbul (Turkey)
  • Lucerne (Switzerland)
  • Zürich (Switzerland)
Pic / Map
Address
299 S. California Ave, Suite 200, Palo Alto, CA 94306, USA
Tel. +1-650-838-0880, fax. +1-650-838-0098
Office in 
Istanbul (Turkey)
  • Moscow
  • Saint Petersburg
  • Ekaterinburg
  • Novosibirsk
  • Kazan
  • Rostov-on-Don
  • Kiev (Ukraine)
  • Odessa (Ukraine)
  • Simferopol (Ukraine)
  • Minsk (Belarus)
  • Palo Alto (California, USA)
  • Istanbul (Turkey)
  • Lucerne (Switzerland)
  • Zürich (Switzerland)
Pic / Map
Address
Büyükdere Caddesi No. 191, Levent, 34330, Istanbul, Turkey (Apa Giz Plaza, Kat 21)
Tel. +90 212 386 87 60 (pbx), fax. +90 212 284 46 48
Marketing & Media Relations
pr@yandex.com.tr
Contact Us
Office in 
Lucerne (Switzerland)
  • Moscow
  • Saint Petersburg
  • Ekaterinburg
  • Novosibirsk
  • Kazan
  • Rostov-on-Don
  • Kiev (Ukraine)
  • Odessa (Ukraine)
  • Simferopol (Ukraine)
  • Minsk (Belarus)
  • Palo Alto (California, USA)
  • Istanbul (Turkey)
  • Lucerne (Switzerland)
  • Zürich (Switzerland)
Pic / Map
Address:
Citybay Business Center, Werftestrasse 4, CH 6005 Lucerne, Switzerland
Tel. +41-41-248-08-60, fax. +41-41-248-08-63
Office in 
Zürich (Switzerland)
  • Moscow
  • Saint Petersburg
  • Ekaterinburg
  • Novosibirsk
  • Kazan
  • Rostov-on-Don
  • Kiev (Ukraine)
  • Odessa (Ukraine)
  • Simferopol (Ukraine)
  • Minsk (Belarus)
  • Palo Alto (California, USA)
  • Istanbul (Turkey)
  • Lucerne (Switzerland)
  • Zürich (Switzerland)
Pic / Map
Address:
Odeonhaus Limmatquai 2, 8001 Zürich, Switzerland
Tel. + 41 44 252 50 00
enEn