Pages

Showing posts with label Russian. Show all posts
Showing posts with label Russian. Show all posts

Friday, August 10, 2018

SDL Cracks Russian Neural Machine Translation

This is a reprint of a post published on the SDL website and closely related to a previous post on the difficulties and challenges of building Russian to English MT systems. 

Many may not be aware that several in the SDL MT team have very significant research and development credentials, and have pioneered the data-driven approaches to MT system development. They initially commercialized Statistical MT and the members of the original research team read like a Who's Who in the MT research community. The original principals at Google MT and the developer of Moses both come from the original Language Weaver team and today you will see members of the original team have a hand in every major MT initiative in the US. 

The SDL MT team also has the unique experience of working closely with linguists and translators on a long-term basis and thus have unique exposure to ongoing human MT quality assessments for most of the systems they build.  In localization use case scenarios MT systems need to be assessed for PEMT suitability and that is best done by human review and assessment rather than by BLEU scores which is the metric of choice for most research teams in the industry.

Thus, this accomplishment with the Russian NMT system is an assessment made with deeply trusted human assessments that have been refined over 10 years of ongoing practice. The professional translators who do these assessments are part of the permanent team of around 2,000 translators at SDL.  These assessments are more reliable than automated metrics which can often be gamed or manipulated or simply be known when they should by definition be blind tests, i.e. the system should not have trained on it.  The claims to have cracked the problem are based on the very high level of accomplishment based on a metric that has gained greater reliability with the research team than BLEU and many other automated metrics which are also used in tandem.

======

The SDL Machine Translation (MT) research team recently announced that our latest machine learning innovations and development strategies with Neural MT have resulted in a breakthrough that clearly demonstrates a significant and substantial leap forward. When testing our Russian to English MT system, the output, when measured against an extensive suite of comparative experiments to verify and validate the outstanding results, outperformed all industry standards, setting a benchmark for Russian to English machine translation. Over 90% of the system’s output has been labelled as perfect translation by professional Russian-English translators.

Those who have been monitoring the progress of Neural MT systems may be aware that Russian to English has been a particularly challenging direction for MT developers.

Adolfo Hernandez, CEO SDL
It was the Russian language that first inspired the science and research behind machine translation,” said Adolfo Hernandez, CEO, SDL. “Since then it has always been a major challenge for the community. SDL has deployed breakthrough research strategies to master these difficult languages, and support the global expansion of its enterprise customers. We have pushed the boundaries and raised the performance bar even higher, and we are now paving the way for leadership in other complex languages.”
The linguistic properties and intricacies of the Russian language relative to English make it particularly challenging for MT systems to model. Russian is a highly inflected language with different syntax, grammar, and word order compared to English. Given the complexities created by these differences between the Russian and English language, raising the translation quality has been an ongoing focus of the SDL Machine Learning R&D team.

 

SDL Neural MT Russian to English results

Much of the enthusiasm for Neural MT is driven by the degree of fluency and naturalness of the output, and its ability to produce a large number of sentences that look like they are very fluent and look like they are from the human tongue. We have seen that often the early results with Neural MT output show that it is considered to be clearly better to human evaluators, even though established MT evaluation metrics such as the BLEU score may only show nominal or no improvements.

The improvements to human assessment are most noticeable when considering fluency and word order issues with machine translation output. However, the most common automatic quality metric used by MT developers during the R&D phase is still the BLEU score, so it is important to incorporate human assessments into the scoring methodology.

The improvements to human assessment are most noticeable when considering fluency and word order issues with machine translation output. However, the most common automatic quality metric used by MT developers during the R&D phase is still the BLEU score, so it is important to incorporate human assessments into the scoring methodology.

The strategy adopted by the SDL researchers was to use professional human assessment as a primary means to assess the MT quality. We wanted to know the human perceived translation quality of SDL Neural MT, and understand how it compared to the human perceived translation quality of an actual human translation. SDL builds custom MT engines for production use on a regular basis, and has developed an accurate and reliable evaluation methodology to assess the quality of MT output that minimizes human bias.

A team of professional human Russian to English translators were shown a set of blind and random translations that came from any of the following systems and were not identified in any way:
  • A human translation of the test set by a professional translator
  • State-of-the-art Rule-Based MT output
  • State-of-the-art Statistical MT output
  • SDL Neural MT output
The evaluation team was shown the source Russian sentence and the blind translation output and was asked to score each output on a scale that had “perfect translation” on one end, and “completely wrong” on the other end. For each system, the percentage of translations that were scored as high as the human translation score were computed. The final results are listed below.
 

The SDL research shows that its Neural MT system outperformed all industry standards, setting a benchmark for Russian to English machine translation, with 95% of the system’s output labelled as equivalent to a human translation in terms of quality by professional Russian-English translators.

Left to Right: Amos Kariuki, Ling Tsou, Dragos Munteanu, Samad Echihabi, Quinn Lam, Wes Feely, and William Tambellini


 "With over fifteen years of research and innovation in machine translation, our scientists and engineers took up the challenge to bring Neural MT to the next level,” said Samad Echihabi, Head of Machine Learning R&D, SDL. “We have been evolving, optimizing and adapting our neural technology to deal with highly complex translation tasks such as Russian to English, with phenomenal results. A machine running SDL’s Neural MT technology can now produce translations of Russian text virtually indistinguishable from what Russian-English bilingual humans can produce.”

SDL latest Neural MT technology is optimized for both accuracy and fluency and provides a powerful paradigm to deal with morphologically rich and complex languages. While the focus of the SDL tests and measurements was the Russian to English system, the strategies deployed by the SDL team are expected to be compatible with and be of benefit to other complex and morphologically rich languages.

It is interesting to note that the best Russian-English SMT systems, even after 10+ years of research, were only marginally better than the best Russian-English RBMT systems, if at all. This points to the significant challenge presented by the Russian to English language combination, and explains why RBMT systems have been preferred by many industrial users until quite recently. The new SDL Neural MT system is very likely to accelerate this transition.


Why is Russian difficult for MT?

Russian has always been considered to be one of the most difficult languages in MT, mostly because it is very different linguistically from English. Russian differs from English significantly in inflection, morphology, word order and gender associations with nouns.

Inflection

Unlike English, Russian is a highly inflected language. Suffixes on nouns mark 6 distinct cases, which determine the role of the noun in the sentence (whether it’s the subject, the direct object, the indirect object, something being possessed, something used as an instrument, or the object of a preposition). For example, all of these are different forms of the word “book.”



That’s 12 forms of the same word, which are used depending on what role the word is playing in the sentence. “But they’re not all distinct; you can have the same form for different roles, like the singular genitive & the plural nominative,” says Wes Feely, Senior Computational Linguist, SDL.

Additionally, like Spanish or French every noun has a gender. The word for “book” is feminine, but this is an arbitrary categorization. “There’s no reason why a book (книга kníga) is feminine and why a table (стол stól) is masculine,” explains Wes. “But it matters because the case suffixes are different for each gender (masculine, feminine, or neuter). So while there are 12 different forms of the word “book” and 12 different forms of the word “table”, they don’t share the same set of suffixes. When adjectives modify nouns, they need to agree with the noun, taking the same (or similar) suffix.”

Also, like Spanish or French, verbs conjugate depending on tense (past vs. non-past), person (I vs. you vs. he/she/it), number (singular vs. plural), etc. So one verb may have several different forms, as well.

Word order

In English, we use word order to accomplish the same thing as the suffixes on nouns in Russian. Because Russian has these case markings, their word order is much more free. For example, these are all acceptable ways of saying “I went to the shop.”



Sample Output from SDL Russian to English Neural MT system

Essentially, all orderings are possible, except that the preposition “to” (в v) must precede the word for “shop” (магазин magazin). You can imagine that as sentences get longer, the number of possible sentence order structures increase. There are some limits on this: some orders in this example sound strange or archaic, and others are only used to emphasize where you’re going or who is going. But there are certainly more ways of saying the same thing than English, which is stricter in its word order.

SDL’s latest Neural MT technology is able to deal with all the Russian language challenges described above and can produce fluent and accurate translations. Below are some examples from the new SDL Neural MT Russian-English system.

Russian До Уистлера, расположенного в провинции Британская Колумбия, можно быстро добраться от Ванкувера на автомобиле или самолете.
SDL Neural MT English Output Whistler, located in British Columbia, is easily accessible from Vancouver by car or plane.
Human English Output Whistler, British Columbia, is quickly accessible from Vancouver by road or air.
Russian Фестивали, спа, рестораны и бары сочетаются с бесконечными возможностями досуга на свежем воздухе, делая Уистлер идеальным местом, где вы можете отдохнуть и расслабиться.
SDL Neural MT English Output Festivals, spa, restaurants and bars combine with endless outdoor activities, making Whistler the ideal place to relax and unwind.
Human English Output Festivals, spas, restaurants and bars combine with endless outdoor activities to make Whistler the ultimate place to escape and unwind.
Russian Директор оперативного управления Международного комитета Красного Креста Доминик Стиллхарт сообщил в воскресенье на пресс-конференции в Сане, что с 27 апреля по 13 мая в стране умерло от холеры 115 человек.
SDL Neural MT English Output The Director of Operations of the International Committee of the Red Cross, Dominic Stillhart, reported on Sunday at a press conference in Sana’a that 115 people died from cholera from April 27 to May 13.
Human English Output On Sunday, Dominik Stillhart, director of operations of the International Committee of the Red Cross, said during a press conference in Sana’a that 115 people died from cholera in the country between April 27 and May 13.
Russian Рынок акций США в среду, вероятнее всего, начнет торговую сессию умеренным ростом на 0,4-0,5% в рамках коррекции к падению предыдущего дня, вызванному напряжением в торговых отношениях между США, Китаем и некоторыми другими странами.
SDL Neural MT English Output The US stock market is likely to start its trading session on Wednesday 0.4-0.5% as part of an adjustment to the fall of the previous day caused by the tension in trade relations between the United States, China and some other countries.
Human English Output The US stock market on Wednesday, will most likely start a trade session with a moderate growth of 0.4-0.5% as part of the correction to the fall of the previous day caused by tension in trade relations between the USA, China and some other countries.

It is important to qualify that these results only reflect the results of generic MT systems and of generic sentences as shown above. The SDL research noted that the generic Neural MT system did not perform as well on domain specific data. As a supplier of MT solutions to the enterprise, SDL will typically adapt MT systems to the unique needs of each enterprise customer’s domain. This adaptation is also an especially challenging task with Neural MT models. In other experiments with domain adapted MT systems, the SDL research team noted that there were further improvements in perceived quality and they documented that, adaptation of the SDL Neural technology provided a 30% improvement over the generic neural engine on domain specific data.

 

SDL Neural MT for the enterprise

The SDL Neural MT breakthrough accomplishment comes soon after several other announcements related to their ongoing research and development, and substantial progress with taking Neural MT from a research environment to a deployable enterprise-ready technology.

"It is remarkable to see such a leap in translation quality with SDL’s latest neural technology. We are currently working on transitioning this advancement from our R&D lab to our enterprise customers,” said Quinn Lam, Senior Product Manager, SDL. “Planned for release this summer, the latest version of SDL Enterprise Translation Server (ETS) will be powered by this fully productized state-of-the-art Neural technology. Stay tuned!”

Another key requirement for successful MT adoption in the enterprise is the ability to get the system to learn and adapt to enterprise-specific linguistic requirements and preferences. This has been especially challenging with Neural MT technology which, until now, has been difficult to do without undermining the fluency and output quality. SDL researchers recently figured out how to augment Neural MT with dictionary capabilities. This means enterprises can easily adapt SDL Neural MT across multiple departments that have differing terminology, yet still maintain the translation fluency that this latest generation of MT technology is acclaimed for.

SDL’s latest dictionary feature sets a new industry standard for user control over automated translations, allowing users across the enterprise to use different dictionaries without impacting the quality of the translations. SDL ETS Dictionary capabilities include:
  • Controls that allow an enterprise to enforce multiple terminology and translation preferences for the same word, something that is necessary for different departments who may have unique interpretations for the same word or term.
  • Easy implementation of preferred terminology and personalization by any user with no upfront technical knowledge or training required.
  • Deployment of multiple dictionaries in a single engine at the same time, allowing multiple departments with differing needs to optimize the MT engine differently.
  • Terminology preferences that can be changed and modified on an ongoing basis to accommodate changing business and communication priorities.
The SDL research team has also recently found new ways to address a major issue with Neural MT systems, namely improving system speed and throughput. This has been a general problem for Neural MT system deployments as these deployments can be expensive and slow, and any ability to improve the efficiency of private deployments are especially valuable in the enterprise use context.

Left to Right: Gonzalo Iglesias, Bill Byrne, Eva Hasler & Adrià De Gispert

"We (SDL) now have has several other initiatives underway and will continue to introduce new features and capabilities emerging from their Neural MT research over the coming year as we bring innovative research ideas from the lab to the production deployment arena. As Samad added, “While there is great excitement about Neural MT, it is clear that as we explore further the science, we already see signs that we will continue to make progress and we look forward to bringing the most relevant innovations to market for our customers.”

Wednesday, June 20, 2018

Why Russian to English is difficult for Machine Translation

When we consider the history of machine translation, the science by which computers automatically translate from one human language to another, we see that much of the science starts with Russian. One of the earliest mentions of automated translation involves Russian Peter Troyanskii who submitted a proposal that included both the bilingual dictionary and a method for dealing with grammatical roles between languages, based on the grammatical system of Esperanto, even before computers were available.

The first set of proposals for computer-based machine translation was presented in 1949 by Warren Weaver, a researcher at the Rockefeller Foundation in his now famous "Translation memorandum". In the famous memorandum referenced here, he said: “it is very tempting to say that a book written in Russian is simply a book written in English which was coded into the Russian code.” These proposals were based on information theory, successes in code-breaking during the Second World War, and theories about the universal principles underlying natural language. But Weaver’s memo was not the only driver for this emerging field. What really kick-started research was Cold War fear and the US analysts desire to easily read and translate Russian technical papers. Warren Weaver inspired the founders of Language Weaver to name themselves after him in the early 2000s, and the company was the first to commercialize and productize Statistical Machine Translation (SMT) and was also the source for much of the subsequent innovation in SMT. Its alumni went on to start Google Translate, Moses, influence Amazon MT/AI initiatives, and the company and its intellectual property are now owned by SDL Plc.


The original Georgetown experiment, which involved successful fully automatic translation of more than sixty Russian sentences into English in 1954, was one of the earliest recorded MT projects. Researchers of the Georgetown experiment asserted their belief that machine translation would be a solved problem within three to five years. This claim to be able to solve the MT problem in five years has been a frequent refrain of the MT community, and almost seventy years later we see that MT remains a challenging problem. Recent advances with Neural MT are welcome and indeed significant advances, but MT remains one of the most challenging research areas in AI.


Why is MT such a difficult NLP problem?


As the results of 70 years of ongoing MT research efforts show, the machine translation problem is indeed one of the most difficult problems to solve in the Natural Language Processing (NLP) field. It is worth some consideration why this is so, as it explains why it has taken 70 years to get here, and why it may still take much more time to get to “always perfect” MT, even in these heady NMT breakthrough days.

It is perhaps useful to contrast MT to the automated speech recognition (ASR) challenge, to illustrate the difficulty. If we take a simple sentence like, “Today, we are pleased to announce a significant breakthrough with our ongoing MT research, especially as it pertains to Russian to English translations.” In the case of ASR, there is really only one correct answer, the computer either identified the correct word or it did not, and even when it does not properly identify the word, one can often understand from the context and other correctly predicted words.

Computers perform well when problems have binary outcomes, where things are either right or wrong, and computers tend to solve these kinds of problems much more effectively than problems where the “answers” are much less clear. If we consider the sentence in question as a translation, it is a very different computing challenge. Language is complex and varied, and the exact same thing can be said and translated in many different ways. All of which can be considered correct. If you were to add the possibilities of slightly wrong or grossly wrong translations you can see there are a large range of permutational possibilities. The sentence in question has many possible correct translations and herein lies the problem. Computers do not really have a way to assess these variations other than through probability calculations and measuring statistical data density which is almost always completely defined by the data you train on. If you train on a data set that does not contain every possible translation then you will have missed some possibilities. The truth is that we NEVER train an engine on every possible acceptable translation.

Michael Housman, is chief data science officer at RapportBoost.AI and faculty member of Singularity University. He explained that the ideal scenario for machine learning and artificial intelligence is something with fixed rules and a clear-cut measure of success or failure. He named chess as an obvious example and noted that machines were able to beat the best human Go player. This happened faster than anyone anticipated because of the game’s very clear rules and limited or definable set of moves.

Housman elaborated, “Language is almost the opposite of that. There aren’t as clearly-cut and defined rules. The conversation can go in an infinite number of different directions. And then, of course, you need labeled data. You need to tell the machine to do it right or wrong.”

Housman noted that it’s inherently difficult to assign these informative labels. “Two translators won’t even agree on whether it was translated properly or not,” he said. “Language is kind of the wild west, in terms of data.

Erik Cambria is an academic AI researcher and assistant professor at Nanyang Technological University in Singapore said, “The biggest issue with machine translation today is that we tend to go from the syntactic form of a sentence in the input language to the syntactic form of that sentence in the target language. That’s not what we humans do. We first decode the meaning of the sentence in the input language and then we encode that meaning into the target language.”

All these hindering factors remain in effect for the forseeable future, so we should not expect another big leap forward until we find huges masses of new, high-quality data, or develop a new breakthrough in the pattern detection methodology.

Why are some language combinations more difficult in MT?


In essence (grossly oversimplified), MT is a pattern detection and pattern matching technique where a computer is shown large volumes of clean equivalent sentences in two languages and it “learns” how to “translate” from analyzing these examples. NMT does this differently than SMT, but essentially they are both detecting patterns in the data they are shown, with NMT having a much deeper sense of what a pattern might be. This is why the quality and volume of the “training data” matters, as it defines the patterns that can be learned.

What we have seen over the last 70 years is that languages that are more similar tend to be easier to model (MT is a translation model). Thus we see, it is much easier to build an MT system for Spanish <> Portuguese because both languages are very similar and have many equivalent linguistic structures. In contrast English <> Japanese will be much more challenging because there are big differences in linguistic characteristics, orthographic format (JP can be written in 3 scripts), morphology, grammar, word order, honorific structure and so on. Also, while English <> Japanese is difficult, it is much easier to build a model for Japanese <> Korean since they have much more structural and linguistic similarity and equivalencies.

Thus the basic cause of difficulty is because of the fundamental linguistic differences between the two languages. Linguistic concepts that exist in one language do NOT exist in the other language, so equivalencies are very difficult to formulate and model. A recent research paper describes language modeling difficulties using the Europarl corpus whose existence allows many comparative research experiments. A key finding of this study was that inflectional morphology is a big factor in difficulty. In the chart below we see that DE, HU, and FI are more difficult because of this and SV and DA are easier also because they are more similar.

 A prior study based on the Europarl data has been a key reference for what combinations are easy or difficult to model with roughly equivalent datasets used to build these comparative models.

What this chart shows is that the easiest translation direction is Spanish to French (BLEU score of 40.2), the hardest Dutch to Finnish (10.3).


This shows that having much more Finnish data does not help raise output quality because the linguistic differences are much more significant. The Romance languages outperform many other language combinations with significantly less data.


Why Russian is especially difficult


Russian has always been considered to be one of the most difficult languages in MT, mostly because it is very different linguistically from English. Early NMT attempts were unable to outperform old RBMT models, and SMT models, in general, were rarely able to consistently beat the best RBMT models.  

Russian differs from English significantly in inflection, morphology, word order and gender associations with nouns.

 Inflection

Unlike English, Russian is a highly inflected language. Suffixes on nouns mark 6 distinct cases, which determine the role of the noun in the sentence (whether it's the subject, the direct object, the indirect object, something being possessed, something used as an instrument, or the object of a preposition). For example, all of these are different forms of the word "book":

singularplural
nominativeкнига (kniga)книги (knigi)
genitiveкниги (knigi)книг (knig)
dativeкниге (knige)книгам (knigam)
accusativeкнигу (knigu)книги (knigi)
instrumentalкнигой, книгою (knigoj, knigoju)книгами (knigami)
prepositionalкниге (knige)книгах (knigax)

(from Wiktionary https://en.wiktionary.org/wiki/%D0%BA%D0%BD%D0%B8%D0%B3%D0%B0#Russian)

That's 12 forms of the same word, which are used depending on what role the word is playing in the sentence. But they're not all distinct; you can have the same form for different roles, like the singular genitive & the plural nominative.

Additionally, like Spanish or French every noun has a gender. The word for "book" is feminine, but this is an arbitrary categorization; there's no reason why a book (книга kníga) is feminine and why a table (стол stól) is masculine. But it matters because the case suffixes are different for each gender (masculine, feminine, or neuter). So while there are 12 different forms of the word "book" and 12 different forms of the word "table", they don't share the same set of suffixes. When adjectives modify nouns, they need to agree with the noun, taking the same (or similar) suffix.

Also, like Spanish or French, verbs conjugate depending on tense (past vs. non-past), person (I vs. you vs. he/she/it), number (singular vs. plural), etc. So one verb may have several different forms, as well.

Word order


In English, we use word order to accomplish the same thing as the suffixes on nouns in Russian. Because Russian has these case markings, their word order is much more free. For example, these are all acceptable ways of saying "I went to the shop":

Я пошёл в магазин. (ya poshol v magazin)
Я в магазин пошёл. (ya v magazin poshol)
Пошёл я в магазин. (poshol ya v magazin)
Пошёл в магазин я. (poshol v magazin ya)
В магазин я пошёл. (v magazin ya poshol)
В магазин пошёл я. (v magazin poshol ya)

я ya = I
пошёл poshol = went
в v = to
магазин magazin = shop

Essentially, all orderings are possible, except that the preposition "to" (в v) must precede the word for "shop" (магазин magazin). You can imagine that as sentences get longer, the number of possible sentence order structures increase. There are some limits on this: some orders in this example are dispreferred and sound strange or archaic, and others are only used to emphasize where you're going or who is going. But there are certainly more ways of saying the same thing than English, which is stricter in its word order.


Difficult languages, in general, are more demanding of the skill required of the MT system developer. They are not advised for the Moses and OpenNMT hacker who wants to see how his data might perform with open source magic, and generally, most of these naive practitioners will stay away from these languages. 

There are special challenges for an MT system developer who builds Russian <> English MT systems, e.g.
  •  MT needs to pay more attention to Russian word inflections than to the order of the words, to know where to put the word in the English translation
  • MT needs to be flexible enough to translate a familiar Russian source sentence that appears in an unfamiliar word order
Thus, Russian to English is amongst the most difficult MT combinations one could attempt and only the most competent and skilled MT system developers would be able to build systems that produce output quality that is judged by professional human translators as “human equivalent”.