Tuesday, January 31, 2017

The Driving Forces Behind MT Technology

This is a  modified version of a post that was originally published on  Caroline Alberoni's blog.


Machine translation (MT) today is as pervasive and ubiquitous as mobile phone technology. While some translators still feel threatened by the technology or feel the need to disparage it for it’s less than perfect translation, it is useful to understand why it is so widely used. At their annual developer conference in April 2016, Google announced that they are translating over 140 billion words a day across 100 languages. Baidu Translate can translate 27 languages and is growing, and processes around 100 million requests every day. Most of this use is from casual internet users who may be interested in translating a news story or some simple phrases. However, there is a growing impact on the professional translation business as well.

If you add the translation volume of Microsoft, Baidu, Yandex and other MT providers, we can certainly expect that more than 500 billion words a day are translated by computers on a daily basis today. This is probably more than 95% of ALL translation done and perhaps as high as 99% on the planet on a daily basis.

As Peter Brantley at Berkeley states in a personal blog:
"Mass machine translation is not a translation of a work, per se, but it is rather, a liberation of the constraints of language in the discovery of knowledge."

The need for translation of business content and other kinds of high-value information on the internet continues to grow, but the increasing use of MT also cause changes that affect translators and agencies alike. The most interesting translation work is increasingly moving beyond the focus of traditional translation work and is likely to do even more so in the future. Thus, the most lucrative and interesting new business translation opportunities, like at eBay for example, may require very different kinds of skills and competence but would still draw on traditional translation and linguistic competence. Translators and linguists today are often required to be “word corpus analysts” and today increasingly are involved in projects to steer MT technology to produce better results.
The professional use of MT is increasingly valid for all of the following:
  • Highly repetitive content where productivity gains with MT can dramatically exceed what is possible with just using TM alone
  • Content that would just not get translated otherwise
  • Content that cannot afford human translation
  • High-value content that is changing every hour and every day
  • Knowledge content that facilitates and enhances the global spread of critical knowledge
  • Content that is created to enhance and accelerate communication with global customers who prefer a self-service model
  • Content that does not need to be perfect but just approximately understandable
The forces that drive the increasing use of MT in the world, are largely beyond the control of the professional “translation industry,” continue to build unabated and can be briefly listed as follows:
  • The Explosion of Content Creation: The sheer volume of content that global enterprises, entertainment agencies, educational establishments, governmental agencies and any international commercial venture need to translate continues to grow by the minute. The amount of digital information increases tenfold every five years! In fact, it can even be said that we live in an age where more information is being created annually than has existed in the 500 years prior.
  • The Changing Content Value Equation: While historically corporate marketing communications had a great degree of control, today most consumers distrust this kind of messaging and would rather trust the shared opinions of fellow consumers. The value of business content increasingly has a very short shelf-life and thus traditional (slow and expensive) TEP (translate-edit-proof) approaches are increasingly questioned for information that may have little or no value after six months.  In actual fact, the fastest growing type of content is actually user-generated content (UGC) that is found in blogs, FB, Youtube, Twitter and community forums. It is estimated by IDC that 70% of the content on the web is UGC and much of that is very pertinent and useful to enterprises. This content is now influencing consumer behavior all over the world and is often referred to as word-of-mouth marketing (WOMM). Consumer reviews are often more trusted than “corporate marketing-speak” and even “expert” reviews which are often funded by the same corporations.We all have experienced Amazon, travel sites, C-Net and other user rating sites. It is useful for both global consumers and global enterprises to make this multilingual. Given the speed at which this information emerges, MT has to be part of the translation solution because of the volume and sheer rate of creation of this type of content. 

How much data is generated every minute? 

A case in point: The world’s largest travel review platform, TripAdvisor receives 315+ million monthly unique visitors, on its website, many contributing reviews. The combined weight of these reviews is considerable, and influence consumer decision making on final purchase selections to a very great extent. Having translations available in multiple languages online to support a purchase decision greatly enhances the possibility of a global consumer executing a transaction on the site.

Another very descriptive example by Juan Rowda at eBay:

"There are currently more than 800 million listings on eBay (over 1 Billion as of this writing). Considering that each listing has around 300 words, how long do you think it would take any given number of linguists to translate these listings? Did I mention that some of the listings may only be online for a day or a week and that the inventory changes continuously?

So, don’t even pull out your calculator. The answer is simple – human translation is not viable. However, if you really want to know, we estimate it would take 1,000 translators 5 years to translate only the 60 million listings eligible for Russia! For (these) listings, machine translation is clearly a much better fit in this scenario. "
  • Short Product Life and Development Cycles: The product life cycles in electronics, fashion, and many other consumer products get shorter all the time, so rapid, “good enough” product descriptions are increasingly considered sufficient for business requirements. The historical translation quality assurance cycles practiced in the 80’s and 90’s are not viable today as they simply could not keep pace with the rate of new product introduction.
  • Continuously Increasing Volume & Managed Cost Pressures: Enterprises are under continuous pressure to translate more content with the same budgets, and thus they seek out translation agencies who understand how to do this with rapid turnaround. Competent use of MT is a critical element of redefining the cost-time-volume equation for translating ever growing volumes of relevant business content especially given the extremely transient nature of a lot of this information.
  • Changing Internet User Base: As more of the developing world comes online it becomes imperative for these new users to have MT to be able to get some basic understanding of existing web content, especially knowledge content. The need is clear not only to global eCommerce sites but also to many local government agencies around the world who need to provide basic health and justice information and services to a growing population of immigrants who may not speak the dominant local language.
  • Widespread Acceptance of Free Generic Machine Translation: The universal availability and widespread use and acceptance of “free MT” on the internet have raised acceptance of MT in executive management circles too. This also drives the momentum for large new types of projects that would never have been considered in the TEP translation world. The fact that 500+ billion words a day are being translated by MT is clear indication that it delivers some value to hundreds of millions of internet users. As the MT quality continues to improve, albeit slowly, it puts further downward pressure on the price of translation work. It can also be said that for many languages MT has become an aid for translators as it can function as a dictionary, terminology or phrase lookup system.
Thus it is safe to presume, that it is very likely that MT is going to be a fact of life for many professional translators in the 21st century. And then, what new skills would a translator need to understand and be considered a valued partner, in a world where MT deployment and “opportunities” will continue to abound?

MT today, has already proven itself in professional use scenarios with many Romance languages, but we are still at a transition point in the use of MT in many other language combinations, and thus the MT experience can often be less than satisfying for translators in those other languages, especially when working with translation agencies who are not technically competent with MT.

The New Skills in Demand

At a high level, the skills that matter in working with the professional use of MT, that we can expect will grow in value to global enterprises and agencies involved in large MT projects are as follows.
  • Understand the different kinds of MT systems that you would interface with. Translators that understand the different kinds of MT are likely to be much more marketable.
  • Understand the specific output quality of the MT engines that you are working with. Provide articulate linguistic feedback on MT output. Being able to provide articulate feedback on error patterns is perhaps one of the most sought after skills in professional MT deployment today. This ability to assess the quality of MT output is also beneficial to a freelancer who is trying to decide whether to work on a PEMT project or not.
  • Develop skills with new kinds of tools that are valuable in dealing with corpus level tasks and manipulations. It is much more likely that MT projects will involve much larger volumes of data and data preparation and global pattern modification skills become much more useful and valuable.
  • Develop skills in providing pattern level feedback and develop rapid error pattern identification and correction. Being able to devise a rapidly implementable test and evaluation routines that are useful and effective is an urgent market requirement. This paper summarizes the specific linguistic issues with Brazilian Portuguese that provide an idea of what this actually means.
  • Develop a corpus view that involves linguistic steering rather than segment level corrections. This is a fundamental change of mental perspective that is a mandatory requirement for successful professional involvement with MT. Understanding the competence of the translation agencies that you engage with is also a key requirement as it is VERY easy to mismanage an MT project and most translation agencies that attempt to build MT engines on their own are quite likely to be incompetent.

What can a translator do?

  1. Learn and educate yourself on the variants of MT.
  2. Experiment with major public engines from Google, Systran, and Bing and with specialist tools like Lilt, SDL Adaptive MT and SmartCAT that allow easy interaction with MT.
  3. Understand how to rapidly assess MT output quality BEFORE you engage in any MT project.
  4. Don’t work with incompetent translation agencies who know little or nothing about MT but only seek to reduce rates with crappy do-it-yourself engines.
  5. Experiment with corpus management tools.

While it is quite possible that MT will never be quite good enough to be used for the translation of literary work and poetry where linguistic finesse and deep semantic insight is essential, it is clear in 2017 that MT has a definite role to make much more information multilingual in the global enterprise and any international communication. The MT technology has evolved over the years and is now beginning to use a new development methodology based on neural networks similar to those formed in human brains. Early results of this Neural Machine Translation are clearly better than the current technology, and we are in a period of inflated expectations of what is possible, but there is a reason for optimism and I think we should only expect that MT will become even more universal and widely used in the years to come.

Thursday, January 26, 2017

An Examination of the Strengths and Weaknesses of Neural Machine Translation

As Neural MT  gains momentum we see more studies that explain why it is being seen as a major step forward, and we are now beginning to understand some of the very specific reasons for this momentum. This summary by Antonio Toral and Víctor M. Sánchez-Cartagena highlights how NMT provides some specific advantages using well-understood data and comparative systems. Their main findings are presented below, but I saw an additional comment in the paper that I am also including here. The paper also provides BLEU scores for all the systems that were used, which are consistent with the scores shown here, and it is interesting that Russian is still a language where rule-based systems still produce the highest scores in tests like this. The fact that NMT systems perform so well on translations going out of English should be especially interesting to the localization industry. Now we need some evidence of how NMT systems can be domain-adapted and SYSTRAN will soon provide some details.

The fact that NMT systems do not do well on very long sentences can be managed by making these sentences shorter. I tend to write really long sentences but 40-45 words in a sentence seems really long to me and in a localization setting I think this can be managed.

"The best NMT system clearly outperforms the best PBMT system for all language directions out of English (relative improvements range from 5.5% for EN > RO to 17.6% for EN > FI) and the human evaluation (Bojar et al., 2016, Sec. 3.4) confirms these results. In the opposite direction, the human evaluation shows that the best NMT system outperforms the best PBMT system for all language directions except when the source language is Russian." 


From EN
























Into EN



















BLEU scores of the best NMT and PBMT systems for each language pair at WMT16’s news translation task.




A case study on 9 language directions


In a paper that will be presented at EACL in April 2017, we aim to shed light on the strengths and weaknesses of the newly introduced neural machine translation (NMT) paradigm. To do so we compare the translations produced by the best neural and phrase-based MT systems submitted to the news translation task at WMT16 for 9 language directions across a number of dimensions. The main findings are as follows:
  • Translations produced by NMT are considerably different than those produced by phrase-based systems. In addition, there is higher inter-system variability in NMT, i.e. outputs by pairs of NMT systems are more different between them than outputs by pairs of phrase-based systems.
  • NMT outputs are more fluent. We corroborate the results of the manual evaluation of fluency at WMT16, which was conducted only for language directions into English, and we show evidence that this finding is true also for directions out of English.

  • NMT systems do more reordering than pure phrase-based ones but less than hierarchical systems. However, NMT reorderings are better than those of both types of phrase-based systems.

  • NMT performs better in terms of inflection and reordering. We confirm that the findings of Bentivogli et al. (2016) apply to a wide range of language directions. Differences regarding lexical errors are negligible. A summary of these findings can be seen in the next figure, which shows the reduction of error percentages by NMT over PBMT. The percentages shown are the averages over the 9 language directions covered.

Reduction of errors by NMT averaged over the 9 language directions covered 

  • NMT performs rather poorly for long sentences. This can be observed in the following figure, where we plot the translation quality obtained by NMT and by phrase-based MT for sentences of different length. Translation quality is measured with chrF, an automatic evaluation metric that operates at the character level. We use it as it has been shown at WMT that it correlates better than BLEU with human judgments for morphologically rich languages (e.g. Finnish), while its correlation is on par with BLEU for languages with poorer morphology, e.g. English. That said, while we only show results based on chrF in the paper, we computed the experiment with BLEU too and the trends are the same, namely NMT degrades with sentence length.  

    Quality of NMT and PBMT for sentences of different length

Antonio Toral and Víctor M. Sánchez-Cartagena. 2017. A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions. arXiv, GitHub

 Antonio Toral
Antonio Toral is an assistant professor in Language Technology at the University of Groningen and was previously a research fellow in Machine Translation at Dublin City University. He has over 10 years of research experience in academia, is the author of over 90 peer-reviewed publications and the coordinator of Abu-MaTran, a 4-year project funded by the European Commission.

Víctor M. Sánchez-Cartagena

Víctor M. Sánchez-Cartagena is a research engineer at Prompsit Language Engineering, where he develops solutions based on automatic inference of linguistic resources and where he also worked on increasing the low industrial adoption of machine translation through Abu-MaTran, a 4-year project funded by the European Commission. He obtained his Ph.D. from University of Alicante in 2015 upon completion of his dissertation "Building machine translation systems from scarce resources". He was a predoctoral researcher at the University of Alicante during 4 years prior.

Monday, January 23, 2017

Finding the Needle in the Digital Multilingual Haystack

There are some kinds of translation applications where MT just makes sense. Usually, this is because these applications have some combination of the following factors: 
  • Very large volume of source content that could NOT be translated without MT
  • Rapid turnaround requirement (days)
  • Tolerance for lower quality translations at least in early stages of information review
  • To enable triage requirements and help to identify highest priority content from a large mass of undifferentiated content
  • Cost prohibitions (usually related to volume)
This is a guest post by Pete Afrasiabi, of iQwest Information Technologies that goes into some detail into the strategies employed to effectively MT in a business application area, that is sometimes called eDiscovery, (often litigation related), but in a broader sense could be any application where it is useful to sort through a large amount of multilingual content to find high-value content. In today's world, we are seeing a lot more litigation involving large volumes of multilingual documents, especially in cases that involve patent infringement and product liability. MT serves a very valuable purpose in these scenarios, namely, it enables some degree of information triage. When Apple sues Samsung for patent infringement, it is possible that tens of thousands of documents and emails are made available by Samsung (in Korean) for review by Apple attorneys. It is NOT POSSIBLE to translate them all through traditional means, so MT, or some other volume reduction process must be used to identify the documents that matter. Because these use-cases are often present in litigation, it is generally considered risky to use the public MT engines, and most prefer to work within a more controlled environment. I think this is an application area that the MT vendors could service much more effectively by working with expert users like the guest author more closely.


Whether you manage your organization’s eDiscovery needs, are a litigator working with multi-national corporations or are a Compliance officer, you commonly work with multilingual document collections. If you are an executive that needs to know everything about your organization, you would have a triage strategy helping you get the right information ASAP. If the document count is over 50-100k you typically employ native speaking reviewers to perform a linear one by one review of documents or utilize various search mechanisms to help you in this endeavor or both. What you may find is that most documents being reviewed by these expensive reviewers is often irrelevant or requires an expert to review. If the population includes documents from 3 or more languages, then the task becomes even more difficult!

There is a better solution. A solution that if used wisely can benefit your organization, save time/money and a huge amount of head ache. I am proposing that in these document populations the first thing you need to do is eliminate non-relevant documents and if they are in a foreign language you need to see an accurate translation of the document. In this article, you will learn in detail how to improve the quality of these translations using machines at a cost of hundreds of times less than human translation and naturally much faster.

With the advent of new machine translation technologies comes the challenge of proving its efficacy in various industries. Historically MT has been looked at not only inferior but as something to avoid. Unfortunately, the stigma that comes with this technology is not necessarily far from the truth. Adding to that, the incorrect methods utilized in presenting its capabilities by various vendors has led to its demise in active use across most industries. The general feeling is “if we can human translate them, why should we use an inferior method” and that is true for the most part, except that human translation is very expensive, especially when the subject matter is more than a few hundred documents. So is there really a compromise? Is there a point where we can rely on MT to complement existing human translations?

The goal of this article is to look under the hood of these technologies and provide a defensible argument for how MT can be supercharged with human translations. Human being’s innate ability to analyze content provides an opportunity to help and aid some of these machine learning technologies. An attempt to transfer that human based analytical information into a training model for these technologies can provide translation results that are dramatically improved.

Machine Translation technologies are based on dictionaries, translation memories and some rules-based grammar that differs from one software solution to another. Although there are newer technologies that utilize statistical analysis and mathematical algorithms to construct these rules and have been available for the past several years, unfortunately, individuals that have the core competencies to utilize these technologies are few and far between. On top of that, these software solutions are not by themselves the whole solution and just a part of a larger process that entails understanding language translation and how to utilize various aspects of each language and features of each of the software solutions.

I have personally witnessed most if not all the various technologies utilized in MT and about 5 years ago, developed a methodology that has proven itself in real life situations as well. Here is a link to a case study on a regulatory matter that I worked on.

If followed correctly, these instructions can turn machine translated documents into documents with minimal post editing requirements and at a cost of hundreds of times less than human translation. They will also look more closely like their human translated counterparts with proper flow of sentence and grammatical accuracy, far beyond the raw machine translated documents. I have referred to this methodology as “Enhanced Machine Translation”, still not a human translation but much improved from where we have been till now.

Language Characteristics

To understand the nuances of language translation we first must standardize our understanding of the simplest components within most if not all languages. I have provided a summary of what this may look like below.
  • Definition
    • Standard/Expanded Dictionaries
  • Meaning
    • Dimensions of a words definition in Context
  • Attributes
    • Stereotypical description of characteristics
  • Relations
    • Between concepts, attributes and definitions
  • Linguistics
    • Part of Speech / Grammar Rules
  • Context
    • Common understanding based on existing document examples

Simply accepting that this base of understanding is common amongst most, if not all languages is important, since the model we will build on makes assumptions that these building blocks will provide a solid foundation for any solution that we propose.

Furthermore, familiarity with various classes of technologies available is also important, with a clear understanding of each technology solution’s pros and cons. I have included a basic summary below.
  • Basic (Linear) rule based Tools
  • Online Tools (Google, Microsoft, etc.)
  • Statistical Tools
  • Tools combining the best of both worlds of rules-based and statistical analysis

Linear Dictionaries & Translation Memories

  • Ability to understand the form of word (noun, verb, etc.) in a dictionary
  • One to one relationship between words/phrases in translation memories
  • Fully customizable based on language
  • Inability to formulate correct sentence structure
  • Ambiguous results, often not understandable
  • Usually, a waste of resources in most case use examples if relied on exclusively

Statistical Machine Translation

  • Ability to understand co-occurrence of words and building an algorithm to use as reference
  • Capable of comparing sentence structures based on examples given and further building on the algorithm
  • Can be designed to be case-centric
  • Words are not numbers
  • No understanding of form of words
  • Results could be similar to some concept searching tools that often fall off the cliff if relied on too much
 Now that we understand what is available, building a model and process that takes advantage of benefits of various technologies, while minimizing the disadvantages of them would be crucial. In order to enhance any and all of these solution’s capabilities, it is important to understand that machines and machine learning by itself cannot be the only mechanism we build our processes on. This is where human translations come into the picture. If there was some way to utilize the natural ability of human translators to analyze content and build out a foundation for our solutions, would we be able to improve on the resulting translations? The answer is a resounding yes!

BabelQwest : A combination of tools designed to assist in Enhancing Quality of MT

To understand how we would accomplish this, we need to review some of the machine based concept analysis terminologies first. In a nutshell, these definitions and solutions are what we have actually based our solutions on. I have made reference to some of the most important of these definitions below. I have also enhanced these definitions with how as linguists and technologists we will utilize them in building out the “Enhanced Machine Translation” (EMT for short) solutions.
  • Classification: Gather a select representative set of the documents from the existing document corpus that represent the majority of subject matters to be analyzed
  • Clustering: Build out documents selected in the classification stage to find similar documents that match the cluster definitions and algorithms of the representative documents
  • Summarization: Select key sections of these documents as keywords, phrases, and summaries
  • N-Grams: N-Grams are the basic co-occurrence of multiple words that are within any context. We will build these N-Grams from the summarization stage earlier and create a spreadsheet with each depicting each N-Gram and their raw machine translated counterparts. The spreadsheet is built into a voting worksheet that allows human translators to analyze each line and provide feedback as to the correct translations and even whether certain N-Grams captured should be part of the final training seed data or not. This seed data will fine tune the algorithms built out in the next stage down to the context level and with human input. A basic depiction of this spreadsheet is shown below.

Voting Mechanism

iQwest Information Technologies Sample Translation Native Reviewer Suggestion Table

Japanese English

アナログ・デバイス Analog Devices

デバイスの種類によりスティック品 Stick with the type of product devices

トレイ品として包装・ as the product packaging tray


新たに納入仕様書 the new technical specifications
 Common Parameters
共通仕様書 Common Specifications


で新梱包方法を提出してもらうことになった have had to submit to the new packing method

  • Simultaneously human translate the source documents that generated these N-Grams. The human translation stage will build out a number of document pairs with the original content in the original language in one document and the human translated English version in another document. These will be imported into a statistical and analytical model to build the basic algorithms. By incorporating these human-translated documents into the statistical translation engine training, the engine will discover word co-occurrences and their relations to the sentences they appear in as well as discovering variations of terms as they appear in different sentences. They will be further fine-tuned with the results of the N-Gram extraction and translation performed by human translators.
  • Define and/or extract key names and titles of key individuals. This stage is crucial and usually the simplest information to gather since most if not all parties involved already have references in email addresses, company org charts, etc. that can be gathered easily.
  • Start training process of translation engines from the results of the steps above (multilevel and conditioned on volume and type of documents)
  • Once a basic training model has been built we would test machine translate original representative documents and compare with their human translated counterparts. This stage can be accomplished with as little as less than one hundred documents to prove the efficacy of this process. This is why we refer to this stage as the “Pilot” stage.
  • Repeat the same steps with a larger subset of documents to build a larger training model and to prove the overall process is fruitful and can be utilized to machine translate the entire document corpus. We refer to this stage as the “Proof of Concept” stage and it is the final stage. We would then start staging the entirety of the documents subject to this process in a “Batch Process” stage.
In summary, we are building a foundation based on human intellect and analytical abilities to perform the final translations. In using an analogy of a large building, the representative documents and their human translated counterparts (pairs) serve as the concrete foundation and steel beams, the N-Grams serve as the building blocks in between the steel beams and the key names and titles of individuals serve as the fascia of the building.

Naturally, we are not looking to replace human translation completely and in cases where certified human translations are necessary (Regulatory compliance, court submitted documents, etc.) we will still rely heavily on this aspect of the solution. Although the overall time and expense to complete a large-scale translation project is reduced by hundreds of times. The following chart depicts the ROI of a case on a time scale to help understand the impact such a process can have

This process has additional benefits as well. Imagine for a moment a document production with over 2 Million of Korean language documents that were produced over a long-time scale and from various locations across the world. Your organization has a choice of either reviewing every single document and classifying them into various categories utilizing native Korean native reviewers or utilize an Enhanced Machine Translation process to provide a larger contingent of English-speaking reviewers to search and eliminate non-relevant and classify the remainder of the documents.

One industry that this solution offers immense benefits is in the Electronic Discovery & Litigation support industry, where majority of attorneys that are experts in various fields are English-speaking attorneys and by utilizing these resources along with elaborate searching mechanisms (Boolean, Stemming, Concept Search, etc.) in English they can quickly reduce the population of documents. On the other hand, if the law firm relied only on native speaking human reviewers, a crew of 10 expert attorney reviewers, each reviewing 50 documents per hour (4000 documents per day on an 8-hour shift) would take them 500 working days to complete the review, with each charging hourly rates that can add up very quickly. 

We have constructed a chart from data over the past 15 years performing this type of work for some of the largest law firms around the world that shows the impact of a proper document reduction or classification strategy may have at every stage of their litigation. Please note the bars start from the bottom to top, with MT being the brown shaded area.

The difference is stark and if proper care is not given to implementation it often prevents organizations from knowing the content of documents within their control or supervision. This becomes a real issue with Compliance Officers that must rely on knowing every communication that occurs or has occurred within their organization at any given time.


Mr. Pete Afrasiabi the President of iQwest, is a veteran of aggregating technology assisted business processes into organizations for almost 3 decades and in the litigation support industry for 18. He has been involved with projects involving MT (over 100 million documents processed), Manages Services and Ediscovery since the inception of the company as well as deployment of technology solutions (CRM, Email, Infrastructure, etc.) across large enterprises prior to that. He has a deep knowledge of business processes, project management and extensive experience working with C-Level executives.

Pete Afrasiabi
iQwest Information Technologies, Inc.