Thursday, August 16, 2018

Post-Editing and NMT: Embracing a New Age of Translation Technology

This is a guest post by Rodrigo Fuentes Corradi and Andrea Stevens from the SDL MT  project management team. 

SDL is unique amongst LSPs as they have both deep MT development expertise, and also have a large pool of in-house professional translators who can communicate much more easily with the MT development and management team on engine specifics, and provide the kind of feedback that leads to best practices in MTPE projects and delivers overall superior quality. MT in many localization contexts only works if it indeed delivers output that is useful to translators rather than frustrates them. This quality is best achieved by an active and productive (constructive) dialogue between translators, project managers, and developers, which is the modus operandi at SDL. 

While most MT developers have a strong preference for using automated quality metrics like BLEU and LEPOR, the SDL MT project management teams have discovered many years ago, that competent human assessments are much more reliable to determine if an MT system is viable or not for an MTPE project. They have developed very reliable methods to make this determination efficiently and effectively.  NMT presents special challenges in MTPE use scenarios, as the automated metrics are generally even less reliable than they have been with SMT. NMT fluency can sometimes veil accuracy and thus special training and modified strategies are needed as this post describes. Also, we often see that the quality of the MT output can be significantly better than a BLEU score might suggest as this metric is often lower for NMT systems for various reasons.

We are beginning to see other research that suggests that NMT does often provide productivity benefits over SMT, but that it requires special updated training and at least some understanding of the new challenges presented to established production processes by the compelling but sometimes inaccurate fluency of NMT systems.


Post-editing means different things to different people. To the corporate world, it means making localization budgets go further by translating more for less. To freelance translators, it may well mean an infringement of their craft and livelihood. We know that freelance careers are the result of several years of study, hard work and perseverance to create a client portfolio based on a reputation for quality, consistency and on-time delivery.

Language Service Providers like SDL are often stuck in the middle, working to satisfy the complex demands of our clients while nurturing the linguistic and creative talent in our supply chain. The truth is that in today’s localization market, Machine Translation Post-Editing (MTPE) is very much a reality and an answer to the unstoppable content explosion that we are experiencing.

How do we find a balance and how can we, as an LSP committed to machine translation (MT) and post-editing, do the impossible and manage both client expectations and supply chain needs?
At the heart of the conundrum lies content. What content needs to be developed and translated, what is its purpose, who does it serve, and how can our clients bring it to their customers in the most appropriate and cost-efficient way possible? Content and how well it communicates to a local audience is what defines a company in today’s fast-moving markets. As a business changes and evolves, so does content, creating the need for almost constant innovation and reinvention. This undoubtedly poses a challenge for translators who need to commit to life-long learning to understand new concepts, trends, and challenges and produce materials in the target language that are fully adapted to local markets and audiences.

Content keeps growing exponentially and our challenge is to understand the vast amounts of diverse content that our customers create and how to best deal with it. All content has value and answers a specific requirement, but for example, there is a difference between content created for a technical knowledge base, content for an advertising feature or even regulatory content. For content that has a short shelf-life, a straight MT solution without human intervention may be perfectly acceptable whereas, at the other end of the spectrum, more creative content may require specialized translators or transcreation. In between, there is a wide range of content that could be best served with a hybrid human and machine approach.

As a translation work-giver and machine translation provider, it’s up to us at SDL to navigate the challenges posed by the ongoing content explosion in partnership with our clients and our supply chain. We need to be realistic about the role that MTPE plays in today’s translation marketplace and acknowledge the advantages for all involved. At the same time, challenges and constraints cannot be swept under the rug, but need to be openly addressed and discussed for full transparency.

“While NMT is inspired by the way the human brain works, it does not learn languages quite like humans do – humans learn to speak to communicate with each other in a wide social context.”

One of the challenges is to make post-editing sustainable. Post-editing is established as a standard solution for a wide variety of content types and language pairs. The application of MT is constantly pushed further by the commercial need to respond to the overall content growth while dealing with limited or unchanged localization budgets. As a result, the MTPE footprint continues to expand beyond initially successful languages and domains into new territories, such as regulatory life sciences content or even marketing.

To do this successfully, we rely on technology advances and improvements, some of which have only become possible over the last one to two years. Customized solutions and real-time adaptive machine translation are among the tools that improve the post-editing experience for translators, but the biggest step forward is surely the arrival of Neural Machine Translation.

Neural Machine Translation

Neural Machine Translation (NMT) has rightly been described as a revolution rather than an evolution. With its core developed entirely from scratch, NMT offers amazing opportunities for innovation. Its powerful architecture paradigm does not only capture text or syntax information but actual meaning and semantics, leading to the improvements in translation quality that we are seeing across the board. This is something all MT providers, including SDL, agree on after extensive testing across language pairs and combinations.

How is NMT achieving this? In short, NMT uses artificial neural networks which are based on mimicking the human brain with its interconnected neurons that help us understand the world around us, what we see, touch, smell, taste or hear. An NMT system learns from observing correlations between the source and the target text and modifies itself to increase the likelihood of producing a correct translation. While NMT is inspired by the way the human brain works, it does not learn languages quite like humans do – humans learn to speak to communicate with each other in a wide social context. NMT systems are still trained on bilingual data sets but promise noticeable uplift in translation quality through a more efficient framework for learning translation rules. NMT systems use an input layer where the text for translation is fed into the system, a hidden layer or multiple hidden layers where processing takes place and an output layer where we obtain the translation.

“When meaning and semantics are represented through math, words with similar meanings tend to cluster together.”


The hidden layer contains a vast network of neural nodes where the input is encoded into a vector of numbers to give a predictive output. Essentially, we are applying math to the problem of language and translation. When meaning and semantics are represented through math, words with similar meanings tend to cluster together. This is how we know that the NMT system starts to learn the semantics of words. When words have several meanings, they appear in different clusters; for example, ‘bank’ can appear in the geography or the finance cluster. It is then even possible to apply further math to the vectors, as shown in the graphic below: if you take the vector ‘king,’ subtract the vector ‘man’ and add the vector ‘woman,’ the result is a vector that is exactly or very close to the vector ‘queen.’

 For NMT, we use deep neural networks, which are better for long-range context and dependencies. This is particularly important when it comes to languages for which the benefits of statistical machine translation were limited. Good examples are language pairs with long distance word traveling such as Japanese, where the clause structure is very different from English, or languages with long-distance dependencies such as German and Dutch.

One of the main advantages of NMT is the very fluent translation output. However, it is important to understand that the very fluent output can sometimes mask the fact that the content of the automated translation is not correct.

This is just one of the reasons why post-editing is still so important, even when working with NMT output. It is essential that translators understand the behaviors and patterns of NMT to be able to take full advantage of this promising technology.

To prepare translators for working with NMT, training and active engagement with the supply chain is essential. This is part of a much larger training effort that includes our soon to be updated Post-Editing Certification Program. Training will not only help our vendors to prepare for working with new technologies but also ensure that everyone is ready for the technological expansion of MTPE.

Working Together

Being a good work giver is key to jointly face new challenges along with our supply chain. Increasingly, this means guiding projects with an expert understanding of tools, domains, processes and intended audience. The combination of these factors will bring sustained success and quality continuity.

New technologies such as NMT can prove disruptive; transparency and training will be key to reassure and prepare our vendors. One key challenge will be to align the improved NMT technology with post-editing experience and a strategy for on-boarding our supply chain.

The intent is to utilize NMT technology across a wide range of domains and content types, and it is important to collect valid data points to proactively assess the chances of success before reaching out to freelancers. Furthermore, when approaching freelancers, it has always been helpful to share these findings and provide guidelines for MT behaviors that can help their post-editing decisions.

In summary, so much of the success is centered on good communication, which is dependent on openness, sharing materials and providing channels to discuss issues and answer questions. In this respect, we see a responsibility to drive these initiatives with a structured communication plan that includes webinars and open days held in our language offices.

 We are witnessing the ongoing growth of Artificial Intelligence and Machine Learning in many aspects of our daily lives, from self-driving cars to medical diagnosis and intervention. Technology is a huge part of how we live and what we do, and this particularly holds true for translators. Post-editing works at the intersection of humans and machines, and machine translation is one of the most advanced tools in the translators’ toolbox to future-proof the profession for a new generation of translators. MTPE is, of course, a choice that everyone needs to make for themselves, but with new technologies such as Adaptive or Neural MT working for translators and the growing reach of MT into new domains, this is not a choice to be taken lightly. Technology developments are not reducing the role of translators, but rather, are changing and enhancing it, opening a host of new opportunities. This is a journey we need to embark on together for continued learning, support, and feedback.

Rodrigo Fuentes Corradi

MT Business Consultant, SDL

Andrea Stevens

 MT Translation Manager, SDL


  The authors have produced a white paper on "Best Practices for Enterprise Scale Post-Editing" that can be accessed at this link.



  1. The Bad And The Good News About The #Translation & #Localization Industry.

    The Bad First(Ohh boy this will hurt):
    The traditional translation and interpretation services will pretty soon be old news like the Blackberry Mobile Phone and Kodak cameras. And they won't come back.

    So, instead of being all defensive about it and deny the current market trend, learn to ride the waves of change...there are inevitable.

    The Good News:
    Globalization, the tech revolution, and expansion of business require more localization experts for companies to be able to properly market, communicate, and satisfy their customers. This means, more business for you and me...but we must change the way we used to promote ourselves. Marketing yourself as a mere interpreter or a translator won't work anymore. Google, Siri, Alexa, could perform those tasks faster than we do(Boom) Get over it.

    So, what is the new game plan?

  2. As a long term localization specialist I usually embrace new technology more than most other people, but 99 out of 100 MTPE projects are still based on an extremely poor base, and the MT engines are rarely optimized as fast as they should be. And I still have to see an LSP where I believe they use MT with the goal of enhancing productivity and deliver projects faster and not to sell projects as cheap as possible. If the customers are happy with the result then go for it by all means, but at some point I think they will realise that they are not (currently) getting the linguistic quality they would expect. On the bright side a well trained MT engine means that even small teams can actually enhance their productivity, and they will be able to compete for even large projects

  3. The technology (algorithm) is not most important aspect of machine learning. A well-designed state-of-the-art Neural MT system that learned from a generic corpus like Google, does not perform as well as a customized statistical MT system that learned from a sell-designed corpus that fits the specific job.

    More than 2 years ago, Kirti advised me to demonstrate real-world results from real customers. It took all that time to coordinate those customers to cooperate in a study and approve a format I could publish. I worked with 74 sets of real customer experiences with customized SMT. Finally, 31 of the 74 approved their results inclusion in a published study. I published that study last month. It is available for download here:

    With respect to the translator sitting at the keyboard and doing the work, customized SMT predicted, on average, the translator's actions more than six times as often than Google's generic NMT. So, if Google generated 100 edit-distance zero segments out of 1,350 total test segments, the customized SMT system generated 650 edit distance zero segments. Of the remaining segments with edit-distance greater than zero, the customized SMT segments reflected significantly less editing work to correct them compared to the Google NMT outputs.

    Neural MT clearly demonstrates potential for future improvements, but SMT with a well-designed corpus is today's most powerful tool to reduce a translator's workload.

    Thank you, Kirti, for your recommendations as we launched Slate Desktop 2 1/2 years ago.

    1. This is encouraging and interesting. This does show that the greatest value-add may indeed come from straightforward personalization and the right steering data. Too many of us in the industry think it is all about the algorithms and presume NMT is universally better. This data should provide grounds to question this presumption. Properly customized SMT can indeed still often outperform generic NMT, especially in this PEMT context.

      However, I also realize that NMT is still in infancy and improvements are quite likely to come specifically in this area of rapid adaptation. I have seen at SDL that a single engine can support multiple terminology views. I expect that in the future we will evolve to more capabilities to easily steer MT engines to do what we want, and that the technology will adapt faster and with more collaborative impact for this L10N use context

    2. Agreed. NMT, like the F-35, is on a growth path but the U-2, A-10, B-52 and SMT still have valuable and irreplaceable roles. Your expected "in the future" is here and now. Personalization (customized to a person) steers the MT results towards the one person at the keyboard. That's a much easier task than simultaneously steering a system to satisfy 10,000 people and keyboards (ore more?).

      In the survey, I borrowed CSA's age-old term and described our 60-year MT efforts as taking place in the "Promised Land Paradigm." The survey proposes a new "Predictive Paradigm." If we step out of the old, we leave behind the self-imposed constraints and open new possibilities to this paradigm and others. The Predictive Paradigm makes it simple to steer MT with higher degrees of confidence. Still, there's never a silver bullet. The study identified three weaknesses in the proposed Predictive Paradigm. Maybe this forum is a good place to deliberate and expand the list.

      I hope you take the time to read and comment on the study. It includes an outline to conduct independent tests and I hope that others can take the time to do so.

  4. On another note, I think it's sad that BLEU scores continue to be referred to as a quality measurement. Case in point, "quality of the MT output can be significantly better than a BLEU score might suggest..." The reality of BLEU and other scores (edit distance, TER, Meteor, etc.) not suggest quality at all. They are likeness scores between the MT output and one or more desirable reference translations. They are like "fuzzy" scores between the project's source and source in the TM. Therefore, a high BLEU score does not mean the MT his high quality any more than a high fuzzy score means the source is high quality. They're simply similar.

    When properly used, these scores tell us more important information than "quality." The acknowledgement that BLEU "is often lower for NMT systems" tells us that the NMT output is more often less like the reference translations. Again, "like" does not equate with good quality and contrarily "unlike" does not equate with low quality.

    Therefore, there's an important takeaway about NMT's lower BLEU scores. We know that with NMT, the translator (I never use the term "post-editor") must consider a broader range of possible translations than he/she would otherwise consider if the MT output were more like the translator's natural work. This extra cognitive work takes more time and can result in mental fatigue.