Tuesday, December 22, 2020

American Machine Translation Association (AMTA2020) Conference Highlights

This post is to a great extent a belated summary of the last highlights from the AMTA2020 virtual conference which I felt was one of the best ones (in terms of signal-to-noise ratio), held in the last ten years.  Of course, I can only speak to those sessions I attended, and I am aware that there were many technical sessions that I did not attend that were also widely appreciated by others.  This post is also a way to summarize many of the key challenges and issues being faced by MT today and is thus a good way to review the SOTA of the technology as this less-than-wonderful year ends.

The State of Neural MT

I think that 2020 is the year that Neural MT became just MT. Just regular MT. It is superfluous to add neural anymore because most of the MT discussions and applications that you see today are NMT based and it would be like saying Car Transportation. It might still be useful to say SMT or RBMT to point out use that is not a current mainstream approach, but it is less necessary to say neural MT anymore.  While NMT was indeed a big or even HUGE leap forward, we have reached a phase where much of the best research and discussion is focused on superior implementation and application of NMT, rather than just simply using NMT. There are many open-source NMT toolkits available and it is clearly the preferred MT methodology in use today, even though SMT nor RBMT are not completely dead. And some still argue that these older approaches are better for certain specialized kinds of problems.

However, while NMT is a significant step forward in improving the generic MT output quality, there are still many challenges and hurdles ahead. Getting back to AMTA2020, one of the sessions (C3) talked specifically about the most problematic NMT errors across many different language combinations and provided a useful snapshot of the situation. The chart below is a summary of the most common kinds of translation errors found across many different language combinations. We see that while the overall level of MT output acceptability has increased, many of the same challenges still remain. Semantic confusion around word ambiguity, unknown words, and dialectical variants continue to be challenging. NMT has a particular problem with phantom or hallucinated text - it sometimes simply creates stuff that is not in the source. But, we should be clear that the proportion of useful and acceptable translations continues to climb and is strikingly "good" in some cases.

A concern for all the large public MT portals, that translate billions of words an hour, is the continuing possibility of catastrophic errors that are offensive, insensitive, or just simply outlandish. Some of these are listed below from a presentation made by Mona Diab, a GWU/Facebook researcher who presented a very interesting overview of something she called "faithful" translation. 

This is a particularly urgent issue for those platforms like Facebook and Twitter that face the huge volumes of social media commentary on political and social events. Social media, in case you did not know, is increasingly the way that much of the world consumes news.

The following slides show what Mona was pointing to when she talked about "Faithfulness" and I recommend that readers look at her whole presentation which is available here. MT on social media can be quite problematic as shown in the next chart.

She thus urged the community to find better ways to assess and determine acceptable or accurate MT quality especially in high-volume social media translation settings. Her presentation provided many examples of problems and described a need for a more semantically accurate measure that she calls "Faithful MT". Social media is an increasingly more important target of translation focus and we have seen the huge impact that commentary in social media can have on consumer buying behavior, political opinion, brand identity, brand reputation, and even political power outcomes. A modern enterprise, commercial, or government, that does not monitor ongoing relevant social media feedback is walking blind, and likely to face unfortunate consequences from this lack of foresight. 

Mona Diab's full presentation is available here and is worth a look as I think it defines several key challenges for the largest users of MT in the world. She mentioned that Facebook processes 20B+ translation transactions per day which could mean anywhere from 100 billion to 2 trillion words a day. This volume will only increase as more of the world comes online and could be twice the current volumes in as little as a year.

Another keynote that was noteworthy (for me) was the presentation by Colin Cherry of Google Research: "Research stories from Google Translate’s Transcribe Mode". He found a way to present his research in a truly compelling way in a style that was both engaging and compelling. The slides are available here but without his talk track, it is barely a shadow of the presentation I watched. Hopefully, AMTA will make the video available. 

Chris Wendt from Microsoft also provided insight into the enterprise use of MT and showed some interesting data in his keynote.  He also gave some examples of catastrophic errors and had this slide to summarize the issues.

He pointed out that in some language combinations it is possible to use 'Raw MT" across many more types of content than in others because these combinations tend to perform better across much more content variation. I am surprised by how many LSPs still overlook this basic fact, i.e. all MT combinations are not equivalent. 

He showed a ranking of the "best" language pair combinations (as in closest to human references) that probably is most meaningful for localization users. But could also be useful to others who want to understand roughly what the MT system quality ratings by language are.

Normally vendor presentations at conferences have too much sales emphasis and too little information content in them to be interesting. I was thus surprised by the Intento and Systran presentations which were both content-rich, educational, and informational-rich. A welcome contrast to the mostly lame product descriptions we normally see.  

While MT technology presentations focused on large-scale use cases (i.e. NOT localization) are making progress in great strides, my feeling is that the localization presentations were inching along progress-wise with post-editing management, complicated data analysis, and review tools themes that really have not changed very much in ten years. A quick look at the most-read blog posts on eMpTy Pages also confirmed that a post I wrote in 2012 on Post-Editing Compensation has made it into the Top 10 list for 2020. Localization use cases still struggle to eke out value from MT technology because it is simply not yet equivalent to human translation. There are clear use cases for both approaches (MT and HT) and it has always been my feeling that localization is a somewhat iffy use case and can only work for the most skilled practitioners who make long-term investments in building suitable translation production pipelines. If I were able to find a cooperative MT developer team I think I would be able to architect a better production flow and man-machine engagement model than much of what I have seen over the years. The reality of MT use in localization still has too much emphasis on the wrong syllable.  I hope I get the chance to do this in 2021.

MT Quality Assessment

BLEU is still widely used today to describe and document progress with MT model development,  even though it is widely understood to be inadequate in measuring quality changes with NMT models in particular. However, BLEU provides long-term progress milestones for developers in particular and I think the use of BLEU scores in that context still has some validity assuming that proper experimental rigor is followed. BLEU works for this task because it is relatively easy to set up and implement. 

The use of BLEU to compare MT systems from different vendors, using public domain test sets is more problematic - my feeling is that it will lead to erroneous conclusions and sub-optimal system selection. To put it bluntly, it is a bullshit exercise that appears scientific and structured but is laden with deeply flawed assumptions and ignorance. 

None of the allegedly "superior metric" replacements have really taken root because they simply don't add enough additional accuracy or precision to warrant the overhead, extra effort, and experimental befuddlement. Human evaluation feedback is now a core requirement for any serious MT system development because it is still the best way to accurately understand relative MT system quality and determine progress in development related to specific use scenarios.  The SOTA today is still multiple automated metrics + human assessments when accuracy is a concern. As of this writing, a comparison system that spits out relative rankings of MT systems without meaningful human oversight I think is suspect and should be quickly dismissed. 

However, the need for better metrics to help both developers and users quickly understand the relative strengths and weaknesses of multiple potential MT systems is even more urgent today. If a developer has 2 or 3 close variants of a potential production MT system, how do they tell which is the best one to commit to?  The need to understand how a production system improves or degrades over time is also very valuable. 

Unbabel presented their new metric: COMET and provided some initial results on its suitability and ability to solve the challenge described above for example. That is to successfully rank several high-performing MT systems. 

The Unbabel team seems very enthusiastic about the potential for COMET to help with:
  • Determine the ongoing improvement or degradation of a production MT system
  • Differentiate between multiple high-performing systems with better accuracy than has been possible with other metrics
Both of these apparently can be done with less human involvement and better automated feedback on these two issues is clearly of high value. It is not clear to me how much overhead is involved in using the metric as we do not have much experience of its use outside of Unbabel. It is being put into open source and will possibly attract a broader user community at least from sophisticated LSPs who stand to gain the most from a better understanding of the value of retraining and doing better comparisons of multiple MT systems than is possible with BLEU, chrF, and hLepor.  I hope to dig deeper into understanding COMET in 2021.

The Translator-Computer Interface

One of the most interesting presentations I saw at AMTA was by Nico Herbig from DFKI on what he called the Multi-modal interface for post-editing. I felt it was striking enough that I asked Nico to contribute a guest post on this blog. This post is now the most popular one over the last two months and can be read at the link below. He has also been covered in more detail and discussion by Jost Zetzche in his newsletter.

While Nico focused on the value of the multi-modal system to the post-editing task at AMTA, it has great value and applicability for any translation-related task. A few things stand out for me about this initiative:
  • It allows a much richer and more flexible interaction with the computer for any translation-related task.
  • It naturally belongs in the cloud and is likely to offer the most powerful user assistance experience in the cloud setting.
  • It can be connected to many translation assistance capabilities like Linguee, dictionaries, terminology - synonym - antonym databases, MT, and other translator reference aids to transform the current TM focused desktop. 
  • It creates the possibility of a much more interactive and translator-driven interaction/adaptation model for next-generation MT systems that can learn with each interaction.

I wish you all a Happy Holiday season and wish a Happy, Healthy, and Prosperous New Year.

(Image credit: SkySafari app and some fun facts)

Tuesday, October 27, 2020

Anonymization Regulations and Data Privacy with MT

 This is a guest post from Pangeanic that focuses on very specific data privacy issues and highlights some of the concerns that any enterprise must address when using MT technology on a large scale across large volumes of customer data.

I recently wrote about the robust cloud data security that Microsoft MT offers in contrast to all the other major Public MT services. Data privacy and security continue to grow into a touchstone issue for enterprise MT vendors and legislation like GDPR makes it an increasingly critical issue for any internet service that gathers customer data.

Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from data sets so that the people whom the data describe remain anonymous.

Data anonymization has been defined as a "process by which personal data is irreversibly altered in such a way that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party." [1] Data anonymization may enable the transfer of information across a boundary, such as between two departments within an agency or between two agencies while reducing the risk of unintended disclosure, and in certain environments in a manner that enables evaluation and analytics post-anonymization. 

This is clumsy to describe, and even harder to do, but is likely to be a key requirement when dealing with customer data that spans the globe. Thus, I thought it was worth a closer look.

*** ===== ***

Anonymization Regulations, Privacy Acts and Confidentiality Agreements 

How do they differ and what do they protect us from?


One of the possible definitions of privacy is the right that all people have to control information about themselves, and particularly who can access personal information, under what conditions and with what guarantees. In many cases, privacy is a concept that is intertwined with security. However, security is a much broader concept that encompasses different mechanisms. 

Security provides us with tools to help protect privacy. One of the most widely used security techniques to protect information is data encryption. Encryption allows us to protect our information from unauthorized access. So, if by encrypting I am protecting my data and access to it, isn't that enough?  

Encryption is not enough for Anonymization because…

in many cases, the information in the metadata is unprotected. For example, the content of an email can be encrypted. This gives us a [false] idea about some protection. When we send the message, there is a destination address. If the email sent is addressed, for example, to a political party, that fact would be revealing sensitive information despite having protected the content of the message.

On the other hand, there are many scenarios in which we cannot encrypt the information. For example, if we want to outsource the processing of a database or release it for third parties to carry out analyses or studies for statistical purposes. In these types of scenarios we often encounter the problem that the database contains a large amount of personal or sensitive information, and even if we remove personal identifiers (e.g., name or passport number), it may not be sufficient to protect the privacy of individuals. 

Anonymization: protecting our privacy

Anonymization (also known as “data masking”) is a set of techniques that allows the user to protect the privacy of the documents or information by modifying the data. This means anonymization with gaps (deletion), anonymization with placeholders (substitution) or pseudoanonymizing data.

[Interfaz de usuario gráfica, Aplicación Descripción generada automáticamente]
In general, anonymization aims to alter the data in such a way that, even if it is subsequently processed by a third party, the identity or sensitive attributes of the persons whose data is being processed cannot be revealed.

Privacy management is regulated similarly across legal jurisdictions in the world. In Europe, it is known as GDPR (General Data Protection Regulation). which was approved in 2016 and implemented in 2018. In the US, the California Consumer Privacy Act (CCPA) was approved in January 2018 and is applicable to businesses that  

  • have annual gross revenues in excess of $25 million;
  • buys, receive, or sell the personal information of 50,000 or more consumers or households; or
  • earn more than half of its annual revenue from selling consumers' personal information

It is expected that most other States will follow the spirit of California’s CPA any time soon. This will affect the way organizations collect, hold, release, buy, and sell personal data.

In Japan, the reformed privacy law came into full force on May 30, 2017, and it is known as the Japanese Act on Protection of Personal Information (APPI). The main differences with the European GDPR are the specific clauses defining private identifiable information which in Europe are “Personal data means any information relating to an identified or identifiable natural person” but APPI itemizes.

In general, all privacy laws want to provide citizens with the right to:  

  1. Know what personal data is being collected about them.
  2. Know whether their personal data is sold or disclosed and to whom.
  3. Say no to the sale of personal data.
  4. Access their personal data.
  5. Request a business to delete any personal information about a consumer collected from that consumer.[9]
  6. Not be discriminated against for exercising their privacy rights.

The new regulations seek to regulate the processing of our personal data. Each one of them establishes that data must be subject to adequate guarantees, minimizing personal data.


What is PangeaMT doing about Anonymization?

PangeaMT is Pangeanic’s R&D arm. We lead the MAPA Project – the first multilingual anonymization effort making deep use of bilingual encoders for transformers in order to identify actors, personal identifiers such as names and surnames, addresses, job titles and functions, and a deep taxonomy.

Together with our partners (Centre National pour la Recherche Scientifique in Paris, Vicomtech, etc.) we are developing the first truly multilingual anonymization software. The project will release a fully customizable, open-source solution that can be adopted by Public Administrations to start their journey in de-identification and anonymization. Corporations will also be able to benefit from MAPA as the commercial version will be released on 01.01.2021.


Wednesday, October 21, 2020

The Evolving Translator-Computer Interface

This is a guest post by 
Nico Herbig from the German Research Center for Artificial Intelligence (DFKI).

For as long as I have been involved with the translation industry, I have wondered why the prevailing translator machine interface was so arcane and primitive. It seems that the basic user interface used for managing translation memory was borrowed from DOS spreadsheets and has eventually evolved to become Windows spreadsheets. Apart from problems related to inaccurate matching, the basic interaction model has also been quite limited. Data enters the translation environment through some form of file or text import and is then processed in a columnar word processing style. I think to a great extent these limitations were due to the insistence on maintaining a desktop computing model for the translation task. While this does allow some power users to become productive keystroke experts it also presents a demanding learning curve to new translators.

Cloud-based translation environments can offer much more versatile and powerful interaction modes, and I saw evidence of this at the recent AMTA 2020 conference (a great conference by the way that deserves much better social media coverage than it has received.) Nico Herbig from the German Research Center for Artificial Intelligence (DFKI) presented a multi-modal translator environment that I felt shows great promise in updating the translator-machine interaction experience in the modern era. 
Of course, it includes the ability to interact with the content via speech, handwriting, touch, eye-tracking, and seamless interaction with supportive tools like dictionaries, concordance databases, and MT among other possibilities. Nico's presentation focuses on the interface needs of the PEMT task, but the environment could be reconfigured for scenarios where MT is not involved and only used if it adds value to the translation task. I recommend that interested readers take a quick look through the video presentation to get a better sense of this.

*** ======== ***

MMPE: A Multi-Modal Interface for Post-Editing Machine Translation

As machine translation has been making substantial improvements in recent years, more and more professional translators are integrating this technology into their translation workflows. The process of using a pre-translated text as a basis and improving it to create the final translation is called post-editing (PE). While PE can save time and reduce errors, it also affects the design of translation interfaces: the task changes from mainly generating text to correcting errors within otherwise helpful translation proposals, thereby requiring significantly less keyboard input, which in turn offers potential for interaction modalities other than mouse and keyboard. To explore which PE tasks might be well supported by which interaction modalities, we conducted a so-called elicitation study, where participants can freely propose interactions without focusing on technical limitations. The results showed that professional translators envision PE interfaces relying on touch, pen, and speech input combined with mouse and keyboard as particularly useful. We thus developed and evaluated MMPE, a CAT environment combining these input possibilities. 

Hardware and Software

MMPE was developed using web technologies and works within a browser. For handwriting support, one should ideally use a touch screen with a digital pen, where larger displays and the option to tilt the screen or lay it on the desk facilitate ergonomic handwriting. Nevertheless, any tablet device also works. To improve automatic speech recognition accuracy, we recommend using an external microphone, e.g., a headset. Mouse and keyboard are naturally supported as well. For exploring our newly developed eye-tracking features (see below), an eye tracker needs to be attached. Depending on the features to explore, a subset of this hardware is sufficient; there is no need to have the full setup. Since our focus is on exploring new interaction modalities, MMPE’s contribution lies on the front-end. At the same time, the backend is rather minimal, supporting only storing and loading of files or forwarding the microphone stream to speech recognition services. Naturally, we plan on extending this functionality in the future, i.e., adding project and user management functionality, integrating Machine Translation (instead of loading it from file), Translation Memory, Quality Estimation, and other tools directly in the prototype.

Interface Layout

As a layout, we implemented a horizontal source-target layout and tried to avoid overloading the interface. On the far right, support tools are offered, e.g., a bilingual concordancer (Linguee). The top of the interface shows a toolbar where users can save, load, and navigate between projects, and enable or disable spell checking, whitespace visualization, speech recognition and eye-tracking. The current segment is enlarged, thereby offering space for handwritten input and allowing users to view the context while still seeing the current segment in a comfortable manner. The view for the current segment is further divided into the source segment (left) and tabbed editing planes for the target (right), one for handwriting and drawing gestures, and one for touch deletion & reordering, as well as a standard mouse and keyboard input. By clicking on the tabs at the top, the user can quickly switch between the two modes. As the prototype focuses on PE, the target views initially show the MT proposal to be edited. Undo and redo functionality and segment confirmation are also implemented through hotkeys, buttons, or speech commands. Currently, we are adding further customization possibilities, e.g., to adapt the font size or to switch between displaying source and target side by side or one above the other.


Hand-writing in the hand-writing tab is recognized using the MyScript Interactive Ink SDK, which worked well in our study. The input field further offers drawing gestures like strike-through or scribble for deletions, breaking a word into two (draw a line from top to bottom), and joining words (draw a line from bottom to top). If there is a lack of space to hand-write the intended text, the user can create such space by breaking the line (draw a long line from top to bottom). The editor further shows the recognized input immediately at the top of the drawing view. Apart from using the pen, the user can use his/her finger or the mouse for hand-writing, all of which have been used in our study, even though the pen was clearly preferred. Our participants highly valued deletion by strike-through or scribbling through the text, as this would nicely resemble standard copy-editing. However, hand-writing for replacements and insertions was considered to work well only for short modifications. For more extended changes, participants argued that one should instead fall back to typing or speech commands.

Touch Reorder

Reordering using (pen or finger) touch is supported with a simple drag and drop procedure: Users have two options: (1) They can drag and drop single words by starting a drag directly on top of a word, or (2) they can double-tap to start a selection process, define which part of the sentence should be selected (e.g., multiple words or a part of a word), and then move it. 

We visualize the picked-up word(s) below the touch position and show the calculated current drop position through a small arrow element. Spaces between words and punctuation marks are automatically fixed, i.e., double spaces at the pickup position are removed, and missing spaces at the drop position are inserted. In our study, touch reordering was highlighted as particularly useful or even “perfect” and received the highest subjective scores and lowest time required for reordering. 



To minimize lag during speech recognition, we use a streaming approach, sending the recorded audio to IBM Watson servers to receive a transcription, which is then interpreted in a command-based fashion. The transcription itself is shown at the top of the default editing tab next to a microphone symbol. As commands, post-editors can “insert,” “delete,” “replace,” and “reorder” words or sub-phrases. To specify the position if it is ambiguous, anchors can be specified, e.g., “after”/”before”/”between” or the occurrence of the token (“first”/”second”/”last”) can be defined. A full example is “replace A between B and C by D,” where A, B, C, and D can be words or sub-phrases. Again, spaces between words and punctuation marks are automatically fixed. In our study, speech [recognition] received good ratings for insertions and replacements but worse ratings for reorderings and deletions. According to the participants, speech would become especially compelling for longer insertions and would be preferable when commands remain simple. For invalid commands, we display why they are invalid below the transcription (e.g., “Cannot delete the comma after nevertheless, as nevertheless does not exist”). Furthermore, the interface temporarily highlights insertions and replacements in green, deletions in red (the space at the position), and combinations of green and red for reorderings. The color fades away after the command. 

Multi-Modal Combinations of Pen/Touch/Mouse&Keyboard with Speech

Multi-modal combinations are also supported: Target word(s)/position(s) must first be specified by performing a text selection using the pen, finger touch, or the mouse/keyboard. 

Afterwards, the user can use a voice command like “delete” (see the figure below), “insert A,” “move after/before A/between A and B,” or “replace with A” without needing to specify the position/word, thereby making the commands less complex. In our study, multi-modal interaction received good ratings for insertions and replacements, but worse ratings for reorderings and deletions. 

Eye Tracking

While not tested in a study yet, we currently explore other approaches to enhance PE through multi-modal interaction, e.g., through the integration of an eye tracker. The idea is to simply fixate the word to be replaced/deleted/reordered or the gap used for insertion, and state the simplified speech command (e.g., “replace with A”/”delete”), instead of having to manually place the cursor through touch/pen/mouse/keyboard. To provide feedback to the user, we show his/her fixations in the interface and highlight text changes, as discussed above. Apart from possibly speeding up multi-modal interaction, this approach would also solve the issue reported by several participants in our study that one would have to “do two things at once” while keeping the advantage of having simple commands in comparison to the speech-only approach.


MMPE supports extensive logging functionality, where we log all text manipulations on a higher level to simplify text editing analysis. Specifically, we log whether the manipulation was an insertion, deletion, replacement, or reordering, with the manipulated tokens, their positions, and the whole segment text. Furthermore, all log entries contain the modality of the interaction, e.g., speech or pen, thereby allowing the analysis of which modality was used for which editing operation. 


Our study with professional translators showed a high level of interest and enthusiasm about using these new modalities. For deletions and reorderings, pen and touch both received high subjective ratings, with the pen being even better than the mouse & keyboard. Participants especially highlighted that pen and touch deletion or reordering “nicely resemble a standard correction task.” For insertions and replacements, speech and multi-modal interaction of select & speech were seen as suitable interaction modes; however, mouse & keyboard were still favored and faster. Here, participants preferred the speech-only approach when commands are simple but stated that the multi-modal approach becomes relevant when the sentences' ambiguities make speech-only commands too complex. However, since the study participants stated that mouse and keyboard only work well due to years of experience and muscle memory, we are optimistic that these new modalities can yield real benefit within future CAT tools.


Due to continuously improving MT systems, PE is becoming more and more relevant in modern-day translation. The interfaces used by translators still heavily focus on translation from scratch, and in particular on mouse and keyboard input modalities. Since PE requires less production of text but instead requires more error corrections, we implemented and evaluated the MMPE CAT environment that explores the use of speech commands, handwriting input, touch reordering, and multi-modal combinations for PE of MT. 

In the next steps, we want to run a study that specifically explores the newly developed combination of eye and speech input for PE. Apart from that, longer-term studies exploring how the modality usage changes over time, whether translators continuously switch modalities or stick to specific ones for specific tasks are planned. 

Instead of replacing the human translator with artificial intelligence (AI), MMPE investigates approaches to better support the human-AI collaboration in the translation domain by providing a multi-modal interface for correcting machine-translation output. We are currently working on proper code documentation and plan to open-source release the prototype within the next months. MMPE was developed in a tight collaboration between the German Research Center for Artificial Intelligence (DFKI) and Saarland University and is funded in part by the German Research Foundation (DFG).


Nico Herbig -

German Research Center for Artificial Intelligence (DFKI)

Further information:


Paper and additional information:

Multi-Modal Approaches for Post-Editing Machine Translation
Nico Herbig, Santanu Pal, Josef van Genabith, Antonio Krüger Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM 2019
ACM Digital Library - Paper access

(Presenting an elicitation study that guided the design of MMPE)

MMPE: A Multi-Modal Interface using Handwriting, Touch Reordering, and Speech Commands for Post-Editing Machine Translation
Nico Herbig, Santanu Pal, Tim Düwel, Kalliopi Meladaki, Mahsa Monshizadeh, Vladislav Hnatovskiy, Antonio Krüger, Josef van Genabith Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. ACL 2020
ACL Anthology - Paper access

(Demo paper presenting the original prototype in detail)

MMPE: A Multi-Modal Interface for Post-Editing Machine Translation
Nico Herbig, Tim Düwel, Santanu Pal, Kalliopi Meladaki, Mahsa Monshizadeh, Antonio Krüger, Josef van Genabith Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ACL 2020
ACL Anthology - Paper access - Video

(Briefly presenting MMPE prototype and focusing on its evaluation)

Improving the Multi-Modal Post-Editing (MMPE) CAT Environment based on Professional Translators’ Feedback
Nico Herbig, Santanu Pal, Tim Düwel, Raksha Shenoy, Antonio Krüger, Josef van Genabith Proceedings of the 1st Workshop on Post-Editing in Modern-Day Translation at AMTA 2020. ACL 2020
Paper access - Video of presentation

(Recent improvements and extensions to the prototype)

Thursday, September 24, 2020

NiuTrans: An Emerging Enterprise MT Provider from China

 This post highlights a Chinese MT vendor who I suspect is not well known in the US or Europe currently, but who I expect will become better known over the coming years. While the US giants (FAAMG) still dominate the MT landscape around the world today, I think it is increasingly possible that other players from around the world, especially from China may become much more recognized in the future. 

One indicator that has been historically reliable to forecast and predict emerging economic power is the volume of patent filings in a country. This has been true for Japan and Germany historically where we saw voluminous patent activity precede the economic rise of these countries, and recently we see that this predictor is also aligned with the rise of S. Korea and China as economic powerhouses. However, the sheer volume of filings is not necessarily a lead indicator of true innovation, and some experts say that the volume of patents filed and granted abroad is a better indicator of innovation and patent quality. But today we see emerging giants from Asia in consumer electronics, automobiles, eCommerce, internet services, and nobody questions the building innovation momentum happening in Asia today. 

Artificial Intelligence (AI) is heralded by many as a key driver of wealth creation for the next 50 years. To build momentum with AI requires a combination of access to large volumes of "good" data, computing resources, and deep expertise in machine learning, NLP, and other closely related technologies. Today, the US and China look poised to be the dominant players in the wider application of AI and machine learning-based technologies with a few others close behind. And here too deep knowledge and clout are indicated by the volume of influential papers published and referenced by the global community. A recent analysis, by the Allen Institute for Artificial Intelligence in Seattle, Washington found that China has steadily increased its share of authorship of the top 10% most-cited papers. The researchers found that America’s share of the most-cited 10 percent of papers declined from a high of 47 percent in 1982 to a low of 29 percent in 2018. China’s share, meanwhile, has been “rising steeply,” reaching a high of 26.5 percent last year, Though the US still has significant advantages with the relative supply of expert manpower and dominance in manufacture of AI semiconductor chip technology, this too is slowly changing even though most experts expect the US to maintain leadership for other reasons

Credit: Allen Institute for Artificial Intelligence

These trends also impact the translation industry and they change the relative benefit and economic value of different languages. The global market is slowly changing from a FIGS-centric view of the world to one where both the most important source language (ZH, KO, HI) and target languages are changing.  The fastest-growing economies today are in Africa and Asia and are not likely to be well served by a FIGS-centric view though it appears that English will remain a critical world language for knowledge sharing for at least another 25 years. These changes create an opportunity for agile and skillful Asian technology entrepreneurs like NiuTrans who are much more tuned-in to this rapidly evolving world.  I have noted that some of the most capable new MT initiatives I have seen in the last few years were based in China. India has lagged far behind with MT, even though the need there is much greater, because of the myth that English matters more, and possibly because of the lack of governmental support and sponsorship of NLP research.

The Chinese MT Market: A Quick Overview

I recently sat down with Chungliang Zhang from NiuTrans, an emerging enterprise MT vendor in China, to discuss the Chinese MT market and his company’s own MT offerings. He pointed out that China is the second-largest global economy today, and it is now increasingly commonplace for both Chinese individuals and enterprises to have active global interactions. The economic momentum naturally drives the demand for automated translation services.

Some examples, he pointed out:

In 2019, China’s outbound tourist traffic totaled 155M people, up 3.3% from the previous year. This massive volume of traveler traffic results in a concomitant demand for language translation. Chungliang pointed out that this travel momentum significantly drives the need for voice translation devices in the consumer market like those produced by Sougou, iFlyTek, and others, which have been very much in demand in the last few years.

There is also a growing interest by Chinese enterprises, both state-owned or privately owned, to build and expand their business presence in global markets. For example, Alibaba, China’s largest eCommerce company, is listed on the NYSE and has established an international B2B portal ( where 20 million enterprises gather and work to “Buy Global, Sell Global.” Currently, the Alibaba MT team builds the largest eCommerce MT systems globally, often reaching volumes of 1.79 billion translation calls per day, which is a larger transaction volume than either Google or Amazon.

“All in all, as we can see it, there is a clear trend that MT is increasingly being used in more and more industries, such as language service industries, intellectual property services, pharmaceutical industries, and information analysis services.”

While it is clear that consumers and individuals worldwide are regularly using MT, the primary enterprise users of MT in China are government agencies and internet-based businesses like eCommerce. This need for translation is now expanding to more enterprises who seek to increase their international business presence and realize that MT can enable and accelerate these initiatives.

The Chinese MT technology leaders in terms of volume and regular user base are the internet services giants (such as Baidu, Tencent, Alibaba, Sogou, Netease) or the AI tech giants (such as iFlyTek). Google Translate and Microsoft Bing Translator are also popular in China since they are free, but they don’t have a large share of the total use if the focus is strictly on MT technology.

When asked to comment on the characteristics and changes in the Chinese MT market, Chungliang said:

“In our understanding, Sogou and iFlytek's primary business focus is the B2C market, and thus both of them develop consumer hardware like personal voice translators. Sogou was recently (July 29, 2020) purchased by Tencent (a major social media player), so we don’t know what will happen next. iFlytek is famous for its Speech-To-Speech technology capabilities. Thus it is natural for them to develop MT, to get the two technologies integrated and grab a larger share of the market.

As for the other important MT players in China, Alibaba MT mainly serves its own global focused eCommerce business, and Tencent Translate focuses on providing the translation needs of its users in social networking use scenarios. Like Google Translate, Baidu Translate is a portal to attract individual users who might need translation during a search. It also serves to expand Baidu’s influence as a whole. While Netease Youdao focuses on the education industry, and the Youdao Team integrates the Youdao online dictionary, direct MT, and human translation.

What are the main languages that people/customers translate? As far as we know, the most translated language is English, Japanese is second, followed by Arabic, Korean, Thai, Russian, German, and Spanish.” Of course, this is all direct to and from Chinese.”

NiuTrans Focus: The Enterprise

The NiuTrans team learned very early in their operational history and during their startup phase that their business survival was linked to providing MT services for the enterprise rather than for individual users and consumers. The market for individuals is dominated by offerings like Google Translate and Baidu Translate that offer virtually-free services. In contrast, NiuTrans is focused on meeting the enterprise demands for MT, which often means deploying on-premise MT engines and the development of custom engines. These enterprises tend to be concentrated around Intellectual Property and Patent services, Pharmaceuticals, Vehicle Manufacturing, IT, Education, and AI companies. For example, NiuTrans builds customized patent-domain MT engines for the China Patent Information Center (CNPAT, a branch of the China National Intellectual Property Administration, a large-scale patent information service based in Beijing.)

CNPAT has the largest collections of multilingual parallel data for patents, and services ongoing and substantial demands for patent-related MT needs in various use scenarios such as patent application filing and examination, patent-related transactions, and patent-based lawsuits. Given the scale of the client’s needs, NiuTrans sends an R&D team on-site to work with CNPAT’s technical team for data processing and data cleaning. This data is then used in the NiuTrans.NMT training module to develop patent-domain NMT engines on CNPAT’s on-premise servers. The on-site team also develops custom MT APIs on-demand to fit into CNPAT’s current workflow and customer servicing needs.

Besides powering and enabling the specialized translation needs of services like CNPAT, NiuTrans also provides back-end MT services for industrial leaders, including iFlyTek (also an early investor in NiuTrans), (the No. 2 eCommerce business in China), Tencent (the largest social networking company in China), Xiaomi (a leader of smart devices OEMs in China), and Kingsoft (a leader of office software in China).

NiuTrans has an online cloud API that also attracts 100,000+ small and medium enterprises interested in expanding their international operations and business presence. The pricing for these smaller users are based on the volume of characters these users translate and is much lower than Google Translate and Baidu Translate prices.

NiuTran’ Online Cloud User Locations

You can visit the NiuTrans Translate portal at

NiuTrans write and maintain their own NMT code-base rather than use open source options for NiuTrans.NMT and claim that they achieve comparable, if not better, quality performance with their competitors. Their comparative performance at the WMT19 evaluations suggests that they actually do better than most of their competitors. They are not dependent on TensorFlow, PyTorch, or OpenNMT to build their systems. Today, NiuTrans is a key MT technology provider, especially for enterprises in China.

NiuTrans.NMT is a lightweight and efficient Transformer-based neural machine translation system. Its main features are:

  • Few dependencies. It is implemented with pure C++, and all dependencies are optional.
  • Fast decoding. It supports various decoding acceleration strategies, such as batch pruning and dynamic batch size.
  • Advanced NMT models, such as Deep Transformer.
  • Flexible running modes. The system can be run on various systems and devices (Linux vs. Windows, CPUs vs. GPUs, FP32 vs. FP16, etc.).
  • Framework agnostic. It supports various models trained with other tools, e.g., Fairseq models.
  • The code is simple and friendly to beginners.

When I probed into why NiuTrans had chosen to develop their own NMT technology rather than use the widely accepted open-source solutions, I was provided with a history of the company and its evolution through various approaches to developing MT technology.

The NiuTrans team originated in the NLP Lab at Northeastern University, China (NEUNLP Lab), a machine translation research leader in the Chinese academic world going as far back as 1980. Like many elsewhere in the world, the team initially studied rule-based MT from 1980 to 2005. In 2006 Professor Jingbo Zhu (the current Chairman of NiuTrans) returned from a year-long visit to ISI-USC and decided to switch to statistical MT research working together with Tong Xiao, who was a fresh graduate student at the time and is now the CEO of NiuTrans. They made rapid strides in SMT research, releasing the first version of NiuTrans SMT open source in 2011. At that time, Chinese academia primarily used Moses to conduct MT-related research and develop MT engines. The development of the NiuTrans.SMT open-source proved that Chinese engineers could do the same as, or even better than Moses, and also helped to showcase the strength and competence of the NiuTrans team. Thus, in 2012, confident with their MT technology and armed with a dream to expand the potential of this technology to connect the world with MT, the NiuTrans team decided to form an MT company, converting the 30+ years’ of MT research work to developing MT software for industrial use.

Given their origins in academia, they kept a close watch on MT research and breakthroughs worldwide and noticed in 2014 that there was a growing base of research being done with neural network-based deep learning models. Therefore, the NiuTrans team started studying deep learning technologies in 2015 and released its first version of NiuTrans.NMT in December 2016, just three months after Google announced the release of its first NMT engines.

NiuTrans prefers to avoid using open source MT platforms like TensorFlow, PyTorch, or OpenNMT as they have developed deep competence in MT technology gathered over 40 years of engagement. The leadership believes there are specific advantages to building the whole technology stack for MT and intend to continue with this basic development strategy. As an example, Chunliang pointed me to the release of NiuTensor, their own deep learning tool: ( NiuTrans.NMT Open Source ( They are confident that they can keep pace with continuous improvements in open source with support from the NEUNLP Lab, which has eight permanent staff and 40+ Ph.D./MS students focusing on MT issues of relevance and interest for their overall mission. This group also allows NiuTrans to stay abreast of the worldwide research being done elsewhere.

NiuTrans understands that a critical requirement for an enterprise user is to adapt and customize the MT system to enterprise-specific terminology or use. Thus, it provides both a user terminology module to introduce user terminology into the MT system and a user translation memory module to introduce the users’ sentence pairs to tune the MT system. Another more sophisticated solution is incremental training. They incorporate user data to modify the NiuTrans model parameters to get the MT model better adjusted to user data features.

NiuTrans also gathers post-editing feedback on critical language pairs like ZH <> EN and ZH <> JP on an ongoing basis, then analyze error patterns to develop continuing engine performance improvements.

Quality Improvement, Data Security, and Deployment

NiuTrans evaluates MT system performance using BLEU and a human evaluation technique that ranks relative systems. They prefer not to use the widely used 5-point scale to assign an absolute value to a translation. Thus if they were comparing NiuTrans, Google, and DeepL, they would use a combination of BLEU and have humans rank the same blind test set for the three systems.

NiuTrans also has an ongoing program to improve its MT engines continually. They do this in three different ways:

  1. Firstly, as the company has a strong research team that is continually experimenting and evaluating new research, the impact of this research is continuously tested to determine if it can be incorporated into the existing model framework. This kind of significant technical innovation is added into the model two or three times a year.
  2. Secondly, customer feedback, ongoing error analysis, or specialized human evaluation feedback also trigger regular updates to the most important MT systems (e.g. ZH<>EN) at least once a month.
  3. Thirdly, engines will be updated as new data is discovered, gathered, or provided by new clients. High-quality training data is always sought after and considered valuable to drive ongoing MT system improvements.

NiuTrans has performed well in comparative evaluations of their MT systems against other academic and large online MT solutions. Here is a summary of the results from WMT19. They report that their performance in WMT20 is also excellent, but final results have not yet been published.

NiuTrans training data comes mainly from two sources: data crawling and data purchase from reliable vendors.

NiuTrans uses crawlers to collect the parallel texts from the websites that do not prohibit or prevent this, e.g., some Chinese government agencies’ websites that often provide data in several languages. They also buy parallel sentences (TM) and dictionaries from specific data provider companies, who might require signing an agreement, specifying that the data provider retains the intellectual property rights of the data.

NiuTrans gets the bulk of its revenue from data-security concerned customers who deploy their MT systems on On-premise systems. However, NiuTrans is also working on an Open Cloud offering, allowing customers to access an online API and avoid installing the infrastructure needed to set up on-premise systems. The Open Cloud is a more cost-effective option for smaller SME companies, and NiuTrans has seen rapid adoption of this new deployment in specific market segments.

International customers, especially the larger ones, much prefer to deploy their NiuTrans MT systems on-premise. For those international customers who cannot afford on-premise systems, the NiuTrans Open Cloud solution is an option. This system is deployed on the Alibaba Cloud that is governed by Chinese internet security laws that require that user data be kept for six months before deletion. The company plans to build another cloud service on the Amazon Cloud for international customers who have data security concerns. This new capability will allow users to encrypt their data locally, transfer the data securely to the Amazon Cloud. NiuTrans will then decrypt the source data on their servers, translate it, and finally delete all the user data and the corresponding translation results once the source data has been translated.

NiuTrans currently has 100+ employees, directed by Dr. Jjingbo Zhu and Dr. Tong Xiao, two leading MT scientists in China. Shenyang is the seat of the company’s headquarters and R&D team as well. Technical support and services are available in Beijing, Shanghai, Hangzhou, Chendu, and Shenzhen currently, but the company is now exploring entering the Japanese market, with the assistance of partners in Tokyo and Osaka. While NiuTrans is not a well-known name in the US/EU translation industry today, I suspect that they will become an increasingly better-known provider of enterprise MT technology in the future.