Monday, July 27, 2020

Observations on the Translation Industry

This is a guest post by a frequent contributor on this blog: Luigi Muzii. Here he shares observations on some key trends in the professional translation industry. His observations are presented as pieces of a jigsaw puzzle and readers can connect them or not as they wish. His opinions are his own, but I like to include them on this platform as they often ring true and show a keener sense of observation than we typically find in the localization media.

 He and I have both been saying for many years that disintermediation and disruption are coming to the industry, but we have yet to see a real fundamental change in the way things are done. This may be because the industry is highly fragmented and the inertia requires much more force to enable the needed structural change. There has been some change, but it has been slow and incremental. Or, quite possibly it may simply be that we are both wrong on this prediction of inevitable disruption.

After considering his observations here again, I think that it is perhaps, that the timing is hard to predict. MT has taken over a decade to even moderately penetrate the industry, and it is my opinion that it is still most often sub-optimally or wrongly used in the localization world. For real disintermediation to take place tools, processes, and solutions all have to evolve and align together in a meaningful way.  

Luigi often points to the practice of emphasizing the wrong aspects of the business challenges in the industry in many of his observations. This little clip makes this clear for those who still find his observations somewhat opaque.

“The reason why it is so difficult for existing firms to capitalize on disruptive innovations is that their processes and their business model that make them good at the existing business actually make them bad at competing for the disruption.”

'Disruption' is, at its core, a really powerful idea. Everyone hijacks the idea to do whatever they want now. It's the same way people hijacked the word 'paradigm' to justify lame things they're trying to sell to mankind."
'Disruption' is, at its core, a really powerful idea. Everyone hijacks the idea to do whatever they want now. It's the same way people hijacked the word 'paradigm' to justify lame things they're trying to sell to mankind.
'Disruption' is, at its core, a really powerful idea. Everyone hijacks the idea to do whatever they want now. It's the same way people hijacked the word 'paradigm' to justify lame things they're trying to sell to mankind.
Clay Christensen

“Life’s too short to build something nobody wants.”
Ash Maurya

“If you always do what you always did, you will always get what you always got.”
Albert Einstein

In the last week or so, there has been much clamor about the "magical" and "astounding" GPT-3 capabilities that can "create" and generate text by drawing from a HUGE language model. More data equals better AI, right? They say that GPT-3 is different because it creates. GPT-3 is a text-generation API. You give it a topic, and it spits back a (hopefully) coherent passage. It learns over time, tracking not just what it thinks your topic is about, but how you talk about that topic. 

Some of the examples of GPT-3 intelligence being shared in the Twitterverse are truly remarkable, but while I am indeed impressed, I think we should also maintain some skepticism about this "breakthrough" until we better understand the limitations. I will not be surprised to see overenthusiastic feedback from the LSP industry just as we saw with NMT. This thread has some interesting discussion and varied viewpoints on GPT-3.   

My initial impression is that is indeed a great leap forward, but it has two very serious flaws that come immediately to mind:
  1. It lacks common sense as does all deep learning based AI that I have seen,
  2. It is unable to admit that it does not know.
However, GPT-3 already appears to have the potential to displace mediocre marketing content producers, just as MT displaced some mediocre or bad translators. As more competent people test it and play with it, we will uncover the problems it is best suited to address. I look forward to hearing more about the production use of the technology and real use cases.

The difference between stupidity and genius is that genius has its limits. 


A Jigsaw Puzzle

Over the last week or two, several topics have jumbled together in my mind. While they may seem unconnected, I do see a thread that binds them together. Commenting on each of these subjects separately would have meant breaking that thread, so they are presented together here as jigsaw tiles, that the reader may wish to combine to build an overall picture.


Ethnocentrism is the original sin of globalization and one of the capital sins of internationalization. Most often, incorrect localization is like the fruit of the poisonous tree.

Writing full strings with as few variables as possible should be the most basic lesson in a Software Internationalization 101 course.

Context helps, syntactic gimmicks don’t.

Using an active voice is always better than using a passive one.

Gender issues should be left to localizers. Paying too much attention to use gender-neutral forms and words from strings (and content in general) won’t help translators do their job. On the contrary, they will make it harder, forcing translators to develop solutions that hardly sound as natural as the neutral English source material does. These translator modifications are not necessarily as neutral in another language, especially when an ending vowel can make a difference

Beyond being a silly stereotype, “thinking outside the binary box” to prevent using gendered language does not necessarily lead to effective communication.

Removing pork or cow meat from menus will not help per se increasing restaurant sales in Muslim or Hindu countries. However, redesigning the menu probably will. And this is a fundamental lesson in globalization 101.

All this reminds me of the launch of Windows 95 when you consider the initial localization attempts of the “Start” button and the sudden abandonment of the “Start Me Upguitar chord as the accompanying jingle, which of course, makes much less sense in non-Anglo cultures.

Ethnocentrism could appear even in a theoretically unbiased approach to writing. Being a linguist does not necessarily mean that one is also open, inclusive, and global. The editor of a historically-popular trade magazine, who was also a translator, was also a prominent figure in the formation of the not so inclusive UKIP.

Inclusive language is something localizers and translators need no specific guide for. Sexist, racist, or otherwise biased, prejudiced language and ideas cannot be prevented from spreading, and translators have to deal with this daily. And they know how to cope with this phenomenon. Most importantly, they know how not to be influenced by this in doing their job. It’s called ethics.

It is wise, though, to request that vendors notify customers whenever they find language that isn’t inclusive, at least when inclusiveness is a pre-requisite. A customer’s task requirements guidelines should clarify whether a translator should keep the non-inclusive language intact — requirement specifications: such strange stuff.

Guidelines on using inclusive language may be useful for authors when machine translation is going to be involved. Much too often, people prefer to ignore that bias in AI and MT doesn’t come from algorithms, but from the people who developed the technology, and it reflects their values. Biased preference comes from training data even more than from input data. Training data are examples from which computers learn patterns and build predictive models. And this historical data is usually coming from real examples of human/social attitudes in the past.

Pandemic Crisis ‘Secondary’ Effects

The effects of the ongoing pandemic may have different readings — some of these readings concerns the broadening of the gig economy.

According to recent reports, the gig economy is taking over the enterprise. That more employees opt for a more flexible work structure may be one reading. Another one is that it is invaluable for organizations seeking to streamline and reduce costs.

Gig jobs are no longer limited to lower-paying work performed on-demand, and it seems that organizations have started taking advantage of more valuable employees. Gig jobs in the white-collar world has significantly increased in the past few years. 72 percent of all gig jobs worldwide between 2018 and 2019 were in large enterprise and professional services firms, and, according to Deloitte, gig workers in the US are going to triple to 42 million workers in 2020.

Quoting Gigster’s CEO Chris Keene, “Companies have always valued the ability to increase capacity without increasing costs.”

The impact of the gig economy on professionals that very few seem to see is that it exploits the demand for jobs to push remuneration lower and lower. No one pays attention to building a meritocracy: performance ratings and rankings are just truncheons.

What remains of the gig economy is a blessing for post-pandemic corporate recovery who can avoid hiring back thousands of full-time employees laid off or furloughed. Quoting Chris Keene again, “Coming out of this pandemic, there are a lot of jobs that people are not going to be able to come back to.” The pandemic crisis has had the gig economy jump a decade forward and pushed capitalism and its mission to a peak, i.e., increase profits and reduce costs to the maximum possible level.

This cost reduction focus is an unrelenting mission, as recent German slaughterhouse outbreak cases of the coronavirus showed. The specific impact of the cost-reduction focus, in this case, was to force close contact amongst workers in feverish working conditions needed to produce cheap meat. Of course, reports showed that otherwise despised migrants provided almost all the cheap labor. The German NGG union spoke of “shameful and inhumane conditions.”

The usual justification is that better working conditions involve higher prices. But are low prices really low? Higher prices always hide behind low prices.


Now that machine translation is finally mainstream [in the translation services industry,] nobody questions its use anymore. But still, the debate around MT use has taken on the same quagmire issues as those around localization translation in general. This means that, as Kirti Vashee, wittily notes, “the quality discussion remains muddy.”

Translation industry attention focuses mostly on edit distance, post-editing effort assessment, post-editing practices, and overall effectiveness measurement. Not surprisingly, discussions focus primarily, if not exclusively, on assessing the quality of machine translation output rather than on how to improve overall MT system capabilities, and shoddy tools like DQF receive all too much consideration.

Indeed, data and its understanding draw little or no interest, despite the clear enterprise market interest in an MT offering. This lack of focus is due not only to the fact that the LSP MT offering is not transparent, is unconvincing, and often poorly focused. Despite the interest of enterprise customers in MT, the relatively good performance of (almost) free online MT engines create a disincentive for LSPs to invest. LSPs are reluctant to explore a territory that seems outside their traditional scope of business and expertise.

Helping machine translation systems handle inclusive language is not just a matter of focus on training data, just as producing good content downstream is not just a matter of effective post-editing practices.

Preemptive quality assessment (or a priori risk assessment, as some call it) is only as effective as the training data is useful. Also, error detection and correction capabilities are crucial, at least as long as quality assessment still heavily depend on inspections.

Information asymmetry also applies to machine translation. Estimating risk only for the output without taking into account the source data, process conditions (especially buyer requirements), and the expected results do not raise high hopes per se. If you are unable to measure these three parameters according to consistent and parallel metrics and produce a weighted mean, you will face misleading estimates. Last but not least, insistence on segment-based rather than document-based analysis will not get you out of the narrow enclave in which the translation community has been basking for centuries.

Disintermediation Is Not A Vending Machine

And no ATM either.

At the WWDC 2020, Apple revealed that version 14 of iOS would come with a translation app specifically designed to translate conversations in 11 languages. An on-device mode will also be available to allow offline translations.

Should this be interpreted as another sign of the imminent end of the translation industry? The industry is most probably doomed, but its end is not set to come tomorrow.

The end of the industry will come from disintermediation. Some, including “yours truly,” have been writing (and talking) about this happening for a decade. Others are speaking more quietly about this more recently. More precisely, the usual suspects made some enthusiastic, although scanty, comments when Lionbridge launched its BPaaS platform, onDemand, five years ago or so.

Recently, though somewhat belatedly, SDL has struck back with its self-service, on-demand platform, SLATE.

Disintermediation is almost inexorable in the evolution of the global (digital) village, where intermediaries are generally seen as the villain. However, they are everywhere online, despite the common belief that they are not (e.g., Airbnb, Amazon,, eBay, Expedia, Instacart, Uber, the food delivery companies, the app stores, just to name a few). Following the typical marketing model of rechristening old things by giving them glamorous or more palatable names, they are simply renamed as two-sided markets.

Incidentally, Lionbridge’s OnDemand was quickly, abruptly and mysteriously discontinued despite its boasted growth of 68 percent in one year with reportedly impressive scores of 99.8 percent on-time delivery rate, 99.4 percent revision-free project rate, and 99 percent of users satisfied or very satisfied (85 percent) with their customer care.

Lionbridge onDemand’s should have turned language services into items that could be bought through an e-catalog via a procure-to-pay system.

This “productization” approach involved standardizing options and making pricing instant. The idea behind it was to entice business stakeholders with 24/7 access, faster turnaround times and lower prices, while providing higher visibility into total-cost-per-output and rate-card negotiations, thus curbing the vendors’ role and their ability to add fees and lengthen lead times.

The pricing model was the traditional word rate model, while for its self-service platform, SDL offers a subscription model (SLA anyone?).

Today’s fundamental question is the same as then: Who and what are these platforms for?

As Semir Mehadžić brilliantly noted, beyond the aim of ‘cutting out the middleman’ childishly coming from typical middlemen, a BPaaS should come up with a better value proposition than the one currently used, i.e., “fewer clicks” and “avoiding the use of Google Translate.”

In the projected perspective, self-service translation platforms may entice consumers, but hardly any businesses.

The businesses such platforms can entice are typically new to translation and the translation industry, usually, companies entering international markets for the first time. Such companies generally go along a long and painful track of word of mouth and web search to find a vendor that suits their needs. Then inquiries and quotes follow, and leave the business managers puzzled and hesitant with their heads spinning and aching. The many quotes collected differ substantially from one another, and all look invariably too costly, mainly because the service offered is essentially the same.

Therefore, if the ideal recipient for a self-service platform is the consumer (e.g.,, One-Hour Translations, Lingo24, Gengo, tolingo, etc.), SDL’s offering, with its SLA-like model, is aiming at SME’s while saving on sales and account management costs. Probably because SMEs would typically not approach a large LSP since they presume that they would not find the same responsiveness, flexibility, and speed.

And this only happens if everything goes well because those SME managers described above might easily bump into an LSP salesperson who tries to educate the prospective customer about the intrinsic value of translation and the wonders of CATs and TMSs. Unfortunately, there is no inherent value in service offerings, only a perceived one, and while the prospect customer knows this, maybe the salesman does not. And the selling effort is thus burnt.

Therefore, self-service translation platforms might target the consumer market, where SMEs with occasional translation jobs can also be found. However, to reach the consumer market, substantial investments are required to be always on top of SERPs and get the necessary conversions. The future effect of these platforms may thus further accelerate commoditization of translation service businesses.

This unintended impact should be feared by self-service translation platforms in particular, as it would require that they will need to sell more and more translations just to stay even in revenue terms. The situation is similar to the dilemma of vending machine suppliers. They need to continuously sell more and more vending machines and find cheaper and cheaper products to include in them.

Anyway, all of these DIY instant translation platforms look like their designers know little about how the need for translation arises in business, and how translations are performed and delivered. More importantly, they look as if they don’t know - or even care - about customer satisfaction and how this is expressed and assessed.

It is my observation, that these allegedly “new offerings” are usually just a response to the same offering from competitors. They should not be equated to disintermediation and they often backfire, both in terms of business impact and brand image deterioration. They all seem to look like dubious, unsound initiatives instigated by Dilbert’s pointy-haired boss. And the Peter principle rules again here and should be considered together with Cipolla’s laws of stupidity, which state that a stupid person is more dangerous than a pillager and often does more damage to the general welfare of others.

Luigi Muzii's profile photo

Luigi Muzii has been in the "translation business" since 1982 and has been a business consultant since 2002, in the translation and localization industry through his firm . He focuses on helping customers choose and implement best-suited technologies and redesign their business processes for the greatest effectiveness of translation and localization related work.

This link provides access to his other blog posts.

Monday, June 29, 2020

Understanding Data Security with Microsoft Translator

In this time of the pandemic, many experts have pointed out that enterprises that have a broad and comprehensive digital presence are more likely to survive and thrive in these challenging times. The pandemic has showcased the value of digital operating models and is likely to force many companies to speed up their digital innovation and transformation initiatives. The digital transformation challenge for a global enterprise is even greater, as the need to share content and expand the enterprise's digital presence is massively multilingual, thus putting it beyond the reach of most localization departments who have a much narrower and much more limited focus. 

Thus, today we are seeing that truly global enterprises and agencies have a growing need to make large volumes of flowing content multilingual, to enable communication, problem resolution, collaboration, and knowledge sharing possible, within and without the organization. Most often this needs to be as close to real-time as possible. The greater the enterprise commitment to digital transformation, the greater the need, and urgency. Sophisticated, state-of-the-art machine translation enables multilingual communication and content sharing to happen at scale across many languages in real-time, and is thus becoming an increasingly important core component of enterprise information technology infrastructure. Enterprise tailored MT is now increasingly a must-have for the digitally agile global enterprise.

However, MT is extremely complex and is best handled by focused, well funded, and committed experts who build unique competence over many years of experience. Many in the business translation world dabble with open source tools, and build mostly sub-optimal systems that do not reach the capabilities of generic public systems, and thus create friction and resistance from translators who are well aware of this shortcoming. MT system development still remains a challenge for even the biggest and brightest, and thus, in my opinion, is best left to committed experts.

Given the confidential, privileged and mission-critical nature of the content that is increasingly passing through MT systems today, the issue of data security and privacy is becoming a major factor in the selection of MT systems by enterprises concerned with being digitally agile, but who also wish to ensure that their confidential data is not used by MT technology providers to refine, train, and further improve their MT technology. 

While some believe that the only way to accomplish true security is by building your own on-premise MT systems, this task as I have often said, is best left to large companies with well-funded and long-term committed experts. Do-it-yourself (DIY) technology with open source options makes little sense if you don't really know, understand, and follow what you are doing with technology this complex.

It is my feeling that MT is also a technology that truly belongs in the cloud for enterprise use, and also usually makes more sense on mobile devices for consumer use. While in some rare cases, on-premise MT systems do make sense for truly massive scale users like national security government agencies (CIA, NSA) who can appropriate the resources to do it competently, for most commercial enterprise MT provides the greatest ROI when it is delivered and implemented in the cloud by an expert and focused team who do not have to re-invent the wheel. Customization on a robust and reliable expert MT foundation appears to be the optimal approach. MT is also a technology that is constantly evolving as new tools, algorithms, new data, and processes come to light to enable ongoing incremental improvements, and this too suggests that MT is better suited to cloud deployment. Neural MT requires relatively large computing resources, deep expertise, and significant data resources and management capabilities to be viable. All these factors point to MT best being a cloud-based deployment, as it essentially remains a work-in-progress, but I am aware that there are still many who disagree on this, and that the cloud versus on-premise issue is one where it is best to agree to disagree.

I recently sat down with Chris Wendt, Group Program Manager and others in his team responsible for Microsoft Translator services, including Bing Translator and Skype Translator. They also connect Microsoft’s research activities with its practical use in services and applications. My intent in our conversation was to specifically investigate, better understand, and clarify the MT data security issues and the many adaptation capabilities that they offer to enterprise customers, as I am aware that the actual facts are often misrepresented, misunderstood, or unclear to many potential users and customers.

Microsoft is a pioneer in the use of MT to serve the technical support information needs of a global customer base, and was the first to make massive support knowledge bases available in MT'd local language for their largest international markets. They were also very early users of Statistical MT (SMT) at scale (tens of millions of words translated for millions of users) and were building actively used systems around the same time that Language Weaver was commercializing SMT. Many of us are aware that the Microsoft Translator services are used both by large enterprises and many LSP agencies in the language services industry because of the relative ease of use, straightforward adaptation capabilities, and relatively low cost.  Among the public MT portals, Microsoft is second only to Google in terms of MT traffic, and their consumer platforms solutions on the web and mobile platforms are probably used by millions of users across the world on a daily basis.

It is important to differentiate between Microsoft’s consumer products and their commercial products when considering the data security policies that are in place when using their machine translation capabilities, as they are quite different.

Consumer Products: 

The consumer products are Bing, the Edge browser, and the Microsoft Translator app for the phone. These products run under the consumer terms of use, which make it possible for Microsoft to use the processed data for quality improvement purposes. Microsoft keeps a very small portion of the data, non-consecutive sentences, and without any information about the customer who submitted the translation. There is really nothing to learn from performing a translation. The value only comes when the data is annotated and then used as Test or Training data. The annotation is expensive, so there are only a few thousand sentences used per language every year, at most.

Some people read the consumer terms of use and assume the same applies to commercial enterprise products.

That is not the case.

Enterprise Products:

The Translator API is provided via an Azure subscription, which runs under the Azure terms of use. The Azure terms of use do not allow Microsoft to see any of the data being processed. Azure services generally run as a GDPR processor, and Translator ensures compliance by not ever writing translated content to persistent storage.

The typical process flow for a submitted translation is as follows:

Decrypt > translate > encrypt > send back > and > forget.

The Translator API only allows encrypted access, to ensure data is safe in transit. When using the global endpoint, the request will be processed in the nearest available data-center. The customer can also control the specific processing location by choosing a geography-specific endpoint from ten locations which are described here

Microsoft Translator is certified for compliance with the GDPR processor and confidentiality rules. It is also compliant with all of the following:

CSA STAR: The Cloud Security Alliance (CSA) defines best practices to help ensure a more secure cloud computing environment, and to helping potential cloud customers make informed decisions when transitioning their IT operations to the cloud. The CSA published a suite of tools to assess cloud IT operations: the CSA Governance, Risk Management, and Compliance (GRC) Stack. It was designed to help cloud customers assess how cloud service providers follow industry best practices and standards and comply with regulations. Translator has received CSA STAR Attestation. 

FedRAMP: The US Federal Risk and Authorization Management Program (FedRAMP) attests that Microsoft Translator adheres to the security requirements needed for use by US government agencies. The US Office of Management and Budget requires all executive federal agencies to use FedRAMP to validate the security of cloud services. Translator is rated as FedRAMP High in both the Azure public cloud and the dedicated Azure Government cloud. 

GDPR: The General Data Protection Regulation (GDPR) is a European Union regulation regarding data protection and privacy for individuals within the European Union and the European Economic Area. Translator is GDPR compliant as a data processor.

HIPAA: The Translator service complies with the US Health Insurance Portability and Accountability Act (HIPAA) Health Information Technology for Economic and the Clinical Health (HITECH) Act, which governs how cloud services can handle personal health information. This ensures that health services can provide translations to clients knowing that personal data is kept private. Translator is included in Microsoft’s HIPAA Business Associate Agreement (BAA). Health care organizations can enter into the BAA with Microsoft to detail each party’s role in regard to security and privacy provisions under HIPAA and HITECH.

HITRUST: The Health Information Trust Alliance (HITRUST) created and maintains the Common Security Framework (CSF), a certifiable framework to help healthcare organizations and their providers demonstrate their security and compliance in a consistent and streamlined manner. Translator is HITRUST CSF certified.

PCI: Payment Credit Industry (PCI) is the global certification standard for organizations that store, process or transmit credit card data. Translator is certified as compliant under PCI DSS version 3.2 at Service Provider Level 1. 

SOC: The American Institute of Certified Public Accountants (AICPA) developed the Service Organization Controls (SOC) framework, a standard for controls that safeguard the confidentiality and privacy of information stored and processed in the cloud, primarily in regard to financial statements. Translator is SOC type 1, 2, and 3 compliant. 

US Department of Defense (DoD) Provisional Authorization: US DoD Provisional Authorization enables US federal government customers to deploy highly sensitive data on in-scope Microsoft government cloud services. Translator is rated at Impact Level 4 (IL4) in the government cloud. Impact Level 4 covers Controlled Unclassified Information and other mission-critical data. It may include data designated as For Official Use Only, Law Enforcement Sensitive, or Sensitive Security Information.

ISO: Translator is ISO certified with five certifications applicable to the service. The International Organization for Standardization (ISO) is an independent nongovernmental organization and the world’s largest developer of voluntary international standards. Translator’s ISO certifications demonstrate its commitment to providing a consistent and secure service. Translator’s ISO certifications are:

    • ISO 27001 Information Security Management Standards
    • ISO 9001:2015 Quality Management Systems Standards
    • 27018:2014 Code of Practice for Protecting Personal Data in the Cloud
    • 20000-1:2011: Information Technology Service Management
    • ISO 27017:2015: Code of Practice for Information Security Controls

The Translator service is subject to annual audits on all of its certifications to ensure the service continues to be compliant.

These standards force Microsoft to review every change to the live site with two employees, and to enforce minimal access to the runtime environment, as well as having processes in place to protect against external attacks on the data center hardware and software. The standards that Microsoft Translator is certified for, or compliant with, include specific ones for the financial industry and health care providers.

Different from the content submitted for translation, the documents the customer uses to train a custom system are stored on a Microsoft server. Microsoft doesn’t see the data and can’t use it for any purpose other than building the custom system. The customer can delete the custom system as well as the training data at any time, and there won’t be any residue of the training data on any Microsoft system after deletion, or after account expiration.

Translation in Microsoft’s other commercial products like Office, Dynamics, Teams, Yammer, SharePoint, and others follow the same data security rules described above.

Chris also mentioned that, "German customers have been very hesitant to recognize that trustworthy translation in the cloud is possible, for most of the time I have been working on Translator, and I am glad to see now that even the Germans are now warming up to the concept of using translation in the cloud." He pointed me to a VW case study where I found the following quote, and also rationale on the benefits of a cloud-centric translation service to a global enterprise that seeks to enable and enhance multilingual communication, collaboration and knowledge sharing. A deciding factor for the team responsible at VW [in selecting Microsoft] was that none of the data – translation memories, documents to be translated, and trained models – was to leave the European Union (EU) for data protection reasons.

“Ultimately, we expect the Azure environment to provide the same data security as our internal translation portal has offered thus far,”

Tibor Farkas, Head of IT Cloud at Volkswagen

Chris closed with a compelling statement, pointing to the biggest data security problem that exists in business translation: incompetent implementation. Cloud services properly implemented can be as secure as any connected on-premise solution, and in my opinion the greatest risk is often introduced by untrustworthy or careless translators who interact with MT systems, or incompetent IT staff that maintain an MT portal as the fiasco showed. . 

"Your readers may want to consider whether their own computing facilities are equally well secured against data grabbing and whether their language service provider is equally well audited and secured. It matters which cloud service you are using, and how the cloud service protects your data."

While I have not focused much on the speech-to-text issue in this post, we should understand that Microsoft also offers SOTA (state-of-the-art) speech-to-text capabilities and that the Skype and Phone app experience also gives them a leg up on speech-related applications that go across languages.

I also gathered some interesting information on the Microsoft Translator customization and adaptation capabilities and experience. I will write a separate post on that subject once I gather a little more information on the matter.

Tuesday, June 16, 2020

Understanding Machine Translation Quality & Risk Prediction

Much of the commentary that is available on the use of machine translation (MT) in the translation industry today, focuses heavily on assessing MT output quality, comparing TM vs. MT, and overall MTPE process management issues. However, the quality discussion still remains muddy and there are very few clear guidelines to make the use of MT consistently effective. There is much discussion about Edit Distance, post-editing productivity, and measurement processes like DQF, but much less discussion about understanding training and source corpus and developing strategies to make an MT engine produce better output.  While the use of  MT in localization use cases continues to expand as generic MT output quality improves, it is worth noting that MT use is much more likely to deliver greater value in use cases other than localization. 

It is estimated that trillions of words a day are translated by "free" MT portal across the world. This activity suggests the huge need for language translation that goes far beyond the normal focus of the translation and localization industry. While a large portion of this use is by consumers, there is a growing portion of this use that involves enterprise users. 

ROI on MT use cases outside of localization tends to be much higher 

The largest users of enterprise MT today tend to be focused on eCommerce and eDiscovery use cases. Alibaba, Amazon, and eBay translate billions of words a month. eDiscovery tends to focus on the following use cases where many thousands of documents and varied data sources have to be quickly reviewed and processed:
  • Cross-border litigation usually related to intellectual property violations, product liability, or contract disputes.
  • Pharmacovigilance (PV or PhV), also known as drug safety, is the pharmacological science relating to the collection, detection, assessment, monitoring, and prevention of adverse effects with pharmaceutical products (and pandemic-like diseases e.g. Covid-19 incident reports).
  • National Security and Law Enforcement Surveillance of social media and targeted internet activity to identify and expose bad actors involved with drugs, terrorism, and other criminal activities.
  • Corporate information governance and compliance.
  • Customer Experience (CX) related communications and content.

Thus, as the use of MT becomes more strategic and pervasive we also see a need for new kinds of tools and capabilities that can assist in the optimization process, This is a guest post by Adam Bittlingmayer, a co-founder of ModelFront who is a developer of a new breed of machine learning-driven tools that assist and enhance MT initiatives across all the use cases described above. ModelFront describes what they do as: In research terms, we've built "massively multilingual black-box deep learning models for quality estimation, quality evaluation, and filtering", and productized it.

I met with @Adam Bittlingmayer, co-founder of @ModelFront, to talk about predicting translation risk and share our experience automating translation at scale at giants like Facebook and Google and startups like ModelFront who is developing specialized capabilities to make MT translation risk prediction more efficient and effective.

The tools they provide are valuable in the development of better MT engines by doing the following:
  • Filtering parallel corpora used in training MT engines
  • Comparison of MT engines with detailed error profiles
  • Rapid and more comprehensive translation error detection & correction capabilities
  • Enhanced man-machine collaboration infrastructure that is particularly useful in high volume MT use cases

In his post, Adam provides an explanation of some important terms that are often conflated and also gives you a feel for MT development from the perspective of one who has done this at scale for Google. Capabilities like these can help developers add value to any MT initiative and these go way beyond the simple data cleaning routines that many LSPs use. From my understanding, these tools can help good engines get better, but they are not magic that can suddenly improve shoddy engines that many LSPs continue to build. Adam provides some comparisons with leading localization industry tools so that a reader can better understand the capabilities and focus of the ModelFront capabilities.

A very key characteristic of the ModelFront platform, beyond the scalability, is the control given to implement high levels of production automation. These processes can be embedded and integrated into MT development and production pipelines to enable better MT outcomes in a much larger range of use cases. Although ModelFront is already being used in Localization MTPE use cases, in a highly automated and integrated translation management workflow, I believe the real potential for added value is beyond the typical localization purview. 

An example of how this kind of technology can add value even in early experimentation can be seen in recent research by Amazon to use quality estimation for subtitling. What they found by categorizing subtitle translations into the three categories: Good translations which are fine as is and need no further improvement, loose translations which may require human post-edits and bad translations which need a “complete rewrite.

The researchers worked with 30,000 video subtitle files in English and their corresponding translations in French, German, Italian, Portuguese and Spanish for their experiments. They found that their DeepSubQE model was accurate in its estimations more than 91% of the time for all five languages. 
This would mean that human efforts can be focused on a much smaller set of data and thus yield a much better overall quality in less time.


Confidence scoring, quality estimation, and risk prediction

What is the difference between a quality estimation, a confidence score, and translation risk prediction?

Microsoft, Unbabel, and Memsource use the research standard quality estimation, while Smartling calls its feature a quality confidence score, and there are even research papers talking about confidence estimation or error prediction, and tools that use a fuzzy match percentage. Parallel data filtering approaches like Zipporah or LASER use quality score or similarity score.

ModelFront uses risk prediction.

They're overlapping concepts and often used interchangeably - the distinctions are as much about tradition, convention, and use case as about inputs and outputs. They are all basically a score from 0.0 to 1.0 or 0% to 100%, at sequence-level precision or greater. Unlike BLEU, HTER, METEOR, or WER, they do not require a golden human reference translation.

We're interested in language, so we know the nuances in naming are important.

Confidence scoring

A machine translation confidence score is typically used for a machine translation's own bet about its own quality on the input sequence. A higher score correlates with higher quality.

It is typically based on internal variables of the translation system - a so-called glassbox approach. So it can't be used to compare systems or to assess human translation or translation memory matches.

Quality estimation

A machine translation quality estimate is based on a sequence pair - the source text and the translation text. Like a confidence score, a higher score correlates with higher quality.

It implies a pure supervised black-box approach, where the system learns from labeled data at training time but knows nothing about how the translation was produced at run time. It also implies the scoring of machine translation only.

This term is used in research literature and conferences, like the WMT shared task and is also the most common term in the context of the approach pioneered at Unbabel and Microsoft - safely auto-approving raw machine translation for as many segments as possible.

It's often contrasted with quality evaluation - a corpus-level score.

In practice, usage varies - researchers do talk about unsupervised and glassbox approaches to quality estimation, and about word-level quality estimation, and there's no reason that quality estimation could not be used for more tasks, like quality evaluation or parallel data filtering.

Risk prediction

A translation risk prediction is also based on a sequence pair - the source text and the translation text. A higher score correlates with a higher risk.

Like quality estimation, it implies a pure black-box approach. Unlike quality estimation, it can also be used for everything from parallel data filtering to quality assurance of human translation to corpus- or system-level quality evaluation.

Why did we introduce yet another name? Risk prediction is the term used at ModelFront because it's the most correct and it's what clients actually want, across all use cases.

Often it's impossible to say if a translation is of high quality or low quality because the input sequence is ambiguous or noisy. When English Apple is translated to Spanish as Manzana or to Apple, it makes no sense to say that both are low quality or medium quality - one of them is probably perfect. But it does make sense to say that, without more context, both are risky.

We also wanted our approach to explicitly break away from quality estimation's focus on post-editing distance or effort and CAT tools' focus on rules-based translation memory matching, and to be future-proof as use cases and technology evolves.

ModelFront's risk prediction system will grow to include risk types and rich phrase- and word-level information.

Options for translation quality and risk

How to build or buy services, tools or technology for measuring translation quality and risk

Measuring quality and risk are fundamental to successful translation at scale. Both human and machine translation benefit from sentence-level and corpus-level metrics.

Metrics like BLEU are based on string distance to the human reference translations and cannot be used for new incoming translations, nor for the human reference translations themselves.

What are the options if you want to build or buy services, tools or technology for measuring the quality and risk of new translations?


Whether just an internal human evaluation in a spreadsheet, user-reported quality ratings, an analysis of translator post-editing productivity and effort, or full post-editing, professional human linguists and translators are the gold standard.

There is significant research on human evaluation methods, and quality frameworks like MQM-DQF and even quality management platforms like TAUS DQF and ContentQuo for standardizing and managing human evaluations, as well as translators and language service providers offering quality reviews or continuous human labeling.


Translation tools like Memsource, Smartling, and GlobalLink have features for automatically measuring quality bundled in their platforms. Memsource's feature is based on machine learning.


Xbench, Verifika, and LexiQA directly apply exhaustive, hand-crafted linguistic rules, configurations and translation memories to catch common translation errors, especially human translation errors.

They are integrated into existing tools, and their outputs are predictable and interpretable. LexiQA is unique in its partnerships with web-based translation tools and its API.

Open-source libraries

If you have the data and the machine learning team and want to build your own system based on machine learning, there is a growing set of open-source options.

The most notable quality estimation frameworks are OpenKiwi from Unbabel and DeepQuest from the research group led by Lucía Specia. Zipporah from Hainan Xu and Philipp Koehn is the best-known library for parallel data filtering.

The owners of those repositories are also key contributors to and co-organizers of the WMT shared tasks on Quality Estimation and Parallel Corpus Filtering.

Massively multilingual libraries and pre-trained models like LASER are a surprisingly effective unsupervised approach to parallel data filtering when combined with other techniques like language identification, regexes, and round-trip translation.

Internal systems

Unbabel, eBay, Microsoft, Amazon, Facebook, and others invest in in-house quality estimation research and development for their own use, mainly for the content that flows through their platforms at scale.

The main goal is to use raw machine translation for as much as possible, whether in  an efficient hybrid translation workflow for localization or customer service, or just to limit catastrophes on User and business-generated content that is machine translated by default.

Their approaches are based on machine learning.

Systems accessible as APIs, consoles or on-prem

ModelFront is the first and only API for translation risk prediction based on machine learning. With a few clicks or a few lines of code, you can access a production-strength system.

Our approach is developed fully in-house, extending ideas from the leading researchers in quality estimation and parallel data filtering, and from our own experience inside the leading machine translation provider.

We've productized it and made it accessible and useful to more players - enterprise localization teams, language service providers, platform and tool developers and machine translation researchers.

We have built security, scalability, and support for 100+ languages and 10K+ language pairs, locales, encodings, formatting, tags and file formats, integrations with the top machine translation API providers and automated customization into our APIs.

We provide our technology as an API and console for convenience, as well as on-prem deployments.

We continuously invest in curated parallel datasets and manually-labeled datasets and track emerging risk types as translation technology, use cases, and languages evolve.