Monday, June 29, 2020

Understanding Data Security with Microsoft Translator

In this time of the pandemic, many experts have pointed out that enterprises that have a broad and comprehensive digital presence are more likely to survive and thrive in these challenging times. The pandemic has showcased the value of digital operating models and is likely to force many companies to speed up their digital innovation and transformation initiatives. The digital transformation challenge for a global enterprise is even greater, as the need to share content and expand the enterprise's digital presence is massively multilingual, thus putting it beyond the reach of most localization departments who have a much narrower and much more limited focus. 

Thus, today we are seeing that truly global enterprises and agencies have a growing need to make large volumes of flowing content multilingual, to enable communication, problem resolution, collaboration, and knowledge sharing possible, within and without the organization. Most often this needs to be as close to real-time as possible. The greater the enterprise commitment to digital transformation, the greater the need, and urgency. Sophisticated, state-of-the-art machine translation enables multilingual communication and content sharing to happen at scale across many languages in real-time, and is thus becoming an increasingly important core component of enterprise information technology infrastructure. Enterprise tailored MT is now increasingly a must-have for the digitally agile global enterprise.

However, MT is extremely complex and is best handled by focused, well funded, and committed experts who build unique competence over many years of experience. Many in the business translation world dabble with open source tools, and build mostly sub-optimal systems that do not reach the capabilities of generic public systems, and thus create friction and resistance from translators who are well aware of this shortcoming. MT system development still remains a challenge for even the biggest and brightest, and thus, in my opinion, is best left to committed experts.

Given the confidential, privileged and mission-critical nature of the content that is increasingly passing through MT systems today, the issue of data security and privacy is becoming a major factor in the selection of MT systems by enterprises concerned with being digitally agile, but who also wish to ensure that their confidential data is not used by MT technology providers to refine, train, and further improve their MT technology. 

While some believe that the only way to accomplish true security is by building your own on-premise MT systems, this task as I have often said, is best left to large companies with well-funded and long-term committed experts. Do-it-yourself (DIY) technology with open source options makes little sense if you don't really know, understand, and follow what you are doing with technology this complex.

It is my feeling that MT is also a technology that truly belongs in the cloud for enterprise use, and also usually makes more sense on mobile devices for consumer use. While in some rare cases, on-premise MT systems do make sense for truly massive scale users like national security government agencies (CIA, NSA) who can appropriate the resources to do it competently, for most commercial enterprise MT provides the greatest ROI when it is delivered and implemented in the cloud by an expert and focused team who do not have to re-invent the wheel. Customization on a robust and reliable expert MT foundation appears to be the optimal approach. MT is also a technology that is constantly evolving as new tools, algorithms, new data, and processes come to light to enable ongoing incremental improvements, and this too suggests that MT is better suited to cloud deployment. Neural MT requires relatively large computing resources, deep expertise, and significant data resources and management capabilities to be viable. All these factors point to MT best being a cloud-based deployment, as it essentially remains a work-in-progress, but I am aware that there are still many who disagree on this, and that the cloud versus on-premise issue is one where it is best to agree to disagree.

I recently sat down with Chris Wendt, Group Program Manager and others in his team responsible for Microsoft Translator services, including Bing Translator and Skype Translator. They also connect Microsoft’s research activities with its practical use in services and applications. My intent in our conversation was to specifically investigate, better understand, and clarify the MT data security issues and the many adaptation capabilities that they offer to enterprise customers, as I am aware that the actual facts are often misrepresented, misunderstood, or unclear to many potential users and customers.

Microsoft is a pioneer in the use of MT to serve the technical support information needs of a global customer base, and was the first to make massive support knowledge bases available in MT'd local language for their largest international markets. They were also very early users of Statistical MT (SMT) at scale (tens of millions of words translated for millions of users) and were building actively used systems around the same time that Language Weaver was commercializing SMT. Many of us are aware that the Microsoft Translator services are used both by large enterprises and many LSP agencies in the language services industry because of the relative ease of use, straightforward adaptation capabilities, and relatively low cost.  Among the public MT portals, Microsoft is second only to Google in terms of MT traffic, and their consumer platforms solutions on the web and mobile platforms are probably used by millions of users across the world on a daily basis.

It is important to differentiate between Microsoft’s consumer products and their commercial products when considering the data security policies that are in place when using their machine translation capabilities, as they are quite different.

Consumer Products: 

The consumer products are Bing, the Edge browser, and the Microsoft Translator app for the phone. These products run under the consumer terms of use, which make it possible for Microsoft to use the processed data for quality improvement purposes. Microsoft keeps a very small portion of the data, non-consecutive sentences, and without any information about the customer who submitted the translation. There is really nothing to learn from performing a translation. The value only comes when the data is annotated and then used as Test or Training data. The annotation is expensive, so there are only a few thousand sentences used per language every year, at most.

Some people read the consumer terms of use and assume the same applies to commercial enterprise products.

That is not the case.

Enterprise Products:

The Translator API is provided via an Azure subscription, which runs under the Azure terms of use. The Azure terms of use do not allow Microsoft to see any of the data being processed. Azure services generally run as a GDPR processor, and Translator ensures compliance by not ever writing translated content to persistent storage.

The typical process flow for a submitted translation is as follows:

Decrypt > translate > encrypt > send back > and > forget.

The Translator API only allows encrypted access, to ensure data is safe in transit. When using the global endpoint, the request will be processed in the nearest available data-center. The customer can also control the specific processing location by choosing a geography-specific endpoint from ten locations which are described here

Microsoft Translator is certified for compliance with the GDPR processor and confidentiality rules. It is also compliant with all of the following:

CSA STAR: The Cloud Security Alliance (CSA) defines best practices to help ensure a more secure cloud computing environment, and to helping potential cloud customers make informed decisions when transitioning their IT operations to the cloud. The CSA published a suite of tools to assess cloud IT operations: the CSA Governance, Risk Management, and Compliance (GRC) Stack. It was designed to help cloud customers assess how cloud service providers follow industry best practices and standards and comply with regulations. Translator has received CSA STAR Attestation. 

FedRAMP: The US Federal Risk and Authorization Management Program (FedRAMP) attests that Microsoft Translator adheres to the security requirements needed for use by US government agencies. The US Office of Management and Budget requires all executive federal agencies to use FedRAMP to validate the security of cloud services. Translator is rated as FedRAMP High in both the Azure public cloud and the dedicated Azure Government cloud. 

GDPR: The General Data Protection Regulation (GDPR) is a European Union regulation regarding data protection and privacy for individuals within the European Union and the European Economic Area. Translator is GDPR compliant as a data processor.

HIPAA: The Translator service complies with the US Health Insurance Portability and Accountability Act (HIPAA) Health Information Technology for Economic and the Clinical Health (HITECH) Act, which governs how cloud services can handle personal health information. This ensures that health services can provide translations to clients knowing that personal data is kept private. Translator is included in Microsoft’s HIPAA Business Associate Agreement (BAA). Health care organizations can enter into the BAA with Microsoft to detail each party’s role in regard to security and privacy provisions under HIPAA and HITECH.

HITRUST: The Health Information Trust Alliance (HITRUST) created and maintains the Common Security Framework (CSF), a certifiable framework to help healthcare organizations and their providers demonstrate their security and compliance in a consistent and streamlined manner. Translator is HITRUST CSF certified.

PCI: Payment Credit Industry (PCI) is the global certification standard for organizations that store, process or transmit credit card data. Translator is certified as compliant under PCI DSS version 3.2 at Service Provider Level 1. 

SOC: The American Institute of Certified Public Accountants (AICPA) developed the Service Organization Controls (SOC) framework, a standard for controls that safeguard the confidentiality and privacy of information stored and processed in the cloud, primarily in regard to financial statements. Translator is SOC type 1, 2, and 3 compliant. 

US Department of Defense (DoD) Provisional Authorization: US DoD Provisional Authorization enables US federal government customers to deploy highly sensitive data on in-scope Microsoft government cloud services. Translator is rated at Impact Level 4 (IL4) in the government cloud. Impact Level 4 covers Controlled Unclassified Information and other mission-critical data. It may include data designated as For Official Use Only, Law Enforcement Sensitive, or Sensitive Security Information.

ISO: Translator is ISO certified with five certifications applicable to the service. The International Organization for Standardization (ISO) is an independent nongovernmental organization and the world’s largest developer of voluntary international standards. Translator’s ISO certifications demonstrate its commitment to providing a consistent and secure service. Translator’s ISO certifications are:

    • ISO 27001 Information Security Management Standards
    • ISO 9001:2015 Quality Management Systems Standards
    • 27018:2014 Code of Practice for Protecting Personal Data in the Cloud
    • 20000-1:2011: Information Technology Service Management
    • ISO 27017:2015: Code of Practice for Information Security Controls

The Translator service is subject to annual audits on all of its certifications to ensure the service continues to be compliant.

These standards force Microsoft to review every change to the live site with two employees, and to enforce minimal access to the runtime environment, as well as having processes in place to protect against external attacks on the data center hardware and software. The standards that Microsoft Translator is certified for, or compliant with, include specific ones for the financial industry and health care providers.

Different from the content submitted for translation, the documents the customer uses to train a custom system are stored on a Microsoft server. Microsoft doesn’t see the data and can’t use it for any purpose other than building the custom system. The customer can delete the custom system as well as the training data at any time, and there won’t be any residue of the training data on any Microsoft system after deletion, or after account expiration.

Translation in Microsoft’s other commercial products like Office, Dynamics, Teams, Yammer, SharePoint, and others follow the same data security rules described above.

Chris also mentioned that, "German customers have been very hesitant to recognize that trustworthy translation in the cloud is possible, for most of the time I have been working on Translator, and I am glad to see now that even the Germans are now warming up to the concept of using translation in the cloud." He pointed me to a VW case study where I found the following quote, and also rationale on the benefits of a cloud-centric translation service to a global enterprise that seeks to enable and enhance multilingual communication, collaboration and knowledge sharing. A deciding factor for the team responsible at VW [in selecting Microsoft] was that none of the data – translation memories, documents to be translated, and trained models – was to leave the European Union (EU) for data protection reasons.

“Ultimately, we expect the Azure environment to provide the same data security as our internal translation portal has offered thus far,”

Tibor Farkas, Head of IT Cloud at Volkswagen

Chris closed with a compelling statement, pointing to the biggest data security problem that exists in business translation: incompetent implementation. Cloud services properly implemented can be as secure as any connected on-premise solution, and in my opinion the greatest risk is often introduced by untrustworthy or careless translators who interact with MT systems, or incompetent IT staff that maintain an MT portal as the fiasco showed. . 

"Your readers may want to consider whether their own computing facilities are equally well secured against data grabbing and whether their language service provider is equally well audited and secured. It matters which cloud service you are using, and how the cloud service protects your data."

While I have not focused much on the speech-to-text issue in this post, we should understand that Microsoft also offers SOTA (state-of-the-art) speech-to-text capabilities and that the Skype and Phone app experience also gives them a leg up on speech-related applications that go across languages.

I also gathered some interesting information on the Microsoft Translator customization and adaptation capabilities and experience. I will write a separate post on that subject once I gather a little more information on the matter.

Tuesday, June 16, 2020

Understanding Machine Translation Quality & Risk Prediction

Much of the commentary that is available on the use of machine translation (MT) in the translation industry today, focuses heavily on assessing MT output quality, comparing TM vs. MT, and overall MTPE process management issues. However, the quality discussion still remains muddy and there are very few clear guidelines to make the use of MT consistently effective. There is much discussion about Edit Distance, post-editing productivity, and measurement processes like DQF, but much less discussion about understanding training and source corpus and developing strategies to make an MT engine produce better output.  While the use of  MT in localization use cases continues to expand as generic MT output quality improves, it is worth noting that MT use is much more likely to deliver greater value in use cases other than localization. 

It is estimated that trillions of words a day are translated by "free" MT portal across the world. This activity suggests the huge need for language translation that goes far beyond the normal focus of the translation and localization industry. While a large portion of this use is by consumers, there is a growing portion of this use that involves enterprise users. 

ROI on MT use cases outside of localization tends to be much higher 

The largest users of enterprise MT today tend to be focused on eCommerce and eDiscovery use cases. Alibaba, Amazon, and eBay translate billions of words a month. eDiscovery tends to focus on the following use cases where many thousands of documents and varied data sources have to be quickly reviewed and processed:
  • Cross-border litigation usually related to intellectual property violations, product liability, or contract disputes.
  • Pharmacovigilance (PV or PhV), also known as drug safety, is the pharmacological science relating to the collection, detection, assessment, monitoring, and prevention of adverse effects with pharmaceutical products (and pandemic-like diseases e.g. Covid-19 incident reports).
  • National Security and Law Enforcement Surveillance of social media and targeted internet activity to identify and expose bad actors involved with drugs, terrorism, and other criminal activities.
  • Corporate information governance and compliance.
  • Customer Experience (CX) related communications and content.

Thus, as the use of MT becomes more strategic and pervasive we also see a need for new kinds of tools and capabilities that can assist in the optimization process, This is a guest post by Adam Bittlingmayer, a co-founder of ModelFront who is a developer of a new breed of machine learning-driven tools that assist and enhance MT initiatives across all the use cases described above. ModelFront describes what they do as: In research terms, we've built "massively multilingual black-box deep learning models for quality estimation, quality evaluation, and filtering", and productized it.

I met with @Adam Bittlingmayer, co-founder of @ModelFront, to talk about predicting translation risk and share our experience automating translation at scale at giants like Facebook and Google and startups like ModelFront who is developing specialized capabilities to make MT translation risk prediction more efficient and effective.

The tools they provide are valuable in the development of better MT engines by doing the following:
  • Filtering parallel corpora used in training MT engines
  • Comparison of MT engines with detailed error profiles
  • Rapid and more comprehensive translation error detection & correction capabilities
  • Enhanced man-machine collaboration infrastructure that is particularly useful in high volume MT use cases

In his post, Adam provides an explanation of some important terms that are often conflated and also gives you a feel for MT development from the perspective of one who has done this at scale for Google. Capabilities like these can help developers add value to any MT initiative and these go way beyond the simple data cleaning routines that many LSPs use. From my understanding, these tools can help good engines get better, but they are not magic that can suddenly improve shoddy engines that many LSPs continue to build. Adam provides some comparisons with leading localization industry tools so that a reader can better understand the capabilities and focus of the ModelFront capabilities.

A very key characteristic of the ModelFront platform, beyond the scalability, is the control given to implement high levels of production automation. These processes can be embedded and integrated into MT development and production pipelines to enable better MT outcomes in a much larger range of use cases. Although ModelFront is already being used in Localization MTPE use cases, in a highly automated and integrated translation management workflow, I believe the real potential for added value is beyond the typical localization purview. 

An example of how this kind of technology can add value even in early experimentation can be seen in recent research by Amazon to use quality estimation for subtitling. What they found by categorizing subtitle translations into the three categories: Good translations which are fine as is and need no further improvement, loose translations which may require human post-edits and bad translations which need a “complete rewrite.

The researchers worked with 30,000 video subtitle files in English and their corresponding translations in French, German, Italian, Portuguese and Spanish for their experiments. They found that their DeepSubQE model was accurate in its estimations more than 91% of the time for all five languages. 
This would mean that human efforts can be focused on a much smaller set of data and thus yield a much better overall quality in less time.


Confidence scoring, quality estimation, and risk prediction

What is the difference between a quality estimation, a confidence score, and translation risk prediction?

Microsoft, Unbabel, and Memsource use the research standard quality estimation, while Smartling calls its feature a quality confidence score, and there are even research papers talking about confidence estimation or error prediction, and tools that use a fuzzy match percentage. Parallel data filtering approaches like Zipporah or LASER use quality score or similarity score.

ModelFront uses risk prediction.

They're overlapping concepts and often used interchangeably - the distinctions are as much about tradition, convention, and use case as about inputs and outputs. They are all basically a score from 0.0 to 1.0 or 0% to 100%, at sequence-level precision or greater. Unlike BLEU, HTER, METEOR, or WER, they do not require a golden human reference translation.

We're interested in language, so we know the nuances in naming are important.

Confidence scoring

A machine translation confidence score is typically used for a machine translation's own bet about its own quality on the input sequence. A higher score correlates with higher quality.

It is typically based on internal variables of the translation system - a so-called glassbox approach. So it can't be used to compare systems or to assess human translation or translation memory matches.

Quality estimation

A machine translation quality estimate is based on a sequence pair - the source text and the translation text. Like a confidence score, a higher score correlates with higher quality.

It implies a pure supervised black-box approach, where the system learns from labeled data at training time but knows nothing about how the translation was produced at run time. It also implies the scoring of machine translation only.

This term is used in research literature and conferences, like the WMT shared task and is also the most common term in the context of the approach pioneered at Unbabel and Microsoft - safely auto-approving raw machine translation for as many segments as possible.

It's often contrasted with quality evaluation - a corpus-level score.

In practice, usage varies - researchers do talk about unsupervised and glassbox approaches to quality estimation, and about word-level quality estimation, and there's no reason that quality estimation could not be used for more tasks, like quality evaluation or parallel data filtering.

Risk prediction

A translation risk prediction is also based on a sequence pair - the source text and the translation text. A higher score correlates with a higher risk.

Like quality estimation, it implies a pure black-box approach. Unlike quality estimation, it can also be used for everything from parallel data filtering to quality assurance of human translation to corpus- or system-level quality evaluation.

Why did we introduce yet another name? Risk prediction is the term used at ModelFront because it's the most correct and it's what clients actually want, across all use cases.

Often it's impossible to say if a translation is of high quality or low quality because the input sequence is ambiguous or noisy. When English Apple is translated to Spanish as Manzana or to Apple, it makes no sense to say that both are low quality or medium quality - one of them is probably perfect. But it does make sense to say that, without more context, both are risky.

We also wanted our approach to explicitly break away from quality estimation's focus on post-editing distance or effort and CAT tools' focus on rules-based translation memory matching, and to be future-proof as use cases and technology evolves.

ModelFront's risk prediction system will grow to include risk types and rich phrase- and word-level information.

Options for translation quality and risk

How to build or buy services, tools or technology for measuring translation quality and risk

Measuring quality and risk are fundamental to successful translation at scale. Both human and machine translation benefit from sentence-level and corpus-level metrics.

Metrics like BLEU are based on string distance to the human reference translations and cannot be used for new incoming translations, nor for the human reference translations themselves.

What are the options if you want to build or buy services, tools or technology for measuring the quality and risk of new translations?


Whether just an internal human evaluation in a spreadsheet, user-reported quality ratings, an analysis of translator post-editing productivity and effort, or full post-editing, professional human linguists and translators are the gold standard.

There is significant research on human evaluation methods, and quality frameworks like MQM-DQF and even quality management platforms like TAUS DQF and ContentQuo for standardizing and managing human evaluations, as well as translators and language service providers offering quality reviews or continuous human labeling.


Translation tools like Memsource, Smartling, and GlobalLink have features for automatically measuring quality bundled in their platforms. Memsource's feature is based on machine learning.


Xbench, Verifika, and LexiQA directly apply exhaustive, hand-crafted linguistic rules, configurations and translation memories to catch common translation errors, especially human translation errors.

They are integrated into existing tools, and their outputs are predictable and interpretable. LexiQA is unique in its partnerships with web-based translation tools and its API.

Open-source libraries

If you have the data and the machine learning team and want to build your own system based on machine learning, there is a growing set of open-source options.

The most notable quality estimation frameworks are OpenKiwi from Unbabel and DeepQuest from the research group led by LucĂ­a Specia. Zipporah from Hainan Xu and Philipp Koehn is the best-known library for parallel data filtering.

The owners of those repositories are also key contributors to and co-organizers of the WMT shared tasks on Quality Estimation and Parallel Corpus Filtering.

Massively multilingual libraries and pre-trained models like LASER are a surprisingly effective unsupervised approach to parallel data filtering when combined with other techniques like language identification, regexes, and round-trip translation.

Internal systems

Unbabel, eBay, Microsoft, Amazon, Facebook, and others invest in in-house quality estimation research and development for their own use, mainly for the content that flows through their platforms at scale.

The main goal is to use raw machine translation for as much as possible, whether in  an efficient hybrid translation workflow for localization or customer service, or just to limit catastrophes on User and business-generated content that is machine translated by default.

Their approaches are based on machine learning.

Systems accessible as APIs, consoles or on-prem

ModelFront is the first and only API for translation risk prediction based on machine learning. With a few clicks or a few lines of code, you can access a production-strength system.

Our approach is developed fully in-house, extending ideas from the leading researchers in quality estimation and parallel data filtering, and from our own experience inside the leading machine translation provider.

We've productized it and made it accessible and useful to more players - enterprise localization teams, language service providers, platform and tool developers and machine translation researchers.

We have built security, scalability, and support for 100+ languages and 10K+ language pairs, locales, encodings, formatting, tags and file formats, integrations with the top machine translation API providers and automated customization into our APIs.

We provide our technology as an API and console for convenience, as well as on-prem deployments.

We continuously invest in curated parallel datasets and manually-labeled datasets and track emerging risk types as translation technology, use cases, and languages evolve.