Monday, June 29, 2020

Understanding Data Security with Microsoft Translator

In this time of the pandemic, many experts have pointed out that enterprises that have a broad and comprehensive digital presence are more likely to survive and thrive in these challenging times. The pandemic has showcased the value of digital operating models and is likely to force many companies to speed up their digital innovation and transformation initiatives. The digital transformation challenge for a global enterprise is even greater, as the need to share content and expand the enterprise's digital presence is massively multilingual, thus putting it beyond the reach of most localization departments who have a much narrower and much more limited focus. 

Thus, today we are seeing that truly global enterprises and agencies have a growing need to make large volumes of flowing content multilingual, to enable communication, problem resolution, collaboration, and knowledge sharing possible, within and without the organization. Most often this needs to be as close to real-time as possible. The greater the enterprise commitment to digital transformation, the greater the need, and urgency. Sophisticated, state-of-the-art machine translation enables multilingual communication and content sharing to happen at scale across many languages in real-time, and is thus becoming an increasingly important core component of enterprise information technology infrastructure. Enterprise tailored MT is now increasingly a must-have for the digitally agile global enterprise.

However, MT is extremely complex and is best handled by focused, well funded, and committed experts who build unique competence over many years of experience. Many in the business translation world dabble with open source tools, and build mostly sub-optimal systems that do not reach the capabilities of generic public systems, and thus create friction and resistance from translators who are well aware of this shortcoming. MT system development still remains a challenge for even the biggest and brightest, and thus, in my opinion, is best left to committed experts.

Given the confidential, privileged and mission-critical nature of the content that is increasingly passing through MT systems today, the issue of data security and privacy is becoming a major factor in the selection of MT systems by enterprises concerned with being digitally agile, but who also wish to ensure that their confidential data is not used by MT technology providers to refine, train, and further improve their MT technology. 

While some believe that the only way to accomplish true security is by building your own on-premise MT systems, this task as I have often said, is best left to large companies with well-funded and long-term committed experts. Do-it-yourself (DIY) technology with open source options makes little sense if you don't really know, understand, and follow what you are doing with technology this complex.

It is my feeling that MT is also a technology that truly belongs in the cloud for enterprise use, and also usually makes more sense on mobile devices for consumer use. While in some rare cases, on-premise MT systems do make sense for truly massive scale users like national security government agencies (CIA, NSA) who can appropriate the resources to do it competently, for most commercial enterprise MT provides the greatest ROI when it is delivered and implemented in the cloud by an expert and focused team who do not have to re-invent the wheel. Customization on a robust and reliable expert MT foundation appears to be the optimal approach. MT is also a technology that is constantly evolving as new tools, algorithms, new data, and processes come to light to enable ongoing incremental improvements, and this too suggests that MT is better suited to cloud deployment. Neural MT requires relatively large computing resources, deep expertise, and significant data resources and management capabilities to be viable. All these factors point to MT best being a cloud-based deployment, as it essentially remains a work-in-progress, but I am aware that there are still many who disagree on this, and that the cloud versus on-premise issue is one where it is best to agree to disagree.

I recently sat down with Chris Wendt, Group Program Manager and others in his team responsible for Microsoft Translator services, including Bing Translator and Skype Translator. They also connect Microsoft’s research activities with its practical use in services and applications. My intent in our conversation was to specifically investigate, better understand, and clarify the MT data security issues and the many adaptation capabilities that they offer to enterprise customers, as I am aware that the actual facts are often misrepresented, misunderstood, or unclear to many potential users and customers.

Microsoft is a pioneer in the use of MT to serve the technical support information needs of a global customer base, and was the first to make massive support knowledge bases available in MT'd local language for their largest international markets. They were also very early users of Statistical MT (SMT) at scale (tens of millions of words translated for millions of users) and were building actively used systems around the same time that Language Weaver was commercializing SMT. Many of us are aware that the Microsoft Translator services are used both by large enterprises and many LSP agencies in the language services industry because of the relative ease of use, straightforward adaptation capabilities, and relatively low cost.  Among the public MT portals, Microsoft is second only to Google in terms of MT traffic, and their consumer platforms solutions on the web and mobile platforms are probably used by millions of users across the world on a daily basis.

It is important to differentiate between Microsoft’s consumer products and their commercial products when considering the data security policies that are in place when using their machine translation capabilities, as they are quite different.

Consumer Products: 

The consumer products are Bing, the Edge browser, and the Microsoft Translator app for the phone. These products run under the consumer terms of use, which make it possible for Microsoft to use the processed data for quality improvement purposes. Microsoft keeps a very small portion of the data, non-consecutive sentences, and without any information about the customer who submitted the translation. There is really nothing to learn from performing a translation. The value only comes when the data is annotated and then used as Test or Training data. The annotation is expensive, so there are only a few thousand sentences used per language every year, at most.

Some people read the consumer terms of use and assume the same applies to commercial enterprise products.

That is not the case.

Enterprise Products:

The Translator API is provided via an Azure subscription, which runs under the Azure terms of use. The Azure terms of use do not allow Microsoft to see any of the data being processed. Azure services generally run as a GDPR processor, and Translator ensures compliance by not ever writing translated content to persistent storage.

The typical process flow for a submitted translation is as follows:

Decrypt > translate > encrypt > send back > and > forget.

The Translator API only allows encrypted access, to ensure data is safe in transit. When using the global endpoint, the request will be processed in the nearest available data-center. The customer can also control the specific processing location by choosing a geography-specific endpoint from ten locations which are described here

Microsoft Translator is certified for compliance with the GDPR processor and confidentiality rules. It is also compliant with all of the following:

CSA STAR: The Cloud Security Alliance (CSA) defines best practices to help ensure a more secure cloud computing environment, and to helping potential cloud customers make informed decisions when transitioning their IT operations to the cloud. The CSA published a suite of tools to assess cloud IT operations: the CSA Governance, Risk Management, and Compliance (GRC) Stack. It was designed to help cloud customers assess how cloud service providers follow industry best practices and standards and comply with regulations. Translator has received CSA STAR Attestation. 

FedRAMP: The US Federal Risk and Authorization Management Program (FedRAMP) attests that Microsoft Translator adheres to the security requirements needed for use by US government agencies. The US Office of Management and Budget requires all executive federal agencies to use FedRAMP to validate the security of cloud services. Translator is rated as FedRAMP High in both the Azure public cloud and the dedicated Azure Government cloud. 

GDPR: The General Data Protection Regulation (GDPR) is a European Union regulation regarding data protection and privacy for individuals within the European Union and the European Economic Area. Translator is GDPR compliant as a data processor.

HIPAA: The Translator service complies with the US Health Insurance Portability and Accountability Act (HIPAA) Health Information Technology for Economic and the Clinical Health (HITECH) Act, which governs how cloud services can handle personal health information. This ensures that health services can provide translations to clients knowing that personal data is kept private. Translator is included in Microsoft’s HIPAA Business Associate Agreement (BAA). Health care organizations can enter into the BAA with Microsoft to detail each party’s role in regard to security and privacy provisions under HIPAA and HITECH.

HITRUST: The Health Information Trust Alliance (HITRUST) created and maintains the Common Security Framework (CSF), a certifiable framework to help healthcare organizations and their providers demonstrate their security and compliance in a consistent and streamlined manner. Translator is HITRUST CSF certified.

PCI: Payment Credit Industry (PCI) is the global certification standard for organizations that store, process or transmit credit card data. Translator is certified as compliant under PCI DSS version 3.2 at Service Provider Level 1. 

SOC: The American Institute of Certified Public Accountants (AICPA) developed the Service Organization Controls (SOC) framework, a standard for controls that safeguard the confidentiality and privacy of information stored and processed in the cloud, primarily in regard to financial statements. Translator is SOC type 1, 2, and 3 compliant. 

US Department of Defense (DoD) Provisional Authorization: US DoD Provisional Authorization enables US federal government customers to deploy highly sensitive data on in-scope Microsoft government cloud services. Translator is rated at Impact Level 4 (IL4) in the government cloud. Impact Level 4 covers Controlled Unclassified Information and other mission-critical data. It may include data designated as For Official Use Only, Law Enforcement Sensitive, or Sensitive Security Information.

ISO: Translator is ISO certified with five certifications applicable to the service. The International Organization for Standardization (ISO) is an independent nongovernmental organization and the world’s largest developer of voluntary international standards. Translator’s ISO certifications demonstrate its commitment to providing a consistent and secure service. Translator’s ISO certifications are:

    • ISO 27001 Information Security Management Standards
    • ISO 9001:2015 Quality Management Systems Standards
    • 27018:2014 Code of Practice for Protecting Personal Data in the Cloud
    • 20000-1:2011: Information Technology Service Management
    • ISO 27017:2015: Code of Practice for Information Security Controls

The Translator service is subject to annual audits on all of its certifications to ensure the service continues to be compliant.

These standards force Microsoft to review every change to the live site with two employees, and to enforce minimal access to the runtime environment, as well as having processes in place to protect against external attacks on the data center hardware and software. The standards that Microsoft Translator is certified for, or compliant with, include specific ones for the financial industry and health care providers.

Different from the content submitted for translation, the documents the customer uses to train a custom system are stored on a Microsoft server. Microsoft doesn’t see the data and can’t use it for any purpose other than building the custom system. The customer can delete the custom system as well as the training data at any time, and there won’t be any residue of the training data on any Microsoft system after deletion, or after account expiration.

Translation in Microsoft’s other commercial products like Office, Dynamics, Teams, Yammer, SharePoint, and others follow the same data security rules described above.

Chris also mentioned that, "German customers have been very hesitant to recognize that trustworthy translation in the cloud is possible, for most of the time I have been working on Translator, and I am glad to see now that even the Germans are now warming up to the concept of using translation in the cloud." He pointed me to a VW case study where I found the following quote, and also rationale on the benefits of a cloud-centric translation service to a global enterprise that seeks to enable and enhance multilingual communication, collaboration and knowledge sharing. A deciding factor for the team responsible at VW [in selecting Microsoft] was that none of the data – translation memories, documents to be translated, and trained models – was to leave the European Union (EU) for data protection reasons.

“Ultimately, we expect the Azure environment to provide the same data security as our internal translation portal has offered thus far,”

Tibor Farkas, Head of IT Cloud at Volkswagen

Chris closed with a compelling statement, pointing to the biggest data security problem that exists in business translation: incompetent implementation. Cloud services properly implemented can be as secure as any connected on-premise solution, and in my opinion the greatest risk is often introduced by untrustworthy or careless translators who interact with MT systems, or incompetent IT staff that maintain an MT portal as the fiasco showed. . 

"Your readers may want to consider whether their own computing facilities are equally well secured against data grabbing and whether their language service provider is equally well audited and secured. It matters which cloud service you are using, and how the cloud service protects your data."

While I have not focused much on the speech-to-text issue in this post, we should understand that Microsoft also offers SOTA (state-of-the-art) speech-to-text capabilities and that the Skype and Phone app experience also gives them a leg up on speech-related applications that go across languages.

I also gathered some interesting information on the Microsoft Translator customization and adaptation capabilities and experience. I will write a separate post on that subject once I gather a little more information on the matter.


  1. In qualifying translated document security, many people distinguish between free and paid services. But if I understand this post correctly, you seem to be stating that by accessing the Translate option in Office applications, our data is as secure as it is in using the paid Azure service. Is this true?

    1. I believe if the Translate Option is accessing Translate through a paid Azure service this is true - I don't think this is true if you just run translation on a PC that has not been setup to use the Private option.