Friday, December 27, 2019

The Issue of Data Security and Machine Translation

As the world becomes more digital and the volume of mission-critical data flows continue to expand, it is becoming increasingly important for global enterprises to adapt to the rapid globalization, and the increasingly digital-first world we live in. As organizations change the way they operate, generate revenue and create value for their customers, new compliance risks are emerging — presenting a challenge to compliance, which must proactively monitor, identify, assess and mitigate risks like those tied to fundamentally new technologies and processes. Digital transformation is driven and enabled by data, and thus the value of data security and governance also rise in importance and organizational impact. At the WEF forum in Davos, CEOs have identified cybersecurity and data privacy as two of the most pressing issues of the day, and even regard breakdown with these issues as a general threat to enterprise, society, and government in general.
While C-level executives understand the need for cybersecurity as their organizations undergo digital transformation, they aren’t prioritizing it enough, according to a recent Deloitte report based on a survey of 500 executives. The report, “The Future of Cyber Survey 2019,” reveals that there is a disconnect between organizational aspirations for a “digital everywhere” future, and their actual cyber posture. Those surveyed view digital transformation as one of the most challenging aspects of cyber risk management, and yet indicated that less than 10% of cyber budgets are allocated to these digital transformation efforts. The report goes on to say that this larger cyber awareness is at the center of digital transformation. Understanding that is as transformative as cyber itself—and to be successful in this new era, organizations should embrace a “cyber everywhere” reality.

Cybersecurity breakdowns and data breach statistics

Are these growing concerns about cybersecurity justified? It certainly seems so when we consider these facts:
  • A global survey in 2018 by CyberEdge across 17 countries and 20 industries found that 78% of respondents had experienced a network breach.
  • The ISACA survey  of cybersecurity professionals points out that it is increasingly difficult to recruit and retain technically adept cybersecurity professionals. They also found that 50% of cyber pros believe that most organizations underreport cybercrime even if they are required to report it, and 60% said they expected at least one attack within the next year.
  • Radware estimates that an average cyber-attack in 2018 costs an enterprise around $1.67M. The costs can be significantly higher, e.g. a breach at Maersk is estimated to have cost around $250 - $300 million, because of the brand damage, loss of productivity, loss of profitability, falling stock prices, and other negative business impacts in the wake of the breach.
  • Risk-Based Security reports that there were over 6500 data breaches and that more than 5 billion records were exposed in 2018. The situation is not better in 2019, and over 4 billion records were exposed in the first six months of 2019.
  • An IBM Security study revealed that the financial impact of data breaches on organizations. According to this study, the cost of a data breach has risen 12% over the past 5 years and now costs $3.92 million on average. The average cost of a data breach in the U.S. is $8.19 million, more than double the worldwide average.
As would be expected, with Hacking as the top breach type, attacks originating outside of the organization were also the most common threat source. However misconfigured services, data handling mistakes and other inadvertent exposure by authorized persons, exposed far more records than malicious actors were able to steal.

 Data security and cybersecurity in the legal profession

Third-party professional services firms are often a target for malicious attacks because of the possibility of acquiring high-value information is high. Records show that law firms relationships with third-party vendors are a frequent point of exposure to cyber breaches and accidental leaks. obtained a list of more than 100 law firms that had reported data breaches and estimate that even more are falling victim to this problem, but simply don’t report it to avoid scaring clients and minimize potential reputational damage.

Austin Berglas, former head of the FBI’s cyber branch in New York and now global head of professional services at cybersecurity company BlueVoyant, said law firms are a top target among hackers because of the extensive high-value client information they possess. Hackers understand that law firms are a “one-stop-shop” for sensitive and proprietary corporate information, merger & acquisitions related data, and emerging intellectual property information.

As custodians of highly sensitive information, law firms are inviting targets for hackers.

The American Bar Association reported in 2018 that 23% of firms had reported a breach at some point, up from 14% in 2016. Six percent of those breaches resulted in the exposure of sensitive client data. Legal documents have to pass through many hands as a matter of course, reams of sensitive information pass through the hands of lawyers and paralegals, and then they go through the process of being reviewed and signed by clients, clerks, opposing counsels, and judges. When they finally get to the location where records are stored, they are often inadvertently exposed to others—even firm outsiders—who shouldn’t have access to them at all.

A Logicforce legal industry score for cybersecurity health among law firms have increased from 54% in 2018 to 60% in 2019, but this is still lower than many other sectors. Increasingly clients are also asking for audits to ensure that security practices are current and robust. A recent ABA Formal Opinion states: “Indeed, the data security threat is so high that law enforcement officials regularly divide business entities into two categories: those that have been hacked and those that will be.

Lawyers are failing on cybersecurity, according to the American Bar Association Legal Technology Resource Center’s ABA TechReport 2019. “The lack of effort on security has become a major cause for concern in the profession.”

“A lot of firms have been hacked, and like most entities that are hacked, they don’t know that for some period of time. Sometimes, it may not be discovered for a minute or months and even years.” Vincent I. Polley, a lawyer, and co-author of a recent book on cybersecurity for the ABA.

As the volume of multilingual content explodes, a new risk emerges: public, “free” machine translation provided by large internet services firms who systematically harvest and store the data that passes through these “free” services.  With the significantly higher volumes of cross-border partnerships, globalization in general, and growth in international business, employee use of public MT has become a new source of confidential data leakage.

Public machine translation use and data security

In the modern era, it is estimated that on any given day, several trillion words are run through the many public machine translation options available across the internet today. This huge volume of translation is done largely by the average web consumer, but there is increasing evidence that a growing portion of this usage is emanating from the enterprise when urgent global customer, collaboration, and communication needs are involved. This happens because publicly available tools are essentially frictionless and require little “buy-in” from a user who doesn’t understand the data leakage implications.  The rapid rate of increase in globalization has resulted in a substantial and ever-growing volume of multilingual information that needs to be translated instantly as a matter of ongoing business practice. This is a significant risk for the global enterprise or law firm as this short video points out. Content transmitted for translation by users is clearly subject to terms of use agreements that entitle the MT provider to store, modify, reproduce, distribute, and create derivative works. At the very least this content is fodder for machine learning algorithms that could also potentially be hacked or expose data inadvertently.

Consider the following:
  • At the SDL Connect 2019 conference recently, a speaker from a major US semiconductor company described the use of public MT at his company. When this activity was carefully monitored by IT management, they found that as much as 3 to 5 GB of enterprise content was being cut and pasted into public MT portals for translation on a daily basis. Further analysis of the content revealed that the material submitted for translation included future product plans, customer problem-related communications, sensitive HR issues, and other confidential business process content.
  • In September 2017, the Norwegian news agency NRK reported data that they found that had been free translated on a site called Translate.Com that included “notices of dismissal, plans of workforce reductions and outsourcing, passwords, code information, and contracts”. This was yet another site that offered free translation, but reserved the right to examine the data submitted “to improve the service.” Subsequently, searches by Slator uncovered other highly sensitive data of both personal and corporate content.
  • A recent report from the Australian Strategic Policy Institute (ASPI) makes some claims about how China uses state-owned companies, which provide machine translation services, to collect data on users outside China. The author, Samantha Hoffman, argues that the most valuable tools in China’s data-collection campaign are technologies that users engage with for their own benefit; machine translation services being a prime example. This is done through a company called GTCOM, which Hoffman said describes itself as a “cross-language big data” business, offers hardware and software translation tools that collect data — lots of data. She estimated that GTCOM, which works with both corporate and government clients, handles the equivalent of up to five trillion words of plain text per day, across 65 languages and in over 200 countries. GTCOM is a subsidiary of a Chinese state-owned enterprise that the Central Propaganda Department directly supervises, and thus data collection is presumed to be an active and ongoing process.
After taking a close look at the enterprise market needs and the current realities of machine translation use we can summarize the situation as follows:
  • There is a growing need for always-available, and secure enterprise MT solutions to support the digitally-driven globalization that we see happening in so many industries today. In the absence of having such a secure solution available, we can expect that there will be substantial amounts of “rogue use” of public MT portals with resultant confidential data leakage risks.
  • The risks of using public MT portals are now beginning to be understood. The risk is not just related to inadvertent data leakage but is also closely tied to the various data security and privacy risks presented by submitting confidential content into the data-grabbing, machine learning infrastructure, that underlie these “free” MT portals. There is a growing list of US companies already subjected to GDPR-related EU regulatory actions, including, Amazon, Apple, Facebook, Google, Netflix, Spotify and Twitter. Experts have stated that Chinese companies are likely to be the next wave of regulatory enforcement, and the violators' list is expected to grow. 
  • The executive focus on digital transformation is likely to drive more attention to the concurrent cybersecurity implications of hyper-digitalization. Information Governance is likely to become much more of a mission-critical function as the digital footprint of the modern enterprise grows and becomes much more strategic.

 The legal market requirement: an end to end solution

Thus, we see today, having language-translation-at-scale capabilities have become imperative for the modern global enterprise.  The needs for translation can range from rapid translation of millions of documents in an eDiscovery or compliance scenario, to the very careful and specialized translation of critical contract and court-ready documentation on to an associate collaborating with colleagues from a foreign outpost. Daily communications in global matters are increasingly multilingual. Given the volume, variety, and velocity of the information that needs translation, legal professionals must consider translation solutions that involve both technology and human services. The requirements can vary greatly and can require different combinations of man-machine collaboration, that includes some or all of these different translation production models:
  • MT-Only for very high volumes like in eDiscovery, and daily communications
  • MT + Human Terminology Optimization
  • MT + Post-Editing
  • Specialized Expert Human Translation

Machine Translation: designed for the Enterprise

MT for the enterprise will need all of the following (and solutions are available from several MT vendors in the market). The author provides consulting services to select and develop optimal solutions :
  • Guaranteed data security & privacy
  • Flexible deployment options that include on-premise, cloud or a combination of both as dictated by usage needs
  • Broad range of adaptation and customization capabilities so that MT systems can be optimized for each individual client
  • Integration with primary enterprise IT infrastructure and software e.g. Office, Translation Management Systems, Relativity, and other eDiscovery platforms
  • Rest API that allows connectivity to any proprietary systems that you may employ. 
  • Broad range of expert consulting services both on the MT technology aspects and the linguistic issues
  • Tightly integrated with professional human translation services to handle end-to-end translation requirements.

This is a post that was originally published on SDL.COM in a modified form with more detail on SDL MT technology. 


  1. The United Kingdom announced July 8 its intention to levy a £183.39 million ($230 million) fine on British Airways in the aftermath of the airliner’s 2018 data breach that compromised 500,000 customers’ personal data.

    Although prospective British Airways clients were diverted to a fraudulent British Airways site where hackers siphoned personal data entered into the fake web page, the U.K.’s Information Commissioner’s Office found information was ultimately compromised by “poor security arrangements at the company.”

    The ICO’s press release noted unauthorized access or damage to personal data is more than an inconvenience.

    “When you are entrusted with personal data you must look after it,” said ICO Commissioner Elizabeth Denham in the press release. “Those that don’t will face scrutiny from my office to check they have taken appropriate steps to protect fundamental privacy rights.”

  2. A year prior Marriott announced it was the victim of a cyber incident when the information of up to 500 million guests was compromised. Less than a year after the breach, the ICO proposed a £99 million ($124 million) fine for alleged GDPR violations. The ICO specifically said Marriott failed to “undertake sufficient due diligence when it bought Starwood [in 2016] and should also have done more to secure its systems,” according to the regulator’s press statement.

  3. On Oct. 30, real estate company Deutsche Wohnen SE was issued a €14.5 million ($16 million) fine for GDPR violations.

    The Berlin commissioner for data protection and freedom of information alleged the company’s archive system for personal data of tenants held onto data longer than needed, according to the regulator’s press release.

    The data included tenants’ personal and financial information, such as salary, tax, Social Security and health insurance data and bank statements, according to the commissioner.

  4. This is an awesome blog , Thanks for sharing