Tuesday, February 11, 2020

3 Ways You Can Become an ‘Augmented Translator’

This is a guest post by 
 2019 United Nations Conference on Trade and Development Digital Economy Report, which shows that global internet protocol (IP) traffic, a proxy for data flows, grew from about 100 gigabytes (GB) per day in 1992 to more than 45,000 GB per second in 2017. By 2022, the figure is expected to stand at 150,700 GB per second.


The future of professional translation is here. Are you ready? Translation is driving the globalization of communication, but it encompasses more than just translation: linguistic advising, review, proofreading, transcreation, subtitling, language consultancy, linguistic content management... the list goes on. No doubt 2020 and beyond is set to increase opportunities for translators to add value to their clients. But, given the rapidly changing world we now live in, how can translators evolve their own services, becoming 'Augmented Translators'?

Engage with technology

The purpose of technology in translation has always been to help translators deliver and finalize content faster. The days when translators were locked up in a library with a pile of dictionaries and a pencil to produce a translation are long gone. Today, content is processed online, from brochures and web pages to user manuals and market outlooks. Even traditional white papers are no longer exclusively published in paper format. And the list of tools, plug-ins, and technologies available to help translators to finalize and reach audiences continues to grow: translation memories, terminology databases, fragment matches, upLIFT, Neural Machine translation, Autosuggest dictionaries, and more.

Even our corporate language is changing with technology: instead of “engaging” with customers, companies “connect” with customers. Now, for those who are familiar with the technologies offered by our flagship solution, SDL Trados Studio, check out the many assumptions raised about our future by language specialists here.

Even more tech-savvy? Check the other side of the fence and see how content will impact the augmented translators’ environment. Discover SDL Content Assistant, a technology that was considered science-fiction several years ago – but is now very real.

Also, with today’s technology, the help provided to translators does not only come from the tools: now, even content creates itself. Now, it’s up to us, translators, to transform it for our local audience.

Specialize in quality levels, not only in specific industries

Fact: the amount of content to translate has reached incredible levels. While SDL translates hundreds of billions of words every year, this figure remains a drop in an ocean of all translated words. What matters is not the amount to translate. What matters is that the result displayed to your audience meets the quality level expected for such content.

However, billions of words also mean billions of possibilities, and augmented translators are aware of one truth that is the current state of affairs: there is no “standard translation”.

All translations are unique, because clients have unique needs, like their customers. And they also have unique constraints, terminologies, processes, and practices.

With the client’s needs in mind, the augmented translator will adjust their effort and the amount of time required to complete their tasks. And the productivity tools available nowadays are here to help them alleviate the burden: the augmented translator never translates from scratch.

The key factor here is to find the perfect dosage in productivity, the right balance between effort and result. It is important to have a strong understanding of the translation workflow, the tools and assets at your disposal, and your own strengths and skills. This will help assess the quality and thus reduce risks.

In fact, “quality” can only be assessed by a human mind, and this is where the augmented translator and the client can collaborate to set expectations on quality. Because both clients and translators know that a “lack of quality” also means “rework”. And while a “high-quality translation” may be expensive, a “low-quality translation” may cost even more.

Inject culture, and acquire knowledge

Augmented translators will speed up the process of integrating their clients’ requirements to get the quality needed, and that is a truth for all industries. But only if they have adequate assets to help them get started in an augmented world.

An augmented translator will take advantage of the following resources:
  • Content reuse from translation memories
  • Glossaries to apply preferred terminology
  • Style guides to comply with formatting, grammar, and stylistic rules 
  • The tone of voice or brand guides to convey the brand’s message 
  • Project-specific instructions, like character limitations
  • Machine and AI-enabled translation engines to accelerate productivity
All these automated tools and assets are literally “knowledge providers” to the translator, and help non-specialized translators to meet client requests even without even knowing the client. These knowledge providers are useful since all the clients have preferred terms, favorite wordings, and different rules.

Of course, this automation can also be error-prone and full of traps: terms in glossaries that do not take the context into account, incorrect source texts written by non-native speakers, corporate jargon not understandable outside of your client’s professional sphere, and more.

This is where the augmented translator has two strong cards to play: culture and understanding.

Augmented translators will be able to spot errors in the source text, avoid using offensive or restrictive content, use the appropriate language for the target audience, rewrite puns, detect dual meanings, adapt to stylistic rules, and correct erroneous terminology used by translation engines, etc.

The augmented translator walks in the footsteps of the ancient copyists and scribes and embraces the same mission and ambition: connect cultures and content to share a message.


Jonathan Grisot started as a translator in 2007 and currently holds a position of Senior Language Specialist for SDL in Paris. He is responsible for driving Machine Translation initiatives, managing internal training and quality best practices and is still involved in various translation and transcreation projects. He is also managing the Junior Academy, a local SDL onboarding structure for newly hired SDL translators. Born in Burgundy and raised on the French Riviera, Jonathan considers his detective novels, sci-fi and fantasy books as his numerous children.

Tuesday, December 31, 2019

Most Popular Blog Posts of 2019

I did not write as much as I had hoped to in 2019 but I hopefully can correct this in the coming year. I notice that the top two posts of the past year were written by guest writers, and I invite others who may be so moved, to also come forward and add to the content being produced on this blog.

These rankings are based on the statistics given to me by the hosting platform, and in general, they look reasonable and likely. In these days of fake news and fake images, one does need to be wary. I have produced other reports that have produced drastically different rankings which seemed somewhat suspect to me so I am going with the listing presented in this post.

The most popular post of the 2019 year was from a frequent guest writer on eMpTy Pages: Luigi Muzii, who has also written extensively about post-editing best practices elsewhere.

1. Understanding the Realities of Language Data

Despite the hype, we should understand that deep learning algorithms are increasingly going to be viewed as commodities.

The data is your teacher. It's the data where the real value is. I predict that this will become increasingly clear over the coming year.

Data is valuable when it is properly collected, understood, organized and categorized. Having rich metadata and taxonomy is especially valuable with linguistic data. Luigi has already written about metadata previously, and you can find the older articles here and here. I think that we should also understand that often translation memory does not have the quality and attributes that make it useful for training NMT systems. This is especially true when large volumes of disparate TM are aggregated together and this is contrary to what many in the industry believe. It is often more beneficial to create new, more relevant TM, based on real and current business needs that better fit the source that needs to be translated.

A series of posts that focused on BLEU scores and MT output quality assessment were the next most popular. Hopefully, my efforts to steer the serious user/buyer to look at business impact beyond these kinds of scores has succeeded, and informed buyers now understand that it is possible to have significant score differences that may have a minimal business impact, and thus these scores should not be overemphasized when selecting a suitable or optimal MT solution.

2.  Understanding MT Quality: BLEU Scores

As there are many MT technology options available today, BLEU and its derivatives are sometimes used to select what MT vendor and system to use. The use of BLEU in this context is much more problematic and prone to drawing erroneous conclusions as often comparisons are being made between apples and oranges. The most common error in interpreting BLEU is the lack of awareness and understanding that there is a positive bias towards one MT system because it has already seen and trained on the test data or has been used to develop the test data set.

What is BLEU useful for?

Modern MT systems are built by “training” a computer with examples of human translations. As more human translation data is added, systems should generally get better in quality. Often, new data can be added with beneficial results, but sometimes new data can cause a negative effect especially if it is noisy or otherwise “dirty”. Thus, to measure if progress is being made in the development process, the system developers need to be able to measure the quality impact rapidly and frequently to make sure they are improving the system and are in fact making progress.

BLEU allows developers a means “to monitor the effect of daily changes to their systems in order to weed out bad ideas from good ideas.” When used to evaluate the relative merit of different system building strategies, BLEU can be quite effective as it provides very quick feedback and this enables MT developers to quickly refine and improve translation systems they are building and continue to improve quality on a long term basis.

The enterprise value-equation is much more complex and goes far beyond linguistic quality and Natural Language Processing (NLP) scores. To truly reflect the business value and impact, evaluation of MT technology must factor in non-linguistic attributes including:
  • Adaptability to business use cases
  • Manageability
  • Integration into enterprise infrastructure
  • Deployment flexibility   
To effectively link MT output to business value implications, we need to understand that although linguistic precision is an important factor, it often has a lower priority in high-value business use cases. This view will hopefully take hold as the purpose and use of MT is better understood in the context of a larger business impact scenario, beyond localization.

Ultimately, the most meaningful measures of MT success are directly linked to business outcomes and use cases. The definition of success varies by the use case, but most often, linguistic accuracy as an expression of translation quality is secondary to other measures of success. 

The integrity of the overall solution likely has much more impact than the MT output quality in the traditional sense: not surprisingly, MT output quality could vary by as much as 10-20% on either side of the current BLEU score without impacting the true business outcome. Linguistic quality matters but is not the ultimate driver of successful business outcomes. In fact, there are reports of improvements in output quality in an eCommerce use case that actually reduced the conversion rates on the post-edited sections, as this post-edited content was viewed as being potentially advertising-driven and thus less authentic and trustworthy.

There is also a post by Dr. Pete Smith that is worth a look: In a Funk about BLEU

Your personal data security really does matter
Don't give it away

The fourth most popular post of 2019 was by guest writer Robert Etches with his vision for Blockchain. 

4.  A Vision for Blockchain in the Translation Industry

Cryptocurrency has had a very bad year, but the underlying technology is still regarded as a critical building block for many new initiatives. It is important to be realistic without denying the promise as we have seen the infamous CEOs do. Change can take time and sometimes it needs much more infrastructure than we initially imagine. McKinsey (smart people who also have an Enron and mortgage securitization promoter legacy) have also just published an opinion on this undelivered potential, which can be summarized as:
 "Conceptually, blockchain has the potential to revolutionize business processes in industries from banking and insurance to shipping and healthcare. Still, the technology has not yet seen a significant application at scale, and it faces structural challenges, including resolving the innovator’s dilemma. Some industries are already downgrading their expectations (vendors have a role to play there), and we expect further “doses of realism” as experimentation continues." 
While I do indeed have serious doubts about the deployment of blockchain in the translation industry anytime soon, I do feel that if it happens it will be driven by dreamers, rather than by process crippled NIH pragmatists like Lou Gerstner and Rory. These men missed the obvious because they were so sure they knew all there was to know and because they were stuck in the old way of doing things.  While there is much about blockchain that is messy and convoluted, these are early days yet and the best is yet to come.

Finally, much to my amazement, a post that I wrote in March 2012 was the fifth most-read post of 2019 even though seven years have passed. This proves Luigi's point, (I paraphrase here)  that the more things change in the world at large, the more they stay the same in the translation industry. 

The issue of equitable compensation for the post-editors is an important one, and it is important to understand the issues related to post-editing, that many translators find to be a source of great pain and inequity.  MT can often fail or backfire if the human factors underlying work are not properly considered and addressed. 

From my vantage point, it is clear that those who understand these various issues and take steps to address them are most likely to find the greatest success with MT deployments. These practitioners will perhaps pave the way for others in the industry and “show you how to do it right” as Frank Zappa says. Many of the problems with PEMT are related to ignorance about critical elements, “lazy” strategies and lack of clarity on what really matters, or just simply using MT where it does not make sense. These factors result in the many examples of poor PEMT implementations that antagonize translators. 

My role at SDL was also somewhat inevitable since as long as 7 years ago I was saying:
I suspect that the most compelling evidence of the value and possibilities of PEMT will come from LSPs who have teams of in-house editors/translators who are on fixed salaries and are thus less concerned about the word vs. hourly compensation issues. For these companies, it will only be necessary to prove that first of all MT is producing high enough quality to raise productivity and then ensuring that everybody is working as efficiently as possible. (i.e not "over-correcting"). I would bet that these initiatives will outperform any in-house corporate MT initiative in quality and efficiency.
It is also clear that as more big-data becomes translation worthy, the need for the technologically informed linguistic steering will become more imperative and valuable.SDL is uniquely positioned to do this better than almost anybody else that I can think of. I look forward to helping make this a reality at SDL in 2020.

The SDL blog also had a strong preference for MT-related themes and if you are curious you can check this out: REVEALED: The Most Popular SDL Blogs of 2019

Wishing you all a Happy, Prosperous, 
and Healthy, New Year and Decade

Friday, December 27, 2019

The Issue of Data Security and Machine Translation

This is a post that was originally published on SDL.COM

As the world becomes more digital and the volume of mission-critical data flows continue to expand, it is becoming increasingly important for global enterprises to adapt to the rapid globalization, and the increasingly digital-first world we live in. As organizations change the way they operate, generate revenue and create value for their customers, new compliance risks are emerging — presenting a challenge to compliance, which must proactively monitor, identify, assess and mitigate risks like those tied to fundamentally new technologies and processes. Digital transformation is driven and enabled by data, and thus the value of data security and governance also rise in importance and organizational impact. At the WEF forum in Davos, CEOs have identified cybersecurity and data privacy as two of the most pressing issues of the day, and even regard breakdown with these issues as a general threat to enterprise, society, and government in general.
While C-level executives understand the need for cybersecurity as their organizations undergo digital transformation, they aren’t prioritizing it enough, according to a recent Deloitte report based on a survey of 500 executives. The report, “The Future of Cyber Survey 2019,” reveals that there is a disconnect between organizational aspirations for a “digital everywhere” future, and their actual cyber posture. Those surveyed view digital transformation as one of the most challenging aspects of cyber risk management, and yet indicated that less than 10% of cyber budgets are allocated to these digital transformation efforts. The report goes on to say that this larger cyber awareness is at the center of digital transformation. Understanding that is as transformative as cyber itself—and to be successful in this new era, organizations should embrace a “cyber everywhere” reality.

Cybersecurity breakdowns and data breach statistics

Are these growing concerns about cybersecurity justified? It certainly seems so when we consider these facts:
  • A global survey in 2018 by CyberEdge across 17 countries and 20 industries found that 78% of respondents had experienced a network breach.
  • The ISACA survey  of cybersecurity professionals points out that it is increasingly difficult to recruit and retain technically adept cybersecurity professionals. They also found that 50% of cyber pros believe that most organizations underreport cybercrime even if they are required to report it, and 60% said they expected at least one attack within the next year.
  • Radware estimates that an average cyber-attack in 2018 costs an enterprise around $1.67M. The costs can be significantly higher, e.g. a breach at Maersk is estimated to have cost around $250 - $300 million, because of the brand damage, loss of productivity, loss of profitability, falling stock prices, and other negative business impacts in the wake of the breach.
  • Risk-Based Security reports that there were over 6500 data breaches and that more than 5 billion records were exposed in 2018. The situation is not better in 2019, and over 4 billion records were exposed in the first six months of 2019.
  • An IBM Security study revealed that the financial impact of data breaches on organizations. According to this study, the cost of a data breach has risen 12% over the past 5 years and now costs $3.92 million on average. The average cost of a data breach in the U.S. is $8.19 million, more than double the worldwide average.
As would be expected, with Hacking as the top breach type, attacks originating outside of the organization were also the most common threat source. However misconfigured services, data handling mistakes and other inadvertent exposure by authorized persons, exposed far more records than malicious actors were able to steal.

 Data security and cybersecurity in the legal profession

Third-party professional services firms are often a target for malicious attacks because of the possibility of acquiring high-value information is high. Records show that law firms relationships with third-party vendors are a frequent point of exposure to cyber breaches and accidental leaks. obtained a list of more than 100 law firms that had reported data breaches and estimate that even more are falling victim to this problem, but simply don’t report it to avoid scaring clients and minimize potential reputational damage.

Austin Berglas, former head of the FBI’s cyber branch in New York and now global head of professional services at cybersecurity company BlueVoyant, said law firms are a top target among hackers because of the extensive high-value client information they possess. Hackers understand that law firms are a “one-stop-shop” for sensitive and proprietary corporate information, merger & acquisitions related data, and emerging intellectual property information.

As custodians of highly sensitive information, law firms are inviting targets for hackers.

The American Bar Association reported in 2018 that 23% of firms had reported a breach at some point, up from 14% in 2016. Six percent of those breaches resulted in the exposure of sensitive client data. Legal documents have to pass through many hands as a matter of course, reams of sensitive information pass through the hands of lawyers and paralegals, and then they go through the process of being reviewed and signed by clients, clerks, opposing counsels, and judges. When they finally get to the location where records are stored, they are often inadvertently exposed to others—even firm outsiders—who shouldn’t have access to them at all.

A Logicforce legal industry score for cybersecurity health among law firms have increased from 54% in 2018 to 60% in 2019, but this is still lower than many other sectors. Increasingly clients are also asking for audits to ensure that security practices are current and robust. A recent ABA Formal Opinion states: “Indeed, the data security threat is so high that law enforcement officials regularly divide business entities into two categories: those that have been hacked and those that will be.

Lawyers are failing on cybersecurity, according to the American Bar Association Legal Technology Resource Center’s ABA TechReport 2019. “The lack of effort on security has become a major cause for concern in the profession.”

“A lot of firms have been hacked, and like most entities that are hacked, they don’t know that for some period of time. Sometimes, it may not be discovered for a minute or months and even years.” Vincent I. Polley, a lawyer, and co-author of a recent book on cybersecurity for the ABA.

As the volume of multilingual content explodes, a new risk emerges: public, “free” machine translation provided by large internet services firms who systematically harvest and store the data that passes through these “free” services.  With the significantly higher volumes of cross-border partnerships, globalization in general, and growth in international business, employee use of public MT has become a new source of confidential data leakage.

Public machine translation use and data security

In the modern era, it is estimated that on any given day, several trillion words are run through the many public machine translation options available across the internet today. This huge volume of translation is done largely by the average web consumer, but there is increasing evidence that a growing portion of this usage is emanating from the enterprise when urgent global customer, collaboration, and communication needs are involved. This happens because publicly available tools are essentially frictionless and require little “buy-in” from a user who doesn’t understand the data leakage implications.  The rapid rate of increase in globalization has resulted in a substantial and ever-growing volume of multilingual information that needs to be translated instantly as a matter of ongoing business practice. This is a significant risk for the global enterprise or law firm as this short video points out. Content transmitted for translation by users is clearly subject to terms of use agreements that entitle the MT provider to store, modify, reproduce, distribute, and create derivative works. At the very least this content is fodder for machine learning algorithms that could also potentially be hacked or expose data inadvertently.

Consider the following:
  • At the SDL Connect 2019 conference recently, a speaker from a major US semiconductor company described the use of public MT at his company. When this activity was carefully monitored by IT management, they found that as much as 3 to 5 GB of enterprise content was being cut and pasted into public MT portals for translation on a daily basis. Further analysis of the content revealed that the material submitted for translation included future product plans, customer problem-related communications, sensitive HR issues, and other confidential business process content.
  • In September 2017, the Norwegian news agency NRK reported data that they found that had been free translated on a site called Translate.Com that included “notices of dismissal, plans of workforce reductions and outsourcing, passwords, code information, and contracts”. This was yet another site that offered free translation, but reserved the right to examine the data submitted “to improve the service.” Subsequently, searches by Slator uncovered other highly sensitive data of both personal and corporate content.
  • A recent report from the Australian Strategic Policy Institute (ASPI) makes some claims about how China uses state-owned companies, which provide machine translation services, to collect data on users outside China. The author, Samantha Hoffman, argues that the most valuable tools in China’s data-collection campaign are technologies that users engage with for their own benefit; machine translation services being a prime example. This is done through a company called GTCOM, which Hoffman said describes itself as a “cross-language big data” business, offers hardware and software translation tools that collect data — lots of data. She estimated that GTCOM, which works with both corporate and government clients, handles the equivalent of up to five trillion words of plain text per day, across 65 languages and in over 200 countries. GTCOM is a subsidiary of a Chinese state-owned enterprise that the Central Propaganda Department directly supervises, and thus data collection is presumed to be an active and ongoing process.
After taking a close look at the enterprise market needs and the current realities of machine translation use we can summarize the situation as follows:
  • There is a growing need for always-available, and secure enterprise MT solutions to support the digitally-driven globalization that we see happening in so many industries today. In the absence of having such a secure solution available, we can expect that there will be substantial amounts of “rogue use” of public MT portals with resultant confidential data leakage risks.
  • The risks of using public MT portals are now beginning to be understood. The risk is not just related to inadvertent data leakage but is also closely tied to the various data security and privacy risks presented by submitting confidential content into the data-grabbing, machine learning infrastructure, that underlie these “free” MT portals. There is a growing list of US companies already subjected to GDPR-related EU regulatory actions, including, Amazon, Apple, Facebook, Google, Netflix, Spotify and Twitter. Experts have stated that Chinese companies are likely to be the next wave of regulatory enforcement, and the violators' list is expected to grow. 
  • The executive focus on digital transformation is likely to drive more attention to the concurrent cybersecurity implications of hyper-digitalization. Information Governance is likely to become much more of a mission-critical function as the digital footprint of the modern enterprise grows and becomes much more strategic.

 The legal market requirement: an end to end solution

Thus, we see today, having language-translation-at-scale capabilities have become imperative for the modern global enterprise.  The needs for translation can range from rapid translation of millions of documents in an eDiscovery or compliance scenario, to the very careful and specialized translation of critical contract and court-ready documentation on to an associate collaborating with colleagues from a foreign outpost. Daily communications in global matters are increasingly multilingual. Given the volume, variety, and velocity of the information that needs translation, legal professionals must consider translation solutions that involve both technology and human services. The requirements can vary greatly and can require different combinations of man-machine collaboration, that includes some or all of these different translation production models:
  • MT-Only for very high volumes like in eDiscovery, and daily communications
  • MT + Human Terminology Optimization
  • MT + Post-Editing
  • Specialized Expert Human Translation

SDL Machine Translation: designed for the Enterprise

SDL is a leader in developing secure, private, scalable enterprise-ready MT technology that can be deployed on-premise, or in a private cloud, and also provides related expert services to ensure optimally tailored deployment. SDL’s NLP technology team bench is deeper than any other in the translation industry and the company’s MT technology is used by the largest global enterprises in the world, as well as many governmental agencies focused on national security and intelligence gathering activities. From the outset, SDL has focused on developing enterprise-friendly capabilities that include the following:
  • Guaranteed data security & privacy
  • Flexible deployment options that include on-premise, cloud or a combination of both as dictated by usage needs
  • Broad range of adaptation and customization capabilities so that MT systems can be optimized for each individual client
  • Integration with primary enterprise IT infrastructure and software e.g. Office, Translation Management Systems, Relativity, and other eDiscovery platforms
  • Rest API that allows connectivity to any proprietary systems that you may employ. 
  • Broad range of expert consulting services both on the MT technology aspects and the linguistic issues
  • Tightly integrated with professional human translation services to handle end-to-end translation requirements.

SDL’s translation capabilities range from handling large eDiscovery litigation related projects using MT enhanced with expert developed client-specific glossaries and search terms to improve the ability to identify relevant documents, to specialized and expert human translation services for critical content. SDL’s secure translation supply chain solution provides an enterprise-class, vendor agnostic, secure translation platform that allows you to combine regulatory compliance and translation best practice. SDL has the most sophisticated and comprehensive end-to-end translation solution capabilities in the industry today, powered by over 1,400 in-house translators working closely with linguistic AI technology enables tools and technology.