Monday, January 14, 2019

A Vision for Blockchain in the Translation Industry

Happy New Year

This is yet another post on Blockchain, a guest post by Robert Etches who presents his vision of what Blockchain might be able to be in the translation industry. A vision, by definition, implies a series of possibilities, and in this case, quite revolutionary possibilities, but does not necessarily provide all the details of how and what. Of course, the way ahead is full of challenges and obstacles which are much more visible to us than the promised land, but I think it is wise to keep an open mind and watch the evolution even if we are not fully engaged or committed or in agreement. Sometimes it is simply better to wait and see than to come to any final conclusions.

It is much easier to be dismissive and skeptical of upstart claims of fundamental change than to allow for a slim but real possibility that some new phenomenon could indeed be revolutionary. I wrote previously about CEO shortsightedness and what I called Roryisms. Here is a classic one from IBM that shows how they completely missed the boat because of their hubris and old style thinking.
Gerstner is however credited with turning around a flailing mainframe business.The cost of missing the boat can be significant for some, and we need only look at relative stock price and market value improvements over time (as this is how CEO performance is generally measured) to understand how truly clueless our boy Lou and his lieutenants at IBM, in general, were when they said this. The culture created by such a mindset can last decades as we see by the evidence.

IBM’s chairman minimized how Amazon might transform retail and internet sales all the way back in 1999.
“ is a very interesting retail concept, but wait till you see what Wal-Mart is gearing up to do,” said [IBM Chairman, Louis V. Gerstner Jr in 1999.]. Mr. Gerstner noted that last year IBM’s Internet sales were five times greater than Amazon’s. Mr. Gerstner boasted that IBM “is already generating more revenue, and certainly more profit, than all of the top Internet companies combined.”

AMZN Stock Price Appreciation of 36,921% versus IBMs 211% over 20 years

 IBM is the flat red line in the chart above. IBM looks just as bad against Microsoft, Google, Apple, Oracle, and many others who had actual innovation.

January 11, 2019
Amazon Market Value$802 Billion7.3X Higher
IBM Market Value$110 Billion

I bring attention to this, because, I also saw this week that IBM has filed more patents than any other company in the US in 2018, Samsung was second. In fact, IBM has been the top patent filer in the US for every year from 1996 to 2018. BTW they are leaders in blockchain patents as well. However, when was the last time that ANYBODY has associated IBM with innovation or technology leadership? 1980? Maybe they just have some great patent filing lawyers who understand the PTO bureaucracy and know how to get their filings pushed through. In fact, there have been some in the AI community who felt that IBM Watson was a joke and that the effort did not warrant serious credibility and respect. Oren Etzioni said this: “IBM Watson is the Donald Trump of the AI industry—outlandish claims that aren’t backed by credible data.” Trump is now a synonym for undeserved self-congratulation, fraud, and buffoonery, a symbol for marketing with false facts. IBM is also credited with refining and using something called FUD (fear, uncertainty, and doubt) as a deliberate sales and marketing misinformation tactic to keep customers from using better, more innovative but lesser-known products. We should not expect IBM to produce any breakthrough innovation in the emerging AI-first, machine learning everywhere world we see today, and most expect the company will be further marginalized in spite of all the patent filings. 

Some of you may know that IBM filed the original patents for Statistical Machine Translation, but it took Language Weaver (SDL), Google and Microsoft to really make it come to life in a useful way. IBM researchers were also largely responsible for conceiving of the BLEU score to measure MT output quality that was quite useful for SMT. However, the world has changed and BLEU is not useful with NMT. I plan to write more this year on how BLEU and all its offshoots are inadequate and often misleading in providing an accurate sense of the quality of any Neural MT system.

It is important to be realistic without denying the promise as we have seen the infamous CEOs do. Change can take time and sometimes it needs much more infrastructure than we initially imagine. McKinsey (smart people who also have an Enron and mortgage securitization promoter legacy) have also just published an opinion on this undelivered potential, which can be summarized as:
 "Conceptually, blockchain has the potential to revolutionize business processes in industries from banking and insurance to shipping and healthcare. Still, the technology has not yet seen a significant application at scale, and it faces structural challenges, including resolving the innovator’s dilemma. Some industries are already downgrading their expectations (vendors have a role to play there), and we expect further “doses of realism” as experimentation continues." 
 While I do indeed have serious doubts about the deployment of blockchain in the translation industry anytime soon, I do feel that if it happens it will be driven by dreamers, rather than by process crippled NIH pragmatists like Lou Gerstner and Rory. These men missed the obvious because they were so sure they knew all there was to know and because they were stuck in the old way of doing things.  While there is much about blockchain that is messy and convoluted, these are early days yet and the best is yet to come.

Another dreamer, Chris Dixon has an even greater vision on Blockchain when he recently said:
The idea that an internet service could have an associated coin or token may be a novel concept, but the blockchain and cryptocurrencies can do for cloud-based services what open source did for software. It took twenty years for open source software to supplant proprietary software, and it could take just as long for open services to supplant proprietary services. But the benefits of such a shift will be immense. Instead of placing our trust in corporations, we can place our trust in community-owned and -operated software, transforming the internet’s governing principle from “don’t be evil” back to “can’t be evil.”


2018 was a kick-off year for language blockchain enthusiasts. At least five projects were launched[1], there was genuine interest expressed by the industry media, and two webinars and one conference provided a stage for discussion on the subject[2]. Then it all went very quiet. So, what’s happened since? And where are we today?

Subscribers to Slator’s latest megatrends[3] can read that it’s same same in the language game for 2019: NMT, M&A, CAT, TMS, unit rates … how we love those acronyms!

On the world stage, people could only shake their heads in disbelief regarding the meteoritic rise of the value of cryptocurrencies in 2017. However, in 2018 those same people relished a healthy dish of schadenfreude as exchange rates plummeted and the old order was restored with dollars (Trump), roubles (Putin), and pound sterling (Brexit) back in vogue.

In other words, for the language industry and indeed for the world at large, “better the devil(s) we know” appears to be the order of the day.

There is nothing surprising in this. Despite all the “out of your comfort zone” pep talks by those Cassandras of Change[4], the language industry continues to respect the status quo, grow and make money[5]. Why alter a winning formula? And certainly, why even consider introducing a business model that expects translators to work for tokens?! Hello, crazy people!!!

But maybe, just maybe, there was method in Hamlet’s madness[6] and Apple was right when they praised the crazy ones[7]?

Let’s take a closer look at the wonderful world of blockchain and token economics, and how they are going to change the language industry … also the language industry.


Pinning down the goal posts

Because they keep moving! Don’t take my word for it. Here’s what those respected people at Gartner wrote in their blockchain-based transformation report[8] in March 2018:


While blockchain holds long-term promise in transforming business and society, there is little evidence in short-term reality.


Opportunities and Challenges

  • Blockchain technologies offer new ways to exchange value, represent digital assets and implement trust mechanisms, but successful enterprise production examples remain rare.
  • Technology leaders are intrigued by the capabilities of blockchain, but they are unclear exactly where business value can be achieved in the enterprise context.
  • Most enterprise blockchain experiments are an attempt to improve today's business process, but in most of those cases, blockchain is no better than proven enterprise technologies. These centralized renovations distract enterprises from other innovative possibilities offered by blockchain.
And now here’s a second overview, also from Gartner, this time their blockchain spectrum report[9] from October 2018:


Opportunities and Challenges

  • Blockchain technologies offer capabilities that range from incremental improvements to operational models to radical alterations to business models.
  • The impact of blockchain’s trust mechanisms and interaction paradigms extends beyond today’s business and will affect the economy, society and governance.
  • Many interpretations of blockchain today suffer from an incomplete understanding of its capabilities or assume a narrow scope.
The seven-month leap from little evidence in short-term reality to will affect the economy, society and governance is akin to a rocket-propelled trip across the Grand Canyon! Little wonder that traditional businesses don’t know where to start even looking into this phenomenon, never mind taking on a new business model that basically requires emptying the building of 90% of hardware, software and, more important, people.


Why does Deloitte have 250 people working in their distributed ledger laboratories? Because when immutable distributed ledgers become a reality they will put 300,000 people out of work at the big four accountancy companies[10].

Why are at least 26 central banks looking into blockchain? Because there’s a good chance that private banks[11] will be superfluous in 10-15 years’ time and we will all have accounts with central banks.

Or there will be no banks at all …

Let’s just take a second look at that Gartner statement:

The impact of blockchain’s trust mechanisms and interaction paradigms extends beyond today’s business and will affect the economy, society and governance.

Other than basically saying blockchain will change “everything”, the sentence mentions two factors that are core to blockchain: trust and interaction.

Trust. What inspires me about blockchain is its transparency. A central tenet of blockchain is its truth gene. In a world in which even the most reliable sources of information are labeled as fake, blockchain’s traceability – its judgment in stone as to who did what, when and for whom – makes it a beacon of light.

Just think if we could utilize this capability to solve the endless quality issue? What if the client always knew who has translated what – and could even set up selection criteria based upon irrefutable proof of quality from previous assignments? It is no surprise to learn that many blockchain projects are focusing on supply chain management.

Interaction is all about peer-to-peer transactions through Ethereum smart contracts. It’s not just the central banks that will be removing the middlemen. Unequivocal trust opens the door to interact with anyone, anywhere. To a global community. These people of course speak and write in one or more of approximately 6,900 languages, so there’s a market for providing the ability for these “anyones” to speak to each other in any language. What a business opportunity! And what a wonderful world it would be!

Cryptocurrencies and blockchain: peas and carrots

You’ve gotta love Forest Gump – especially now we know Jenny grew up to become Claire Underwood[12] 😊

Just as Jenny and Forest went together like peas and carrots, so do tokens and blockchain.

Unfortunately, this is where many jump off the train. One thing is accepting the relevance of some weird ledger technology that is tipped to become the new infrastructure for the Internet, another is trading in hard-won Venezuelan dollars for some sort of virtual mumbo jumbo!
  1. All fiat currencies are a matter of trust. None is backed by anything more than our trust in a national government behaving responsibly. In 2019 that is quite a scary thought – choose your own example of lemming-like politicians.
  2. All currencies (fiat or crypto) are worth what the market believes them to be worth. In ancient times a fancy shell from some far-off palmy beach was highly valued in the cold Viking north. Today not so. At its inception, bitcoin was worth nothing. Zero. Zip. Today[13] 1 BTC = €3,508.74. Because people say so.
Today, there is absolutely no reason why a currency cannot be minted by, well, anyone. There is indeed a school of thought that believes there will be thousands of cryptocurrencies in the non-too distant future. If we look at our own industry, we have long claimed that translation memories and termbases have a value. Why can that value not be measured in a currency unique to us and with an intrinsic value that we all respect and which is not subject to the whims of short-term political aspirations? Why can’t linguistic assets be priced in a Language Coin?

Much has already been written about the concept of a token economy, though little better than the following:

An effective token strategy is one where the exchange of a token within a particular economy impacts human economic behavior by aligning user incentives with those of the wider Community.[14]

Think about this in the context of the language industry. What if the creation and QA of linguistic assets were tied to their own token? What if you – a private company, a linguist, an LSP, an NGO, an intranational organization – were paid in this token for your data and that the value of this data grew and grew year on year as it was shared and leveraged as part of a larger whole – the Language Community[15]. What if linguists were judged by their peers and their reputations were set in stone? What if everyone was free to charge whatever hourly fees they choose, and that word rates and CAT discounts were a relic of the past?

This is why blockchain feeds the token economy and why the token needs blockchain. Peas and carrots!

To end with the words of another Cassandra – a trendy one at that: Max Tegmark:
If you hear a scenario about the world in 2050 and it sounds like science fiction, it is probably wrong; but if you hear a scenario about the world in 2050 and it does not sound like science fiction, it is certainly wrong.[16]

The pace of change will continue to accelerate exponentially, and I believe blockchain will be one of the main drivers.

Already in 10-15 years, there will be some household (corporate) names and technologies that do not exist or have only just started today. And by 2050 the entire finance, food, and transport sectors (to name the obvious) will be ‘blockchained’ beyond recognition.

At Exfluency, we see multilingual communication as an obvious area where a token economy and blockchain will also come out on top; I’m sure that other entrepreneurs in a myriad of other sectors are coming to similar conclusions. It’s going to be exciting!

Robert Etches
CEO, Exfluency
January 2019

[1] Exfluency, LIC, OHT, TAIA, TranslateMe
[2] TAUS Vancouver, and GALA and TAUS webinars
[4] Britta Aagaard & Robert Etches, This changes everything, GALA Sevilla 2015; Katerina Pastra
Krzystof Zdanowski, Yannis Evangelou & Robert Etches, Innovation workshop, NTIF 2017; Peggy Peng & Robert Etches, The Blockchain Conversation, TAUS 2018, Jochen Hummel, Sunsetting CAT, NTIF 2018.
[5] I am painfully aware that not everyone in the food chain is making money …
[6] Hamlet, II.ii.202-203

[10] P.221 The Truth Machine, by Paul Vigna and Michael J. Casey
[11] Ibid. pp163-167

[13] 9 January 2019
[14] P.69 The Truth Machine
[15] See Aagaard & Etches This changes everything for the sociological and economic importance of communities, the circular society, and the sharing society.
[16] Life 3.0 by Max Tegmark


CEO, Exfluency

A dynamic actor in the language industry for 30 years, Robert Etches has worked with every aspect of it, achieving notable success as CIO at TextMinded from 2012 to 2017. Always active in the community, Robert was a co-founder of the Word Management Group, the TextMinded Group, and the Nordic CAT Group. He served four years on the board of GALA (two as chairman) and four on the board of LT-Innovate. In a fast-changing world, Robert believes there has never been a greater need to implement innovative initiatives. This has naturally led to his involvement in his most innovative challenge to date: As CEO of Exfluency, he is responsible for combining blockchain and language technology to create a linguistic ledger capable of generating new opportunities for freelancers, LSPs, corporations, NGOs and intergovernmental institutions alike.

Tuesday, December 25, 2018

The Global eCommerce Opportunity Enabled by MT

This is a slightly updated post that has been already published on the SDL site.

The holiday season around the world today is often characterized by special holiday shopping events like Black Friday and Cyber Monday. These special promotional events generate peak shopping activity and are now increasingly becoming global events. They are also increasingly becoming a digital and online commerce phenomenon. This is especially true in the B2C markets but is also now often true in the B2B markets.

The numbers are in, and the most recent data is quite telling. The biggest U.S. retail shopping holiday of the year – from Thanksgiving Day to Black Friday to Cyber Monday, plus Small Business Saturday and Super Sunday – generated $24.2 billion in online revenues. And that figure is far below Alibaba’s 11.11 Global Shopping Festival, which in 2018 reached $30.8 billion – in just 24 hours.

When we look at the penetration of eCommerce in the U.S. retail market, we see that, as disruptive as it has been, it is still only around 10% of the total American retail market. According to Andreessen Horowitz, this digital transformation has just begun, and it will continue to gather momentum and spread to other sectors over the coming years.

The Buyer’s Experience Affects eCommerce Success

Success in online business is increasingly driven by careful and continued attention to providing a good overall customer experience throughout the buyer journey. Customers want relevant information to guide their purchase decisions and allow them to be as independent as possible after they buy a product. This means sellers now need to provide much more content than they traditionally have.

Much of the customer journey today involves a buyer interacting independently with content related to the product of interest, and digital leaders understand that understanding the customer and providing content that really matters to them is a pre-requisite for digital transformation success.

B2B Buying Today is Omnichannel

In a recent study focused on B2B digital buying behavior that was presented at a recent Gartner conference, Brent Adamson pointed out that “Customers spend much more time doing research online – 27% of the overall purchase evaluation and research [time]. Independent online learning represents the single largest category of time-spend across the entire purchase journey.”

The proportion of time 750 surveyed customers making a large B2B purchase spent working directly with salespeople – both in person and online – was just 17% of their total purchase research and evaluation process time. This fractional time is further diluted when you spread this total sales person contact time across three or more vendors that are typically involved in a B2B purchase evaluation.

The research made evident that a huge portion of a sellers’ contact with customers happens through digital content, rather than in-person. This means that any B2B supplier without a coherent digital marketing strategy specifically designed to help buyers through the buyer journey will fall rapidly behind those who do.

The study also found that just because in-person contact begins, it doesn’t mean that online contact ends. Even after engaging suppliers’ sales reps in direct in-person conversations, customers simultaneously continue their digital buying journey, making use of both human and digital buying channels simultaneously.

Relevant Local Content Drives Online Engagement

Today it is very clear to digitally-savvy executives that providing content relevant to the buyer journey really matters and is a key factor in enabling digital online success. A separate research study by Forrester uncovered the following key findings:
  • Product information is more important to the customer experience than any other type of information, including sales and marketing content.
  • 82% of companies agree that content plays a critical role in achieving top-level business objectives.
  • Companies lack the global tools and processes critical to delivering a continuous customer journey but are increasingly beginning to realize the importance of this.
  • Many companies today struggle to handle the growing scale and pace of content demands.
A digital online platform does enable an enterprise to establish a global presence very quickly. However, research suggests that local-language content is critically needed to drive successful international business outcomes. The global customer requires all the same content that a US customer would in their own buying journey.

Machine Translation Facilitates Multilingual Content Creation

This requirement for providing so much multilingual content presents a significant translation challenge for any enterprise that seeks to build momentum in new international markets. To address this challenge, eCommerce giants like eBay, Amazon, and Alibaba are among the largest users of machine translation in the world today. There is simply too much content needs to be multilingual to do this with traditional localization methods.

However, even with MT, the translation challenge is significant and requires deep expertise and competence to address. The skills needed to do this in an efficient and cost-effective manner are not easily acquired, and many B2B sellers are beginning to realize that they do not have these skill in-house and could not effectively develop them in a timely manner.

Expansion Opportunities in Foreign Markets

The projected future growth of eCommerce activity across the world suggests that the opportunity in non-English speaking markets is substantial, and any enterprise with aspirations to lead – or even participate – in the global market will need to make huge volumes of relevant content available to support their customers in these markets.

When we look at eCommerce penetration across the globe, we see that the U.S. is in the middle of the pack in terms of broad implementation. The leaders are the APAC countries, with China and South Korea having particularly strong momentum as shown below. You can see more details about the global eCommerce landscape in the SDL MT in eCommerce eBook.

The chart below, also from Andreessen Horowitz shows the shift in global spending power and suggests the need for an increasing focus on APAC and other regions outside of the US and Europe. The recent evidence of the power of eCommerce in China shows that these trends are already real today and are gathering momentum.

The Shifting Global Market Opportunity

To participate successfully in this new global opportunity, digital leaders must expand their online digital footprint and offer substantial amounts of relevant content in the target market language in order to provide an optimal local B2C and B2B buyer journey. 

As Andreesen points out, the digital disruption caused by eCommerce has only just begun and the data suggests that the market opportunity is substantially greater for those who have a global perspective. SDL's MT in eCommerce eBook provides further details on how a digitally-savvy enterprise can handle the new global eCommerce content requirements in order to partake in the $40 trillion global eCommerce opportunity.

Happy Holidays to all.  

May your holiday season be blessed and peaceful.  

Click here to find the SDL eBook on MT and eCommerce.

Thursday, November 15, 2018

The Growing Momentum of Machine Translation in Life Sciences

This is a post that was originally published on the SDL website in two parts which are combined here in a single long post. This post also reflects the expertise of my colleague Matthias Heyn, VP of Life Sciences Solutions at SDL.


This first post in an ongoing series takes a closer look at the emerging use and acceptance of machine translation (MT) in the Life Sciences industry. We take a look at the expanding role MT is likely to have in the industry over the coming years and explore some key use cases and applications.

The Life Sciences industry, like every other industry today, feels the impact of the explosion of content and of the driving forces that compel the industry to use MT and machine learning (ML). The growth is caused by:
  • The volume of multilingual research impacting drug development
  • The increasing volume of multilingual external consumer data now available (or needed), which influence drug discovery, disease identification, global clinical research, and global disease outbreak monitoring
Consumers share information in many ways, across a variety of digital platforms. It has become increasingly necessary to monitor these platforms to stay abreast of trends, impressions, and problems related to their products.

It is useful to consider some of the salient points behind this growing momentum.

MT use has exploded

The content that needs translation today is varied, continuous, real-time and always flowing in ever greater volumes. We can only expect this will continue and increase.

The use of global public MT portals is in the region of an estimated 800 billion words a day. This is astounding to some in the localization industry who account for less than 1% of this, and it suggests that MT is now a regular part of digital life.

Everyone, both consumers and employees in global enterprises, use it all the time. This use of public MT portals also involves many global enterprise workers, who may compromise data security and productivity by using these portals. However, the need for instant, always-available, translation services is so urgent that some employees will take the risk.

Some large global enterprises recognize both the data security risks entailed by this uncontrolled use and the widespread need for controlled and integrated MT in their digital infrastructure. In response, they have deployed internal solutions to meet this need in a more controlled manner.

Why Life Sciences has not used MT historically

There are several reasons why Life Sciences has not used MT, including quality requirements, lags in technical adoptions, global need and non-optimized MT capabilities.

The Life Sciences industry needs high quality, accurate translations given that often the life and death of human beings could be at stake if a translation is inaccurate, creating a subject-matter-expert-dependent and verified quality mindset. The industry saw little benefit from using MT since it was so hard to control and optimize. Depending on the kind of errors, there can be catastrophic consequences from failures and thus a general “not good enough for us” attitude within the industry. Occasional breaking news about MT mishaps did not help.

The Life Sciences industry is not typically early adopters of new technologies. Historically Life Sciences organizations have focused on technology and innovation in targeted areas but that is changing as the need to innovate in multiple areas is only increasing to stay competitive. It is no longer a nice to have, it’s a must-have. At the same time, technologies like machine translation have evolved and improved significantly over the last few years which has impacted how MT is viewed. Machine Translation is now seen as a viable and effective solution to address certain global content challenges.

There’s a concern about risk management/mitigation. Life Sciences organizations have been concerned about the risk involved in leveraging machine translation due to the data security aspects as well as the ability to handle their industry-specific terminology requirements. Generic MT solutions like Google do not provide adequate data security and tailoring controls for the specific needs of an enterprise. Once something is translated using Google Translate it is potentially available in the public domain. Data privacy and security is a top priority for Life Sciences companies and the need for an Enterprise MT solution that provides the benefits of MT technology but with the necessary security, elements are essential. Additionally, there were many use cases where the enterprise needed to have the MT capabilities deployed in private IT environments, and carefully integrated with key business applications and workflow.

But compelling events are forcing change..

The massive increase in the volume of content in general and high volumes of multilingual content from worldwide digitally connected and active patients and consumers are key drivers for the enterprise adoption of MT across the industry.

In the Life Sciences industry, an exponential increase in internal scientific data (particularly in genomics and proteomics data) has triggered global research. This research has led to new ways to develop drugs, knowledge about disease pathways and manifestation, and to the development of tailored treatments for individual patients. Keeping abreast of potentially breakthrough research, much of which may be in local languages has become a competitive imperative.

Source: Arcondis
 The huge increase in patient-related data such as the data from central laboratories, prescriptions, claims, EHRs and Health Information Exchanges (HIEs) provides an immense opportunity to analyze and gain insights across the entire value chain, such as:
  • Drug Discovery: Analyzing and spotting additional indications for a drug, disease pathways, and biomarkers
  • Clinical Trials: Optimizing clinical trials through better selection of investigators and sites, and defining better inclusion and exclusion criteria
  • Wearables: Wearable technologies generate a significant amount of data to monitor patients, such as tracking key parameters and therapy compliance
  • Aggregated data: The ability to aggregate data from multiple reporting sources has also increased the volume and flow of such data.

The Impact of Social Media

Signals related to problems and adverse effects may appear in any language, anywhere in the world. The need to monitor and understand this varied data grows in importance as information today can spread globally in hours. Safety concerns can have serious implications for patient health and on a company’s financial health and reputation. These concerns need to be monitored to avoid derailing a drug that may be on track to become an international success.

Additionally, another important use for machine translation is in the social media and post-marketing area. Life Sciences organizations can compile large amounts of data from multiple languages leveraging MT technology. Monitoring sentiment across all language groups allows Life Sciences organizations to track market-specific issues, sentiment and explain trends. It also helps develop marketing and communication strategies to handle dissatisfaction and avoid crises or to build further momentum to ride positive sentiment.

Applying MT to Epidemic Outbreak Predictions

ML and AI technologies are also applied to monitor and predict epidemic outbreaks around the world, based on satellite data, historical information on the web, real-time social media updates, and other sources. For example, malaria outbreaks predictions take into account temperature, average monthly rainfall, the total number of positive cases, and other data points.

Increasingly the aggregated data that makes this possible is multilingual and voluminous and requires MT to enable more rapid responses. Indeed, such monitoring would be impossible without machine translation.

MT Quality Advancements and Neural MT

MT quality has improved dramatically in recent years, driven by the recent wave of research advances in machine learning, increasing volumes of relevant data to train these systems, and improvements in computing power needed to do this.

This combination of resources and events are key drivers for the progress that we see today. The increasing success of deep learning and neural nets, in particular, have created great excitement as successful use cases emerge in many industries, and also benefit a whole class of Natural Language Processing (NLP) applications including MT.

SDL is a pioneer in data-driven machine translation and pioneered the commercial deployment of Statistical Machine Translation (SMT) in the early 2000’s. The research team at SDL has published hundreds of peer-reviewed research papers and has over 45 MT related patents to their credit. While SMT was an improvement over previous rules-based MT systems, the early promise plateaued, and improvements in SMT were slow and small after the initial breakthroughs.

Neural MT changed this and provided a sudden and substantial boost to MT capabilities. Most in the industry consider NMT a revolution in machine learning rather than evolutionary progress in MT. The significant improvements only represent the first wave of improvement, as NMT is still in its nascence.

At SDL our first generation NMT systems improved 27% on average over previous SMT systems. In some languages, the improvement was as much as 100%, based on the automatic metrics used to measure improvement. The second generation of our NMT systems shows an additional 25% improvement over the first generation. This is remarkable in a scientific endeavor that typically sees 5% a year in improvement at most. It is reasonable to expect continued improvements as the research intensity in the NMT field continues and as we at SDL continue to refine and hone our NMT strategy.

The degree of fluency and naturalness of the output, and its ability to produce a large number of sentences that are very fluent and look like they are from the human tongue drives much of the enthusiasm for Neural MT. Human evaluators often consider the early results, with Neural MT output, to be clearly better, even though established MT evaluation metrics such as the BLEU score may only show nominal or no improvements.

The Neural MT revolution has revived the MT industry again with a big leap forward in output quality and has astonished naysayers with the output fluency and quality improvements in “tough” languages like Japanese, German and Russian.

A Breakthrough in Russian MT

An example of SDL’s MT competence was demonstrated recently, when the SDL research team announced a breakthrough with Russian MT, where its new Neural MT system outperformed all industry standards, setting a benchmark for Russian to English machine translation, with 95% of the system’s output labeled as equivalent to human translation quality by professional Russian-English translators.

Additionally, SDLs broad experience in language translation services and enterprise globalization best practices has also enabled them to provide effective MT solutions for many enterprise use cases ranging from eDiscovery, localization productivity improvements, global customer service and support to broad global communication and collaboration use cases that make global enterprises more agile and responsive to improving CX across the globe.

Availability of Enterprise MT Solutions

While the use of MT across public portals is huge, there are several reasons why these generic public systems are not suitable for the enterprise. These include a lack of control on critical terminology, lack of data security, lack of integration with enterprise IT infrastructure and lack of deployment flexibility. MT needs to have the following core capabilities to make sense to an enterprise:
  • The ability to be tuned and optimized for enterprise content and subject domain.
  • The ability to provide assured data security and privacy.
  • The integration into enterprise infrastructure that creates, ingests, processes, reviews, analyzes, and generates multilingual data.
  • The ability to deploy MT in a variety of required settings including on-premises, private cloud or a shared tenant cloud.
  • The availability of expert services to facilitate tailoring requirements and use case optimization.

Life Sciences Perspective

What is clear today, is that the Life Sciences industry can gain business advantage and leverage from the expeditious and informed use of MT. It is worth reviewing this technology to understand this impact.

MT can transform unstructured data, such as free-text clinical notes or transcribed voice-of-the-customer calls, into structured data to provide insights that can improve the health and well-being of patient populations.

As self-service penetrates the Life Sciences industry, the growing volume of new data from around the world can:
  • Drive better health outcomes and advance the discovery and commercialization of new drugs
  • Improve large-scale population screening to identify trends and at-risk patients.
MT and text mining together will enable the enterprise to process multilingual Real World Evidence (RWE) and generate Real World Data (RWD) to inform all phases of pharmaceutical drug development, commercialization, and drug use in healthcare settings.

Regulatory bodies like the FDA could also utilize additional data related to drug approval trials by expanding to more holistic data during the product approval process – for example, they can also review multilingual internal data from international reports, and multilingual external data from social media that MT can make available for analysis. This could enable much faster processing of drug approvals as more data would be available to support and provide needed background on new drug approval requests.

As the Royal Society states:
“The benefits of machine learning [and MT] in the pharmaceutical sector are potentially significant, from day-to-day operational efficiencies to significant improvements in human health and welfare arising from improving drug discovery or personalising medicine.”