Wednesday, June 20, 2018

Why Russian to English is difficult for Machine Translation

When we consider the history of machine translation, the science by which computers automatically translate from one human language to another, we see that much of the science starts with Russian. One of the earliest mentions of automated translation involves Russian Peter Troyanskii who submitted a proposal that included both the bilingual dictionary and a method for dealing with grammatical roles between languages, based on the grammatical system of Esperanto, even before computers were available.

The first set of proposals for computer-based machine translation was presented in 1949 by Warren Weaver, a researcher at the Rockefeller Foundation in his now famous "Translation memorandum". In the famous memorandum referenced here, he said: “it is very tempting to say that a book written in Russian is simply a book written in English which was coded into the Russian code.” These proposals were based on information theory, successes in code-breaking during the Second World War, and theories about the universal principles underlying natural language. But Weaver’s memo was not the only driver for this emerging field. What really kick-started research was Cold War fear and the US analysts desire to easily read and translate Russian technical papers. Warren Weaver inspired the founders of Language Weaver to name themselves after him in the early 2000s, and the company was the first to commercialize and productize Statistical Machine Translation (SMT) and was also the source for much of the subsequent innovation in SMT. Its alumni went on to start Google Translate, Moses, influence Amazon MT/AI initiatives, and the company and its intellectual property are now owned by SDL Plc.

The original Georgetown experiment, which involved successful fully automatic translation of more than sixty Russian sentences into English in 1954, was one of the earliest recorded MT projects. Researchers of the Georgetown experiment asserted their belief that machine translation would be a solved problem within three to five years. This claim to be able to solve the MT problem in five years has been a frequent refrain of the MT community, and almost seventy years later we see that MT remains a challenging problem. Recent advances with Neural MT are welcome and indeed significant advances, but MT remains one of the most challenging research areas in AI.

Why is MT such a difficult NLP problem?

As the results of 70 years of ongoing MT research efforts show, the machine translation problem is indeed one of the most difficult problems to solve in the Natural Language Processing (NLP) field. It is worth some consideration why this is so, as it explains why it has taken 70 years to get here, and why it may still take much more time to get to “always perfect” MT, even in these heady NMT breakthrough days.

It is perhaps useful to contrast MT to the automated speech recognition (ASR) challenge, to illustrate the difficulty. If we take a simple sentence like, “Today, we are pleased to announce a significant breakthrough with our ongoing MT research, especially as it pertains to Russian to English translations.” In the case of ASR, there is really only one correct answer, the computer either identified the correct word or it did not, and even when it does not properly identify the word, one can often understand from the context and other correctly predicted words.

Computers perform well when problems have binary outcomes, where things are either right or wrong, and computers tend to solve these kinds of problems much more effectively than problems where the “answers” are much less clear. If we consider the sentence in question as a translation, it is a very different computing challenge. Language is complex and varied, and the exact same thing can be said and translated in many different ways. All of which can be considered correct. If you were to add the possibilities of slightly wrong or grossly wrong translations you can see there are a large range of permutational possibilities. The sentence in question has many possible correct translations and herein lies the problem. Computers do not really have a way to assess these variations other than through probability calculations and measuring statistical data density which is almost always completely defined by the data you train on. If you train on a data set that does not contain every possible translation then you will have missed some possibilities. The truth is that we NEVER train an engine on every possible acceptable translation.

Michael Housman, is chief data science officer at RapportBoost.AI and faculty member of Singularity University. He explained that the ideal scenario for machine learning and artificial intelligence is something with fixed rules and a clear-cut measure of success or failure. He named chess as an obvious example and noted that machines were able to beat the best human Go player. This happened faster than anyone anticipated because of the game’s very clear rules and limited or definable set of moves.

Housman elaborated, “Language is almost the opposite of that. There aren’t as clearly-cut and defined rules. The conversation can go in an infinite number of different directions. And then, of course, you need labeled data. You need to tell the machine to do it right or wrong.”

Housman noted that it’s inherently difficult to assign these informative labels. “Two translators won’t even agree on whether it was translated properly or not,” he said. “Language is kind of the wild west, in terms of data.

Erik Cambria is an academic AI researcher and assistant professor at Nanyang Technological University in Singapore said, “The biggest issue with machine translation today is that we tend to go from the syntactic form of a sentence in the input language to the syntactic form of that sentence in the target language. That’s not what we humans do. We first decode the meaning of the sentence in the input language and then we encode that meaning into the target language.”

All these hindering factors remain in effect for the forseeable future, so we should not expect another big leap forward until we find huges masses of new, high-quality data, or develop a new breakthrough in the pattern detection methodology.

Why are some language combinations more difficult in MT?

In essence (grossly oversimplified), MT is a pattern detection and pattern matching technique where a computer is shown large volumes of clean equivalent sentences in two languages and it “learns” how to “translate” from analyzing these examples. NMT does this differently than SMT, but essentially they are both detecting patterns in the data they are shown, with NMT having a much deeper sense of what a pattern might be. This is why the quality and volume of the “training data” matters, as it defines the patterns that can be learned.

What we have seen over the last 70 years is that languages that are more similar tend to be easier to model (MT is a translation model). Thus we see, it is much easier to build an MT system for Spanish <> Portuguese because both languages are very similar and have many equivalent linguistic structures. In contrast English <> Japanese will be much more challenging because there are big differences in linguistic characteristics, orthographic format (JP can be written in 3 scripts), morphology, grammar, word order, honorific structure and so on. Also, while English <> Japanese is difficult, it is much easier to build a model for Japanese <> Korean since they have much more structural and linguistic similarity and equivalencies.

Thus the basic cause of difficulty is because of the fundamental linguistic differences between the two languages. Linguistic concepts that exist in one language do NOT exist in the other language, so equivalencies are very difficult to formulate and model. A recent research paper describes language modeling difficulties using the Europarl corpus whose existence allows many comparative research experiments. A key finding of this study was that inflectional morphology is a big factor in difficulty. In the chart below we see that DE, HU, and FI are more difficult because of this and SV and DA are easier also because they are more similar.

 A prior study based on the Europarl data has been a key reference for what combinations are easy or difficult to model with roughly equivalent datasets used to build these comparative models.

What this chart shows is that the easiest translation direction is Spanish to French (BLEU score of 40.2), the hardest Dutch to Finnish (10.3).

This shows that having much more Finnish data does not help raise output quality because the linguistic differences are much more significant. The Romance languages outperform many other language combinations with significantly less data.

Why Russian is especially difficult

Russian has always been considered to be one of the most difficult languages in MT, mostly because it is very different linguistically from English. Early NMT attempts were unable to outperform old RBMT models, and SMT models, in general, were rarely able to consistently beat the best RBMT models.  

Russian differs from English significantly in inflection, morphology, word order and gender associations with nouns.


Unlike English, Russian is a highly inflected language. Suffixes on nouns mark 6 distinct cases, which determine the role of the noun in the sentence (whether it's the subject, the direct object, the indirect object, something being possessed, something used as an instrument, or the object of a preposition). For example, all of these are different forms of the word "book":

nominativeкнига (kniga)книги (knigi)
genitiveкниги (knigi)книг (knig)
dativeкниге (knige)книгам (knigam)
accusativeкнигу (knigu)книги (knigi)
instrumentalкнигой, книгою (knigoj, knigoju)книгами (knigami)
prepositionalкниге (knige)книгах (knigax)

(from Wiktionary

That's 12 forms of the same word, which are used depending on what role the word is playing in the sentence. But they're not all distinct; you can have the same form for different roles, like the singular genitive & the plural nominative.

Additionally, like Spanish or French every noun has a gender. The word for "book" is feminine, but this is an arbitrary categorization; there's no reason why a book (книга kníga) is feminine and why a table (стол stól) is masculine. But it matters because the case suffixes are different for each gender (masculine, feminine, or neuter). So while there are 12 different forms of the word "book" and 12 different forms of the word "table", they don't share the same set of suffixes. When adjectives modify nouns, they need to agree with the noun, taking the same (or similar) suffix.

Also, like Spanish or French, verbs conjugate depending on tense (past vs. non-past), person (I vs. you vs. he/she/it), number (singular vs. plural), etc. So one verb may have several different forms, as well.

Word order

In English, we use word order to accomplish the same thing as the suffixes on nouns in Russian. Because Russian has these case markings, their word order is much more free. For example, these are all acceptable ways of saying "I went to the shop":

Я пошёл в магазин. (ya poshol v magazin)
Я в магазин пошёл. (ya v magazin poshol)
Пошёл я в магазин. (poshol ya v magazin)
Пошёл в магазин я. (poshol v magazin ya)
В магазин я пошёл. (v magazin ya poshol)
В магазин пошёл я. (v magazin poshol ya)

я ya = I
пошёл poshol = went
в v = to
магазин magazin = shop

Essentially, all orderings are possible, except that the preposition "to" (в v) must precede the word for "shop" (магазин magazin). You can imagine that as sentences get longer, the number of possible sentence order structures increase. There are some limits on this: some orders in this example are dispreferred and sound strange or archaic, and others are only used to emphasize where you're going or who is going. But there are certainly more ways of saying the same thing than English, which is stricter in its word order.

Difficult languages, in general, are more demanding of the skill required of the MT system developer. They are not advised for the Moses and OpenNMT hacker who wants to see how his data might perform with open source magic, and generally, most of these naive practitioners will stay away from these languages. 

There are special challenges for an MT system developer who builds Russian <> English MT systems, e.g.
  •  MT needs to pay more attention to Russian word inflections than to the order of the words, to know where to put the word in the English translation
  • MT needs to be flexible enough to translate a familiar Russian source sentence that appears in an unfamiliar word order
Thus, Russian to English is amongst the most difficult MT combinations one could attempt and only the most competent and skilled MT system developers would be able to build systems that produce output quality that is judged by professional human translators as “human equivalent”.

Friday, June 8, 2018

Why MT Matters and its Role in Digital Transformation

This is a modified and updated post that was originally published in CMS Wire on June 1st.

We live in an era where there is more information available to a digitally savvy human than has ever been possible in the history of mankind as we know it. The volume growth implications are so significant and substantial that it is worth considering some contextual facts to get a proper understanding of this fact.

The Encyclopedia Britannica announced in 2012 that after 244 years, dozens of editions, and more than 7M sets sold, no new editions would be printed. The 32 volumes of the 2010 installment, it turns out, were the last edition of this great publication. The primary cause for this was the increasing use and relevance of the Wikipedia (and other digital alternatives) which would cover 2,670+ Encyclopedia Britannica sized volumes if it were to be actually printed. While some may argue that the Wikipedia is less reliable, this contrast is quite astonishing and clearly, there is more information.

This explosion of content is even more astounding if we survey the larger information landscape which reveals the following fact described in the graphic below.

This explosion, however, presents special challenges to the modern enterprise which now needs to assimilate, digest and determine what is relevant and what is not. In this age of digital disruption, those who fail in doing this becoming increasingly irrelevant, as we have seen in the retail industry in particular. Once iconic brands now fall by the wayside, and will quietly disappear e.g. Sears, Toys R Us, and many more. Studies by experts suggest that many more companies, across many industries, will disappear because they fail to understand the changes in values and priorities inherent in this content explosion and the related digital disruption it causes.

The behavior of the modern customer has changed and is now much more affected by freely flowing content. In fact in many B2C and even B2B scenarios we see that the modern customer may conduct the whole customer journey without ever talking to a salesperson. Studies show that as much as 67% of the buyer’s journey is conducted digitally (though some put a different spin on that statistic), and customer behavior driven by content that they discover. It can be said that in the modern era, companies that provide relevant, high-quality content succeed, and those that don’t, become irrelevant. The following graphic shows the many stages at which relevant customer content is required to persuade and develop a deeper engagement with a potential customer, and then maintain an ongoing relationship with an enterprise or a brand after a customer relationship has been established.

Now, consider doing this across the many languages that are needed to engage and connect with the global customer, and we see the crying need for not just translation of mandated packaging materials, but translation capabilities that can support the constant and always updating content that may influence a buyer in their evaluation and decision-making process. And, also provide adequate information support in the post-purchase phase of the relationship.

While historically corporate marketing had a great degree of control, today most consumers distrust or at least prefer additional sources of this kind of product messaging, and many would rather trust the shared customer experience of fellow consumers. The value of business content increasingly has a very short shelf-life and thus traditional (slow and expensive) translation approaches are increasingly questioned for information that may have little or no value after six months.  In actual fact, the fastest growing type of content is actually user-generated content (UGC) that is found in blogs, FB, YouTube, Twitter and community forums. It is estimated by IDC that 70% of the content on the web is UGC and much of that is very pertinent and useful to enterprises to understand trends and customers better. This content is now influencing consumer behavior all over the world and is often referred to as word-of-mouth marketing (WOMM). Consumer reviews are often more trusted than “corporate marketing-speak” and even “expert” reviews which are often funded by the same corporations. We all have experienced Amazon, travel sites, C-Net and other user rating sites. It is useful for both global consumers and global enterprises to make this multilingual.

 It is estimated that as many as 600 billion words a day are translated by computers today, across the various MT (machine translation) portals. This dwarfs what the localization and professional business translation industry does by a factor of more than 99X! Recent reports suggest that a new MT developer, Alibaba, does as much as 200 billion words a day alone on their various eCommerce platforms. This brings the total MT word tally up to almost 800 billion words a day. Clearly, global customers need specific information that may not be available in their native tongues, and they will use MT to get at least a gist of what they need to understand. While many continue to moan about the imperfection of MT quality at a human linguistic quality assessment level, we have already reached a point in human history where the substantial bulk of language translation being consumed on the planet today is being done by computers.

 Global customers who research products and services, can and will get many disparate sources of information, that is not controlled by an enterprise to make their evaluations on whether to buy a product or not. MT will allow them to get access to non-native language content, and instantly obtain a translation that is “good enough” to support a personal evaluation process.

In our modern times, we’re experiencing a state of unprecedented connectivity thanks to technology. However, we’re still living under the shadow of the Tower of Babel in terms of global human communication ease. Language remains a barrier to business and marketing. Even though technological devices can quickly and easily connect, humans from different parts of the world often can’t. And traditional translation service offerings simply cannot scale to the real translation needs of the modern enterprise without leveraging technology in a substantial and competent way.

There are many kinds of business translation applications where MT just makes sense, and it would be foolish to even attempt these kinds of projects without competent MT technology as a foundation. Usually, this is because these applications have some combination of the following factors:
  • Very large volume of source content that simply could NOT be translated without MT in any useful time frame
  • Rapid turnaround requirement (days, hours or minutes) for the content to have any value to the content consumers
  • A user tolerance for lower quality translations, at least in early stages of information review
  • To enable information and document triage when dealing with large document collections and help to identify highest priority content from a large mass of undifferentiated content. This process also helps to identify the most important and relevant documents to send to higher quality human translation processes.
  • Translation Cost prohibitions (usually related to volume)
One can find this combination of requirements in several customer communications oriented functions like providing technical support knowledge-base, eCommerce product listings, customer service/support, and customer experience reviews for all kinds of products and service experiences. However, in an increasingly digital world, we see that the need to be able to process large volumes of business content will only grow, and the need to identify what is most relevant and valuable for ongoing international business mission needs is becoming a critical success-enabling technology requirement.
Competently deployed machine translation technology, that is properly integrated with relevant content flows to enhance customer experience, solves high-value business problems that further and enhance any and all global business initiatives.
So if we are to summarize these key trends we see the following:
  • A content explosion that makes huge amounts of information available to global customers to help them understand products and services on a scale that has never been seen before. Much of this content is out of the control of the modern enterprise but yet it can deeply influence the behavior of potential and existing customers of the enterprise. Understanding what is most relevant and important is also becoming an increasingly more valuable skill.
  • An era where content is increasingly your best salesperson and customers everywhere will use digital content to make purchase decisions. For an enterprise to be relevant in the modern era they will need to be present at every stage of the buyer and customer journey with relevant and high-value content. Those that provide the best digital experience (DX) with relevant content will thrive and prosper, and those that do not will struggle and fail.
  • The modern global customer expects to get as much content and information in his language as his counterparts across the world. MT technology use will continue to accelerate to support these needs to make relevant content visible in a timely and efficient manner. MT technology will continue to improve as the smartest researchers in the world continue to focus on it.

"Mass machine translation is not a translation of a work, per se, but it is rather, a liberation of the constraints of language in the discovery of knowledge."  
                                                                                      Peter Brantley

Thus, in this era of digital disruption, content-driven customer engagement, and B2C relationship building, where global customers want access to the same content that their English speaking counterparts have, what is the leadership at a modern enterprise to do?

In reviewing the needs for the day and the skills that really matter for the future increasingly point to three things:
  1. An understanding of what is relevant content within the deluge that every enterprise today faces. The need is to increase relevant communication not just make any random content more available.
  2. An alignment of the content development and management strategies, with efficient and optimized translation processes that enable the enterprise to quickly reach a global audience. Content creation needs to be aligned with content transformation (translation) and delivery strategies.
  3. An understanding of the new AI-based emerging technologies that will help the enterprise to rapidly evolve to producing relevant content and establish a global digital presence so that the relevant content is delivered to the right customers at the right time.
 In a respected paper published in the Harvard Business Review, the authors point out three building blocks that can help drive a modern enterprise into building a market leadership position in this age of digital disruption. The building blocks are summarized in the graphic below.

And for those who look carefully, we can see that MT has now reached a point of being a critical and strategic technology to assist in this digital transformation.  MT enables the communication that enables and underlies product innovation from global teams, helps an enterprise to understand and communicate more effectively with customers across the world, and, also enables efficient delivery of highly relevant multilingual content across the world to build market leadership.

Tuesday, May 15, 2018

The Weakness In Data

Another guest post by Luigi who covers a variety of subjects here: AI, Big Data, NMT Hype, and more. Luigi attempts on a regular basis to clarify the conflation that seems rampant in the translation industry and makes my life easier by producing what I perceive as interesting content to keep this blog relevant and current.

 AI is a much-misunderstood term and thus I think it is worth a closer look to further reduce the conflation that surrounds it.  The graphic below from a presentation I made on "Linguistic AI" on behalf of SDL, describes what I think a real AI should do. However, the reality is still quite far from the broad promise made by the use of the word intelligence, and most of what we see today is narrowly focused ML deployments that indeed do seem to perform some kind of cognitive function around carefully selected data.

 There is also a lot of confusion about what machine learning (ML) is and how it relates to AI. Thus I think this graphic below is also useful to keep the ongoing discussions clear. Especially, since we hear of some talking about deep NMT  versus your basic NMT. Seriously, how deep are we talking? Most NMT today TTBOMK is based on deep learning as shown below.

 Luigi also touches upon the hype around NMT, specifically, on the Microsoft claim of reaching human parity with their Chinese NMT engine. While not untrue from a very narrowly defined, and very specific definition of what parity is, it is an overstatement of the actual achievement in a broader sense as us regular humans might understand. However, to see this overstatement requires actual intelligence, artificial intelligence is not enough. 

It is hyperbole that you can quickly disprove by taking any random Chinese news web page and running a translation through. You will indeed be disappointed by the complete lack of alleged human parity of this exercise, and will probably begin to ask pesky questions about what humans are we talking about. It also similar to equating a card trick to a miracle. Anyway, this kind of claim is a common marker in the MT world, which is often filled with empty promises. To be fair it is a much less deceptive and blatant overstatement than the Google announcements a year or so ago. 

It has been my observation that most if not all the do-it-yourself experimentation with SMT produced sub-optimal results. To be explicit, this means that you would have been better off using a public MT portal or working with an expert. NMT has 10+ open source toolkits, so my question(s) to the DIYers is: Which one are you going to use? Why? How do you know the others are not better? The cost and complexity to engage with NMT go way beyond loading low-quality data into an open source or any toolkit. The rate of change in the science and algorithmic evolution is unprecedented. It is my opinion that NMT is not a game for the underfunded and the naive, but I am sure many in the translation industry will expend time and resources to find this out.  

The notion of data in this era of ML and neural nets is interesting, and I recommend that you go down the thread and often silly comment trail that was triggered by this tweet from a partner at VC Andreessen Horowitz who it seemed, wanted to make the point that ML apps need very different and specific data to produce useful outcomes, not just generic "data":

 Some of my favorite responses include the examples below, which sound surprisingly like some discussions on translation quality that I have witnessed.

I heard they both go through pipelines
@BatMongoose : Maybe data is more like sand - annoyingly ubiquitous but useless until you figure out how to turn it into something (silicon wafers) 
@EVplusEV : I prefer: Data is the new bacon
: Big Data is the new snake oil.
@DanielMiessler : Data would be more like the dinosaurs, plants, and sunshine. The oil would be the insights and predictions. 

@asemotaData is the new "Oxygen" 🤣🤣


 In recent years, the blogosphere has lost much of its original appeal, mainly because its connected community has largely moved to social media, which, today, ended up conveying most content. Indeed, social media help much content emerge that would otherwise remain buried. Social media—as we all know—also convey content that should better be ignored anyway, but even crap has its raison d’être: That’s content marketing, baby, content marketing, and there’s nothing you can do about it, nothing.

Content, skills, and knowledge

Indeed, this content offers a plea to run some basic psychometrics on the small groups of people one follows on social media. Don’t get fooled by the Facebook/Cambridge Analytica scandal, it’s not rocket science: Even likes can tell you a lot and help you understand what your contacts are paying attention to and why especially if they are not just virtual acquaintances.

Social media activity of your contacts can even provide you with much more confirmations than expected. The fundamentals of content marketing say that the content produced should be of absolute value, but this is hardly true because marketing is supposed to exert its effects anyway and one does not always have something definitive to say.

What would you think, for example, of an acquaintance of yours recommending a post by someone who admits s/he is an absolute beginner with machine translation, has no technical knowledge of it and yet thinks s/he can provide his/her customers with solid advice anyway? And what would you think of the same acquaintance of yours who defines him/herself as an industry professional while admitting his/her revulsion for MT and declaring her cast-iron belief in any professional as being capable of sparing his/her customers a “poor figure”? Well, these people are really telling a lot about themselves with a post and a like.


The power of data


Seth Stephen-Dawidowitz’s Everybody Lies is a terrific book for how simply it shows the power of data. Just like Seth Stephen-Dawidowitz in his book, Google’s Mackenzie Nicholson displaced many attendees at the recent Smartling Global Ready EMEA, by asking a few classic questions with a seemingly obvious and yet invariably incorrect answer. For example, when it comes to clichés, no one would have bet that Italians pay far more attention to price than Germans, Scots, Israelis as Google’s data unequivocally shows.

It came as no surprise, then, that analytics generally indicate that in-house reviews mostly result overly expensive and largely pointless, as Kevin Cohn, later on, showed in the same occasion. Simply put, despite great expectations almost no actual improvement is recorded. Indeed, most edits are usually irrelevant and simply a matter of personal taste. Incidentally, Kevin Cohn is a data scientist who only speaks English and admittedly knows almost nothing about translation. Anyway, as the wise man says, data ipsa loquuntur.


Hypes you (don’t) expect


Of the many expectations that have been generating hypes over the last few years, the ones about data are not inflated, and people are, maybe slowly but steadily, getting accustomed to reckoning with data-driven predictions. As algorithms will be growing in numbers and potentials, the confidence in their applications will also grow.

In fact, hypes are aimed at and address people outside verticals, so Microsoft’s recent hype on NMT achieving human parity, for example, was not meant for the translation industry.

So why all the fuss?

As a matter of fact, the difference between human and machine translation is becoming thinner and thinner, at least looking at quality scores and statistical incidence. Also, the concept of parity may be quite hard for a layman to grasp. This, if anything, makes the desolation of posts like the one mentioned above even more evident. Indeed, it is pretty unlikely for the general media to get the news correctly in such cases like the Microsoft hype case: However complete and clear the article might have been, it was even misleading in the title, which usually is the only catchphrase for the media.

In Microsoft’s much-vexed, and yet, don’t forget it, scientific article, parity is defined mostly as a functional feature, i.e. as a measure of the ability to communicate across language barriers. Parity is compared to professional human translations, and yet keeping clearly in mind the idea that “computers achieving human quality level is generally considered unattainable and triggers negative reactions from the research community and end users alike” and that “this is understandable, as previous similar announcements have turned out to be overly optimistic.”

As a matter of fact, it is made equally clear that the quality of NMT output in the case examined exceeds that of crowd-sourced non-professional translations, which should come as no surprise for those translation pundits who have read the article.

On the other hand, a recent study from the University of Maryland found that “users reacted more strongly to fluency errors than adequacy errors.” Since the main criterion in recruiting participants was their English language ability, the study indirectly confirms that “adequacy” implies a vertical kind of knowledge, the same that could prevent hypes from arising and spreading.

The unpleasant side of this story is that, once again, many so-called translation professionals still can’t see how MT is just a stress-relieving technology, conceived and developed to enhance translation, make it easier and faster and possibly better.

That’s why (N)MT is no inflated hype, and it has actually been on the plateau of productivity for years now.

Overcoming language barriers is an ageless aspiration of humankind that does not generate any fears, unlike the much-fabled singularity. Except, possibly, amongst language professionals, despite the continuous, recurrent, self-reassurance (wishful thinking?) that machines will never replace men, at least in this creative and thus undeniably human task.

In the end, the NMT hype falls within mainstream tech news, which is sprayed as toxic gas to win a market war that is battled on much more profitable fronts than NLP, corporate business platforms. Indeed, the NMT arena is dominated by a leading actor with a supporting actor and many smaller side actors struggling for an appearance on the proscenium. Predictably, a translation industry “star”, which is just a “dwarf” in the global business universe, recently opted for buying instead of making its own NMT engine, citing the scarcity of data scientists—and money, of course—as the main reason for the decision.

Actually, not only has NMT emerged as the most promising approach, it has also been showing superior performances on public benchmarks and rapid adoption in deployments and steady improvements. Undeniably, there have also been reports of poor performance, such as the systems built under low-resource conditions, confirming that NMT systems have lower quality with out-of-domain data. This implies that the learning curve may be quite steep with respect to the amount and, most importantly, the quality of training data. Also, NMT systems are still little interpretable, meaning that any improvements are extremely complex and random, when not arbitrary.

Anyway, to be unmistakably clear, MT is definitely “at parity” with human translation, especially when this is below expectations, i.e. sadly average low-grade. And Arle Lommel is right in writing that an article titled New Study Shows That MT Isn’t Terrible would not generate much attention. At the same time, though, when he writes that “the only translators who need to worry about machine translation are those who translate like machines” he can’t possibly even imagine that this is exactly what most human translators have been doing, maybe forcedly, for decades.

Therefore, the NMT hype is such only for the people in the translation industry who, on the other hand, are much more open to stuff that insiders in other industry would label as crap.

After all, NMT is just another algorithm and, with the world going increasingly digital and (inter)connected, and so information-intensive, resorting to algorithms is inevitable, because it is necessary.

Data as fuel


The fuel of algorithms is data. Unfortunately, despite the long practice of producing language and translation data, translation professionals and businesses have seemingly learned very little about data and are still very late in adopting data-driven applications. Indeed, data can be an asset if you know what to do with it, how to take advantage of it, how to profit from it.

In this respect, besides showing a total ignorance of what “big data” is, the inconsiderate use of non-sensical “translation big data” has been seriously damaging any chance for the effectual trading of language and translation data. This is just one of the impact of fads and hypes, especially if ignorantly borrowed from and spread through equally ignorant (social) media.

As Andrew Joscelyne finally wrote in his latest post for the TAUS blog, «Language data […] has never been “big” in the Big Data Sense.»

By the way, what happened with “translation big data” is about to happen with AI, too, because ML—or even DL—is not AI, but too many people don’t care to deepen and see the difference.
In fact, with the translation industry processing less than 1% of translation requests, language data can’t be exactly big, while translation businesses don’t have the necessary knowledge, tools, and capability to effectively exploit and benefit from translation (project) data. Exceptions are de rigueur, of course, but one can count them on the fingers of one hand, and they all are technology providers.


Data and quality


Unfortunately, the translation industry is affected by a syndrome, blaming technology for replacing services, products, and habits with others of lower quality, impoverished and/or simplified. Luddite anyone?

Indeed, only human laziness should be blamed for unsatisfactory quality. And this is consistent with the perennial, grueling and inconclusive debate on quality, the magical mystery word that instantly explains everything and forbids further questioning.

A solid example is the anxiety for confidentiality with online MT, which is not quite an issue. Confidentiality is definitely a minor issue for an industry whose players are still extensively using email, when not FTP unsecured connections and servers for exchanging files. Confidentiality is definitely not a major issue when it is mostly delegated to NDAs, without providing for any enforcement mechanism, especially when non-disclosure agreements are perceived as offensive, for revealing lack of trust and questioning professionalism. Confidentiality is not an issue when, even in spite of bombastic certifications, the violation of any confidentiality obligations is around the corner for keeping customer’s data unsecured, having no contingency or security plan in place or re-using the same data for other projects, knowingly or not. Also, in most cases, IPR rather than confidentiality is the real issue.

Anyway, when such issues arise, never is technology to blame but human laziness, sloppiness, helplessness, and ineptness.

Are all these traits also affecting data? Of course, they are. It is not a case that translation businesses believe they are so different than other service businesses, to the point that to real innovation has ever come from them. Even when they choose to build their own platforms, these are so peculiar that they could never be made available to the whole community, even if their makers would, and they wouldn’t. After all, this is also a reason for the proliferation of unnecessary standards. Narcissism is the boulder blocking the road to change and innovation.

The same dysfunctional approach affects data. For example, should one believe in the meager results of the perennial, grueling and inconclusive debate on quality, one should only be able to measure it downstream and only by counting and weighing errors, in a typical red-pen syndrome. On the contrary, a predictive quality score can be computed based on past project data, which is extremely interesting for buyers.


More ML applications


Now, imagine a predictive quality score combined with a post-factum score deduced from content profiling and initial requirements (checklists), classic QA, and linguistic evaluation based on correlation and dependence, precision and recall and edit distance.

Only a weak point will be left, i.e. how to recruit, vet, compensate, and retain vendors to have always the best fit.

During his presentation on KPIs at the recent interpretation and translation congress in Breda, XTRF’s Andrzej Nedoma recalled how project managers always tend to use the same resources, who are not necessarily always the most suitable.

With vendor managers continuously vetting and monitoring vendors and constantly updating the vendor database, project managers could have a reliable repository to get their picks from. And with project managers updating, in turn, the vendor database with performance data, this could be combined with assessments and ratings from customer and peers to feed an algorithm that would provide for best fits at any new projects and, in short, ultimately start a virtuous circle and maximize customer satisfaction.

To be unambiguously clear once again, this is by no means an endorsement of translation marketplaces. On the contrary, the inherent vice of translation marketplaces is the ultra-exploitation of information asymmetry as they provide no mechanism to help factual vetting and evaluation, thus ultimately disintermediation. However, any platform that users from all parties could join in and be vetted and evaluated—and their performances fairly measured—will eventually prevail.

If the idea of translation marketplaces has not worked out so far is not because of a supposedly unique nature of translation; on the contrary, this is one of the conditions that make the translation industry an ideal candidate for disruption. In fact, with suitable data and the right algorithms, machine learning—including deep learning—can provide many high-value solutions.

Where’s the weakness in data then? In humans [who misunderstand and misuse it.]

Luigi Muzii's profile photo

Luigi Muzii has been in the "translation business" since 1982 and has been a business consultant since 2002, in the translation and localization industry through his firm. He focuses on helping customers choose and implement best-suited technologies and redesign their business processes for the greatest effectiveness of translation and localization-related work.

This link provides access to his other blog posts.