Friday, May 1, 2020

Evaluating Machine Translation Systems

This post is the first in a series of upcoming posts focusing on the issue of quality evaluation of multiple MT systems. MT system selection has become a more important issue in recent times as users and buyers realize that potentially multiple MT systems can be viable for their needs, but would like to develop better, more informed selection procedures.

I have also just ended my tenure at SDL, and this departure will also allow my commentary and opinion in this blog to be more independent and objective, from this point onwards. I look forward to looking more closely at all the most innovative MT solutions in the market today and providing more coverage on them.  

As NMT technology matures it has become increasingly apparent to many buyers that traditional metrics like BLEU that are used to compare/rank different MT systems and vendors are now often inadequate for this purpose, even though these metrics are still useful to engineers who are focused on building a single MT system.  It is now much more widely understood that best practice involves human evaluations used together with automated metrics. This combined scoring approach is a more useful input in conducting comparative evaluations of MT systems.  To the best of my knowledge, there are very few in the professional translation world who do this well, and it is very much an evolving practice and learning that is happening now. Thus, I invite any readers who might be willing to share their insights into conducting consistent and accurate human evaluations to contact me about doing this here.

Most of the focus in the localization world's use of MT remains on MTPE efficiencies (edit distance, translator productivity), often without consideration of how the volume and useable quality might change and impact the overall process and strategy. While this focus has value, it misses the broader potential of MT and "leaves money on the table" as they say.

We should understand the questions that we are most frequently asking is: 
  • What MT system would work best for our business purposes?
  • Is there really enough of a difference between systems to use anything but the lowest cost vendor?
  • Is there a better way to select MT systems than just looking at generic BLEU scores?
I have covered these questions to some extent in prior posts and I would recommend this post and this post to get some background on the challenges in understanding the MT quality big picture.

The COVID-19 pandemic is encouraging MT-use in a positive way. Many more brands now realize that speed, digital agility, and a greater digital presence matter in keeping customers and brands engaged. As NMT continues to improve, much of the "bulk translation market" will move to a production model where most of the work will be done by MT.  Translators who are specialists and true subject matter experts are unlikely to be affected by the technology in a negative way, but NMT is poised to penetrate standard/bulk localization work much more deeply, driving costs down as it does so.

This is a guest post and an unedited independent opinion from an LSP (Language Service Provider) and it is useful in providing us an example of the most common translation industry perspective on the subject of multiple MT system evaluations. It is interesting to note that the NMT advances over SMT are still not quite understood by some, even though the bulk of the research efforts and most new deployments have shifted to NMT. 

Most LSPs continue to stress that human translation is "better" than MT which most of us on the technology side would not argue against, but this view loses something when we see that the real need today is to "translate" millions of words a day. This view also glosses over the fact that all translation tasks are not the same. Even in 2020 most LSPs continue to overlook that MT solves new kinds of translation problems that involve speed and volume and that new skills are needed to really leverage MT in these new directions. There is also a tendency to position the choice as binary MT vs Human Translation, even though much of the evidence is pointing to new man + machine models that provide an improved production approach. The translation needs of the future are quite different from the past and I hope that more service providers in the industry start to recognize this. 

I also think it is unwise for LSPs to start building their own MT systems, especially with NMT. The complexity, cost and expertise required are prohibitive for most. MT systems development should be left to real experts who do this on a regular and continuing basis. The potential for LSPs adding value is in other areas, and I hope to cover this in the coming posts.

Source: MasterWord

                                                                                                                                                               * =======*

It’s not a secret that machine translation (MT) has taken the world by storm. Almost everyone now has had some experience with MT, mostly in the form of a translation app such as Google Translate being popular. But MT comes in a variety of formats and is heavily utilized by businesses and institutions all over the world.

With that in mind, which MT system is best? Since MT comes in many colors, figuratively speaking, which one should you ought to rely on if you decide to build your own MT system? We’ll also talk more about translation quality and whether or not MT is suitable for specialized translations such as medical translation; a critical field now for any active translation company in light of the current coronavirus pandemic that has the whole world at a standstill.

What is Machine Translation?

Machine Translation, or MT, is software that is capable of translating text from a source language to a translated text of the target language. Over the years, there have been multiple variations of MT, but there are three definitive types; Rules-based Machine Translation (RBMT), Statistical Machine Translation (SMT), and Neural Machine Translation (NMT). Here’s a quick rundown of their characteristics, including their pros and cons between each other;

  1. RBMT

Rules-Based Machine Translation is one of the earliest forms of MT. Its algorithm is language-based, meaning for it to know how to translate one source language to the other, it must rely on input data in the form of a lexicon, grammar rules, and other linguistic fundamentals. The problem with RBMT systems is scaling it efficiently as it becomes more complicated as more language rules are added. Also, RBMT is never ideal for obscure languages with minuscule data. However, with the development of advanced MT systems over the years, RMBT has largely been superseded, in which you'll know more about its successor next.

  1.  SMT

Statistical Machine Translation, compared to RBMT, is designed to translate languages from statistical algorithms. SMT works by being fed with data in the form of bilingual text corpora, SMT is programmed to identify patterns in the data and form its translations from it. Patterns in this context mean how many times a certain word/phrase appears consistently in a certain context. This probability learning model allows SMT systems to render relatively appropriate translations compared. It’s pretty much like ‘If this is how was it done, then this is how it should be done’. 

SMT also must be fed with plenty of data just like RBMT, but MT developers of which includes translation app developers prefer SMT due to its ease of setting up due to numerous open-source SMT systems available, cost-effectiveness due to free quality parallel text corpora that are available online, higher translation accuracy than RMBT, and its ease of scalability as the system grows bigger.

But just like RBMT, SMT can’t function well if it’s fed with insufficient and poorly structured parallel text corpora. That being said, it’s not that ideal to translate obscure languages.

  1. NMT

Neural Machine Translation is the latest development in MT. Think of it as an upgraded version of SMT in which its abilities are now supplemented with artificial intelligence (AI), specifically deep learning. Not only is it capable of coming through data faster, but it can also produce better outputs through constant trial and error. SMT does it the same way as well but the only difference, albeit a definitive one, is that it’s able to do it much faster and more accurately. Google Translate recently made the switch in 2016 to NMT from its old SMT system.

Its deep learning capability is such a real game-changer that it’s able to accomplish what RBMT and SMT; translating obscure regional languages. That’s why Google Translate can cover over 100 languages such as Somalian and Gaelic. But its outputs are questionable, to say the least as it needs some time to learn a language that has little reliable data lying around for it to use. However, the development of NMT just goes to show how far MT overall has evolved over the years.

What Makes A Good Machine Translation (MT) System?

There have been many MT systems over the years and many still in development. The ones that happened to survive the test of time are select variants of RBMT and most variants of SMT. NMT has quickly gained popularity and will slowly replace SMT as the years go by. What’s generally expected out of a good custom-built MT system is reliability and quality of outputs, pretty much like any other product or service out there.

If you’re looking for a reliable metric, then BLEU (Bilingual Evaluation Understudy) is one of the most widely used MT evaluation metrics. BLEU ranks MT systems between 0 being the worst and 1 being the best. It rates how close the translated text is to a human. The more human-like and natural-sounding the translation is, the better the score.

That being said, every MT developer creates their system according to not only the developer’s but also a client’s specifications and linguistic needs. So not one of them is alike. But there are MT platforms that are widely used by multiple clients due to their flexibility of being adapted to the client’s needs and ease of use. But even with a variety of MT systems being developed over the years, one thing remains the same; MT systems have to learn from a lot of quality data and must be given the time to learn.

They say that machines are inherently dumb and that they’re only as good as job or data are given to them. For MT, that notion still rings true up to this day and will most likely keep ringing for decades to come. However, quality data isn’t only what makes a good MT system.

There are platforms in which MT is integrated with other processes for it to render quality or at the very least, passable translations. Indeed, MT itself is a process onto its own, but its outputs, even with deep learning capabilities, is still not up to par with that of a professional translator. MT has to be integrated with other processes, namely computer-assisted translation (CAT) tools.

There are many CAT tools but two of the most essential are a glossary tool and translation memory. A glossary is simply a database of terminologies and approved translations. It’s a very simple feature but very important as it saves up a lot of time for the translator as they don’t need to constantly look back and forth which translation is the perfect choice for the source text at hand.

A translation memory is also like a glossary, but stores phrases and sentences. It also saves the translator valuable time as many translations recycle the same language such as user manuals, marketing collateral, and etc. A translation memory also helps by providing consistent language at a given domain and language pair.

I Now Pronounce You Man and Machine

However, even with all the bells and whistles, developers can equip an MT system with, is MT alone enough? Can MT alone produce accurate and quality translations that are demanded by the clients of language services today? MT is part of the solution but doesn’t comprise the complete picture. It sounds counterintuitive, but MT is best paired with a professional translator as a means of optimizing the translation process.

This unlikely union broke the predictions of many that saw MT giving professional translators a run for their money and driving translation companies out of business. Professional translators work with CAT tools as it helps them be more churn out more words than ever before and helps them be more consistent. Why the need for speed? Domo’s latest report states that “2.5 quintillion bytes of data are created every single day”—that’s a lot of data and most of it is not in English which creates the rising demand for translation services.

Also, by having a translator work together with an MT system, the translator is doing the MT system a favor as well by constantly feeding back revisions for the MT to learn from and render better outputs and suggestions. All in all, it’s a highly productive and beneficial two-way street between a translator and an MT system.

Of course, this ‘relationship’ will be all for moot if the MT system wasn’t developed to a satisfactory standard. That being said, developers have to take into account both translation clients and translators themselves.

They have to ensure that not only will the MT system procure quality translations for clients but can also adapt to the needs of the translators using them. Being convenient to use and having a friendly UX design is one thing, but being able to incorporate the inputs of a translator and accurately replicating it in similar contexts is also another thing.

What Do Professional Translation Services Have Over MT?

Specifically, what can a translation company that hires professional translators to do better than artificial intelligence (AI)? Apart from translation quality and consistency, a professional translator has one advantage; they’re human. It may sound cliche but a human can understand nuances and no MT or AI are light years away from replicating.

Unable to Understand Emotional, Cultural, and Social Nuances

As of now, there is no MT yet that is capable of accurately understanding jokes, slang, creative expressions, and so on. The abilities of MT shine brightly with formulaic sentences and predictable language conventions. But if confronted with linguistic habits that are natural in everyday conversations, MT falls apart. This problem is made more pronounced at a global scale since every culture and society has its own way of speaking all the way down to highly distinct street lingo.

Unable to Process Linguistic Nuances

Parent languages are divided by their regional vernaculars and dialects. When someone’s trying to translate English to Spanish, it’s actually just generic Spanish with no local ‘flavoring’. But if you’re aiming for translations that resonate true to how Spanish people or how Mexican people speak, then a professional translator with native-speaking ability is who you need. No MT system now is able to comprehend, let alone translate linguistic nuances reliably.

Unable to Keep Up With Linguistic Trends

Languages change every day with new words being constantly added and removed to the lexicon of world languages. Humor, slang, and creative expressions are a testament to that notion. Even social media has given rise to new creative expressions in ways human society has never experienced before with meme culture as one of the most notable examples. Even if NMT was somehow capable of keeping up, it would still need time for the data to accumulate for it to start translating. By that time, new slang would have already popped out.

Unable to Render Specialized and Highly Contextual Translations

What we mean by specialized here is text with highly nuanced terminology such as the literary field and also texts belonging to critical fields such as the legal, scientific, medical sector. Authors inherently embed their works with highly nuanced expressions and linguistic ‘anomalies’, so much so that there is no identifiable pattern for any MT that can work with since each author has their own voice.

For the legal, and the medical sector, have their own language conventions that although seem formulaic on the surface, the inherently specialized terminologies and the risk factor involved in these fields means no margin of error can be given to MT. There are MT systems used in these sectors but are always paired with a professional legal translator and professional medical translator.

Developing Your Own MT System

Even with the quality issues and other imperfections associated with MT, the demand for machine translation services. According to a report published in Market Watch, “The Global Machine Translation Market was valued at USD 550.46 million in 2019 and is expected to reach USD 1042.46 million by 2025, at a CAGR of 11.23% over the forecast period 2020 - 2025.”.

However, many are looking to develop their own company MT instead of ‘borrowing’ one from an external provider and for good reason. If a translation company is rendering plenty of niche translations in a given year, then configuring their own MT system is the most cost-effective investment as there will be no need to pay for licensing fees to external MT providers.

Many industries have their language conventions and jargon, in regards to internal communication mostly. For example, legalese is perfectly comprehensible to lawyers but downright alien-sounding to those with little legal knowledge. That being said, even businesses and organizations have their own language conventions that veer off from the industry norm. In that case, they would then have to build their very own MT systems, especially if they’re focusing on specific target foreign markets and audiences. 

So out of the 3 listed earlier, which one should you choose? It’s most likely SMT due to its popularity and how much support it gets. There are who have gone for a Hybrid MT by combining SMT and RBMT but that’s probably too intimidating for first-timers. If you want to make the big leap right from the start, then, by all means, go NMT if it meets your company’s objectives. 

Mind you that investing and training any MT system does come at a price and will take time. It’ll take time for glossaries and translation memories to develop, provided that the data used to feed the system is of standard. For a translation company, that usually isn’t a problem as in tandem with open-source parallel text corpora are the translation company’s own document archives.

Can You Choose MT Over a Translation Company?

Back then, instant language translation belonged to the category of futuristic science fiction gadgets. In fact, it still is today albeit we’ve heightened our standards. What we dream of now is instant voice interpretation. Specifically, being able to conduct a seamless multilingual conversation with anyone without the awkward pauses. But let’s get back to reality now. It’s hard not to be impressed with the abilities of MT today since we can easily witness it from our smartphones.

Even so, there are plenty of flaws associated with MT as discussed earlier that’s actually hindering it from developing serious widespread adoption. Be that as it may, MT as it now nevertheless has its own perks. Although one shouldn’t rely too much on MT at certain thresholds, doesn’t mean that you shouldn’t use it at all at specific situations. Here are some reasons why.


There are plenty of translation apps out there such as Google Translate as you might know already. All of them are free with the exception of premium access subscription payments to unlock more features. There are plenty of free translation plugins as well for website developers. Keep in mind that we’re talking about generic translators here and not the specialized MT systems from external providers that have licensing fees.

Speed and Convenience

At specific situations, some are just looking to have translation at the very moment they want it. Whether you’re a language student or a traveling businessperson, MT is your answer. It’s free and they can get results the moment they click the translate button. Even if it’s not 100% accurate, it at least gives them an implied meaning behind the translation.

For Generic, Repetitive, and Well-Resourced Languages

*Consider this pointer at your own risk*. One can certainly find MT if they have non-contextual and predictable text at hand such as simple and formulaic phrases. What you decide to do with it is all on you whether you use it only as a reference or actually employ it in a professional setting. That being said, the most quality translations you can get are from well-resourced such as Spanish, German, French, etc. If you tried translating, even a simple phrase from English to Chinese, you’ll unlikely get a similarly accurate translation since English and Chinese have vastly different language rules and an unrelated linguistic history.

A Note on Translation Quality in the Context of the Coronavirus Pandemic

Despite the vast improvements to MT, quality is still a significant issue and as you’re aware, human translators are there to guarantee that. However, in no situation is quality ever more necessary than in global communication in crisis as made evident by the current coronavirus pandemic, specifically in the form of medical translation. Medical translation is a highly specialized niche in translation and critical one too wherein the slightest mistranslation would lead to potentially unfortunate and even fatal consequences.

Medical translation must be provided by specialized medical translators who have complete mastery over their language pair (Ex. English to Spanish, Spanish to English) and extensive familiarity with medical terminology, medical practices, and code of ethics. They must undergo additional lengthy training before they can be classified as certified medical translators. That being said, are MT systems out of the picture?

There are MT systems that translate medical documents and medical research, but it must be under constant supervision from a certified medical translator. Connecting it to today’s crisis, there hasn’t been a recent time in history where a speedy translation of medical research has been more important than ever. Medical scientists all over the world are working together to understand the COVID-19 virus for them to come up with viable treatments and eventually, a vaccine. With that in mind, medical translation is the only bridge that’s making this level of coordination between medical scientists around the world possible.

Final Takeaway

Will there be a future where MT would be so advanced and almost human-like that professional translators would be an endangered species? If you were to judge by the pace of development of MT in such a short period, it would not be that unreasonable to believe in a future like that. However, let’s not put too much thought into it as it doesn’t pay attention enough to what is demanded from translation in the first place.

It’s apparent now that MT is good at servicing the translation speed and optimization needs, but as for quality, much of it belongs to the hands, or should I say the mind of a professional translator. That union would likely last for the next few decades. But let’s not hold ourselves to that prediction. Perhaps a game-changing MT feature is just a few years away or if our prediction holds true decades. But still, that’s considering our standards on translations, particularly on quality and human-ness, haven’t changed.

Author Bio:

Laurence Ian Sumando is a freelance writer penning pieces on business, marketing, languages, and culture.


  1. This could have been interesting if it had written a bit better. Just as an example:
    "SMT also must be fed with plenty of data just like RBMT, but MT developers of which includes translation app developers prefer SMT due to its ease of setting up due to numerous open-source SMT systems available, cost-effectiveness due to free quality parallel text corpora that are available online, higher translation accuracy than RMBT, and its ease of scalability as the system grows bigger."
    The above is ungrammatical, repetitive, and hard to understand.
    It could (and should) have been improved with a bit more time spent on writing and editing the article.
    For example
    "Just like RBMT, SMT also requires plenty of data, but MT developers (including translation app developers) prefer SMT for several reasons: the numerous open-source SMT engines available mean that SMT is easier to set up, the free parallel text corpora of good quality available allow costs to be reduced, and, crucially, SMT is both more accurate than RMBT and more scalable."

    1. Riccardo

      Thank you for your comment.

      This is a free service that attempts to share knowledge that may benefit some readers with very specialized interests.

      I have no doubt that it could be improved with some editing to improve readability and overall clarity. However, unless you or others are volunteering to do this as a free service, it will remain what it is: a best attempt even though it is stream of consciousness flow.

  2. The question is, if we created a dataset of written and edited blog posts, could we teach a machine to edit blog posts? Just joking, guys! I've been learning a lot from you both. Regards.

  3. I think, that in the future, machine translation will be so good that it will be impossible to tell the difference from human translators