Monday, June 27, 2016

MT Options for the Individual Translator

This is yet another post triggered by conversations in Rio at the ABRATES conference in early June. As I mentioned in my initial conference post, the level of interest in MT was unusually high and there seemed to be real and serious interest in finding out ways to engage with MT beyond just the brute force and numbing corrective work that is typical of most PEMT projects with LSPs.

MT has been around in some form for decades, and I am aware that there have always been a few translators who found some use for the technology. But since I was asked by so many translators about the options available today I thought it would be useful to write about it. 

The situation today for many translators is to work with low quality MT output produced by LSP/enterprise MT practitioners with very limited MT engine development experience, and typically they have no say in how the MT engines evolve, since they are so far down the production line. Sometimes they may work with expert developed MT systems where there is some limited feedback and steering possible, but generally the PEMT experience involves: 

1.    Static MT engines that are developed offline somewhere, and that may have periodic if any updates at all to improve the engine very marginally if at all.
2.    Post-editors work on batches of MT output and provide periodic feedback to MT developers.

This is beginning to change in the very recent past with innovative new MT technology that is described as Adaptive Interactive Dynamic Learning MT (quite a mouthful). The most visible and elegant implementation of this approach is from a startup called Lilt. This allows the translator-editor to tune and adjust the engine dynamically in real time, and thus make all subsequent MT predictions more intelligent, informed and accurate. This kind of an MT implementation is something that has to be cloud based to allow the core engine to be updated in real time. Additionally, when used in workgroups, this technology can also leverage the individual efforts of translators by spreading the benefit of a more intelligent MT engine with the whole team of translators. Each user benefits from the previous edits and corrective actions of every other translator-editor and user as this case study shows. This allows a team to build a kind of communal edit synergy in real time and theoretically allows 1+1+1 to be 5 or 7 or even higher. The user interface is also much more translator friendly and is INTERACTIVE so it changes moment to moment as the editor makes changes. Thus you have a real-time virtuous cycle which is in essence an intelligent learning TM Engine that learns with every single corrective interaction. 

CSA tells us that the SDL Language Cloud also has similar abilities but my initial perusal suggests it is definitely less real-time, and less dynamic than Lilt i.e. it is not updating phrase tables in real time. There are several little videos that explain it and superficially it looks like an equivalent but I am going to bet it is not yet at this point in time anyway.

So for a translator who wants to get hands on experience with MT and understand the technology better, what are the options? The following chart provides a very rough overview of the options available ranked by my estimation of the best options to learn valuable new skills. The simplest MT option for an individual translator has always been a desktop RbMT system like Systran or ProMT, and it still is a viable option for many, especially with Romance languages. But there has never been much that could be done to tune these older systems beyond building dictionaries, a skill that in my opinion will have low value in the future.

Good expert developed MT systems will involve frequent interaction with translators in the early engine development phases to ensure that engines get pattern based corrective feedback to rapidly improve output quality. The more organized and structured this feedback and improvement process, the better the engine and the more interesting the work for the linguist.

Of the “free” generic MT engines Microsoft offers much more customization capabilities and thus are a good platform to learn how different sets of data can affect an MT engine and MT output. This of course means the user needs to have organized data available and an understanding of the technology learning process. MT can be trained if you have some understanding of how it learns. This is why most Moses experiments fail I think, too much effort focused on the low value mechanics, and too little on what and why you do what you do. I remain skeptical about ignorant Moses experimentation because getting good SMT engines requires good data + understanding of how your training corpus is similar to or different from the new source that you want to translate, and a variety of tools to help keep things synced and aligned when you see differences. I am also skeptical that these DIY efforts are likely to get as engines as good as the free generic engines, and I wonder why one would bother with the whole Moses effort if you could get it at higher quality for free from Microsoft or Google. There are some translators who claim some benefit from working with Moses and its desktop implementations like Slate. 

All these options will provide some experience and insight into MT technology, but I think it is useful to have some sense for how they might impact you from two key perspectives shown below:

1.    What options are likely to give you the fastest way to get to improved personal productivity?
·         I would expect that an Adaptive MT solution is most likely to do this the fastest, but you need have good clean training data – TM, Glossaries (the more the better). Also you should see your edit experience improve rapidly as your corrective feedback modifies the engine in real time.
·         Followed by the Microsoft Translator Hub (if you do not have concern about privacy issues), SDL Language Cloud and some Expert systems which are more proactive in engaging translators but this will also involve an LSP middleman typically.
·         Generic Google & Microsoft and Desktop RbMT (Romance languages and going to English tend to have better results in general).
·         DIY Moses is the hardest way to get to productivity IMO but there is some evidence of success with Slate.

2.    What options are most likely to help develop new skills that could have long-term value?
·         My bet is that the SMT options are all going to help skills related to corpus analysis, working with n-grams, large scale corpus editing and data normalization. Even Neural MT will train on existing data so all those skills remain valuable.
·         Source analysis before you do anything is always wise and yields better results as your strategy can be micro-tuned for the specific scenario.
·         Both SMT and the future Neural MT models are based on something called machine learning. It is useful to have at least a basic understanding of this as this is how the computer “learns”. It is growing in importance and worth long-term attention.

There are tools like Matecat that show promise, but given that edits don’t directly change the underlying MT engine, I would opt for a true Adaptive MT option like Lilt instead.  

  What do we mean by PEMT in the larger context?

The traditional understanding of PEMT can be summarized in the graphic below and this is the most common kind of interaction that most translators have with MT if they have any at all.

However the problem definition and the skills needed to solve them are quite different when you consider an MT engine from a larger overall process perspective. It generally makes sense to address corpus level issues before going to the segment level so that many error patterns can be eliminated and resolved at a high frequency pattern level. It may also be useful to use web crawlers to gather patterns to guide the language model in SMT and get more fluent target language translations. 

The most interesting MT problems which almost always happen outside the “language services industry”, require a view of the whole user experience with translated content from beginning to end. This paper describes this holistic user experience view and the resultant corpus analysis perspective for an Ebay use case. Solving problems often involves data acquisition of the right kind of new data, normalization of disparate data, and focus on handling high frequency word patterns in a corpus to drive MT engine output quality improvements. This type of deeper analysis may happen at an MT savvy LSP like SDL, but otherwise is almost never done by LSP MT practitioners in the core CSA defined translation industry. This kind of deep analysis is also often limited at MT vendors because customers are in a hurry, and not willing to invest the time and money to do it right. Only the most committed will venture into this kind of detail, but this kind of work is necessary to get an MT system to work at optimal levels. Enterprises like Facebook, Microsoft and EBay understand the importance of doing all the pre and post data and system analysis and thus develop systems that are much more closely tuned to their very very specific needs.

MT use makes sense for a translator only if there is a productivity benefit. Sometimes this is possible right out of the gate with generic systems, but most often it takes some effort and skill to get an MT system to this point. It is important that translators have a basic understanding of three key elements before MT makes sense:
1.    Their individual translation work throughput without MT in words per hour.
2.    The quality of the MT system output and the ability of the translator to improve this with corrective feedback.
3.    The individual work throughput with MT after some effort has been made to tune it for specific use.

Obviously 3 has to be greater than 1 for MT use to make sense. I have heard that many translators use MT as way to speed up the typing or to look up individual words. I think we are at a point where the number of MT options will increase and more translators will find value. I would love to hear real feedback from anybody who reads this blog as actual shared experience is still the best way to understand the possibilities.



  1. Kirti, thank you for mentioning Slate Desktop. Like you said, there is some evidence of success. In that success (and the success of other use cases below), there's growing evidence that your pyramid's tiers and descriptive categories are showing some age, especially relative to the individual translator. Let's review some use cases.

    Memsource recently published this table showing 5-20 percent zero-edit distance zero (ED0) for "generic" engines (e.g. GT, Bing). Customers have turned to Slate Desktop after using Memsource for gist-only translations what they accepted without edit but were nowhere close to final publication quality. This practice drives the ED0 percent artificially higher, but I can't speculate to what extent.

    Circa 2014, Asia Online's Mr Dion Wiggins (acknowledging your now-former affiliation) stated in a Linkedin Machine Translation group discussion, "But don't take my word for it. I'm happy to provide real world reference customers where as much as 86% of raw MT output requires no editing at all..." Again, this is a reference to ED0 percent. I'm sorry I lost the link.

    With only these two use cases as data points, it would be easy to conclude that (a) generic engines christen the low end at 5-20% because no experts customized the engine, and (b) expertly customized engines set the high end at as much as 86%. However, we can't stop there or we'd be oversimplifying what has become a complex and interesting world.

    Let's look at a use case that falls outside the pyramid. Silvio Picinini is one of several MT specialists at eBay who maintain and improve eBay's highly customized Moses system. In a Linkedin Pulse comment, he shared an anecdotal ED0=5%. This highly-qualified expert team doesn't ensure a high percent of exact matches. They work to ensure the system serves its purpose.

    For a decade, MT systems have been built with specific design goals as one shared resource to serve huge user groups (final publication, gist, user-generated-content, etc). Individual engines serving individual translators (i.e. the subject of this blog post) mark the introduction of a new use case that's only months old.

  2. Three new entrants in this use case have been specifically designed and built from the ground up to serve individual translators: Lilt, Slate Desktop and SDL's Language Cloud Custom MT Engine. There are pro's and con's for each, but all are alike in that none have SMT experts actively customizing each individual translator's engine. In this regard, none qualify for your "Expert systems."

    In another diversion from the pyramid's tiers, all three are simple for individual translators to setup and use. It takes less than 1/2 hour for Slate customers download, install the app, import their TMs and start generating their first engine. Language Cloud users bypass the download/install step, but they still upload their TMs and wait for the engine to finish. Lilt takes a very different approach. With regards to complexity and end-user understanding, none qualify for your DIY option.

    Quality results are also neck-n-neck. One translator's blog,, reported comparable experiences between the two. She perceived slightly better for quality SDL's Language Cloud and Slate Desktop was ahead for confidentiality. Emma didn't share her percentages.

    So, I'll share that customers who have reported (unlike Memsource reporting is voluntary) experience ED0% in the 30% to 50% range for their first engine. Some are investing time to hone their expertise and are improving their results. Like Mr Wiggin's offer in 2014, I'll make personal introductions between anyone who wishes to verify these numbers. To demonstrate our confidence in these numbers, we have confidence to initiate (soon) a "more than your money back" promotion if the customer doesn't experience a minimum threshold ED0 percent on their first engine.

    I can't propose good alternatives for your options, but I think there's an argument for looking for new ones.

    1. Tom

      Congratulations on the success of some of your customers. It is encouraging, and shows that the technology has fundamental merit and possibility.

      I am not responsible for Dion's hyperbole and I will leave it at that. For some reason the history of MT is populated with people who make claims that many consider outrageous and exaggerated and may not always be warranted. As someone said once: "The history of MT is filled with empty promises." Most real experts will admit that MT is THE most difficult problem in NLP or AI. I once met someone at DARPA who said to me, in reference to the Star Trek technology, that all the other technology shown on that series was easier to make happen in reality than the Universal Translator.

      I think there is a significant amount of conflation in your comments so I will attempt to clarify.

      Experts are those people who have a deep understanding of the tools they are using and generally have considerable experience with failure as well as success with these tools beyond just having educational credentials. Expertise can take hundreds if not thousands of hours to acquire. Just to be clear let me state this more explicitly.

      Google is a generic system developed by experts.
      Bing Translate is a generic system developed by experts that allows users to customize to some extent.
      Lilt is an Adaptive MT system built by experts to allow translators to easily tune an SMT engine in real time.
      SDL provides base systems built by experts (from Language Weaver)designed to allow some further customization.

      In every case I would expect that the baseline engines produced by these experts will outperform a Moses attempt (by LSP or Translator) in probably 90% or more of the cases. TAUS has documented this repeatedly. Several enterprise TAUS members tried to use Moses to build their own engines and realized that they could do much better by simply using the MSFT Hub customization capabilities and get better quality at a fraction of the cost and effort. Even the domain customization experts often find that their systems are barely better than expert generic systems.

      Silvio is not an "MT specialist". He is actually an MTLS = MT Language Specialist. He addresses linguistic problems that undermine MT engines and he is working with a team of expert MT developers who actually handle the SMT engine development part. Also he is working with the most challenging type of content possible (UGC) so 5% is actually pretty good there. Their effort is about as far as you can get from your typical Moses experience.

      "It takes less than 1/2 hour for Slate customers download, install the app, import their TMs and start generating their first engine." is in my experience unlikely to outperform a system built by experts. However, I maybe wrong, and I offer you the chance to provide contrary evidence in a guest post on this blog.

      If your users can indeed produce better systems than the experts listed above in 1/2 hour, I think you may have a truly valuable asset on your hands.

      Thank you for your comments.

  3. I didn't mean to infer you were responsible for any of Dion's comments. I mere used he comment as as a possible high end of a range.

    Sorry about the "significant amount of conflation" in my comments. I think technologies mature faster than business models. SMT as a viable commercial technology is over 10 years old, but it's easy to get stuck thinking about it in its original context.

    Use cases are expanding. It takes thought to sort through them. So, thanks for supporting most points and your less conflated summary.

  4. Very interesting. Awaiting more from Mr. Kirti Vashee & Mr. Thomas Hoar.