Thursday, April 23, 2015

How Translators Can Assess Post-Editing MT Opportunities

With the continued growth in the use of MT, it has become increasingly important for translators to understand better when it is worth getting involved, and when it is wise to stay away from post-editing opportunities that come their way. 

This is still a very fuzzy issue for most translators and I think it might be useful to share some information with them to highlight some of the key variables they could use to determine the most rational action given the facts at hand. For some, post-editing will never be palatable work, but for those who look more closely and see that PEMT is now just another variant of professional translation work that is much like other translation work, which can be economically advantageous when one is working with the right partners and the right technology in this case.  

We have seen that in the early days of MT use that there has been much cause for dissatisfaction all around, especially for translators who have been asked to post-edit sub-standard MT output for very low rates. Translators do need to be wary since many LSPs deploy MT technology without really understanding it, with the sole purpose of reducing costs, and with no understanding on how to produce systems that actually enable this lower cost scenario or interest in engaging translators in the process. Thus it is worth translators learning some basic discrimination skills to determine and establish some general guidelines to understand the relative standing of any PEMT opportunity that they are presented with.

The following checklist is a useful start (IMO) that every translator should consider when deciding what kinds of PEMT opportunities are worth working on.
  • Understand the very specific MT output that you will be working with as every MT engine is unique and assessments need to be made in reference to the actual output you will be working with.
  • Determine if the LSP understands what they are doing with the MT technology and can respond to feedback on error patterns. There are many “upload and pray” efforts nowadays that create very low-quality systems that are very hard to control and challenging for translators to work with.
  • Understand the MT technology that is being used as not all MT is equal. There are many variants and you should know what the key differences are. Systems that allow feedback and have more controls to correct errors after the MT engine has been built and accept ongoing corrective feedback will generally be better to work with.
  • Have a basic understanding of the MT methodology which means at least an overview of the rules-based and statistical approaches. This can give you a sense of what kind of feedback you can provide and also help you understand error patterns.
  • Understand that MT engine development is an evolutionary process rather than an instant solution that Google has led some of us to believe. Professional MT deployment is a molding process that evolves in quality through expert iteration and is typically done to tune an engine for a specific business purpose to help an ongoing high-volume translation production need. MT makes much less sense for random one-time use.
  • Understand the basic quality assessment metrics used with MT. BLEU scores are often bandied about with MT systems and often interpreted incorrectly. If you understand them you will always have a better sense of the reality of a situation as incompetent practitioners use and interpret these scores incorrectly all the time. The BLEU scores are only as good as the Test Sets used and so try and understand what makes a good Test Set as described in the link.
It is wise to use technology when and only if there is a clear benefit, and this is especially true with MT. An LSP should have a clear sense that the productivity of the translation project will be improved by using the technology otherwise it is detrimental in many ways. This means that there needs to be a clear idea of what typical translation project throughput is before and after the use of MT. And a trusted way to measure how MT might impact this productivity. 

  • Thus MT only makes sense when it boosts productivity or when it makes it possible to provide some kind of translation for material that would just not get translated otherwise.
  • Translators should also understand that lower rates are not necessarily bad if their throughput is appropriately higher.
  • Finally, MT error patterns tend to be consistent so it makes sense to approach corrections at a chunk level rather than an individual segment level. 

Much of the dissatisfaction with PEMT work is related to compensation. My post on PEMT compensation remains the most read post on this blog even though it is now 3 years old. But I think if you understand the specific MT output you are dealing with and it’s impact on your throughput you can make an informed decision.

It is wise to remember that a lower rate does not necessarily mean less overall compensation as the following totally hypothetical chart explains. (The productivity benefits are more likely to be shared less generously). The best LSPs will have an open and transparent process in setting this rate and translators will be involved to ensure that the rate is fair and reasonable and based on actual MT output quality rather than some arbitrarily lower rate “since we are suing MT”. Also, expect Romance language rates to be lower than tough-for-MT languages like Japanese and Korean if editing effort is used as a criterion for setting the rate.

Much of what I have covered here was presented in a Proz presentation that is still available as video (slides with voice) for those who want to see and hear more details of the summary presented in this post.

As a complete aside this is for those who think that Genetically Modified foods are harmless. Here is a quote from a biotech company leader that you might want to consider the next time you eat corn from a US supermarket:
“We have a greenhouse full of corn plants that produce anti-sperm antibodies.” ~ Mitch Hein, president of Epicyte, a California-based biotechnology company.

And to end on a cheery note, I was very impressed by the musicality of this song and thought others might want to hear it too.