Thursday, March 22, 2012

Exploring Issues Related to Post-Editing MT Compensation

As the practice of post-editing MT continues to gain momentum and perhaps even some acceptance as a legitimate practice, there continue to be questions raised about how to do this in a way that is equitable and beneficial to all the stakeholders.  There was an interesting discussion in LinkedIn on this subject where it is possible to see the perspectives of tools developers, LSPs and clients and even some translators in their own words. Some of the things that stand out from this discussion is the general sense of the lack of trust between constituents in the translation production chain, the inability to share and take operational risk between stakeholders, and the difficulty in defining critical elements in the process e.g. MT/final translation quality and accurate productivity implications.
The issue of equitable compensation for the post-editors is an important one, and it is important to understand  the issues related to post-editing, that many translators find to be a source of great pain and inequity.  MT can often fail or backfire if the human factors underlying work are not properly considered and addressed. 

From my vantage point it is clear that those who understand these various issues, and take steps to address them are most likely to find the greatest success with MT deployments. These practitioners will perhaps pave the way for others in the industry and “show you how to do it right” as Frank Zappa says. Many of the problems with PEMT are related to ignorance about critical elements, “lazy” strategies and lack of clarity on what really matters, or just simply using MT where it does not make sense. These factors result in the many examples of poor PEMT implementations that antagonize translators. 

Some of the key elements that need to be understood or implemented to maximize the probability of successful PEMT outcomes include:
  • Customize your MT engine for your domain requirements and generally MT engines make most sense if you do repeat/ongoing work in the same domain and language. And be wary of any MT vendor or LSP who assures you that “for a nominal service charge you could reach nirvana tonight”. If you do this properly there are no instant solutions and few shortcuts.
  • Use objective and mostly transparent and repeatable measurements of efficiency and quality that are trusted by key stakeholders e.g. SAE J2450
  • A good understanding of the cost structure and efficiency of the pre-MT translation production process (Human TEP = Translate, Edit , Proof). If you don’t understand where you are, how will you know what direction is forward? It makes little sense to deploy MT if you cannot improve upon the old process in some meaningful way i.e. timeliness, cost, quality.  Tradeoffs will need to be made as it is not possible to improve all three of these elements.
  • An understanding of the “average translation quality” of the MT engine output. This can be determined at the outset by sample-based tests, and are useful input to establishing fair rates for the full project. It should be understood that MT engines that produce higher quality will require less effort to get to a level where the final delivered translation is equivalent to that produced from a standard TEP process. Really good engines can produce an average output segment that looks like an 85% fuzzy match or better from translation memory. This kind of system will also produce a large number of 100% matches for new segments, which still need to be verified and editors need to be compensated appropriately for this validation. Learn how to interpret and link measures like J2450, BLEU, TER, Edit Distance to create your own unique measurements so that you can quickly understand what you are dealing with. Badly done, these metrics are a black hole of misunderstanding and wrong conclusions and remember that automated metrics are only as good as the users understanding of them. Human assessments are ALWAYS needed and always used in successful PEMT case studies.  If a survey of 90 interested users in March 2012 is to be believed, “over 80% of MT users have no reliable way of measuring MT quality”. If this is indeed true, it surely explains why translators are so outraged and why so much of PEMT yields less than satisfactory results.
  • An understanding of the target translation quality level. Interestingly, the easiest case to define is one where a client requires the same quality as they have from a standard TEP process. MT in this case is a draft version of the translation step of the TEP process and will still require the EP, or edit and proofing steps. Expect your EP costs to rise as your T costs fall. It is much harder to define the quality level when MT output will only be “slightly” edited for “understandability” in very high volume knowledge based projects. “Slightly” or “lightly” is very hard to define and even harder for a translator to understand. Studies (see the Sharon O’Brien links below) have shown that translator productivity is lower with this type of task than with one where the target is TEP quality. It is important to provide many examples of what is required. In these high-volume cases it may be more useful to follow the 80/20 rule and focus 80% of the post-edit efforts on the 20% of content that is most important. Often this is best done through corpus analysis to define the human focus, and then compensating editors for the corrections they make or at a fair hourly rate, i.e. at a rate they would make on average for TEP work.
  • An understanding of the effort required to raise the MT to the target level. Once you understand your average MT output quality and have a clear target, it is possible to make an estimate of the post-editing effort. This should be the key determinant of what post-editor compensation is. If you wish to build a long-term relationship with post-editors it would be wise to compensate post-editors fairly. Thus if a system raises translation production efficiency consistently, I would recommend that you compensate editors at a rate to ensure their net income is higher than it would be in a TEP process. (So easy for me say.) The proper rate can only be learned through experience so there are few useful generalizations to be made. The quality of your measurement systems really matters here and can help you get to the “right” (win-win scenario) rate faster. Also, it would probably be better to err on over-paying rather than under-paying as shown in these completely hypothetical examples e.g.
    • Average TEP rate 15 Cents, Average Daily Translation Output 2,500 words  =  $375  per day 
    • MT Engine 1: Average Post Edit Translation Output 7,000 words, Average Rate 7.5 cents = $525 per day
    • MT Engine 2: Average Post Edit Translation Output 5,000 words, Average Rate 10 cents = $500 per day
    • MT Engine 3: Average Post Edit Translation Output 4,000 words, Average Rate 12 cents = $480 per day
Omnilingua is an example of a company that has long term experience (5 years +) with PEMT and has developed sophisticated processes and methodology to understand this gap and human effort with rare precision. They are committed users of SAE J2450 for many years now, and thus understand quality and productivity with distinctive precision. You can see a video presentation of the Omnilingua approach in their own words starting at 31:00. It is my opinion that very few LSPs can make this PEMT effort assessment with any precision. This is where superior LSPs will excel and this competence should become a clear differentiator in future. (Ask your LSP the question: “How much effort is needed to make the MT output indistinguishable from standard TEP?” and watch them fidget around a bit as FZ says.)
 Remember also that most often, starting with a good professionally developed engine will produce better ROI, than starting with quick and dirty DIY options that require much more post MT labor to raise the output to target levels.
  • Expect and plan for a learning curve and develop post-editor training materials. MT requires an investment in engine development and process learning as well, as measurement systems fall into place. However, once this new process is understood it is possible to have success, even with tough languages like Hungarian as Hunnect as shown with their training program. Not all translators are interested in post-editing and it is important to determine this early and then provide guidance to those who are interested and best suited to this kind of translation work.
  • While accurate quality measurements are important it is also critical to understand productivity impacts in as much detail as possible over time. Best practice suggests that it is important to monitor the use of MT through the various learning stages to best understand financial and productivity impact. This may not be the same for every language as MT does not work equally well on all languages. Some MT systems will continuously improve and some will not. LSPs will need to decide where they should invest: MT technology, measurement systems and processes, PEMT training and new workflow and/or solving new translation problems like customer support chat. It is unlikely that many will be able to do it all and the overall complexity and time taken to achieve mastery on all these new initiatives should not be underestimated.
  • Involve some translators in the MT engine steering process to identify major error patterns. This action has been shown to produce much more useful systems and higher productivity when you go into production. They can also help to establish meaningful and trusted measurements between the raw MT quality and establishing reasonable translator productivity expectations.
Having said all this, it will still require collaboration and trust (that rarely exists) between corporate customers, tool vendors, LSPs and translators. The stakeholders will also all need to understand that the nature of MT requires a higher tolerance for “outcome uncertainty” than most are accustomed to. Though it is increasingly clear that domain focused systems in Romance languages are more likely to succeed with MT, it is not clear very often how good an MT engine will be a priori, and investments need to be made to get to a point to understand this. The stakeholders all need to understand this and work together and each make concessions and contributions to make this happen in a mutually beneficial way. This is of course easier said than done as somebody has to usually put some money down to begin this process. The reward is long-term production efficiency so hopefully buyers are willing to fund this, rather than go the fast and dirty MT route as some have been doing.

I hope we have all reached a point where we understand that arbitrarily setting lower pay rates for MT related cleanup work is unwise, and that the lowest initial cost on building MT engines is rarely is the best TCO (total cost of ownership) with MT technology. MT in 2012 is still very complex and requires real financial and intellectual investment to build real competence. 

I suspect that the most compelling evidence of the value and possibilities of PEMT will come from LSPs who have teams of in-house editors/translators who are on fixed salaries and are thus less concerned about the word vs. hourly compensation issues. For these companies it will only be necessary to prove that first of all MT is producing high enough quality to raise productivity, and then ensuring that everybody is working as efficiently as possible. (i.e not "over-correcting"). I would bet that that these initiatives will outperform any in-house corporate MT initiative in quality and efficiency.

I have seen that there are LSPs out there that know how to build the eco-system to make this a win-win scenario for all stakeholders so I know it is possible, even though it is not very common in 2012. In these win-win examples, the customer and the LSP understand the risks, and post-editors are paid more when the engine is not great and less when it is. Quality and productivity related information flows freely in the production chain and is trusted, and often translators are compensated for higher productivity. Thus, I think there are three basic principles to keep in mind in developing fair and equitable compensation practices:
  1. Measure productivity and quality accurately, frequently and objectively and share critical findings. Ensure that the links between MT quality and productivity expectations are realistic.
  2. Train and select the right people for post-editing work and monitor progress carefully, especially in the early stages.
  3. Link the compensation to the effort required to complete the job which means you needs to have some understanding of this effort. Not all PEMT work is equal, when uncertain on the correct rates, initially err on the side of overpay rather than underpay to build a loyal work force.
The LinkedIn discussion goes into many more details and is worth a look to get a broader and varied perspective on the post-editor compensation issue. It would be wonderful to hear other perspectives on this in the comments. Practitioners, especially LSPs, should understand that the real benefit of making these investments is long-term cost and productivity advantages that are sustainable and defensible. This however requires “hard work” as George Bush said, apart from the time and money investment and has a learning curve. Finally, I would warn you that we live in a time of Moses Madness and many yearn for quick fixes that cost nothing. These quick fixes can often backfire and we should heed the wise words of Frank Zappa in the song Cosmik Debris:
The Mystery Man came over
And he said: "I'm outa-site!"
He said, for a nominal service charge,
I could reach nirvana tonight

If I was ready, willing 'n able
To pay him his regular fee
He would drop all the rest of his pressing affairs
And devote his attention to me
But I said . . .
Look here brother,
Who you jivin' with that Cosmik Debris?

For those interested, here are some other references that may be useful to those trying to understand PEMT issues from other perspectives :