Thursday, March 22, 2012

Exploring Issues Related to Post-Editing MT Compensation

As the practice of post-editing MT continues to gain momentum and perhaps even some acceptance as a legitimate practice, there continue to be questions raised about how to do this in a way that is equitable and beneficial to all the stakeholders.  There was an interesting discussion in LinkedIn on this subject where it is possible to see the perspectives of tools developers, LSPs and clients, and even some translators in their own words. Some of the things that stand out from this discussion are the general sense of the lack of trust between constituents in the translation production chain, the inability to share and take operational risk between stakeholders, and the difficulty in defining critical elements in the process e.g. MT/final translation quality and accurate productivity implications.

The issue of equitable compensation for the post-editors is an important one, and it is essential to understand the issues related to post-editing, that many translators find to be a source of great pain and inequity.  MT can often fail or backfire if the human factors underlying work are not properly considered and addressed. 

From my vantage point, it is clear that those who understand these various issues and take steps to address them are most likely to find the greatest success with MT deployments. These practitioners will perhaps pave the way for others in the industry and “show you how to do it right” as Frank Zappa says. Many of the problems with PEMT are related to ignorance about critical elements, “lazy” strategies, and lack of clarity on what really matters, or just simply using MT where it does not make sense. These factors result in the many examples of poor PEMT implementations that antagonize translators. 

Some of the key elements that need to be understood or implemented to maximize the probability of successful PEMT outcomes include:
  • Customize your MT engine for your domain requirements and generally MT engines make the most sense if you do repeat/ongoing work in the same domain and language. And be wary of any MT vendor or LSP who assures you that “for a nominal service charge you could reach nirvana tonight. If you do this properly there are no instant solutions and few shortcuts.
  • Use objective and mostly transparent and repeatable measurements of efficiency and quality that are trusted by key stakeholders e.g. SAE J2450
  • A good understanding of the cost structure and efficiency of the pre-MT translation production process (Human TEP = Translate, Edit, Proof). If you don’t understand where you are, how will you know what direction is forward? It makes little sense to deploy MT if you cannot improve upon the old process in some meaningful way i.e. timeliness, cost, and quality.  Tradeoffs will need to be made as it is not possible to improve all three of these elements.
  • An understanding of the “average translation quality” of the MT engine output. This can be determined at the outset by sample-based tests, and are useful input to establishing fair rates for the full project. It should be understood that MT engines that produce higher quality will require less effort to get to a level where the final delivered translation is equivalent to that produced from a standard TEP process. Really good engines can produce an average output segment that looks like an 85% fuzzy match or better from translation memory. This kind of system will also produce a large number of 100% matches for new segments, which still need to be verified and editors need to be compensated appropriately for this validation. Learn how to interpret and link measures like J2450, BLEU, TER, and Edit Distance to create your own unique measurements so that you can quickly understand what you are dealing with. Badly done, these metrics are a black hole of misunderstanding and wrong conclusions remember that automated metrics are only as good as the users' understanding of them. Human assessments are ALWAYS needed and always used in successful PEMT case studies.  If a survey of 90 interested users in March 2012 is to be believed, “over 80% of MT users have no reliable way of measuring MT quality”. If this is indeed true, it surely explains why translators are so outraged and why so much of PEMT yields less than satisfactory results.
  • An understanding of the target translation quality level. Interestingly, the easiest case to define is one where a client requires the same quality as they have from a standard TEP process. MT in this case is a draft version of the translation step of the TEP process and will still require the EP, or edit and proofing steps. Expect your EP costs to rise as your T costs fall. It is much harder to define the quality level when MT output will only be “slightly” edited for “understandability” in very high-volume knowledge-based projects. “Slightly” or “lightly” is very hard to define and even harder for a translator to understand. Studies (see the Sharon O’Brien links below) have shown that translator productivity is lower with this type of task than with one where the target is TEP quality. It is important to provide many examples of what is required. In these high-volume cases, it may be more useful to follow the 80/20 rule and focus 80% of the post-edit efforts on the 20% of content that is most important. Often this is best done through corpus analysis to define the human focus, and then compensating editors for the corrections they make or at a fair hourly rate, i.e. at a rate they would make on average for TEP work.
  • An understanding of the effort required to raise the MT to the target level. Once you understand your average MT output quality and have a clear target, it is possible to make an estimate of the post-editing effort. This should be the key determinant of what post-editor compensation is. If you wish to build a long-term relationship with post-editors it would be wise to compensate post-editors fairly. Thus if a system raises translation production efficiency consistently, I would recommend that you compensate editors at a rate to ensure their net income is higher than it would be in a TEP process. (So easy for me to say.) The proper rate can only be learned through experience so there are few useful generalizations to be made. The quality of your measurement systems really matters here and can help you get to the “right” (win-win scenario) rate faster. Also, it would probably be better to err on over-paying rather than under-paying as shown in these completely hypothetical examples e.g.
    • Average TEP rate 15 Cents, Average Daily Translation Output 2,500 words  =  $375  per day 
    • MT Engine 1: Average Post Edit Translation Output 7,000 words, Average Rate 7.5 cents = $525 per day
    • MT Engine 2: Average Post Edit Translation Output 5,000 words, Average Rate 10 cents = $500 per day
    • MT Engine 3: Average Post Edit Translation Output 4,000 words, Average Rate 12 cents = $480 per day
Omnilingua is an example of a company that has long term experience (5 years +) with PEMT and has developed sophisticated processes and methodology to understand this gap and human effort with rare precision. They are committed users of SAE J2450 for many years now, and thus understand quality and productivity with distinctive precision. You can see a video presentation of the Omnilingua approach in their own words starting at 31:00. It is my opinion that very few LSPs can make this PEMT effort assessment with any precision. This is where superior LSPs will excel and this competence should become a clear differentiator in future. (Ask your LSP the question: “How much effort is needed to make the MT output indistinguishable from standard TEP?” and watch them fidget around a bit as FZ says.)
 Remember also that most often, starting with a good professionally developed engine will produce better ROI, than starting with quick and dirty DIY options that require much more post-MT labor to raise the output to target levels.
  • Expect and plan for a learning curve and develop post-editor training materials. MT requires an investment in engine development and process learning as well, as measurement systems fall into place. However, once this new process is understood it is possible to have success, even with tough languages like Hungarian as Hunnect as shown with their training program. Not all translators are interested in post-editing and it is important to determine this early and then provide guidance to those who are interested and best suited to this kind of translation work.
  • While accurate quality measurements are important it is also critical to understand productivity impacts in as much detail as possible over time. Best practice suggests that it is important to monitor the use of MT through the various learning stages to best understand the financial and productivity impact. This may not be the same for every language as MT does not work equally well in all languages. Some MT systems will continuously improve and some will not. LSPs will need to decide where they should invest: MT technology, measurement systems and processes, PEMT training and new workflow, and/or solving new translation problems like customer support chat. It is unlikely that many will be able to do it all and the overall complexity and time taken to achieve mastery of all these new initiatives should not be underestimated.
  • Involve some translators in the MT engine steering process to identify major error patterns. This action has been shown to produce much more useful systems and higher productivity when you go into production. They can also help to establish meaningful and trusted measurements between the raw MT quality and establishing reasonable translator productivity expectations.
It will still require collaboration and trust (that rarely exists) between corporate customers, tool vendors, LSPs, and translators. The stakeholders will also all need to understand that the nature of MT requires a higher tolerance for “outcome uncertainty” than most are accustomed to. Though it is increasingly clear that domain-focused systems in Romance languages are more likely to succeed with MT, it is not clear very often how good an MT engine will be a priori, and investments need to be made to get to a point to understand this. The stakeholders all need to understand this and work together and make concessions and contributions to make this happen in a mutually beneficial way. This is of course easier said than done as somebody has to usually put some money down to begin this process. The reward is long-term production efficiency so hopefully, buyers are willing to fund this, rather than go the fast and dirty MT route as some have been doing.

I hope we have all reached a point where we understand that arbitrarily setting lower pay rates for MT-related cleanup work is unwise and that the lowest initial cost of building MT engines is rarely the best TCO (total cost of ownership) with MT technology. MT in 2012 is still very complex and requires real financial and intellectual investment to build real competency. 

I suspect that the most compelling evidence of the value and possibilities of PEMT will come from LSPs who have teams of in-house editors/translators who are on fixed salaries and are thus less concerned about the word vs. hourly compensation issues. For these companies, it will only be necessary to prove that first of all MT is producing high enough quality to raise productivity and then ensure that everybody is working as efficiently as possible. (i.e not "over-correcting"). I would bet that these initiatives will outperform any in-house corporate MT initiative in quality and efficiency.

I have seen that there are LSPs out there that know how to build the eco-system to make this a win-win scenario for all stakeholders so I know it is possible, even though it is not very common in 2012. In these win-win examples, the customer and the LSP understand the risks, and post-editors are paid more when the engine is not great and less when it is. Quality and productivity-related information flows freely in the production chain and is trusted, and often translators are compensated for higher productivity. Thus, I think there are three basic principles to keep in mind in developing fair and equitable compensation practices:
  1. Measure productivity and quality accurately, frequently, and objectively and share critical findings. Ensure that the links between MT quality and productivity expectations are realistic.
  2. Train and select the right people for post-editing work and monitor progress carefully, especially in the early stages.
  3. Link the compensation to the effort required to complete the job which means you need to have some understanding of this effort. Not all PEMT work is equal, when uncertain about the correct rates, initially err on the side of overpaying rather than underpaying to build a loyal workforce.
The LinkedIn discussion goes into many more details and is worth a look to get a broader and varied perspective on the post-editor compensation issue. It would be wonderful to hear other perspectives on this in the comments. Practitioners, especially LSPs, should understand that the real benefit of making these investments is long-term cost and productivity advantages that are sustainable and defensible. This, however, requires “hard work” as George Bush said, apart from the time and money investment and has a learning curve. Finally, I would warn you that we live in a time of Moses Madness and many yearn for quick fixes that cost nothing. These quick fixes can often backfire and we should heed the wise words of Frank Zappa in the song Cosmik Debris:
The Mystery Man came over
And he said: "I'm outa-site!"
He said, for a nominal service charge,
I could reach nirvana tonight

If I was ready, willing 'n able
To pay him his regular fee
He would drop all the rest of his pressing affairs
And devote his attention to me
But I said . . .
Look here brother,
Who you jivin' with that Cosmik Debris?

For those interested, here are some other references that may be useful to those trying to understand PEMT issues from other perspectives :


  1. As usual, words of wisdom, Kirti.

    Personally, I think that the way out of these discussions is payments based on edit distance between MT output and final translated product. It is the perfect measurement of the effort required to "upgrade" a translation. But there are two challenges with this approach:

    . You have to assume good faith in all the actors, but that should be a given in any long-standing business relationship
    . There is no certainty of the price of a job until it is completed. This lack of initial precision may make some uncomfortable, but it can be kept at bay with regular communication for big engagements.

    You could say that it is a roll of the dice, But in my mind, it is more just than making assumptions about productivity and hope that they pan out.
    Posted by Pedro Gomez

  2. Pedro

    I agree.

    While it may be possible to make reasonably accurate good faith estimates / guesses on the level of effort required to complete a project, it is also possible that these estimates are not correct. Thus it is necessary to sometimes have a reconciliation after a project is completed and make compensation and cost adjustments as necessary.

    This requires cooperation and trust and thus wherever this exists much is possible.

  3. Good article. MT and post editing is here to stay.
    I think it is like steam engine during first industrial revolution. We translators either need to learn to live with it, find good companies to work with and adapt, or we will be run over (smile).
    But I also believe that effectivity of post editing and cost connected with it will vary substantially from language to language, depending on grammar complexity, size of memory, etc, etc.
    Slavic languages will be especially chalenging but I am interested to talk to a good company working with MT and post editing in my languages.

    Posted by Radovan Pletka

  4. Udi Hershkovich • I subscribe to your views Kirti. The point is simple - If the whole point of highly-effective machine translation is to dramatically increase Translators' productivity, the price should be a factor of the productivity gain. For example, if MT allows a translator to take x2 the work in a given period of time, the price per word should be somewhere between 50% and 100% of normal fee. This means, using MT, translators should always see the opportunity to INCREASE their income or reduce the amount of work to maintain a similar level of income. In the 60% of normal price example, productivity has to be close to double to make it worth anyone's time.

    It boils down to the output quality of highly-effective MT solutions. Now, it is only a matter of certifying MT solutions with a score that would indicate to translators the productivity factor they should expect to know "how low can they go..." Of course, there would be a variance across projects depending on all familiar factors I would not repeat here but on average, MT MUST produce a substantial productivity gain that can be measured in SIMPLE TERMS (such as average words post edited per day/hour) as a baseline for determining the right cost for service.

    Bottom line, if a translator cannot do more in less time using MT, no price is good enough!

  5. @Udi, I agree with you. If you cannot accomplish productivity there is little point to using MT at all.

    However they may be cases where some translator feedback is required in the early stages of developing the MT system and in this case they should be compensated at fair rate that should not be less than their time would be doing regular TEP work.

    Ideally it would be good to have translators involved in making the productivity assessments so that the rates are trusted and well understood. Translators should run tests and establish a productivity/quality score of some kind that could be linked to a PEMT rate

  6. Martin Wunderlich • How about simply charging an hourly or daily rate? A rate that is independent of the task - be it translation, post-editing, file preparation or project management etc. That way one could get rid of the percentage and fuzzy rating system, one wouldn't have to differentiate between good or bad MT output and it would bring the profession more in line with professional services standards in other areas, such as consulting, software development, lawyers etc., raising the profile of the industry as a side-effect.

    (of course, there is a lot to be said against such an approach - clients would not agree to it, because they wouldn't know upfront how much a jobs will cost; the organisational structures on the supplier side are to weak to introduce such a change; some translators benefit from MT/TM systems and don't have to pass on the price benefit, thus they would see reduced income with an hourly rate etc. etc.).

    I apologize to Kirti in advance, if my post is perhaps not a direct answer to his query, but I feel this aspect is quite relevant in the context of MT post-editing pricing. After more than 10 years in this industry, I am still sometimes surprised that translation services are priced like a mass-produced product, such as screws, and not like the professional service that it is.

  7. No reason to apologize -- agreed upon hourly rates may in fact be a way to solve this, but also introduces upfront uncertainty on how much a project would cost, which again leads back to having good metrics to make these assessments or estimates before a project begins, and cost/time estimates are roughly agreed upon or understood.

  8. Donna Furlani • I agree with you, Martin. I price editing jobs - whether post-editing, editing/proofreading/verifying/etc. human translations, editing English written by non-native speakers, or really anything other than straightforward translation - as an hourly rate. So far though I've only done this for clients I've worked with in the past, so they knew my work and trusted that I wasn't going to inflate the bill or take advantage in any other way. I did recently hear of an agency that's only using ATA-certified translators both to translate and review each translation, and they're asking translators to commit to a per-word rate not only for translation jobs, but also (at a lower rate, obviously) proofreading. Given that you've got a rough guarantee of high quality due to their selectivity in translators, I think this could work, but I have no firsthand experience. I think it's a comparable situation except for one key point: MT output quality varies greatly depending on the language combination. If I were to offer a much lower per-word rate for post-editing French to English output than for German to English output from the same system, then would clients wonder what was going on? Charging an hourly rate does away with this issue

  9. I agree with Kirti that post-editing payment should always be linked to the quality of the MT output and this should be assessed first (I think even an automatic score, properly done, should give a clear indication), of course the expected quality is another key factor. Since translators have high differences in productivity gain from any tool (as most research shows) be it MT or TM, it is difficult to put a price tag that will please everyone, and an hourly compensation (without any productivity metric associated) is jut too open for the person buying the service, as translators will give very different prices, not because they are trying to "take advantage" but because performances are very different. Confidence scores for MT should solve partially the issue if these scores are done properly, but I think this is not fully developed yet.
    I also agree that with a high quality MT output the productivity could be equal to a high fuzzy match because with MT there is a possibility of having perfect matches that would require no change at all, so the mixture of proposals that need editing with those that do not required any change can be beneficial. On the other hand, if the quality of the MT output is not high, the amount of time required to edit will probably be even higher than to translate from scratch (except if terminology is very up to date). Of course, if a translator is paid 30% of a rate for reviewing, TM (except 100%) and MT matches should be always higher than this at least. Even with a very high quality output, a translator should probably be paid between 60 and 70 percent of the rate to compensate for the amount of time spent on reading, comparing to source, making the necessary adjustments and revising before finalizing the segment, and this should also be applied to TMs. There are may factors, however, such as terminology and general quality of the TM and output. The ideal, right now, is to put price tags according to the original quality of the material after doing an initial test of the quality that will be given to the translators, and the expected quality that is wanted from them. A flat rate across all domains, languages, engines, just not reasonable (possibly not even for TMs).

    Posted by Ana Guerberof

  10. I found the article very interesting. I've had a little experience with post-editing MT output and the results vary. I agree that sometime MT is used where it isn't appropriate or, at least, where it should be understood that it is the first stage of a phased approach. I think that some unrealistically view MT as a holy grail of sorts. It will get better but will always require some human intervention to ensure quality.

    Posted by Victor Foster

  11. it's all very interesting .... very technical ...
    but a factor is tragically missed IMHO: the Human Factor

    we translator are humans, and as far we remain humans (may be a day or another we all become cyborgs, making language providers very very happy, having to bill oil for us, instead of dollars) you and every PEMT or whatsoever acronym provider should understand that working (translating or building cars) it's not only a matter of number of pieces built and dollars

    when allowed, I use MT in my job to save typing time, but I mainly still use my hardware (brain, experience) so I'm not bored doing so

    on the other hand, testing PEMT (I'm MT tester for LW, and luckily I don't receive a job since months) or doing PEMT (I get through the test for a PEMT provider, and luckily again I didn't receive any job so far) I find the job so boring that I cannot believe a real "human" can do that job every day, 8 hrs/day, without falling on the ground as someone bit by the tsetse fly

    and there is no "high rate" (if any, and I know that rates are low) that can raise you from the dead, not mentioning that working in a trance-like state, makes you are more prone to mistakes

    so I think this industry should focalize to add MIRTH to the process (even if I think it will be easier building transcyborgs ;-)