Friday, November 9, 2012

Understanding Post-Editing

MT continues to build momentum because the need for large global enterprises to make more information available faster, continues relentlessly. There are still some who question the “content tsunami”, and we are now getting some data points that define this for industry players in very specific terms, for those who are still doubtful.  For example, last week at AMTA 2012, a senior Dell localization professional gave us a specific data point: Dell has increased its volume of business and product-related translation from 30 million words to 60 million words in two years. This was done without any increase in the translation budget. This situation is mirrored across the information technology industry, and now many with information-intensive products apparently do realize that translating large amounts of product/service-related information enhances global business success. Given the speed and volume of information creation, it is often necessary and perhaps even imperative to use technology like MT. 

While much of the discussion about MT tends to get gets bogged down around linguistic quality issues, we should all remember that finally, the whole point of business translation and the whole localization industry is to facilitate cross-language trade and commerce. We now have many examples where we see that the final customers of Dell, Microsoft, Apple customers say that machine-translated content is more than acceptable even though this same content would fail a linguistic review in a typical Translate-Edit-Proof (TEP) process.  We see terms like linguistic usability and readability being applied to translated content which is often short of the TEP quality that many of us have grown accustomed to or expect. Customer expectations change and free online MT has made MT more acceptable, also as we understand that the content that is translated is being created by writers who are not really writers, for readers who do not have scholarly expectations on this content. There is content that requires TEP rigor and there is some that can be raw MT and there is much in between with various shades of grey.  This is not acquiescence to crappy quality across the board, rather, it is understanding that for a lot of business translation, MT or PEMT does produce quality that helps to accomplish business goals of getting information to customers in a cost-effective and timely manner.

Thus, we see the growing use of MT in business translation contexts, but there is still a lot of misinformation and it is useful to share more information about successful practices so that the use and adoption of this technology is more informed and the discussions can become more dispassionate and pragmatic. 

There were some recent sessions exploring what post-editing MT is about in the ProZ Virtual Conference that I participated in, that I thought might be interesting to highlight in this post.

The integrated audio/video session is available on the ProZ site by clicking on the link to the left and playing the presentation back on the “low-bandwidth” image towards the bottom of the page. (Unfortunately, the live session had many video resolution issues but the recording is fine.) I have also included the Slideshare below for those who just want to see the basic content of the presentation. Hopefully, this presentation does provide a more realistic perspective on what is and is not possible with MT.

A second session included a panel discussion on post-editing with speakers from various translation agencies talking about their direct experiences with post-editing MT (PEMT).
PEMT is an important issue to understand as there are very strongly felt opinions on this (many based on actual bad experiences) but the signal-to-noise ratio is still very poor. Many translators feel that the work is demeaning and are not interested in doing it and practitioners should understand this. However, much of the negative feedback is based on early practices where the MT quality was very bad and translator/editors were paid unfairly for the effort involved. Some recent feedback from TAUS even suggests that many translators are considering leaving the profession because they do not enjoy this type of work. Better MT and fair compensation practices can address some of this dissatisfaction.  While early experiences often only focus on the most mechanical aspects of PEMT, I think there is an opportunity for the professional translation industry to get more engaged in solving different kinds multilingual problems e.g. Support Chat, Customer Forum discussions where translation could greatly enhance global customer satisfaction and increase dialogue and engagement.
I think that as we as an industry could further our prospects and greatly reduce the emotional content in the debate and discussion by getting better definitions of quality across the spectrum of content that is worth translating to facilitate commerce.  Competent TEP and raw online free MT are two opposite ends of the quality spectrum and it would be useful to get better definitions of the useful quality levels for the variety of grey shades in-between. Preferably in terms that are meaningful to the consumers of that content rather than in terms of linguistic errors.
In the PEMT context, it would useful for both translation agencies and translator/editors to understand the specific MT output quality involved better, so that compensation structures can be set more rationally and equitably.  This quality assessment, I believe is an opportunity for translators to develop measures that link the quality of specific MT output to their compensation on a project-by-project basis.  My previous post suggests one such approach but I am sure there are many other ways that translators can rapidly assess the scope and difficulty of a PEMT task and help the agencies and the buyers understand equitable compensation structures based on trusted measurements of the scope of work.
There is also a growing discussion on what an ideal PEMT environment looks like and Jost Zetsche provided some clues in the 213th edition of his newsletter. But basically, we need tools that provide some different context since MT errors are not quite the same as TM fuzzy match errors. Perhaps some of the frustration that translators have, stems from expecting to see the same type of errors as they see in low fuzzy matches.   I would suggest the following for an ideal PEMT environment:
  • Rapid Error Detection (Grammar and Spelling Checkers)
  • Rapid Error Correction (e.g. Move word order, global correct and replace)
  • Dictionary and Terminology DB links
  • Error Pattern Identification so that hundreds of strings can be corrected by correcting a pattern
  • Quality measurement utilities to assess specific and unique MT output
  • Productivity measurement tools
  • Context as well as individual segment handling
  • Tight integrations with TM
  • Linguistic data manufacturing capabilities to create corrective data
  • Regex and Word Macro-like capabilities