Friday, October 28, 2011

The Building Momentum for Post-Edited Machine Translation (PEMT)

This is an (opinionated) summary of interesting findings from a flurry of conferences that I attended earlier this month. The conferences were the TAUS User Conference, Localization World and tekom. Even though it is tiring to have so many so close together, it is interesting to see what sticks out a few weeks later. For me TAUS and tekom were clearly worthwhile, and Localization World was not, and I believe that #LWSV is an event that is losing it’s mojo in spite of big attendance numbers.

Some of the big themes that stand out (mostly from TAUS) were:
  • Detailed case studies that provide clear and specific evidence that customized MT enhances and improves the productivity of traditional (TEP) translation processes
  • The Instant on-demand Moses MT engine parade
  • Initial attempts at defining post-editing effort and difficulty from MemoQ and Memosource
  • A future session on the multilingual web from speakers who actually are involved with big perspective, global web-wide changes and requirements
  • More MT hyperbole
  • The bigger context and content production chain for translation that is visible at tekom
  • Post-editor feedback at tekom
  • The lack of innovation in most of the content presented at Localization World
 The archived twitter stream from TAUS (#tausuc11) is available here, the tekom tag is #tcworld11 and Localization World is #lwsv. Many of the TAUS presentations will be available as web video shortly and I recommend that you check some of them out.

PEMT Case Studies
In the last month I have seen several case studies that document the time and cost savings and overall consistency benefits of good customized MT systems. At TAUS, Caterpillar indicated that their demand for translation was rising rapidly and thus they instituted their famed controlled language (Caterpillar English) based translation production process using MT. The MT process was initially more expensive since 100% of the segments needed to be reviewed but they are now seeing better results on their quality measurements from MT than from human translators on Brazilian Portuguese and Russian according to Don Johnson, Caterpillar. They expect to expand to new kinds of content as these engines mature.

Catherine Dove of PayPal described how the human translation process got bogged down on review and rework cycles (to ensure PayPal brand’s tone and style was intact) and was unable to meet production requirements of 15K words per week with a 3 day turnaround in 25 languages. They found that “machine-aided human translation” delivers better, more consistent terminology in the first pass and thus they were able to focus more on style and fluency. Deadlines are easier to meet and she also commented that MT can handle tags better than humans. They also focus on source cleanup and improvement to leverage the MT efforts and interestingly the MT is also useful in catching errors in the authoring phase. PayPal uses an “edit distance” measurement to determine the amount of rework and have found that the MT process reduces this effort by 20% on 8 of 10 languages they are using MT on. An additional benefit is that there is a new quality improvement process in place that should continue to yield increasing benefits.

A PEMT user case study was also presented by Asia Online and Sajan at the Localization Research Conference in September 2011. The global enterprise customer is a major information technology software developer, hardware/IT OEM manufacturer, and comprehensive IT services provider for mission critical enterprise systems in 100+ countries. This company had a legacy MT system developed internally that had been used in the past by the key customer stakeholders. Sajan and Asia Online customized English to Chinese and English to Spanish engines for this customer. These MT systems have been delivering translated output that even beats the first pass output from their human translators due to the highly technical terminology, especially in Chinese.  A summary of the use case is provided below:
  • 27 million words have been processed by this client using MT
  • Large amounts of quality TM (many millions of words) and glossaries were provided and these engines are expected to continue to improve with additional feedback.
  • The customized engine was focused on the broad IT domain and was intended to translate new documentation and support content from English into Chinese and Spanish.
  • A key objective of the project was to eliminate the need for full translation and limit it to MT + Post-editing as a new modified production process.
  • The custom engine output delivered higher quality than their first pass human translators especially in Chinese
  • All output was proof read to deliver publication quality.
  • Using Asia Online Language Studio the customer saved 60% in costs and 77% in time over previous production processes based on their own structured time and cost measurements.
  • The client also produces an MT product, but the business units prefer to use Asia Online because of considerable quality and cost differences.
  • Client extremely impressed with result especially when compared to the output of their own engine.
  • The new pricing model enabled by MT creates a situation where the higher the volume the more beneficial the outcome.
The video presentation below by Sajan begins at 27 minutes (in case you want to skip over the Asia Online part) and even if you only watch the Sajan presentation for 5 minutes you will get a clear sense for the benefit delivered by the PEMT process.

A session on the multilingual web at TAUS by the trio Bruno Fernandez Ruiz, Yahoo! Fellow and Vice President, Bill Dolan, Head of NLP Research, Microsoft, Addison Phillips, Chair, W3C Internationalization Group / Amazon also produced many interesting observations such as:
  • The impact of “Big Data” and the cloud will affect language perspectives of the future and the tools and processes of the future need to change to handle the new floating content.
  • Future applications will be built once and go to multiple platforms (PC, Web, Mobile, Tablets)
  • The number of small nuggets of information that need to be translated instantly will increase dramatically
  • HTML5 will enable publishers to be much freer in information creation and transformation processes and together with CSS3 and Javascript can handle translation of flowing data across multiple platforms
  • Semantics have not proven to be necessary to solve a lot of MT problems contrary to what many believed even 5 years ago. Big Data will help us to solve many linguistic problems that involve semantics
  • Linking text to location and topic to find cultural meaning will become more important to developing a larger translation perspective
  • Engagement around content happens in communities where there is a definable culture, language and values dimension
  • While data availability continues to explode for the major languages we are seeing a digital divide for the smaller languages and users will need to engage in translation to make more content in these languages happen
  • Even small GUI projects of 2,000 words are found to have better results with MT + crowdsourcing than with professional translation
  • More translation will be of words and small phrases where MT + crowdsourcing can outperform HT
  • User s need to be involved in improving MT and several choices can be presented to users to determine the “best” ones
  • The community that cares about solving language translation problems will grow beyond the professional translation industry.

At TAUS, there were several presentations on Moses tools and instant Moses MT engines via a one or two step push button approach. While these tools facilitate the creation of “quick and dirty data” MT engines, I am skeptical of the value of this approach for real production quality engines where the objective is provide long-term translation production productivity. As Austin Powers once said, “This is emPHASIS on the wrong syllABLE" My professional experience is that the key to long-term success (i.e. really good MT systems) is to really clean the data and this means more than removing formatting tags and removing the most obvious crap. This is harder than most think. Real cleaning also involves linguistic and bilingual human supervised alignment analysis. Also, I have seen that it takes perhaps thousands of attempts across many different language pairs to understand what is happening when you throw data into the hopper, and that this learning is critical to fundamental success with MT and developing continuous improvement architectures. I expect that some Moses initiatives will produce decent gist engines, but are unlikely to do much better than Google/Bing for the most part. I disagree with Jaap’s call to the community to produce thousands of MT systems, what we really need to see are a few hundred really good, kick-ass systems, rather than thousands that do not even measure up to the free online engines. And so far, getting a really good MT engine is not possible without real engagement from linguists and translators and more effort than pushing a button. We all need to be wary of instant solutions, with thousands of MT engines produced rapidly but all lacking in quality and "new" super semantic approaches that promise to solve the automated translation problem without human assistance. I predict that the best systems will still come from close collaboration with linguists and translators and insight borne from experience.

I was also excited to see the initiative from MemoQ to establish a measure of translator productivity or post-editing effort expended, by creating an open source measurement of post-edited output, where the assumption is that an untouched segment is a good one. MemoQ will use an open and published edit distance algorithm that could be helpful in establishing better pricing for MT post-editing and they also stressed the high value of terminology in building productivity. While there is already much criticism of the approach, I think this is a great first step to formulating a useful measurement. At tekom I also got a chance to see the scheme that MemSource has developed where post-edited output is mapped back to a fuzzy matching scheme to establish a more equitable post-editing pricing scheme than advocated by some LSPs. I look forward to seeing this idea spread and hope to cover it in more detail in the coming months.

Localization World was a disappointing affair and I was struck by how mundane, unimaginative and irrelevant much of the content of the conference was. While the focus of the keynotes was apparently innovation, I found the @sarahcuda presentation interesting, but not very compelling or convincing at all in terms of insight into innovation. The second day keynote was just plain bad, filled with clich├ęs and obvious truisms e.g. “You have to have a localization plan” or “I like to sort ideas in a funnel”. (Somebody needs to tell Tapling that he is not the CEO anymore even though it might say so on his card). I heard several others complain about the quality of many sessions, and apparently in some sessions audience members were openly upset. The MT sessions were really weak in comparison to TAUS and rather than broadening the discussion they succeeded in mostly making them vague and insubstantial. The most interesting (and innovative) sessions that I was witness to were the Smartling use case studies and a pre-conference session on Social Translation. Both of these sessions focused on how the production model is changing and both were not particularly well attended. I am sure that there were others that were worthwhile (or maybe not), but it appears that this conference will matter less and less in terms of producing compelling and relevant content that provides value in the Web 2.0 world. This event is useful to meet with people but I truly wonder how many will attend for the quality of the content.

The tekom event is a good event to get a sense for how technical business translation fits into the overall content creation chain and also see how synergies could be created within this chain. There were many excellent sessions and it is the kind of event that helps you to broaden your perspective and understand how you fit into a bigger picture and ecosystem. The event has 3300 visitors so it is also a much larger perspective in terms of many different views points. I had a detailed conversation with some translators about post-editing. They were most concerned about the compensation structure and post-editor recruitment practices. They specifically pointed out how unfair the SDL practice of paying post-editors 60% of standard rates was, and asked that more equitable and fair systems be put into place. LSPs and buyers would be wise to heed this feedback if they want to be able to recruit quality people in future. I got a close look at the MemSource approach to making this more fair, and I think that this approach which measures the actual work done at a segment level should be acceptable to many. This approach measures the effort after the fact. However, we still need to do more on making the difficulty of the task before the translators begin more transparent. This begins with an understanding of how good the individual MT system is and how much effort is needed to get to production quality levels. This is an area that I hope to explore further in the coming weeks.

I continue to see more progress on the PEMT front and I now have good data of measurable productivity even on a language pair as tough as English to Hungarian. I expect that a partnership of language and MT experts will be more likely to produce compelling results than many DIY initiatives, but hopefully we learn from all the efforts being made.