Pages

Wednesday, April 3, 2013

PEMT Case Study - Advanced Language Translation

The most active advocates of machine translation today are Fortune 100 companies especially in the IT industry and the translation agencies that serve them.  The large IT companies have used MT more widely than any other group. However, MT can also be used by smaller LSPs outside of this sphere, especially when they collaborate with experts. This is an example of one such case study which provides many specifics that might be illustrative and educational for others.

Corporate Translation & Localization Services

Advanced Language Translation (www.advancedlanguage.com) is a Rochester, NY based Language Service Provider (LSP) which has skillfully incorporated MT (machine translation) into its production process, after years of resisting the technology. CEO Scott Bass admits that this anti-MT stance caused them to miss out on some larger projects, as customers increasingly looked for service providers with a coherent automation strategy. Customers were looking for a partner who understood how to deploy machine translation in order to output cost-effective and high volume translation projects. After much debate, the company finally decided to jump on the MT train.

ALT began the process by identifying certain customers who were open to a collaborative PEMT (post-edited machine translation) production model. They then began to work with Asia Online in the summer of 2012 to develop MT engines for the selected clients. For ALT’s first MT project, engines were simultaneously developed for French, Spanish, Russian and Japanese; however, there were some issues that needed addressing in order to ensure successful completion of the project. The greatest challenge initially was the scarcity of data available to build and train the MT systems; and in fact, data volume was so limited that the likelihood of producing usable systems with raw SMT (statistical machine translation) approaches like Moses, was nil. The other challenge was building an engine for Japanese, as it is considered an especially difficult language for MT.

To remedy these issues, ALT collaborated with Asia Online to develop a terminology-driven data manufacturing strategy. They worked to build up critical data resources that enabled productivity enhancing systems to be developed, and they leveraged relevant monolingual data that was readily available to boost the engines’ capabilities in the domain of interest. ALT relied on the broad and deep experience of the Asia Online team to maximize and leverage their limited data assets and resources.

Additionally, ALT focused on using translators who had previous PEMT experience rather than using ones who either had no PEMT experience, or were not interested in working with PEMT output. Prior to establishing production deadlines and appropriate compensation rates, ALT sent several samples of MT output to the post-editors to ensure that the scope and difficulty of the work was well understood. Bass notes, “Many companies rush into ‘instant MT’ solutions, overlooking the fact that MT takes time to develop, and coordination among all parties. While it is possible to leverage MT systems once they have been built, practitioners must understand that there is a direct relationship between this initial effort and ongoing success with the engine.” He adds, “This outlook is critical to successfully leveraging MT in the long-run, and lack of it, is one of the main reasons why MT initiatives fail.”

ALT also allowed post-editors to set their own throughput rates based on their experience with the MT output samples produced by the customized systems. They discovered that on average this process resulted in throughput rates of 750 words per hour (6,000 words per day). For Japanese, the rate was lowered to 500 words per hour, as the MT systems produced lower quality output when translating between English and Japanese. After the throughput and MT quality issues were resolved, compensation was addressed by giving the editors a 25% premium over standard human editing rates. These parameters were established to the satisfaction of all parties for this initial “test” project; and it turned out to be successful on all accounts due to cooperation and skilled implementation.

image

PEMT Best Practices

Scott Bass summarizes lessons learned and gives advice for others undertaking MT initiatives:

  • Do not rush MT engine development. A higher quality engine takes longer to develop and may require multiple iterations to build it into a usable engine.
  • Pro-actively manage the expectations of all the people involved, including clients, project managers, post-editors and LSP sales and marketing personnel.
  • Ensure that post-editors understand the very specific nature of the work.
  • Ensure that MT output levels reach a quality level similar to a light to moderate cleanup of a human translated segment.
  • Collect as much data as possible including TMs, in-domain monolingual data in the target language and core terminology. (ALT used MemoQ LiveDocs to quickly build corpora.)
  • Test the MT engines and benchmark them prior to starting actual production work.
  • Give the post-editors insight into the kinds of edits they will have to make by producing examples with smaller representative test data sets.
  • Focus on minimizing the most frequent errors first and understand that dumb repetition can kill enthusiasm faster than anything else.
  • Ensure that the MT engine is improving through feedback from post-editors. Ask for their feedback often and give them plenty of time and attention.
  • Retune and retrain the MT engine quickly and as frequently as possible. 
  • Make sure that the strengths of MT are clearly understood, and manage any weaknesses throughout the process.

Overall Benefits

ALT is a fantastic example of a company who has leveraged MT properly. The company has demonstrated that when MT is used with skill and when human factors are carefully managed, the benefits go beyond mere increases in productivity.  ALT has found that overall business with accounts who ventured into MT has increased by over 75%. Bass notes, “In many cases, we gained preferred vendor status because we added MT to our service mix.”

Bass also emphasizes that sitting on the fence with regard to machine translation enabled ALT to deny the possible benefits of an MT-HT production model for far too long. Tackling the business and human challenges first were actually the most difficult facets of shifting ALT’s production model. In fact, Bass comments, “The process of customizing an MT engine is not that much different than undertaking formal terminology development or managing high-quality translation memories. Extending our toolset to include MT has been a natural extension of skills we already had in place as an LSP.”

To hear an online presentation of this case study you can also go to the Asia Online website.

 

6 comments:

  1. Very interesting, this is an important topic. As a translator I definitely have mixed feelings about post-editing. I can see, though, how a thoughtful approach like this might be able to achieve meaningful gains in productivity without 'belittling' the role of the translator.

    By Andrew Lynch

    ReplyDelete
  2. Natalia KonstantinovaApril 8, 2013 at 9:48 AM

    It looks like a very promising example! Hopefully, more companies will understand that MT can help a lot and it is not a danger to real translators.

    By Natalia Konstantinova

    ReplyDelete
  3. Thank you for this example.
    If possible, clarify the terminology driven concept. I would like here to ask several questions:
    1. Does ALT uses an SMT solution only?
    2. How many entries (average) contain MT dictionaries per language pair?

    Thanl you in advance.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. ALT is a fantastic example of a company who has leveraged MT properly.. As a translator I definitely have mixed feelings about post-editing

    ReplyDelete
  6. Really nice post on Advanced Language Translation. I like to read your blog.

    Technical Language Translation

    ReplyDelete