Pages

Wednesday, May 9, 2012

Omnilingua: A Profile of Effective Translation Quality & Productivity Measurement

One of the major challenges that enterprises have in the use of increased automation in business translation, is understanding the productivity and quality impact of any new automation strategy. As the discussion of quality and and even productivity in the industry is often quite often vague and ill-defined, it is useful to show an example where a company understands with great precision what the impact is before and after the use of new translation production technology.

The key questions that one needs to understand are:
· What is my current productivity (time taken, words produced) to achieve a defined quality level?
· What impact does new automation e.g. an MT system, have on my existing productivity and final delivered quality?


Kevin Nelson, a Managing Director at Omnilingua Worldwide clip_image001, recently presented part of the Asia Online webinar "Meaningful Metrics for Machine Translation clip_image001[1]". Omnilingua is a company that prides itself in building teams that deliver definable quality and are recognized throughout the translation industry as a company that pays particular attention to process efficiency and accurate measurement. 

Thus, when they embarked in their use of MT 5 years ago, they took great care to understand and carefully measure and establish that MT was in fact enhancing production efficiency before it was put in to production. In the same way before making any changes to their current MT deployment they wanted to make sure that any new initiative they embarked on was in fact a clear and measurable improvement over previous practice. As Kevin Nelson, Managing Director at Omnilingua said: “The understanding of positive change is only possible when you understand the current system in terms of efficiency.”
During the webinar, Kevin discussed how and why Omnilingua perform detailed metrics. To demonstrate the benefits of measurement to Omnilingua, Kevin presented a case study that measures and compares an Asia Online Language Studio™ custom MT engine with a competitors MT engine and also studies it impact on human translators. 

Omnilingua first embarked in the use of MT 5 years ago with Language Weaver and took great care to carefully measure, understand and establish that MT was in fact enhancing production efficiency prior to using MT in production. Recently, Omnilingua reached a point where they had to reconsider retraining and upgrading their aging legacy MT engine or whether to invest in a new MT engine with Asia Online. 
 
Omnilingua engaged Asia Online at the end of 2011 to build a custom MT engine in the technical automotive domain translating from English into Spanish using similar data to the legacy competitors MT system. As this was Omnilingua’s first Language Studio™ custom MT engine, Omnilingua wanted to make sure that any new MT initiative they embarked on was in fact a clear and measureable improvement over the competitor's legacy MT technology before making any changes in their production environment. 

Omnilingua has long-term experience in conducting valid “double-blind” studies that produce statistically relevant results that measure machine quality, human quality and effort. The same careful measurement process was embarked upon to determine if their new MT initiative with Asia Online was an improvement. 
The understanding of positive change is only possible when you understand the current system in terms of efficiency.
...
Any conclusion about consistent, meaningful, positive change in a process must be based on objective measurements otherwise conjecture and subjectivity can steer efforts in the wrong direction. 

– Kevin Nelson, Omnilingua Worldwide
At the heart of Omnilingua’s process and quality control procedures is a long-term and continuous use of the SAE J2450 quality assessment and measurement process clip_image001[2]. Long-term use of a metric like this provides trusted and well-understood quality benchmarks for projects, individual customers and also for MT quality that are more trusted than automated metrics like BLEU, TER and METEOR that are available with the free Language Studio™ Pro measurement tools clip_image001[3] from Asia Online. 

While there is effort and expense involved in implementing SAE J2450 as actively as Omnilingua, the advantages provided by the measurements allow for a deep understanding of translation quality and the associated effort. Long-term use of such a metric also dramatically improves the conversation regarding translation quality between all the participants in a translation project, as it is very specific and impersonal and clear about what quality means.

Kevin listed the following benefits of the SAE J2450 measurement standard:
clip_image004 Built as a Human Assessment System:
Provides 7 defined and actionable error classifications.
2 severity levels to identify severe and minor errors.
clip_image004[1] Provides a Measurement Score Between 1 and 0:
A lower score indicates fewer errors.
Objective is to achieve a score as close to 0 (no errors/issues) as possible.
clip_image004[2] Provides Scores at Multiple Levels:
Composite scores across an entire set of data.
Scores for logical units such as sentences and paragraphs. 

In order to determine if MT has been successful, production efficiencies and improvements must be measureable. This not only shows improvement in MT over time, but ensures that the MT based process is more efficient than the previous human only process while delivering a comparable translation quality. A recent survey by MemSource clip_image001[4] indicated that over 80% of MT users have no reliable way of measuring MT quality. Omnilingua uses multiple metrics to precisely define the degree of effort required to post-edit MT to client deliverable quality. This quantification of the Post Edited MT (PEMT) effort includes raw SAE J2450 scores for MT Vs. the equivalent historical human quality SAE J2450 scores in addition to Time Study measurements and Omnilingua’s own proprietary effort metric, OmniMT EffortScore™, which is based on 5 years of measuring PEMT effort at a segment level. These different metrics are combined and triangulated to deliver very reliable and trusted measurements of the effort needed for each PEMT project. 

Omnilingua is able to understand through the above 3 metrics that the changes in their production process are measurably greater than the cost of deploying MT. Omnilingua also makes efforts to “share cost savings and benefits across the value chain with clients and translators”. Through this approach, Omnilingua has been able to keep the same team of post-editors working with them for 5 years continuously. This possibly is the greatest benefit of understanding what you are doing and what impact it has.

Omnilingua used the SAE J2450 standard to measure the improvement of the new Language Studio™ custom engine over the competitor’s legacy MT engine. SAE J2450 measurements were made on both the raw MT and the final output after post-editing the MT from both custom engines.
SAE J2450 Error Count Comparison:
Asia Online Language Studio™ Vs. Competitor
clip_image005
After reviewing the detailed measurement data Omnilingua made the following conclusions:
  • There were far fewer errors produced by the Language Studio™ custom MT engine than the competitor’s legacy MT engine.
    • Notably there were fewer wrong meanings, structural errors and wrong terms in the Language Studio™ custom MT engine, that were “typical SMT problems” in the competitor’s legacy MT engine.
  • 52% of the raw MT output from the Language Studio™ custom MT engine had no errors at all compared to the competitor’s legacy MT engine which had 26.8%.
    • The Language Studio™ custom MT engine measured was the very first iteration of the engine, with no improvements or corrective feedback applied.
    • Many of the errors from the Language Studio™ custom MT engine were minor spelling errors relating to capitalization. A majority of the "spelling errors" were traced back to a legacy portion of the client-supplied translation memory historically used for case-insensitive leverage.
    • Omnilingua found the errors easy to correct with tools provided by Asia Online.
  • The final translation quality after post-editing was better with the new Language Studio™ custom MT engine than the competitor’s legacy MT engine and also better than a human only translation approach.
    • Terminology was more consistent with a combined Language Studio™ custom MT engine plus human post editing approach.
  • When surveyed, post editors perceived that both MT engines were about the same quality and effort to edit. However, human perceptions can often overlook what objective measurements capture.
    • The measured results show that the Language Studio™ custom MT engine was considerably better in terms of translator productivity and produced a final product that had fewer errors because of the higher quality raw MT output provided to the post-editors.
    • The following table summarizes the key results for both the raw MT and the final post-edited MT:
   Asia Online Language Studio™ Vs. Competitor
Metric Factor
Total Raw MT SAE S2450 Errors 2 x Fewer
Raw SAE J2450 Score 2 x Better
Total Post Edited MT SAE J2450 Errors 5.3 x Fewer
Post Edited MT SAE J2450 Score 4.8 x Better
Post Editing Rate 32% Faster

Omnilingua has already seen translation quality from the first version of their Language Studio™ custom MT engine improve beyond the above levels by providing basic feedback using the tools provided by Asia Online. As Omnilingua continues to periodically measure quality, it is expected that the metrics will show further improvement in the metrics specified above.

We found that 52% of the raw original output from Asia Online had no errors at all – which is great for an initial engine

– Kevin Nelson, Omnilingua Worldwide
The entire presentation video and slides can be viewed in the Asia Online webinars library starting at about 31 minutes. clip_image001[5]