eMpTy Pages: Understanding MT Customization

Wednesday, September 18, 2013

Understanding MT Customization

While we have reached a point in time where many more people realize that machine translation (MT) produces the best results when it is properly customized, what customization actually means is still not well understood.

There is a significant difference between shallow customization and deep customization in terms of the impact on the MT system’s output quality. The quality of output in turn has a direct impact on the potential business leverage and return on investment. There are a growing number of MT vendors, but very few real MT developers in the market today, and deep expertise is the key differentiator that leads directly to better output and better productivity. It is important for anyone considering purchasing an MT solution to understand the difference between the two types of vendors.

Generally, MT developers have created either Rules Based Machine Translation (RBMT) or Statistical Machine Translation (SMT) systems with hands-on coding at the deepest levels of the core MT engine and its surrounding technologies. Thus they are likely to have insight into how and why an MT engine works the way it does. They are also more likely to be able to coax an engine to produce better quality output by applying the optimal corrective actions to improve on initial results.

In contrast, most Do-It-Yourself (DIY) MT vendors provide little, if any, real innovation and focus on simplifying and packaging a collection of open source tools into a web based offering. Their primary emphasis is on simplifying the interface to these open source tools and enabling a user to build a basic MT system with user data instantly. I would characterize this approach as a shallow customization. When real understanding of the engine technology and data is required, few have the necessary skills needed to make this initial MT engine quality better on an ongoing basis and even less ability to make it reach levels of quality that provides real competitive advantage.

When evaluating MT vendors, there are a few simple things that anyone considering purchasing an MT offering should understand:

Is your MT vendor a serious developer of MT technology or do they simply provide/package other third-party or open source MT technology?
There are only a very small number of companies developing commercial enterprise class MT. Most MT vendors are users or packagers of third-party technology. Many do not have the depth of understanding to do anything but the simplest and shallowest MT customization tasks. These vendors will often present themselves as experts and sometimes claim to be technology agnostic. Some Language Service Providers (LSPs) that have a few years’ experience using open source or third party RBMT systems are presenting themselves as MT experts. Be wary of any vendor that claims deep experience in multiple MT technologies. Advanced skills in any MT technology require long-term investment and long-term experience to get to any kind of distinctive expertise. To get good results from any of these approaches require very different skill-sets and independent and unique expertise must be developed for each approach. The notion that a standard set of MT development skills that work anywhere and everywhere is a myth.

Any MT vendor that does not have a strong and experienced human skill and human steering component in the customization process will always deliver lower quality results.

Does your MT vendor use a Clean Data SMT or Dirty Data SMT strategy?
The Clean Data SMT approach was pioneered by Asia Online in 2008. Most MT vendors do not have the technology or rigorous data analysis and data cleaning processes to deliver a Clean Data SMT approach, and so take the easier Dirty Data SMT approach. Clean Data SMT has many benefits such as more rapid improvement from post editing and provides the ability to manage and control terminology so that it is consistent. Dirty Data SMT by its very nature is unpredictable and inconsistent and therefore is difficult to manage and much slower to improve with corrective feedback.

Does your MT vendor claim that MT is easy?

3 Monkeys

Some MT vendors claim that MT is not complex. One DIY MT vendor even likens those who claim MT to be complex to be monkeys. The reality is that running an open source MT solution or using a “upload and pray” solution like that of many DIY MT vendors has become very easy.

Building an instant MT engine is not the same as delivering a production quality MT system that provides production efficiency. Indeed, a significant number of DIY custom MT engines deliver translation quality well below the quality of Google.

Delivering high quality MT requires skill, a deep understanding of the different approaches to MT and the inner workings of the technology, a deep understanding of the data used to engineer the customization process and a range of tools, skills and knowledge that permit optimization to deliver the highest possible quality. There will be unique requirements to each and every engine – after all, the point of customizing is to match the translation output to a particular customer writing style and audience. This can only be achieved with human cognition and guidance and cannot be fully automated.

The impact of real expertise is clear. Asia Online customers can speak on the record of achieving productivity gains greater than 300%, while DIY MT vendors typically claim that they can deliver productivity gains between 20-40% if any at all.

Does your MT vendor give you control of the data and the process?
Many MT vendors today provide very limited control of core data elements and typically rely on a simple “upload and pray” web interface that promises instant results. They generally lack the ability to manage, control and normalize data used to customize an MT engine and generally do not have any data analysis and data manufacturing capabilities. A developer like Asia Online provides multiple levels of control, both during the development and translation process that enable much better output quality and thus higher productivity.

What is expected of you as a user in order to customize and MT engine?
If the answer is nothing more than uploading your translation memories (TMs) then a red flag should already be raised. Machine translation can be very high quality when managed with expertise, but expecting good results without any knowledge investment and real expertise is not realistic.

Just as in any high quality focused human translation project management, special tools, processes and expertise are required to get better results.

Any custom MT technology that does not require your involvement in steering the customization process will deliver considerably lower quality output - often worse than anybody could do with Google or Bing. MT systems that produce good quality output require human steering, guidance and control. This is possible with today's technology, but does require more effort than just uploading some translation memories.

How much effort does it take and how quickly can the customized engine improve after the first version?
Dirty Data SMT systems offered by DIY MT vendors require significant amounts of new data to improve the system after an initial system is in place, usually around 25% of the total training data that the custom MT engine is built on. So if your engine has 3 million segments provided by your MT vendor and 200,000 segments provided by you, to see any improvement you will need at least 640,000 new segments to see a noticeable improvement in quality. Getting this much additional data is usually beyond the reach for nearly all users of MT systems. As the customization approach is Dirty Data SMT, errors are very difficult to trace and correct. The standard means to correct issues is to add more data and hope that the problem is resolved.

Clean Data SMT systems such as Asia Online’s Language Studio™ can learn and improve with just a few thousand edits and every edit counts. Terminology is consistent, and there are tools to identify common problems ahead of time and means to automatically resolve them. Data manufacturing is also applied to amplify edits and corrective feedback and ensures they are applied to the engine in a broader set of contexts. The cause of errors can quickly be traced and the errors can be rectified using a number of problem analysis tools. The resulting improvement is rapid and noticeable even with a very small effort by a single person.

Bottom Line: Creating a high quality custom MT engine requires deep expertise, control and broad experience, elements that are usually not present in the "upload and pray" approach provided in a DIY MT model. Developing high quality MT is complex and in 2013 still an expertise based affair.

To simply upload a translation memory and expect high MT quality to come out is wishful thinking. A computer cannot automatically know your preferred terminology, vocabulary choices, writing style, target audience and purpose. Just like a human translation project, achieving quality requires effort, time, management and skill.

Customizing an MT engine to produce “near-human” output quality levels is possible and there are many proof points where raw MT output has been able to produce 50% or more of the MT translated segments requiring no editing at all - i.e. they were “perfect”, with many of the remaining segments having minor issues that could be quickly edited. A fully customized MT engine built on the Clean Data SMT approach consistently deliver 150%-300% (sometimes even greater) productivity gains. The long-term ROI impact is clear relative to the meager productivity that instant MT approaches sometimes produce.

MT in 2013 is still a complex affair that requires deep expertise and collaboration with experts if your intention to build long-term business leverage through more efficient translation production processes. There is no advantage to a system that any of your competitors could create instantly and there is no value or business advantage to just dabbling with MT.

“When conceiving the idea of Moses, the primary goal was to foster research and advance the state of MT in academia by providing a de facto base from which to innovate from.

Currently the vast majority of interesting MT research and advancements still takes place in academia. Without open source toolkits such as Moses, all the exciting work would be done by the Google’s and Microsoft’s of the world, as is the case in related fields such as information retrieval or automatic speech recognition.

As a platform for academic research, Moses provides a strong foundation. However, Moses was not intended to be a commercial MT offering. There are considerable amounts of additional functionality, beyond providing a web based user interface for Moses, that are not included in Moses that are essential in order to offer a strong and innovative commercial MT platform. “
Professor Philipp Koehn, University of Edinburgh, Chief Scientist, Asia Online

Addendum: This post triggered a strong reaction from Manual Herranz at Pangeanic and I am including a response I made on his blog in case it does not make it through the approval process there.

My primary point in my blog posting is that expertise, long-term experience and a real understanding of how the technology works is necessary and critical to get the best results. Most DIY users do not have these characteristics and thus are very likely to get much lower quality results. Pointing this out, to my mind is not equivalent to "bad mouthing competition", I am simply comparing approaches and pointing out the value implications.

Also, while I claim that expertise does matter, I do not suggest that Asia Online is the only company with this expertise. There are several other MT experts including RbMT developers like Systran and a specialist like Tayou in Spain.

I do believe that MT technology is complex enough that it does require specialization, and that developing real competence with MT is difficult enough that it is unlikely to be successfully done by a company whose primary business is being a translation agency. It is clear that you disagree. I am also pointing out that the value received by a customer is very likely to be lower for a DIY user. I can understand that you may have a different opinion to mine and assure you that my observations are not borne of virulence.

Historically we saw many LSPs develop their own TMS systems too, but most people in the industry would concede that the best TMS systems have come from companies that focus and specialize in the development of these tools e.g. MemoQ, Memsource, Across, XTM etc.. We have also seen the SDL acquisitions of software companies like Idiom, LW and Trados result in what most perceive as reduced customer responsiveness, quality and commitment to these products. Buying critical production infrastructure from a competitor generally does not make sense in any industry and thus we have seen the momentum slow down on all the SDL software acquisitions. IMO Specialization matters and with technology this complex, one will get the best results using technology developed and managed by specialists for the foreseeable future.

Anyway, I wish you peace and health.

Kirti

28 comments:

UnknownSeptember 20, 2013 at 5:17 AM
There are some contradictions in this article.
It is extremely biased and lacks a clear grasp and understanding of what other technologies are (DIY MT) and how they work.
Clearly, the author is good at writing but has not been a translation practitioner.
Bad mouthing the competition or simply undermining other solutions as "be careful, they don't work" is a very low commercial / marketing strategy. Simply not worth commenting on. Just like some translation agencies used to say "our translations are better".

The author tends to forget in this blog that he is addressing an audience who have been in this industry for decades.
ReplyDelete
Replies
ShaiSeptember 20, 2013 at 8:38 AM
You can replace the words 'MT vendor' with 'Human translator' and the above would be just as correct.

The bottom line is that MT is a tool that aids humans to their jobs arguably better, and not a human replacement.
The amount of intermediates/brokers that jump on the 'linguistic services wagon' is enormous. Most of them are plain frauds who just try make a quick buck out of the evasiveness (or greed) of some people.

If one wants the type of expertise that will deliver best results and reduce costs in the long-term, one needs to hire a professional.
Also, one should beware of any promises and claims for any MT solution that makes the human element redundant.

I'm not representing a corporation, but from my, allbit limited, experience with MT it is best used as part of the human translator workflow - as a tool - and not as a mechanism for spitting out volumes of text that later will be 'post-edited' (a mind-numbing task and process by the experience of quite a few) by some vague "human".
ReplyDelete
Replies
Tom HoarSeptember 20, 2013 at 11:06 AM
Interesting comments. I thought "shallow" vs "deep" customization referred to how far the vendor reaches into the customers' pockets. Thank you for the clarification.

By Tom Hoar
ReplyDelete
Replies
John MoranSeptember 27, 2013 at 9:21 AM
@Kirti - the 328% productivity improvement from Sajan says little about Asia Online's system and more about how suitable the material for translation from Sajan was for SMT. The question is, given the same training corpus could this productivity improvement just as easily have been achieved using a system like DoMT from Tom's company, baseline Moses with some tag cleaning or any of several Moses-based cloud offerings? A short while ago KantanMT announced Sajan as a client we can only speculate as to whether this particular engine has been moved from Asia Online to Kantan. This is private to both companies so let's not get into it.

What is important is that large LSP's are tending towards multiple suppliers of MT. This makes sense as cloud based providors bring both training data and technology to the table. Where MT buyers bring their only own training data to the table and the data is not sensitive the picture is fairly simple. They can shop around, try a few cloud based systems, try out DoMT or similar and chose the solution with the best support, customisation options or whatever is important to them as a buyer.

Where an MT provider brings training data specific to the domain to the table that other MT suppliers cannot, this is indeed value. HOWEVER, it is unwise of any buyer to pay a large sum of money in advance of proper productivity testing. I can cite an example where precisely this scenario occurred, costing the mid-size (23 person) LSP in question €25,000.

I am not suggesting that MT systems cannot be improved via manual labour once this value has been demonstrated but it is very hard to improve a bad SMT system if the training data is not at the races. You have to be in or close to the positive territory on throughput to measure productivity without it costing a fortune and once there you very quickly reach a plateau where your manual effort has little effect on that throughput.

No mortal can predict in advance which company happens to have the most relevant training data for a particular client's translation material.

The point is that as the MT provisioning market matures, I see large MT customers moving to a suck-it-and-see approach (with room for improvement later) rather than pay and pray. This is particularly true of LSP's where new clients bring new requirements but also of any language department that handles many types of translation material on a scale that might justify an outlay for MT, at least in the initial phase of implementation.

By John Moran
ReplyDelete
Replies
Dion WigginsSeptember 27, 2013 at 10:50 AM
Hi John, Thanks for your thoughts. Without knowing the project, I am not sure how you are arriving at your conclusions, but I am happy to provide more information so that you can understand the project better. While I am unable to disclose the end client or too much detail about the project specifics, I can tell you that the domain was hotel reviews. On the surface, this domain is a relatively simple one – typically shorter sentences of 5-15 words and simple in nature. However, there are many things that make this domain challenging:

1. User generated content: This type of content is prone to spelling, grammar and many other kinds of errors. Language Studio Pre-Translation Correction (PTC) technology and custom Pre-Translation JavaScript rules were leveraged to adjust and improve the quality of the source.

2. Large number of named entities: There was a huge number of named entities such as hotel names, restaurant and bar names, location names and people names from all over the world. As the domain is global in nature, many of these named entities were not in the source language. This was addressed with Language Studio analysis tools and data manufacturing. In some cases class based translation was also applied.

3. Scope of domain: The travel domain covers every geography around the globe and had many challenges that were location or geography specific. Pre-Translation JavaScript was leveraged to pre-process the content to insert additional guidance into the translation. This help to manage word order and in some case pre-translated specific terms base on glossaries and other information that was context specific that standard phrase based MT cannot handle.

4. Content normalization: As part of pre-processing, the source content was normalized for greater consistency and thus delivering higher quality translations. For example, hotel names were expanded to their full form, “remote control” was normalized to “TV remote” and similar.

With respect to the client’s data, a relatively small amount of training data was provided, just a few hundred thousand sentences. Putting this into perspective, there are about 200,000 hotels in the world, so this could also equate to 1 sentence per hotel to learn from. So while it sounds like the client provided a reasonable amount of data, for a domain as broad and complex as this, it was actually quite small.

ReplyDelete
Replies
Dion WigginsSeptember 27, 2013 at 10:51 AM
As the client had a huge number of hotel reviews available in the source language, we were able to perform gap analysis and other processes that enabled us to train systems for normalization and correction of many common source content issues. We also used this data and the clients TM, plus other data for this domain that we had in our Foundation Domain libraries to perform extensive data manufacturing. This was a significant effort and took several weeks of processing in order to manufacture very large volumes of data that was then incorporated into the engine. Asia Online has spent a considerable amount of R&D efforts on data manufacturing, data cleaning and data normalization technologies. Many of our customers have little and in some cases no data, so data manufacturing is the only way to deliver quality. When a customer has data already, we use data manufacturing to supplement their data.

The Sajan case study is an excellent example of what is possible when you apply experience and a deep understanding of the domain requirements and the data. Our approach involves human cognition to define a unique customization plan for each engine that adapts the engine to the domain, writing style and purpose. Each engine is different and our linguist perform analysis on the data that is available (both client data and Language Studio Foundation Data) and the domain requirements to develop the customization plan.

Once the customization plan has been developed, the project is set up and a series of automated tools are executed to gather and prepare the necessary data by skilled Language Studio linguists. We take care of this complexity so that our customers do not need to learn these skills and need only provide some basic assistance and guidance. This is one of the key things that differentiate the Language Studio approach from the “Upload and Pray” DIY MT model, where no skill is required at all other than the ability to upload data on a web page.

If the client’s TM was uploaded as per the DIY model, the engine would have been very poor indeed. It would have lacked the vocabulary range and model data to deliver even a basic level of quality and would most likely have been worse than Google’s generic translations. It was all the extra work that both Sajan and Asia Online put into this engine that made it very high quality.

The quality of this custom engine did in fact have a lot to do with Asia Online's system. The skilled Language Studio linguists that designed the unique customization plan for this engine and leveraged data manufacturing and other Language Studio technologies and approaches as described above are directly responsible for the 328% productivity gains that this engine delivered. This is not simple or easy and is also not be possible with a DIY approach as there is no ability to do anything more than upload data.
ReplyDelete
Replies
John MoranSeptember 30, 2013 at 11:08 AM
@Kiri, Thanks for the pointer. I read Dion's reply on the blog.

You wrote "we would only undertake projects that have a real chance of producing superior MT output"

That is nice but what happens when it is not superior to Google? Do you share the risk? If not, where is the incentive to turn down projects? As the "upload and pray" systems demonstrate, it is not hard or expensive to get a quick, cheap first impression to see if the data AO brings to the table matches the material the client brings.

Kevin wrote, it was "great for an initial engine", so clearly AO was going to win this client. I would say this is a good example that strengthens the case for early stage low-cost or no-cost demo systems that output a few thousand words rather than pay and pray.

Could the explanation for the initial engine be that at some point previously, some client of AO uploaded bilingual data in a domain similar to that of Omnlingua's client, or did Omnilingua bring all the training data to the table? It is unclear from the word "similar".

If it only came from Omnilingua I would say owning at least some of their own training data helped prevent vendor lock-in. Either way, it shows how important it is to test new suppliers from time to time when dealing with technology in a landscape that is evolving as fast as MT!

For most common file formats, data cleaning is not a mystery (see the free tool from Logrus or DoMT) so unless there is a complex process to gather new training data, terminology or do-not-translate lists, as with the Sajan example to follow from Dion, I fail to see what is so wrong with a suck-it-and-see demo before shelling out the big bucks for a manually customized solution.

The €25k price tag was for two citrus shaped engines.
By John Moran
ReplyDelete
Replies
Dion WigginsSeptember 30, 2013 at 11:48 AM
@John, Thanks for your response, please see my answers and comments below:

>> You wrote "we would only undertake projects that have a real chance of
>> producing superior MT output"

>> That is nice but what happens when it is not superior to Google? Do you
>>share the risk? If not, where is the incentive to turn down projects?

The incentive is very easy – we make our revenue from ongoing utilization, not the customization fee. We charge a minimal customization fee that covers our costs. Our incentive is that we want to make sure the engine is good so that it is used on an ongoing basis. We do not want to waste our time and resources on customizing an engine if the engine is not going to deliver any more than the client want to waste theirs. As a result, our linguists will give an honest assessment of a project based on the information provided by the client. It is in no one’s interest to build an engine that will not perform.

With that said, I should clarify one perspective of this. We often build an engine with the knowledge that the quality may be lower initially and discuss this with the client and expectations are managed. We do this when we know the client is going to be using the engine long term and is willing to invest in maturing the engine. This was the case with the Advanced Language Translation (ALT) case study published some time back on Kirti’s blog. ALT had very little data, but the engines improved quickly over a small number of iterations. ALT worked with us resolving unknown words and normalizing terminology. This was combined with data manufacturing to create a high quality engine.

Quality engines require understanding and refinement of the data. They also require some effort from the client. In many cases, the best path forward and fine tuning is only possible once the first iteration has been developed. We can reduce the fine tuning effort through normalization, cleaning and refinement of data up front. The amount of effort put into data manufacturing in advance by both Asia Online linguists and the client is directly related to the quality of the engine.

>>As the "upload and pray" systems demonstrate, it is not hard or expensive
>> to get a quick, cheap first impression to see if the data AO brings to the
>> table matches the material the client brings.

In the example provided of ALT and Sajan, the quality of the translation would have been very poor from the outset in an “upload and pray” system. This would have given a misleading result and could have even resulted in the client deciding not to proceed. If the only data you are relying on is the MT vendor’s baseline and the TM, then you should expect low quality. Just today a client submitted data for an EN-DE engine. Many segments were misaligned and some even contained other languages and data from other domains that would lower the quality of the translation output. There were also significant formatting issues and some words were glued together, while others were misspelled and many sentences had poor grammar. Uploading this into an “upload and pray” model would have resulted in these segments being trained as part of the engine and delivering lower quality. Language Studio tools we able to identify these segments, remove them from the data and then reports were prepared for the client so that they could normalize. Language Studio’s automated domain detection technology was also able to extract out of domain content so that it did not get mixed in with the in domain content and reduce translation quality.
ReplyDelete
Replies
Dion WigginsSeptember 30, 2013 at 11:52 AM
...Continued

>> Kevin wrote, it was "great for an initial engine", so clearly AO was going to
>> win this client. I would say this is a good example that strengthens the
>> case for early stage low-cost or no-cost demo systems that output a few
>> thousand words rather than pay and pray.

John, without knowing the project, I am unsure how you make such conclusions. The case study was presented by Omnilingua in a webinar which can be viewed on our website. If you would like to know more about the project, this would be a good place to start.

This was a very accurate comparison performed by Omnilingua that compared SDL BeGlobal engines (formerly Language Weaver) with Language Studio. We did not know we would win this client or not. This was the results after doing a full customization and doing a quick DIY “upload and pray” customization would not have given the results that the client was looking for.

To provide more background, Omnilingua has a strong history of performing both automated and human metrics. In this case, there was detailed SAE J2450 metrics as well as productivity metrics. The result was that there when trained on the exact same data (i.e. the client provided the same data), the Language Studio engine delivered 5.3 times fewer errors. Having a DIY demo system was of little use as there was already a SDL system in place that had been running and improved over 5 years. They knew it was working for them, but wanted to know if Language Studio could deliver better results before investing more with SDL. They ran their tests independently and presented them to Asia Online.

>> Could the explanation for the initial engine be that at some point previously,
>> some client of AO uploaded bilingual data in a domain similar to that of
>> Omnlingua's client, or did Omnilingua bring all the training data to the
>> table? It is unclear from the word "similar".

No, this is not the case. We never share client data under any circumstances without the client’s written permission. I am not sure where your ideas and conclusions are coming from, but they are way off base. We have a very detailed contract that protects client data and ensures that it is only used for the purpose of customizing the client’s engine and nothing else.

I am not sure why you have latched on to the word “similar”. I used this word 2 times in my response. The first relates to the experience that our linguists have in this domain and states very clearly “previous experience with similar data sets” and has no relationship to reuse of data from another client as you suggest and the second relates to normalization of data.

Omnilingua uploaded the same data that they provided to SDL so that they could have an accurate comparison with the same data and could compare Language Studio on a level playing field. The remainder of the data came from Asia Online foundation data and data manufacturing.

>> If it only came from Omnilingua I would say owning at least some of their
>> own training data helped prevent vendor lock-in. Either way, it shows how
>> important it is to test new suppliers from time to time when dealing with
>> technology in a landscape that is evolving as fast as MT!

This I agree with – by having the ownership of their data, they could switch from MT vendors as they saw fit. In this case, they did a detailed assessment and switched from SDL to Asia Online.

ReplyDelete
Replies
Dion WigginsSeptember 30, 2013 at 11:53 AM
...Continued

>> For most common file formats, data cleaning is not a mystery (see the free
>> tool from Logrus or DoMT) so unless there is a complex process to gather
>> new training data, terminology or do-not-translate lists, as with the Sajan
>> example to follow from Dion, I fail to see what is so wrong with a suck-it-
>> and-see demo before shelling out the big bucks for a manually customized
>> solution.

Neither the tools from Logrus or DoMT do anywhere near the cleaning that we do. I have listed some of the examples above such as domain detection, language identification and validation, etc. There is much more to data cleaning than simply extracting data from a common format and removing formatting. There were a lot of additional processes that extracted and normalized terminology, non-translatable terms and more.

Once the data is cleaned, it is then used as part of our data manufacturing to build additional data. It is not uncommon for data to be manufactured that equates to around 10-15 times the volume of initially provided by the client. This data is what delivers quality translations and is refined and cleaned using automated tools. The data manufacturing process is different for each custom and requires a high level of human cognition. However once the customization plan has been defined, the automated tools are leveraged to execute the plan.

>> The €25k price tag was for two citrus shaped engines.

I don’t know where this price came from. Our fees are a tiny fraction of this. I can tell you for a fact that this most definitely was not from Asia Online. We have never charged such ludicrously high fees to customize an engine. I am aware of one EU based LSP that purchased a DIY system with an onsite license and built 2 custom engines for around this price however. Perhaps this is the one you are referring to.

Going back to your original point that “upload and pray” systems can demonstrate the quality that is likely via a cheap first impression, I think I have shown clearly that the two examples from Advanced Language Translation and Sajan would not have seen a result would indicate anything remotely like the quality expected and would have given a very negative impression as there was insufficient data in both cases provided by the client to deliver quality. It required a full customization with extensive data manufacturing in both cases to deliver a higher quality custom engine.

The example of Omnilingua was different as they already had data and were performing a comparison in order to determine if they should replace SDL. They did their own detailed metrics and moved to Language Studio as a result of significantly better quality. A cheap first impression with DIY would not have helped them in any way. They wanted to see what and alternative custom engine could do when fully customized and how it fared against their customized legacy SDL engine. The results presented by Omnilingua in person on the webinar speak for themselves.
ReplyDelete
Replies
Tom HoarSeptember 30, 2013 at 12:07 PM
Thanks for your clarification that you're driven by your "opinions" and "feelings" when you conclude that "low price often equates to low value" despite the TAUS case studies and others thar contradict these emotion-driven conclusions.

To put these emotions into perspective, let's review the capabilities of a Moses baseline system with "shallow customization." Since the Moses system is not directly responsible for productivity, I'll use the "zero-edit" percent you referenced. I'll also site DoMT experience, but TAUS case studies and I suspect users of the other DIY vendors can confirm similar results.

When new DoMT users build their first engines they typically achieve SMT output with 25% to 45% of the segments requiring zero editing. These engines are trained from their TM's. Typically, the TM's have 150,000 to 300,000 translation unit segments.

As users gain experience, they learn to apply some "shallow customization" techniques, such as corpus normalization. Typically within the first 2 or 3 months, more batches are at the higher end of the range. I.e. more often than not, they experience 40-45% zero-edit output and fewer 25% zero-edit output. One user recently reported 46% zero-edits. These are real customers' results without violating their desires to maintain their privacy.

Customers who do the work to learn how to use the system develop reusable internal capabilities. Their localization engineers, who do not have PhD's in SMT computational linguistics, develop their own expertise and apply their new skills across multiple projects. Each new project requires less work and generates better results. They're not alone when they encounter challenges that they can't solve. We step in with our expertise or we refer them to an appropriate resource.

Their results, however, do not come free. These customers have bought DoMT licenses (some of them have bought 3 or 4 licenses). Many of these customers paid for training and extended support. Their total cost is typically less than the price of outfitting a professional graphic artist workstation and paying for Photoshop training. Note that these are fixed, one-time expenses without a need to pay additional per-word fees or royalties.

In summary, a Moses baseline system with "shallow customization" is capable of generating consistent results in the 40-45% zero edit range. I say KUDOS to Philipp Koehn and the entire Moses team! Anyone who says this has no value clearly disrespects the Moses team.

As you note Kirti, Philipp Koehn cautions, "There are considerable amounts of additional functionality, beyond providing a web based user interface for Moses, that are not included in Moses that are essential in order to offer a strong and innovative commercial MT platform." Therefore, I also say KUDOS to the DIY vendors (including ourselves) who are developing those commercial extensions, filling the gaps in the academic code base and expanding the availability of SMT to non-academic users. I agree with you, Kirti, to the extent that knowledgeable users with experience generate better SMT results than novice users. Now, more localization engineers (and translators) are using the systems, gaining experience and developing their own expertise because of the DIY vendors.

ReplyDelete
Replies
Tom HoarSeptember 30, 2013 at 12:07 PM

So, let's compare "shallow customization" with your "deep customization," but without the emotions and feelings:

Shallow
zero edit: 40-45%
expertise & skills: grows in-house
cost: less than $5k (unlimited engines, no additional fees)

Deep
zero edit: 50% or more
expertise & skills: outsourced
cost: €25k two engines (additional fees?)

in my opinion, "deep customization" is setting itself up to be a premium value brand. There's nothing wrong with that. Rolls Royce, Pagani, Piaget & Prada are all premium brands, but I don't see them trying to build themselves up by tearing down Ford, Mercedes, Timex and Hush Puppies respectively. There are value points on the spectrum for everyone.
By Tom Hoar
ReplyDelete
Replies
Lori ThickeOctober 2, 2013 at 4:09 PM
I don't have time for a long reply to the many points and counterpoints, but I do like John Moran's thought here:

"I would say this is a good example that strengthens the case for early stage low-cost or no-cost demo systems that output a few thousand words rather than pay and pray."

I couldn't agree more, John!

We offer no-risk pilots on the engines we build (e.g. Moses, Systran, Microsoft Hub) because we find companies need a quick evaluation of the gains before committing their time and resources. That makes sense to us so we're happy to oblige them!

By Lori Thicke
ReplyDelete
Replies
Dion WigginsOctober 2, 2013 at 6:09 PM
@Lori, please read my response to John. None of the three examples that John references from Asia Online would have been suitable for a quick measurement. The Sajan and Advanced Language Translation engine had very little data. Without data manufacturing the results would have been terrible. In 1 language pair there was less than 10,000 segments of TM. After full customization, Sajan achieved 328% productivity gain and ALT had significant productivity gains.

Today I loaded the data from ALT into a foundation engine with just there data and no data manufacturing applied. The difference was nearly 26 BLEU points. Data Manufacturing and normalization made a huge difference to the quality. Without data manufacturing and normalization the quality was so bad that it was unusable.

If the MT vendor is not doing more advanced tasks such as Data Manufacturing and deep cleaning and normalization, then you are correct, you will get a rough representation of whether the engine will work or not. DIY vendors do not have this capability in their current systems as data manufacturing not only requires technology, but human cognition that provided a deep understanding of the goals and data. In our case Language Studio™ linguists analyze the goals and data and determine a custom training plan for each and every engine in order to deliver the best results.

The third was a comparison where SDL BeGlobal was compared against Asia Online Language Studio. A full customization was needed in order to do this comparison. A quick DIY translation also would not have delivered to the goals of the client. The client wanted to determine if they should replace their bespoke SDL system. They spent a lot of time on human and automated metrics. This would be a waste if the system was not fully customized with a deep customization.

When we customize engines, only about 20% of our customers bring sufficient data in the form of their own TMs to the table. The other 80% often have 50,000-100,000 segments. This volume is not sufficient to deliver a high quality engine. You could certainly load that data into a DIY solution and get a result. However the result would most likely be disappointing. With a full customization is performed the quality is greatly improved.

We work with our customers on a Quality Improvement Plan. Often there is a significant jump in quality between V1 and V2 of an engine. This is frequently between 5-10 BLEU points. Language Studio Linguists analyze the issues and work with the client to further normalize data, manufacture new data and create rules for processing and fine tuning the input and output. V1 of our engines we call the Diagnostic stage. Many issues are not identified until this stage. Our customers then have the ability to manipulate both their data and our foundation data to deliver a higher quality output based on the understanding of what the issues are in the V1 engine. This is why there is frequently such a significant jump in quality between V1 and V2 of the engine. This kind of task is not possible with a DIY MT approach as the only data the client can address is their own data. Additionally, very few have the skills or tools to be able to do this, even if they were able to access all the data.

ReplyDelete
Replies
Dion WigginsOctober 2, 2013 at 6:09 PM
Many of our customers have in domain data (i.e. Automotive), but not in the domain for the project that they intend to work process new data. For example one of our customers recently had 500,000 segments of Automotive User Manual data, but their new project was for Automotive Technical Engineer manuals. If they had uploaded this into a DIY system it would again not have delivered the results that they desired. Language Studio™ Data Manufacturing as used to identify over 10,000 technical terms (i.e. Lateral Acceleration Sensor, A/C drain hose and steering intermediate shaft) that were not in the clients TMs or the foundation data. The language pair was German to Slovenian. In this case, uploading into a DIY system would not have delivered anything remotely like the customized system as nearly every sentence to be translated would have had unknown words. In a SMT system when there is unknown word/term, not only is the word/term left in the source language, but it also breaks the statistical patterns and destroys fluency as a result. Adding that term/word will change the sentence that a SMT engine produces in its entirety.

The previous example where the domain desired is different to the domain where TMs exist is similar to that of when a customer uploads a mixture of TMs from many projects. Without the Data Manufacturing tools and processes, the data is inconsistent and will have holes in coverage. DIY MT systems don’t deal with this and simply apply the data that was uploaded by the client. Human cognition is required so that the data can be prepared and deliver the to the quality goals of the client.

Bottom line: DIY is only useful as a quick indicator of likely quality if the data volume provided by the client is sufficient to deliver quality. The majority of users do not have the skills to manipulate their own data as they do not have an understanding of MT systems and how their internals work or what impact their data will have. The majority of users also have insufficient data or mixed data from multiple clients/domains. Loading such data into a DIY system will give a totally different result that processing the same data with a full customization that involves human cognition of the issues and the solutions.
There is 1 risk with DIY systems that seems to be totally overlooked that my above response and the response to John clearly demonstrates. The risk of loading the data into DIY systems is that you will not get a result anything remotely like what a full deep customization based around human cognition can deliver. As most do not have data clean enough or sufficient enough to give high levels of quality without Data Manufacturing and refinement/normalization of data, the result will be very misleading and may risk the project not proceeding. This project may have been perfectly viable as was the case with ALT, but the DIY result from 10,000 segments showed it was not.

ReplyDelete
Replies
John MoranOctober 14, 2013 at 7:23 AM
Apologies for the delay in my response. I was attending ELIA on Malta and last week was catchup.

@Dion

I agree that for *some* projects work is needed to get from a point where MT wastes a translator's time to one where it can speed them up. Tools to gather more training data can help. However, on my travels I hear of very good results from DoMT for customers who are self-sufficient or want a test-bed to reduce the risk of vendor lockin. I am trying it out myself in a small new specialist agency and I plan to publish a comparison with various MT systems. I also see good productivity test data from Microsoft Translator Hub, which many are not aware does domain adaption on client TMX files. To me Hub and DoMT are two ends of a spectrum in terms of data privacy but low cost certainly does not equate to low quality. Here is an industrial research paper that compares Bing just prior to the release of Hub http://mt-archive.info/MTS-2011-Roturier.pdf) to a few other systems.

You say only 20% of customers bring sufficient data to the table to train an MT engine but we can't verify this number any more than we can verify that the oft cited translator who sent you a mail that said she post-edited 24k words in a day actually did this or that +300% productivity gains are a norm that can be expected. IBM say they manage to manage to achieve +17%. They pretty much invented SMT and have a high level of control over their supply chain. Though variance is high +30% is closer to the average that I see and hear about in high-quality translation. I am not suggesting anything you say is untrue and I accept that +300% may well be possible for light PE (how light?) of UGC or that a person can read 24k words and change a few sentences in a day. It just seems like these outliers are presented so often they could be misread as norms by the casual reader and I worry this might be the intent.

Promises that an SMT system will improve with use are always true if you retrain on post-edited segments. Unfortunately, it is not likely to be much of a relative improvement. The training data is always much more than the post-edited data (though research work is underway here). The big question is how do you get to the tipping point where translators agree to use the MT output so you can gather that PE data and what size project / throughput / quality expectations do you have to have before you can afford to pay people to manually look at data like in the Sajan example? It is all down to ROI.

I'm optimistic. Even using CSA statistics, which I feel are skewed towards the bulk market that already uses MT, MTPE is at most 5% of the total translation market. However, I can see at first hand that it is growing. Celer Solutions in Spain engaged early with academia on SMT and RBMT and say they use it for 29% of their high-quality translation turnover (not light PE) so I have this in my head as the current upper-bound for a mid-sized MLV.

As the market expands both through better MT and more importantly better means of matching technology to content AND people, there is room for everyone so I don't see the point in making sweeping statements about the quality of the competition. It is a complex emerging niche within the translation industry...

ReplyDelete
Replies
John MoranOctober 14, 2013 at 7:23 AM
...My main concern is for the early adopters I meet at conferences who tell me they are sitting on expensive MT systems from various different MT companies where they have not seen *any* of their time or financial investment recouped (because they have not reached the tipping point of close to 0% HT/MT speed difference).

At ELIA, one large agency owner told me that he himself post-edited around 4k words in an afternoon using an MT system that cost the guts of $100k but the system itself has achieved no ROI thus far. You will be pleased to hear this was not an AO system.

What does this tell us? Either the agency owner is a better post-editor than most of his employees or the text at hand was suitable for MT post-editing but other content in the company for that language pair is not. Either way, I would say he should turn his mind to achieving at least some ROI on the existing system by identifying the people that can post-edit fast and well before he spends a lot of money trying to improve the system by small degrees that are hard to measure. He may have to bear with them as they adapt to the new task though.

I would summarize by saying, from what I see, when it comes to large financial outlays for MT, particularly for LSPs where margins are tight and MT might look tempting as a temporary respite from the war on rates it makes more sense to negotiate to pay the ferryman as little as possible before he gets you to the other side.
ReplyDelete
Replies
Dion WigginsNovember 25, 2013 at 12:07 PM
John, apologies for the delayed response also. I did not check back here in some time and just saw your reply. Systems like DoMT and Microsoft are labeled “Do it Yourself” systems and as such, directly imply that one knows how to do it themselves. Uploading data simply is not enough and that is all users of these systems typically do or typically know how to do.

What the DIY tools that are now flooding the market have enabled is the ability for users to make low quality MT engines very easily. As an example, we had a recent project with IOLAR translating from German to Slovenian (not an easy language pair) in the technical engineering domain. IOLAR tried for 6 months with Moses and even hired a computational linguist to work on the project, but still could not get their output to beat that of Google and had unpredictable output. Google’s output was also bad and unusable.

The decided to try Language Studio and in a matter of weeks and a fraction of the cost spent over the 6 month period. They not only got a high quality system operational, they quality was as good as a human in many cases. This case study can be seen here: http://www.asiaonline.net/EN/Resources/CaseStudies/IOLAR1.aspx

The issues the client faced and how we resolved them are outlined in the case study. There was a lot more required than simply uploading the data. Simon Bratina is on record as a result saying “From a business perspective it was clear that outsourcing to an expert was a better strategy than a DIY struggle, and I would say that our investment in Asia Online’s Language Studio™ technology was one of the
best technology investments that we have made.”

With respect to the 300% productivity, it is actually quite common and is most definitely an edge case. This is possible because of the granular approach to machine translation that we take as outlined in this article http://www.asiaonline.net/EN/LanguageStudio/CustomTranslationEngines.aspx#HumanGuidedCustomTranslationEngines.

Additionally, the Clean Data SMT model that Asia Online uses enables us to have much higher quality output and much more control. There are many that claim to do Clean Data SMT, but basically clean the formatting tags only. Clean Data SMT is defined more clearly here http://www.asiaonline.net/EN/Resources/Articles/CleanDataSMT.aspx. There are 4 key tenets required in order to do Clean Data SMT and none of them have anything to do with removing the formatting information, they relate to the quality, appropriateness and consistency of the data that is refined for a purpose. With the right tools, this approach delivers excellent results. With the right knowledge provided by Language Studio Linguists, the optimal approach to quality is defined for each engine with a unique customization training plan designed to meet the client’s goals, writing style, preferred terminology, target audience and purpose.

As an example of this, another recent customer tested a blind test set of 1000 segments. The result was that 369 were perfect word for word matches to the human reference, 310 were perfect (required no editing), but were different from the human reference, 30 were perfect and actually better translations than the human reference, 230 required minor edits and 61 required a medium level of edits. No segments at all required retranslation. On the same data, Google got 8 perfect matches with the human reference, 270 were perfect, requiring zero edits, none were better than the human reference, 111 required minor edits, 291 required medium edits and 320 required complete retranslation. This is not an unusual level of difference in a fully customized engine. I can list many others that had similar or even better results. Several of them are on our website as case studies, with the client speaking in their own words on a live webinar recording.
ReplyDelete
Replies
Dion WigginsNovember 25, 2013 at 12:11 PM
...continued from previous message.

“Promises that an SMT system will improve with use are always true if you retrain on post-edited segments. Unfortunately, it is not likely to be much of a relative improvement.” – this I agree with for the Dirty Data SMT model, but it works very differently with the Clean Data SMT model as is demonstrated here http://www.asiaonline.net/EN/Resources/Articles/CleanDataSMT.aspx#QualityImprovement

Often there is talk of the large financial outlays of MT when going with a specialized vendor such as ourselves and thus an attempt is made with DIY tools. At the same time, there is a lack of understanding of the skillset required. This is outlined here http://www.asiaonline.net/EN/Resources/Articles/UnderstandingTheDifferenceBetweenFullCustomizationandDIYCustomization.aspx#SkillsNeeded. Achieving a quality level that is useable for efficient post editing was clearly not the simple task that TAUS and third-party DIY proponents try to convey. The cost of experimenting with such systems often exceeds that of a professional system. The risk of failure is also notably higher. Anyone can build a low quality MT engine. But it takes skill and experience to develop a high quality MT engine.

While some DIY Moses efforts are successful, few DIY Moses users know how to address or even identify the cause of problems when they do occur, even if they have some knowledge or training in the core technological concepts. Moving beyond the initial problems in a DIY Moses custom engine is a significant challenge, even when expert NLP specialists or computational linguists were on staff. Skills in understanding data, not just algorithms and tools, are required to address the challenges in adapting, refining and creating data to address issues, either preemptively or as a remedy to issues.

Without a deep understanding of the cause of problematic machine translation output and corrective strategies to remedy them, the only improvement path available for most DIY Moses users is to upload post edited machine translations or additional translation memories. As there is little or no understanding of the impact that the new data will have, often the issues are not resolved and in many cases new issues and problems are introduced.

The case studies in this reply and in my earlier message show clearly that there are many cases of not being able to upload data and having a high quality engine. There are also many cases like ALT (http://www.asiaonline.net/EN/Resources/CaseStudies/AdvancedLanguageTranslation1.aspx) where they had no data at all, yet we were able to develop a high quality custom engine for them with our data manufacturing.

It is clear that even those users that had data would have a much better engine if the data was supplemented with additional data that is domain specific, client specific, writing style specific and refined and normalized. This is what the Language Studio Advanced Data Manufacturing and Clean Data SMT approach is all about. Uploading data into a DIY system may give an engine, but it will never give an optimal engine without human understanding applied to it and refinement for a purpose.

ReplyDelete
Replies

Add comment

eMpTy Pages

Pages

Wednesday, September 18, 2013

Understanding MT Customization

28 comments:

Get new posts by email:

Search This Blog

Pages

Featured Post

Comparing MT System Performance