Friday, July 25, 2014

Understanding The Drivers of Success with the Business Use of Machine Translation

We have reached a phase where there is a relatively high level of acceptance of the idea that machine translation can deliver value in professional translation settings. But as we all know the idea and the reality can often be far apart. It would be more accurate to say this acceptance of the idea that MT can be valuable, is limited to a select few among large enterprise users and LSPs (the TAUS community) and has yet to reach the broad translator community who continue to point out fundamental deficiencies in the technology or share negative experiences with MT.   So while we see growth in the number of attempts to use  MT, as it has gotten mechanically easier to do, there is also more evidence that many MT initiatives fail in achieving sustainable efficiencies in terms of real translation production value.

In a typical TEP (Translate-Edit-Proof) business translation scenario, A "good" MT system will provide three things to be considered successful:

1) Faster completion of all future translation projects in the same domain
2) Lower cost/word than doing it without the MT system
3) Better consistency on terminology especially for higher volume projects where many translators need to be involved

All of this should happen with a final translation delivered to the customer that is indistinguishable in terms of quality from a traditional approach where MT is not used at all.

It is useful to take a look at what factors underlie success and failure in the business use setting, and thus I present my (somewhat biased) opinions on this as a long-time observer of this technology (largely from a vendor perspective). I think that to a great extent we can already conclude that MT is very useful to the casual internet user, and we see that millions use it on a regular basis to get the gist of multilingual content they run into while traveling across websites and social platforms. (e.g. I use it regularly in Facebook.)

What are the primary causes of failure with MT deployments in business translation settings?

Incompetence with the technology: The most common reason I see for failed deployments is the lack of understanding that the key users have about how the technology works. Do-it-yourself (DIY) tools that promise that all you need to do is upload some data and press play are plentiful, and often promise instant success. But the upload and pray approach does not often work to provide any real satisfaction and business advantage. Unfortunately the state of the technology is such that some expertise and some knowledge are required. The translators and post-editors who have to work with the output results of these lazy Moses efforts, are expected to clean-up and somehow fix this incompetence usually at lower wage rates. And thus resentment grows and many are speaking up frequently in blogs and professional forums about bad MT experiences. Those that have positive MT experiences rarely speak up in these forums since the work is not so different from regular TM-based translation work and MT is often regarded as just another background tool that helps to get a project done faster and more consistently. MT output that does not provide cost and turnaround advantages for translation work cannot be considered to be useful for any professional use. Thus, a minimum requirement for using MT in professional settings is that it should enhance the production process.


Lowering cost is the ONLY motivation: The most naïve agencies simply assume that using MT, however incompetently, is a way to reduce the cost of getting a translation project done, or more accurately a way to justify paying translators less. Thus the post-editors are often in a situation where they have to clean up low quality MT output for very low wages. Given that we live in a world where the customers who pay for professional translation are asking for more efficient translation production i.e. faster and cheaper, agencies are being forced to explore how to do this, but this exploration needs to happen from a larger vision of the business.  As Brian Solis points out, using technology without collaboration and vision is unlikely to succeed (emphasis mine).
"That's the irony about digital transformation, it doesn't work when in of itself technology is the solution. Technology has to be an enabler and that enabler needs to be aligned with a bigger mission. We already found that companies that lead digital transformation from a more human center actually bring people together in the organization faster and with greater results," Solis says. “When technology is heralded above all else, there becomes an even greater disconnect between employees (translators)  and the challenges that their business is trying to solve.”
What many LSPs fail to understand is that their customers are asking for ongoing efficiencies, and new production models to handle the new kinds of translation challenges they face in their businesses. They are not just asking for a lower rate for a single project. Agencies focused on the bigger picture are asking questions like how MT can enable them to achieve new things and what's different about their customers needs today versus yesterday. With the right MT strategy in place, technology becomes an enabler, not the answer and enables agencies to build strong long-term relationships with customers who could not get the same price/performance with another agency that does not understand how to leverage technology for these new translation challenges. Agencies must evolve and reimagine their internal process, structure and culture to match this evolution in customer behavior among their own employees and translators.

No engagement with key stakeholders: Many if not all the bad MT experiences I hear about have one thing in common. Very poor communication between the MT engine developers (LSP), the customer and the translators and editors. MT is as much about new collaboration models as it is about effective engine development, and collaboration cannot happen without open and transparent communication, especially during the initial learning phase when there is a great deal of uncertainty for all concerned. If this communication process is in place in the early projects, it enables everybody to rise together in efficiency, and gets easier and more streamlined and more accurately predictable with each successive MT project. The communication issue is quite fundamental and I have tried to address and explore this in a previous post

What are the key drivers of successful deployments of MT?


Expert MT Engine Development: The building of MT engines has gotten progressively easier in terms of raw mechanics, but the development of MT engines that provide long-term competitive advantage remains a matter of deep expertise and experience. If as an LSP, you instantly create an MT engine that any of your competitors could duplicate with little trouble, you have achieved very little. Developing MT systems that provide long-term production advantage and a real competitive advantage is difficult, and requires real expertise and experience. The odds of a developer who has built thousands of engines producing a competitive engine are much higher than someone who uploads some data and hopes for the best. Skillful MT engine development is an iterative process where problems are identified and resolved in very structured development cycles so that the engine can improve continuously with small amounts of corrective feedback. Knowing which levers to pull and adjust to solve different kinds of problems is critical to developing competition beating systems. Really good systems that are refined over time are very difficult to match and will continue to provide price/performance advantages over the long-term that competitors will find difficult to match.


Engaged Project Managers and Key Translators: The most valuable feedback to enhance MT system output will come from engaged PMs and translators who see broad error patterns and and can help develop corrective strategies for these errors. Executives should always strive to ensure that these key people are empowered and encourage them to provide feedback in the engine development process. For most PMs today, MT is new and an unknown and unpredictable element in the translation production process. Thus in initial projects, executives should allow PMs great leeway to develop critical skills necessary to understand and steer both the translators and the MT engine developers. These new skills are very key to success and can help build formidable barriers to competition. While very large amounts of high quality data can sometimes produce excellent MT systems, a scenario where you have a a good project manager steering the MT developers and coordinating with translators to ensure that key elements of an upcoming project are well understood, will almost always result in favorable results other things being equal, especially with challenging situations like very sparse data or when dealing with tough language combinations. 

Communication and collaboration are key to both short and long-term success. The worst MT experiences often tend to be with those LSPs (often the largest ones ) where communication is stilted, disjointed and focused on CYA scenarios rather than getting the job done right. Successful outcomes are highly likely when you combine informed executive sponsorship, expert MT engine development and have empowered PMs who communicate openly and frequently with key translators to ensure that the job characteristics are well understood and that outcomes have a high win-win potentiality. Even really good MT output can fail when the human factors are not in sync. Remember that some translators really don’t want to do this kind of work and forcing them to do it is in nobody’s interest.

Fair & Reasonable Compensation for Post-Editors: I have noted that a blog post I wrote on this issue almost 30 months ago still continues to be amongst the most popular posts I have written. This is an important issue that needs to be properly addressed with a basic guiding principal, pay should be related to the specific difficulty of the work and quality of the output. So low quality output should pay higher per word rates than very high quality output. This means that you have to properly understand how good or bad the output is in as specific and accurate terms as possible since people’s livelihoods are at stake. This accuracy can be gauged in terms of average expected throughput i.e. words per hour or words per day. You may have to experiment at first and be prepared to overpay rather than underpay. Make sure that translators are involved in the rate setting process and that the rate setting process is clearly communicated so that it is trusted rather than resisted. Translators should also ask for samples to determine when a job is worthwhile or not. The worst scenario is where an arbitrary low rate is set without regard for the output quality, and typically in these scenarios incompetent MT practitioners always tend to go too low on the rates, resulting in discontent all around. 


Real Collaboration & Trust Between Stakeholders: This may be the most critical requirement of all as I have seen really excellent MT systems fail when this was missing. Translation is a business that requires lots of interaction between humans with different goals and if these goals are really out of sync with each other it is not possible to achieve success from multiple perspectives. Thus we often see translators feel they are being exploited or agencies feeling they are being squeezed to offer lower rates because an enterprise customer has whipped together some second rate MT system together with lots of noisy data for them to “post-edit”. When the technology is used (actually misused) in this way it can only result in a state of in equilibrium that will try to correct itself or make a lot of noise trying to find balance. This I think is the reason why so many translators protest MT and post-editing work. There are simply too many cases of bad MT systems combined with low rates and thus I have tried to point out how a translator can make an assessment of a post-editing job that is worth doing from an economic perspective at least. 

Perhaps what we are witnessing at this stage of the technology adoption cycle is akin to growing pains, like the clumsy first steps of a baby or the shyster attempts of some agencies to exploit translators as some translators have characterized it. Both cases are true I feel. And so I repeat what I said before about building trusted networks as this seems to be an essential element for success.

The most successful translators and LSPs all seem to be able to build “high trust professional networks”, and I suspect that this will be the way forward i.e. collaboration between Enterprises, MT developers, LSPs and translators who trust each other. Actually quite simple but not so common in the professional translation industry.

There seems no way to discuss the use of MT in professional settings without raising the ire of at least a few translators as you can see from some of the comments below. So I thought it might be worth trying to lighten the general mood of these discussions with music. I chose this song carefully as some might even say the lyrics are quite possibly the result of machine translation or not so different from what MT produces. As far as I know it is just one example of the poetic mind of Bob Dylan. If you can explain the lyrics shown below you are a better interpreter and translator than I am. Musically this is what I would call a great performance and a good vibe. So here you have a rendition of Dylan's My Back Pages on the Empty Pages blog.

Crimson flames tied through my ears
Rollin’ high and mighty traps
Pounced with fire on flaming roads
Using ideas as my maps
“We’ll meet on edges, soon,” said I
Proud ’neath heated brow
Ah, but I was so much older then
I’m younger than that now

Half-wracked prejudice leaped forth
“Rip down all hate,” I screamed
Lies that life is black and white
Spoke from my skull. I dreamed
Romantic facts of musketeers
Foundationed deep, somehow
Ah, but I was so much older then
I’m younger than that now

Girls’ faces formed the forward path
From phony jealousy
To memorizing politics
Of ancient history
Flung down by corpse evangelists
Unthought of, though, somehow
Ah, but I was so much older then
I’m younger than that now

A self-ordained professor’s tongue
Too serious to fool
Spouted out that liberty
Is just equality in school
“Equality,” I spoke the word
As if a wedding vow
Ah, but I was so much older then
I’m younger than that now

In a soldier’s stance, I aimed my hand
At the mongrel dogs who teach
Fearing not that I’d become my enemy
In the instant that I preach
My pathway led by confusion boats
Mutiny from stern to bow
Ah, but I was so much older then
I’m younger than that now

Yes, my guard stood hard when abstract threats
Too noble to neglect
Deceived me into thinking
I had something to protect
Good and bad, I define these terms
Quite clear, no doubt, somehow
Ah, but I was so much older then
I’m younger than that now

Friday, June 20, 2014

The Expanding Translation Market Driven by Expert Based MT

There has been much talk amongst some translators about how MT is a technology that will take away work and ultimately replace them, and thus some translators dig in their heels and resist MT at every step. The antagonistic view is based on a zero-sum game assumption that if a computer can perform a translation that they used to do, it inevitably means less work for them in future. In some cases this may be true, however this presumption is worth a closer look.

While stories of MT mishaps and mistranslations abound, (we all know how easy it is to make MT look bad), it is becoming increasingly apparent to many in the professional translation business, that it is important to learn how to use and extend the capabilities of this technology successfully, as the technology also enables new kinds of translation and linguistic engineering projects that would simply be impossible without viable and effective implementations of expert MT technology. Generally, MT is not a wholesale replacement for humans and in my opinion never will be. When properly implemented, it is a productivity enhancer and a way to expand the scope of multilingual information access for global populations that can benefit from this access. 

MT is in fact as much or more a tool/technology to create new kinds of translation work, as it is a tool to get traditional translation work done faster and more cost effectively. While MT is unlikely to replace human beings in any application where translation quality and semantic finesse is really important, there are a growing number of cases that show that MT is suitable for enabling many new kinds of business information translation initiatives that may in fact generate whole new kinds of translation related work for some if not all translators. MT is already creating new kinds of translation work opportunities in all the following scenarios:

  • With high volume content that would just not get translated via traditional human translation modes for economic and timeliness reasons, and thus the use case scenario is either use MT or do nothing. MT is used to lower total costs that make content viable to translate without which it would have never been translated. This in turn has created new work for human translation professionals in editing the most critical content and helping to raise the average quality of expert MT output.
  • With content that cannot afford human translation because the value of the information is clearly not worth the typical human translation cost scenario.
  • High value content in social networks that is changing every hour and every day and has great value for a brief moment, but has limited value a few weeks after the fact.
  • Knowledge content that facilitates and enhances the global spread of critical knowledge.
  • Content that is created to enhance and accelerate information access to global customers, who prefer a self-service model as in technical support knowledge base databases which have new content streaming in on a daily basis.
  • Content that does not need to be perfect but just approximately understandable for exploratory or gist purposes.
One point worth clarifying upfront is that much of the interest in MT by global enterprises is driven by their need to face the barrage of product/service related comments, discussions and opinions that flow in social media and influence how customers view their products. This social media banter is very influential in driving purchase decisions, often much more than corporate marketing communications which are seen as self-serving and self-promoting. Also, as products grow in complexity it becomes important to share more information about power features and extended capabilities. The issue of growth in the sheer volume of information is increasingly clear to most but there are actually translators out there who think the content tsunami is a myth. EMC and IDC have well documented studies that show the continuing content explosion. 

Global enterprises who wish to engage in commerce with global populations have discovered that the control of marketing has shifted away from corporate marketing departments to consumers who share intimate details or real customer experiences. User generated content (UGC) such as product experience related comments in social media e.g. blogs; Facebook, YouTube, Twitter and community forums have become much more important to final business outcomes. This UGC content is now influencing customer behavior all over the world and is often referred to as word-of-mouth-marketing (WOMM). Consumer reviews are often more trusted than corporate marketing-speak and even “expert” reviews. We all have experienced Amazon, travel sites, C-Net and other user rating sites which document actual consumer experiences. This is also happening at B2B levels. It is useful to both global consumers and global enterprises to make this content multilingual. Given the speed at which this information is produced, MT has to be part of the translation solution to digesting this information, and conversion to multilingual modes, to influence and assist global customers in a time frame where it is useful. For those of us who understand the translation challenges of this material, it is clear that involving humans in the expert MT development process providing linguistic and translation guidance in this process, will produce better MT output quality. The business value is significant so I expect that linguists who add value to this conversion process will be valued and sought after.

While some translators see MT as a big bad wolf that looms menacingly around, they fail to see that the world has changed for everybody, especially corporate marketers, PR professionals, and any enterprise sales function facing customers who share information freely with details of personal customer experiences. An individual blogger brought Dell to its knees with a blog post titled Dell Hell. Some say it triggered a huge stock price drop. A viral video about careless baggage handling of musical instruments resulted in a PR nightmare for United Airlines and perhaps even a negative impact on their stock price. This user experience content really matters to a global enterprise and they need strategies to deal with this as it spreads across the globe and influences purchase behavior. As the infographic below (bigger version available by clicking on this link) shows, every time a consumer posts an experience on the web it is seen by 150 people, which means small improvements in brand advocacy result in huge revenue increases, and 74% of consumers now rely on social networks to guide their purchasing decisions. This means that non-corporate content becomes much more important to understand and translate since these experiences are being shared in multiple languages.

This graph details how negative experiences multiply in negative impact, as consumers tend to be much more invested in sharing bad experiences than they are about sharing positive experiences. Thus it is very important that global enterprises monitor social media carefully. This is yet another example of what content really matters and how social media drives purchasing behavior. 

So if all this is going on, it also means that what used to be the primary focus for the professional translation industry, needs to change from the static content of yesteryear to the more dynamic and much higher volume user generated content of today. The discussions in social media are often where product opinions, brand credibility and product reputations are formed and this is also where customer loyalty or disloyalty can form as the customer support experience shows. This is what we call high value content. MT is a critical technology that is necessary as a foundational element for the professional translation world to play a useful role in solving these new translation challenges. However, it is important to also understand that this challenge cannot be solved by any old variant of MT, especially the upload and pray approaches of most DIY (Do It Yourself) MT. This is challenging even for experts and failure is par for the course..

Where MT creates new translation work opportunities


Some specific examples of the expanding translation pie that MT enables and drives:

The knowledge base use-case scenario has been well established as something that improves customer satisfaction and empowerment for many global enterprises with high demand technical support information. To develop and improve the quality of the MT translations in knowledge bases, very special linguistic work and translations need to be done. And while we see many examples of translators commenting on the poor quality of the translations we also see that millions of real customers provide feedback to the global enterprise suggesting that they find these “really bad” translations quite useful for their purposes, and prefer that to trying to read a tech note in a language that is not as familiar. Thus, while MT is imperfect we have evidence that many (millions) find it useful. Generic users on the internet are information consumers who have to deal with a language barrier. They are often the customers that global enterprises wish to communicate with. Their growing acceptance of MT suggests that MT has utility in general as a way to communicate with global customers, even though it is clear that a machine’s attempt at translation is rarely if ever as good as a human translation.

We are now also seeing that social media content based sentiment analysis is increasingly being considered as a high value exercise by marketing groups in understanding global markets. To translate international social media content it is useful to understand core terminology and get critical language translations in place and steer expert MT. This is new kinds of linguistic and translation related work which involves understanding the behavior of language in specific domains and discussion forums and then building predictive translation models for them. This new linguistic engineering work is an opportunity for progressive translators. New skills are needed here, an understanding of corpus at a linguistic profile level, the ability to identify MT error patterns and develop corrective strategies by working together with experts. The objective here is to understand the customer voice by language and develop appropriate marketing response strategies.

We also see the growth of sharing internal product development information across language within large global enterprises. Rather than use a public MT engine that can compromise and expose secret product plans it has become important to develop internal corporate engines that help employees to share documents and presentations in a secure environment and at least get a high quality gist. This effort too benefits from skilled linguistic engineering work, corpus analysis, terminology development and strategic glossary and TM data manufacturing. 

Every large translation project that is ONLY done because the cost/time characteristics that expert managed MT lends to it will generate two kinds of translation opportunities that would not exist were it not for the basic fact that MT made this content viable and visible in a multilingual context:

  1. Post-editing of the highest value material in a multimillion word corpus
  2. Translation of content that simply would NOT have been considered for translation had MT not made it economically viable and feasible.

So the next time you hear somebody bashing on “MT” ask yourself a few questions:

  1. What kind of MT variant are they talking about as there are many shades of grey? Amateur DIY experiences producing shoddy MT output abound, and translators should learn to identify these quickly and avoid them. Dealing with experts provides a very different experience and allows for ongoing feedback and improvement. MT is a tool that is only as good as the skill and competence of the users and is not suitable for many kinds of high value translation work.
  2. Are you dealing with a client/customer who has a larger vision for expanding the scope of translation? There is likely a bright future with anybody who has a focus on these new massive data volume social media projects.
  3. Are you playing a role in getting information that really matters to customers and marketers translated? While user documentation is still important, it is clear the relative value of this kind of content continues to fall as an element of building great customer experiences. The higher the value of the information you translate to your customer, the higher your value to the client.
But I expect that there will still be many translators who see no scenario in which they interact with MT in any way, expert-based or not, and that is OK, as it is a very different work experience that may not suit everybody. The very best translators can still put machines to shame with their speed and accuracy. But I hope that we will see more MT naysayers base their opinions about MT on professionally focused expert MT initiatives, rather than the well-publicized generic MT and lazy DIY MT initiatives that are much easier to find.

"You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete." - Buckminister Fuller

Friday, May 30, 2014

Monolithic MT or 50 Shades of Grey?

In the many discussions by different parties in the professional translation world involving machine translation, we see a great deal of conflation and confusion because most people assume that all MT is equivalent and that any MT under discussion is largely identical in all aspects. Here is a slightly modified description of what conflation is from the Wikipedia.
Conflation occurs when the identities of two or more implementations, concepts, or products, sharing some characteristics of one another, seem to be a single identity — the differences appear to become lost.[1] In logic, it is the practice of treating two distinct MT variants as if they were one, which produces errors or misunderstandings as a fusion of distinct subjects tends to obscure analysis of relationships which are emphasized by contrasts.
However, there are many reasons to question this “all MT is the same” assumption, as there are in fact many variants of MT, and it is useful to have some general understanding of the core characteristics of each of these variants so that a meaningful and more productive dialogue can be had when discussing how the technology can be used. This is particularly true in discussions with translators as the general understanding is that all the variants are essentially the same. This can be seen clearly in the comments to the last post about improving the dialogue with translators. Misunderstandings are common when people use the same words to mean  very different things.

There may be some who view my characterizations as opinionated and biased, and perhaps they are, but I do feel that in general these characterizations are fair and reasonable and most who have been examining the possibilities of this technology for a while, will likely agree with some if not all of my characterizations.

The broadest characterization that can be made about MT is around the methodology used in developing the MT systems i.e. Rule-based MT (RbMT) and Statistical MT (SMT) or some kind of hybrid as today users of both of these methodologies claim to have a hybrid approach. If you know what you are doing both can work for you but for the most part the world has definitely moved away from RbMT, and towards statistically based approaches and the greatest amount of commercial and research activity is around evolving SMT technology. I have written previously about this but we continue to see misleading information about this often, even from alleged experts. For practitioners the technology you use has a definite impact on the kind and degree of control you have over the MT output during the system development process so one should care what technology is used. What are considered valuable skills and expertise in SMT may not be as useful with RbMT and vice versa, and they are both complex enough that real expertise only comes from a continuing focus and deep exposure and long-term experience. 

The next level of MT categorization that I think is useful is the following:
  • Free Online MT (Google, Bing Translate etc..)
  • Open Source MT Toolkits (Moses & Apertium)
  • Expert Proprietary MT Systems
The toughest challenge in machine translation is the one that online MT providers like Google and Bing Translate attempt to address. They want to translate anything that anybody wants to translate instantly across thousands of language pairs. Historically, Systran and some other RbMT systems also addressed this challenge on a smaller scale, but the SMT based solutions have easily surpassed the output quality of these older RbMT systems in a few short years. The quality of these MT systems varies by language, with the best output produced in Romance languages (FR, IT, ES, PT) and the worst quality in languages like Korean, Turkish and Hungarian and of course most African, Indic and lesser Asian languages. Thus the Spanish experience with “MT” is significantly different to the Korean one or the Hindi one. This is the “MT” that is most visible, and most widely used translation technology across the globe. This is also what most translators mean and reference when they complain about “poor MT quality”. For a professional translator user, there are very limited customization and tuning capabilities, but even the generic system output can be very useful to translators working with romance languages and save typing time if nothing else. Microsoft does allow some level of customization depending on user data availability. This type of generic MT is the most widely used “MT” today, and in fact is where most of the translation done on the planet today is done. The number of users numbers in the hundreds of millions per month. We should note that in the many discussions about MT in the professional translation world most people are referring to these generic online MT capabilities when they make a reference to “MT”.

Open Source MT Toolkits (Moses & Apertium)

I will confine the bulk of my comments to Moses, mostly because I pretty much know nothing about Apertium other than it being an open source RbMT tool. Moses is an open source SMT toolkit that allows anybody with a little bit of translation memory data to experiment and develop a personal MT system. This system can only be as good as the data and the expertise of the people using the system and tools, and I think it is quite fair to say that the bulk of Moses systems produce lesser/worse output quality than the major online generic MT systems. This does not mean that Moses users/developers cannot develop superior domain-focused systems but the data,skills and ancillary tools needed to do so are not easily acquired and I believe definitely missing in any instant DIY MT scenario. There is a growing suite of instant Moses based MT solutions that make it easy to produce an engine of some kind, but do not necessarily make it easy produce MT systems that meet professional use standards. For successful professional use the system output quality and standards requirements are generally higher than what is acceptable for the average user of Google or Bing Translate. 

While many know how to upload data into a web portal to build an MT engine of some sort, very few know what to do if the system underperforms (as many initially do) as it requires diagnostic, corpus analysis and identification skills to get to the source of the problem, and then knowledge on what to fix and how to fix it as not everything can be fixed. It is after all machine translation and more akin to a data transformation than a real human translation process.  Unfortunately, many translators have been subjected to “fixing” the output from these low quality MT systems and thus the outcry within the translator community about the horrors of “MT”. Most professional translation agencies that attempt to use these instant MT system toolkits underestimate the complexity and skills needed to produce good quality systems and thus we have a situation today where much of the “MT” experience is either generic online MT or low quality do-it-yourself (DIY) implementations.  DIY only makes sense if you really do know what you are doing and why you are doing it, otherwise it is just a gamble or a rough reference on what is possible with “MT”, with no skill required beyond getting data into an up loadable data format.

Expert Proprietary MT Systems
Given the complexity, suite of support tools and very deep skill requirements of getting MT output to quality levels that provide real business leverage in professional situations I think it is safe to say that this kind of “MT” is the exception rather than the rule. Here is a link to a detailed overview of how an expert MT development process would differ from a typical DIY scenario. I have seen a few expert MT development scenarios from the inside and here are some characteristics of the Asia Online MT development environment:
  • The ability to actively steer and enhance the quality of translation output produced by the MT system to critical business requirements and needs.
  • The degree of control over final translation output using the core engine together with linguist managed pre processing and post-processing rules in highly efficient translation production pipelines.
  • Improved terminological consistency with many tools and controls and feedback mechanisms to ensure this.
  • Guidance from experts who have built thousands of MT systems and who have learned and overcome the hundreds of different errors that developers can make that undermine output quality.
  • Improved predictability and consistency in the MT output, thus much more control over the kinds of errors and corrective strategies employed in professional use settings.
  • The ability to continuously improve the output produced by an MT system with small amounts of strategic corrective feedback.
  • Automatic identification and resolution of many fundamental problems that plague any MT development effort.
  • The ability to produce useful MT systems even in scarce data situations by leveraging proprietary data resources and strategically manufacturing the optimal kind of data to improve the post-editing experience.
   So while we observe many discussions about “MT” in the social and professional social web, they are most often referring to the translator experience with generic MT as this is the most easy to access MT. In translator forums and blogs the reference can also often be a failed DIY attempt. The best expert MT systems are only used in very specific client constrained situations and thus rarely get any visibility, except in some kind of raw form like support knowledge base content where the production goal is always understandability over linguistic excellence. The very best MT systems that are very domain focused and used by post editors who are going through projects at 10,000+ words/day are usually very client specific and for private use only and are rarely seen by anybody outside the involvement of these large production projects. 

It is important to understand that if any (LSP) competitor can reproduce your MT capabilities by simply throwing some TM data into an instant MT solution, then the business leverage and value of that MT solution is very limited. Having the best MT system in a domain can mean long-term production cost and quality advantage and this can provide meaningful competitive advantage and provide both business leverage and definite barriers to competition.

In the context of the use of "MT" in a professional context, the critical element for success is demonstrated and repeatable skill and a real understanding of how the technology works. The technology can only be as good as the skill, competence and expertise of the developers building these systems. In the right hands many of the MT variants can work, but the technology is complex and sophisticated enough that it is also true that non-informed use and ignorant development strategies (e.g. upload and pray) can only lead to problems and a very negative experience for those who come down the line to clean up the mess. Usually the cleaners are translators or post-editors and they need to learn and insist that they are working with competent developers who can assimilate and respond to their feedback before they engage in PEMT projects. I hope that in future they will exercise this power more frequently. 

So the next time you read about “MT”, think about what are they actually referring to and maybe I should start saying Language Studio MT or Google MT or Bing MT or Expert Moses or Instant Moses or Dumb Moses rather than just "MT". 

Addendum: added on June 20

This was a post that I just saw, and I think provides a similar perspective on the MT variants from a vendor independent point of view. Perhaps we are now getting to a point where more people realize that competence with MT requires more than dumping data into the DIY hopper and expect it to produce useful results.

Machine translation: separating fact from fiction