Thursday, December 29, 2011

Review: Most Popular Blog Posts from 2011

Blogs are about sharing with authenticity. A good blog can help you really connect deeply with your audience in a meaningful way because the content is not only relevant but insightful and personal. I think most enterprises miss that point. When you do it right, your customers will walk away not only having learned something new but will also feel much more connected to your brand.     David Armano EVP, Global Innovation & Integration at Edelman Digital
Don’t say anything online that you wouldn’t want plastered on a billboard with your face on it. -- Erin Bury

 One of the things that I enjoy about blogging is the feedback that one gets, and the continuing and evolving  discussion that sometimes comes forth from these posts. I find it helps to clarify my thinking on what really matters, and the critical feedback one gets, on assumptions that may previously go unquestioned is very useful in just evolving my own thinking on these issues. The feedback and the rankings helps me, and others too, I think, to understand what strikes a chord in the reader community, and can also sometimes help to guide further evolutionary thinking on the subjects at hand. This is is a ranking of the most popular (Unique Visitors and Page Views) posts of the year based on the data provided by Google Analytics.
  1. Analysis of the Shutdown Announcements of the Google Translate API and the subsequent posts on what this may mean for the translation industry were by far the most popular posts of the year. The original post authored by Dion Wiggins was also referenced by the Atlantic and  other mainstream media and still continues to be an influential view on the announcement today, probably much more so than any other publicly offered opinion in the professional translation industry.
  2. The Continuing Saga & Evolution of Machine Translation was coverage of the IMTT 7th Conference in Cordoba triggered active debates and discussions MT, automation and translator compensation in several forums and clearly struck a chord for many.
  3. The Future of Translation Memory (TM) is a posting that continues to receive high new visit rates long after it was originally published.
  4. The Building Momentum for Post-Edited Machine Translation (PEMT) a number of case studies on the increasing use of post-edited MT to meet business timeliness and production cost requirements.
  5. Has Google Translate Reached the Limits of its Ongoing Improvement? More evidence that more data Is not always better especially for MT, but even for Search, and the many reasons to consider the data quality, yet again.
  6. The Growing Interest & Concern About the Future of Professional Translation About reactions to the changes underway in translation
  7. Standards: the Importance of Measurement A guest post by Valeria Cannavina on how standards can drive quality improvements
  8. The Moses Madness and Dead Flowers A post that questions some of the assumptions made by “instant Moses” advocates and challenges the long-term value of these experiments. Strong opinions voiced in the comments.
  9. Translation Crowdsourcing An exploration of the driving forces underlying successful translation crowdsourcing efforts.
  10. An Exploration of Post-Editing MT – Part I Discussion on the nature and compensation of post-editing MT work.
Please Repeat: Influence is NOT Popularity --  Brian Solis

While reader traffic is one way to measure the impact of articles, there are also other ways that capture the relative influence of individual posts. PostRank is one such measure that I think monitors how others reference the posts, and monitors where and when content generates meaningful interactions across the web. They provide a truer picture of the relative influence and impact of individual blog posts, and thus I include the latest PostRank snapshot here. (You can link to the posts through the table on the right of this blog text).  This table shows that some articles that may not have had high direct readership may actually be much more useful to readers and it is interesting to see how different the two lists are though it is clear that the analysis of the Google Translate API shutdown/pay-wall was a major hit no matter how you look at it. 


It is also interesting to note that some older posts continue to strike a chord with readers and remain active in terms of visibility because the themes are longer lived and also perhaps because they ring true. The original post on standards and some of the posts discussing disintermediation were also posts that generate continuing interest and continue to show up in both the Google Analytics and PostRank ratings.

I have noticed that we are getting more clarity on post-editing MT work in many different ways including new models for more equitable compensation. I am hoping to highlight best practices in this area in the coming year as I believe it will be critical to ongoing adoption and success with MT technology. I also think there will be much more to share on best practices of post-editing MT and I expect that we may find that it is not quite the dreaded beast it has often been portrayed to be.

Social Media is not just a set of new channels for marketing messages. It’s an opportunity for organizations to align with the marketplace and start delivering on behalf of customers  -- Valeria Maltoni,

I would also like to invite some of you to contribute to the discussion in this blog (guest posts) and assure you that I believe in open discourse and think it is useful for many different viewpoints to be aired to get closer to the “truth”. So please don’t hesitate to send me contributions that you think might be interesting to the audience that has been following this blog. I thank you for your support and I hope that the content here will continue to earn your interest and comments to extend the discussion beyond my thoughts on key issues.

For those who are not aware, there are some very interesting videos from presentations at TAUS that I reported on in the 4th ranked posting above on PEMT momentum.   

Videos of presentations and panels at the recent TAUS User Conference in Santa Clara are now available on YouTube for everyone. The links below will take you to playlists on specific themes: 

I wish you all a wonderful holiday season and look forward to sharing observations in the coming year, a year that many say will be a turning point across many dimensions.

Friday, December 2, 2011

The Moses Madness and Dead Flowers

Machine translation technology has an unfortunate history of overpromising and under delivering. At least 50 years of doing this and sometimes it seems that the torture will never stop. MT enthusiasts continue to make promises that often greatly exceed the realistic possibilities. Recently, in various conversations, I have seen that the level of unwarranted exuberance around the possibilities with the Moses Open Source SMT technology is rising to peak levels. This is especially true in the LSP community. While most technologies go through a single hype cycle, MT seems destined to go through several of these cycles with each new approach and the latest of these is what I call Moses Madness. It has become fashionable of late to build instant DIY MT engines, using tools that help you with the mechanics of running the software that is “Moses”.  While some of these tools greatly simplify the mechanical process of running the Moses software, they do not give you any insight into what is really going on inside the magic box or any clues to what you are doing at all.  Moses is a wonderful technology and it enables all kinds of experimentation that furthers the art and science of data-driven MT, but it does require some knowledge and understanding for real success. It is possible to get a quick and dirty MT engine together using some of these tools, but for long-term strategic translation production leverage, I am not so sure. Thus it is my sense that we are at the peak of the hype cycle for DIY Moses.

I would like to present a somewhat contrarian viewpoint to much of what you will hear at TAUS - “Let a thousand MT systems bloom”,  and other online forums on getting started with instant MT approaches.  IMO Moses and especially instant Moses is clearly not the final answer. While Moses is a starting point for real development, it should not be mistaken as the final destination. I think there are a number of reasons that you should pause before you jump in, and at least build up some knowledge before taking the dive. I have attempted to enumerate some of these reasons, but I am sure some will disagree. Anyway, I hope an open discussion will be valuable in reaching a more sustainable and accurate view of the reality and so here goes, even though perhaps I am rushing in where angels fear to tread.  And of course my opinion on this matter is not impartial, given my involvement with Asia Online.

The Sheer Complexity
As you can see from the official description, Moses is an open source project that makes its home in the academic research community. This link describes some of the conferences where people with some expertise and understanding of what Moses actually does convene and share information. Take a look at the program committee of these conferences to get a sense of what the focus might be. Now take a look at the “ step-by-step guide”, which students in NLP are expected to be able to handle. It is what you would have to do to build an MT system if did not have the DIY kit.  Most of the instant/simplified Moses engine services in the market focus on simplifying this and only this aspect of developing an MT engine.

Clearly it would be good to have some knowledge of what is going on in the magic box BEFORE you begin, and perhaps it would even be really nice to have some limited team expertise with computational linguistics to make your exploration more useful. Remember that hiding complexity is not quite the same as removing complexity, and it would be smart to not underestimate this complexity BEFORE you begin. Anybody who has ventured into this has probably realized already, that while some of the complexity has been hidden, there is still much that is ugly and complicated to deal with in Moses world, and often it feels like the blind leading the blind.

I have noticed that many in professional translation industry have trouble even with basics like MT system BLEU scoring, and even some alleged MT experts barely know how to measure BLEU accurately and fairly. Thus I am skeptical that LSPs will be able to jump into this with any real level of competence in the short term. A level of competence that assures or at least raises the probability of business success i.e. enhances long-term translation productivity. Though it is possible that a hardy few will learn over the next 2-5 years, it is also clear that NLP and computational linguistics is not for everyone. The level and extent of knowledge required is simply too specialized and vast. As Richard Feynman said:”I think it’s much more interesting to live not knowing, than to have answers which might be wrong.” (Though he was talking about beauty, curiosity and mostly about doubt). 

Alon Lavie, AMTA President, CMU NLP professor and President of Safaba (which develops hosted MT solutions that are largely built on top of Moses) says:
“ I am of course a strong supporter, and am extremely enthusiastic about Moses and what it has accomplished in both academic research and in the commercial space. I also think there is indeed a lot of value in the various DIY offerings (commercial and Achim's M4L efforts). But these efforts primarily target and solve the *engineering complexity* of deploying Moses. While this undoubtedly is a critical bottleneck, I think there is a potential pitfall here that users that are not MT experts (the vast majority) would come to believe that that's all it takes to build a state-of-the-art MT system. The technology is actually complex and is getting more complex and involved to master. Users may be disappointed with what they get from DIY Moses, and more detrimentally, become convinced that that's the best they can accomplish, when in fact letting expert MT developers do the work can result in far better performance results. I think this is an important message to communicate to potential users, but I'm not sure how best to communicate this message.”

Thus, I will join Alon in trying to convey the message that Moses is a starting point in your exploration of MT and not the final answer, and that experience, expertise and knowledge matter. Perhaps, a way to understand the complexity issue better, is to use some analogies. 

The sewing machine/tailor analogy: Moses can be perhaps be viewed as a very basic sewing machine. You still need to understand how to cut cloth, stitching technique, fabric and lining selection, measurement, pocket technique (?), final fit modifications and so on to make clothes. Tailors do it better and expert tailors that only focus on men's suits do it even better than you or I would with the same sewing equipment. The closest to a ready made suit would be the free MT engines, except in this analogy they are only available in one size. Expertise really does matter folks if you want to customize-to-fit.

The DIY car analogy: In this analogy, Moses is the car engine and perhaps a very basic chassis, one that would be dangerous on a highway or bumpy roads. The DIY task is to build a car that can actually be used as transportation. This will require some understanding of auto systems design, matching key components to each other, tires, braking systems, body design and so on. Finally you also need to learn to drive and you would want the car to turn right when you want to. Again, expert mechanics are more likely to be successful even though there are some great DIY kits out there for NASCAR enthusiasts.

The Learning Curve
Even if you do have a team with some NLP expertise, remember that working with any complex technology involves a process of learning and usually an apprenticeship to get to a point of real skill. The people who build SMT engines at Microsoft, Google, Asia Online and other MT research teams have built thousands of MT engines during their MT careers. The skills developed and lessons learned during this experience are not easily replicated and embedded into open source code. Failure is often the best teacher and most of these teams have failed often enough to understand the many pitfalls along an SMT engine development path. To expect that any “instant” Moses solution is going to capture and encapsulate all of this is naïve and and somewhat arrogant. This is the kind of skill where expertise builds slowly, and comes after much experimentation across many different kinds of data and use case scenarios. Just as professional tailors and expert mechanics are likely to produce better results, MT experts who work across many different use scenarios are likely to produce much better results than a do-it-yourself enthusiast might. These results translate into long-term savings that should far exceed an initially higher price.

The objective of MT deployment for most LSP users is to increase translation productivity. (Very few have reached the next phase where they are translating new content that would never be translated were it not for MT). Thus getting the best possible systems that produce the highest possible MT output quality really matters to achieve this core objective of achieving measurable translation productivity. To put this in simpler terms, the difference between instant Moses systems and expert MT systems could be as much as 4,000 words/day versus 10,000+ words a day. Expert MT engine developers like Asia Online have multi-dimensional approaches, NLP skills, and many specialized tools in place to extract the maximum amount of information out of the data they have available. The use of these tools is guided by two team members with deep expertise on the inner workings of Moses and SMT in general. The learning process driving the development of these comprehensive tools takes years, and they enable Asia Online custom systems to produce superior translation output to the free online MT engines consistently. One team member has literally written the book on SMT and created Moses and thus one could presume is quite likely to have the expertise to develop better MT systems than most. 

I have already heard from several translators who when asked to post-edit “instant Moses” output they know is inferior, simply run the same source material through Google/Bing and edit that instead, to improve their own personal productivity and save themselves some anguish. So if your Moses engine is not as good as these public engines you will find that translators will simply bypass them whenever they can. And they may not actually tell you that they are doing this. Post-editors will generally choose the best MT output they can get access to, so beware if your engine does not compare well. And buyers, insist on seeing how these instant MT engines compare to the public free engines on a meaningful and comprehensive test set, not just a 100 or so sentences.

However, I am also aware that some Moses initiatives have produced great results e.g. Autodesk,(for you doubters on the value of PEMT, here is clear evidence from a customer viewpoint) and here I would caution against any extrapolation of these results and expectation to achieve this for any and every Moses attempt. The team that produced these systems were more technically capable and knowledgeable than most, and I am also aware that that their training data was better suited for SMT than most of the TM you will find in the TDA or on the web. And even here, I would argue that MT experts would probably produce better results with the same data especially with the Asian languages where other support tools and processes become much more imperative.

As others have stated before me, the global population of people who actually understand how these data-driven systems work is really quite tiny, miniscule in fact. If you are building Moses systems you should be comparing yourself to the public free engines, as you may find that all your effort was much ado about nothing. One would hope that you will produce systems that compare favorably to these “free” options. And if your competition includes the lads and lassies at Microsoft and Google, one would hope that you know more about how to do this than pushing the instant Make-my-engine button. The financial cost of ignorance is substantially higher than most are able to define in terms of lost opportunity costs, and learning costs (a.k.a. mistakes) should be factored into a real TCO (Total Cost of Ownership).

The bottom line: Success with SMT requires very specialized skills that include, some NLP background, massive data handling skills, knowledge of parallel computing processing, linguistic data management tools, corpus analysis and linguistic structural analysis capabilities for optimal results not to mention a culture that nurtures collaboration with translators.

The Data, the Data, The Data
Moses is a data-driven technology and thus is highly dependent on the data that is used. Data volume is required to get good output from the systems and thus users have to gather the data from public sources and it is important to normalize and prepare the data for optimal performance. Most LSPs will not have the data or skills needed to gather the data in an optimal way. I have seen two major SMT engineering initiatives up close, one where training data was scraped off the web by spider programs, and another where data was not allowed to go into training data if it had not passed several human linguistic quality assessment checks. The differing impact of these approaches is quite striking. The dirty data approach requires substantially larger amounts of new data to see any ongoing improvement, while the clean data approach can produce compelling improvement results with much less new data. 

This ability to respond to small amounts of corrective feedback is a critical condition for ongoing improvement, and for continued improvements in productivity e.g. raising PEMT throughput up to 15,000+ words/day in the shortest time possible. I have already stated that I was surprised how little attention is paid to data quality in instant Moses approaches presented at TAUS. And while data volume matters, for high quality domain-focused systems, the data you exclude may be more important than what you include. We are in a phase of the web's development where ‘ Big Data” is solving many semantic and linguistic problems, but we have also seen that data is not always the solution to better MT systems.

The upfront data analysis and data preparation, the development of “good” tuning and test sets are critical to the the short and long-term quality and evolution of an MT engine. This is something that takes experience and experimentation to understand and be skillful at. Experts can add huge value at this formative stage. Remember that this is a technology where “Garbage In Garbage Out” (GIGO) will be particularly true. Many who understand how bad TM can get don’t need any further elaboration on this, even though some people in the SMT community remain unconvinced that clean data does matter.

Many of the people who have jumped into instant Moses, do not realize that to get your initial MT engine to improve, will require very large amounts of new data with a standard Moses approach. The rule of thumb I have heard used frequently is that you need 20-25% of the initial training data volume to see meaningful improvements. Thus, if you used 10 million words to build your system, you will need 2-3 million new words to see the system noticeably improve. So most of these instant systems are as good as they are ever going to get when the first engine is produced. In contrast, Asia Online systems can improve dramatically with as little as a few thousand sentences (a single project) and are architected and designed from the outset to improve continuously over time with focused and targeted corrective feedback.

Given the difficulty of getting large amounts of new data, users need systems that can respond to small amounts of corrective feedback and yet show noticeable improvements. One of the major deficiencies of historical MT systems has been the lack of user control, the inability of users to make any meaningful impact on the quality of raw output produced on an ongoing basis.This ability to CONTINUALLY steer the MT engine with financially feasible amounts (i.e. relatively small) of corrective feedback is a key to getting the best long-term productivity results and ROI. I think as users get more informed on how to work with this technology, they will zero in on this ability of some expert MT systems. IMO, it is the single most important criterion when evaluating competitive MT systems:
  • What do I have to do to improve the raw system output quality once an initial engine is in place?
  • And, how much effort/data is required to get meaningful and measurable improvements?
  • Measurable = Rising average throughput of post-editors (By hundreds or thousands of more words a day, and often a multiple of what is possible with instant MT).
The issue of data cleaning is also not well understood. While it is helpful to remove tags and formatting information, it is also important to validate the linguistics and the quality of translations in addition to this to avoid GIGO results. Users should take care to keep data in the cleanest possible state (format wise and linguistically) as it can provide real long-term business production leverage on a scale greater than most TM data can. What most successful users will find is that 90%+ of the time spent in developing the highest quality engines is spent in corpus and data analysis, data preparation and organization, error detection and correction. The Moses step is a tiny component of the whole process of developing superior MT engines. image

Control & Data Security
One of the reasons why it may make sense to use Moses sometimes is to keep your data and training and translation activity REALLY REALLY private (e.g. translations of interrogation transcripts where persuasion involving water might be used). The need for security and privacy makes sense for national security applications, but I find it hard to understand the resistance some global companies have, to working in the cloud when a lot of this MT and PEMT content ends up on the web anyway. For most companies cloud computing simply makes sense and spares the user from the substantial IT burden of maintaining the hardware infrastructure needed to play at the highest professional level. (Asia Online actually makes it’s full training and translation environment available for on-premise installation for large enterprise customers like LexisNexis who process hundreds of millions of words a day and have suitable computing and human resource expertise to handle this).

I have heard of several LSPs who have spent $10K–$20K on servers that will probably only do Moses training once a year. If you do not have the data to drive an improvement in your Moses engine, what is the point of having these kinds of servers?  There is no point in trying to re-train an engine when you don’t have enough new data to make any noticeable impact. This is a technology that just makes much more sense in the cloud, for scalability, extensibility, security and effective control. Cloud solutions are often more secure than on-premise installations at LSPs because cloud service providers can afford the IT staff that has deep expertise on computer security, data protection and data availability management. (BTW I have also seen what happens when hacks try and manage 200 servers = not pretty). Like many other things in today’s world, IT (Information Technology) has become so specialized and complex that it makes more sense to outsource much of it, and work in the cloud rather than try and do it on your own with a meager and barely trained staff. Compare your IT staff capabilities to any cloud service provider. Even Microsoft Office is finally making the transition to the cloud. Some analysts are even saying that the shift to the cloud will challenge the dominance of older stalwarts like HP, Microsoft, Intel, SAP, RIM, Oracle, Cisco, Dell  and that a third of these companies may not be around in in 2020. Remember DEC and Wang? In a world where tablets, smartphones and mobile platforms will increasingly drive global commerce, the desktop/server perspective of traditional IT is already fading, and makes less sense with each passing day. It is ironic to see LSPs jumping on the “On-Premise Server” train just as it about to reach the end of the line.

Cloud based MT can also be setup to be always improving (assuming you have more than basic Moses MT) as new data is added regularly and feedback gathered from users as Google and Bing do. Setting up this kind of infrastructure is a significant undertaking and most Moses users will never get to that point, but this is how the best MT systems will continue to evolve. What some may find is, that their domain focused MT system may be better than the public engines in January, but by June this may no longer be true. You should realize that you are dealing with a moving target and most public engines will continue to improve. All the expert MT developers are constantly updating and enhancing their technology, most have already moved beyond the phrase-based SMT that Moses is today, and are incorporating linguistics in various forms. This can only be done because they understand what they are doing. Some of these enhancements may make it back to Moses years later but the productivity edge will remain with experts in the foreseeable future and I expect in 2012 we will see several case studies where expert MT systems outperform instant Moses systems by significant margins. So my advice; Be wary of any kind of instant MT solution that is not free.

I started the eMpTy Pages blog in early 2010, and one of my earliest posts was on the importance of clean data for SMT.  It was blasphemy at the time to question the value of sheer data volume for SMT, but in the period since then, many have validated that working with consolidated TM from multiple sources, trusted though they may be, is a tricky affair and data quality does matter. Pooling data can work sometimes but will also fail often without cleaning and standardization.

The origin of the phrase “Let a thousand flowers bloom” is attributed to a misquote of Mao Zedong. The results for Chinese intellectuals who took Mao seriously were quite unfortunate.  Fortunately we live in better times (I think?) and this phrase is not likely to have such dire consequences today. However, while a thousand MT systems may bloom (or at least be seeded), I predict that many will fade and die quickly. This is not necessarily bad, as hopefully institutional, community and industry learning will take place, and some practitioners may actually discover that they now have a much better appreciation for corpus linguistics and some of the skills that drive the creation of better MT systems. The experimental evidence from many failed experiments with Moses will also provide useful information for MT experts and further enhance the state of the art and science of MT. The learning curve for this technology is long and arduous and it may take a while for the dust to settle from the current hype, but I fully expect that by December 21st, 2012 it will be clear that expertise, experience and knowledge does matter with something as complex as Moses. Dead flowers are also used to fertilize gardens and help other plants thrive, and as long as we have the long view, we will continue to move onward and upward. I will restate my prediction, that the best MT systems will still come from close collaboration between MT experts with linguists, translators, LSPs and insight drawn from experience and failure. 

And you can send me dead flowers every morning
Send me dead flowers by the mail
Send me dead flowers to my wedding
And I won't forget to put roses on your grave

A celebration for dead flowers

Tuesday, November 29, 2011

Wanted: A Fair and Simple Compensation Scheme for MT Post-Editing

As the subject of fair and equitable compensation to post-editors of MT is important to the ongoing momentum of MT, I would like to introduce some people who have examined this issue, and have made an attempt (however imperfect) to developing a solution. The initial response to many such initiatives often seems to be criticism of how the approach fails. I am hoping that the dialogue on these ideas can rise above this, to more constructive and pragmatic advice or feedback to help the continuing evolution of this approach to reach more widely accepted levels of accuracy. The MemSource approach is something that measures the effort after the work is done. Used together with other initiatives that attempt to provide some measure of the post-editing task a priori, I think it could have great value in developing new compensation models that make sense to all the stakeholders in the professional translation world. It is important to develop new ways to measure MT quality and post-editing difficulty as this will become increasingly more common in the professional translation world.

This is  a guest post by David Canek, CEO of MemSource Technologies. I have not edited David’s article other than selecting some phrases that I felt were worth highlighting for a reader who skims the page.

Throughout 2011 MemSource, a provider of a cloud-based translation environment and CAT tool, has run a number of workshops, exploring the impact of machine translation on the traditional translation workflow. We had lots of debates with translation buyers, LSPs, as well as translators on machine translation post-editing and specifically on how it should be compensated. We have shared our findings at the 2011 Localization World in Barcelona and we thought it may be interesting to also share them here, on the eMpTy Pages blog.

Translation Buyers and MT

While the majority of translation buyers still need to discover machine translation, there are many organizations whose progress with MT goes beyond the pilot phase. The innovators, among them many software companies, have successfully used machine translation to make the traditional translation process more efficient. One headache still remains: A simple and fair compensation scheme for machine translation post-editing. Today, typically a flat reduction of the “normal” translation rate is negotiated with the vendor, disregarding the actual effort of the translator spent on post-editing a specific document, let alone a specific segment. This can be rather imprecise, even unfair as MT quality can vary significantly from document to document, and of course segment to segment.

Translators and MT

There is a myth that all translators dislike machine translation post-editing. In fact many translators have started MT post-editing as their standard translation workflow long before anyone requested them to do so. They themselves chose to use MT because it helped them increase their productivity. Then, some years later, they were approached by their LSP/client regarding MT. Perhaps it went like this?

Dear translator,
We have introduced this great new technology, it is called machine translation. It will help you speed up your translation and – by the way we will cut your rates by 30%.
All the best...

Of course, none of the translators could be happy at the face of this news. The innovative translators - already using MT to speed up their translations - would not be happy because nothing would change for them except that their rates would get cut. The less innovative also had no reason to be happy – they had to adapt to a new translation method and their rates got cut – without any guarantee that the new translation workflow would actually speed up their translation process.

LSPs and MT

Language service providers, generally speaking, are not too fast to adopt machine translation. This may come as a surprise, as LSPs should be most interested in slashing their costs with intelligent use of MT. However, LSPs, it seems, face specific obstacles, which make MT adoption not a simple task. In contrast to translation buyers, LSPs have to cope with limited resources, yet on the other hand have to tackle multiple language pairs and subject domains, spanning across all of their clients. Training a custom MT engine in this context is a bit challenging. The available online MT services, such as Google Translate or Microsoft Translator, are perceived by many LSPs as inadequate, mainly because of “confidentiality” concerns. The – growing – minority of LSPs that have started using custom MT engines report mixed results but are generally quite optimistic about the output.

Getting the right MT technology in place is important but not enough. LSPs need to make sure that there is ROI on the new technology. That means they need to modify their translation workflow to include machine translation and most of all have to make sure the new workflow makes translating faster, i.e. cheaper. This means that they will have to renegotiate rates with their translators. All of this is far from trivial and if not done carefully, it can cause more trouble than good.

Fair Compensation for MT Post-editing

MT is an innovative technology that will eventually (though not equally across all language pairs and domains) make human translation faster, i.e. cheaper. It is important that all stakeholders benefit from this increased efficiency: Translation buyers, LSPs and translators.

Above all, compensation for MT post-editing should be fair. There can be different ways. Some translation buyers run regular productivity tests and, based on the results, apply a flat discount on translations supported by MT (I believe Autodesk has a fairly sophisticated approach to this). At MemSource we have tried to come up with a different, perhaps complementary, approach, which is based on the editing distance between the MT output and the post-edited translation. Indeed, quite simple. We call this the Post-editing Analysis. In fact this approach is an extension of the traditional “TRADOS discount scheme”, which long ago became a standard method for analyzing translation memory matches and the related discounts in the translation industry.

Post-editing Analysis: How It Works

When a translation for a segment can be retrieved from translation memory (a 100% match), the translation rate for that segment is reduced – typically to just 10% of the normal rate. A similar approach can be applied to MT post-editing. If the MT output for a segment is approved by the post-editor as correct, then we can say we have a 100% match and the post-editing rate should be very moderate for that segment. If, on the other hand, the post-editing effort is heavy and the machine translated output needs to be completely rewritten for a segment, a full translation rate should be paid. In the post-editing analysis, there is, of course an entire scale ranging from 0% to 100% when calculating the similarity (editing distance) between the MT output and its post-edited version. The rates can be adjusted accordingly.


The advantages of the post-editing analysis:
· Simple
· Transparent
· Measurable at segment-level
· Extension of the established TM discount scheme

There are also some disadvantages. Namely, the analysis can be run only after the post-editing has been carried out, which means that any discounts can be determined only after the translation job is completed. Another objection could be that the editing distance is a simplification of the actual effort of the post-editor. Indeed, this could be valid and a more complex approach could be applied. However, our goal was to come up with a simple and efficient approach, which could be easily implemented into today’s CAT workbenches and translation environments.

Interested to Know More and Experiment?

More details on the MemSource Post-editing analysis, incl. a sample post-editing analysis can be found on our wiki. If you are interested to share your experiences with MT post-editing initiatives and/or find out more about our efforts in this space, sign up for a webinar, etc. write to

David Canek is the founder and CEO of MemSource Technologies, a software company providing cloud translation technology. David, a graduate from Translation and Comparative Studies, received his education at Charles University, Prague, Humboldt University in Berlin and the University of Vienna. His professional experience includes business development and product management roles in the software and translation industries. David is keen on pursuing innovative trends in the translation industry, such as machine translation post-editing or cloud-based translation technologies and has presented on these topics at leading industry conferences, such as Localization World, Tekom, ATA and others.

Friday, October 28, 2011

The Building Momentum for Post-Edited Machine Translation (PEMT)

This is an (opinionated) summary of interesting findings from a flurry of conferences that I attended earlier this month. The conferences were the TAUS User Conference, Localization World and tekom. Even though it is tiring to have so many so close together, it is interesting to see what sticks out a few weeks later. For me TAUS and tekom were clearly worthwhile, and Localization World was not, and I believe that #LWSV is an event that is losing it’s mojo in spite of big attendance numbers.

Some of the big themes that stand out (mostly from TAUS) were:
  • Detailed case studies that provide clear and specific evidence that customized MT enhances and improves the productivity of traditional (TEP) translation processes
  • The Instant on-demand Moses MT engine parade
  • Initial attempts at defining post-editing effort and difficulty from MemoQ and Memosource
  • A future session on the multilingual web from speakers who actually are involved with big perspective, global web-wide changes and requirements
  • More MT hyperbole
  • The bigger context and content production chain for translation that is visible at tekom
  • Post-editor feedback at tekom
  • The lack of innovation in most of the content presented at Localization World
 The archived twitter stream from TAUS (#tausuc11) is available here, the tekom tag is #tcworld11 and Localization World is #lwsv. Many of the TAUS presentations will be available as web video shortly and I recommend that you check some of them out.

PEMT Case Studies
In the last month I have seen several case studies that document the time and cost savings and overall consistency benefits of good customized MT systems. At TAUS, Caterpillar indicated that their demand for translation was rising rapidly and thus they instituted their famed controlled language (Caterpillar English) based translation production process using MT. The MT process was initially more expensive since 100% of the segments needed to be reviewed but they are now seeing better results on their quality measurements from MT than from human translators on Brazilian Portuguese and Russian according to Don Johnson, Caterpillar. They expect to expand to new kinds of content as these engines mature.

Catherine Dove of PayPal described how the human translation process got bogged down on review and rework cycles (to ensure PayPal brand’s tone and style was intact) and was unable to meet production requirements of 15K words per week with a 3 day turnaround in 25 languages. They found that “machine-aided human translation” delivers better, more consistent terminology in the first pass and thus they were able to focus more on style and fluency. Deadlines are easier to meet and she also commented that MT can handle tags better than humans. They also focus on source cleanup and improvement to leverage the MT efforts and interestingly the MT is also useful in catching errors in the authoring phase. PayPal uses an “edit distance” measurement to determine the amount of rework and have found that the MT process reduces this effort by 20% on 8 of 10 languages they are using MT on. An additional benefit is that there is a new quality improvement process in place that should continue to yield increasing benefits.

A PEMT user case study was also presented by Asia Online and Sajan at the Localization Research Conference in September 2011. The global enterprise customer is a major information technology software developer, hardware/IT OEM manufacturer, and comprehensive IT services provider for mission critical enterprise systems in 100+ countries. This company had a legacy MT system developed internally that had been used in the past by the key customer stakeholders. Sajan and Asia Online customized English to Chinese and English to Spanish engines for this customer. These MT systems have been delivering translated output that even beats the first pass output from their human translators due to the highly technical terminology, especially in Chinese.  A summary of the use case is provided below:
  • 27 million words have been processed by this client using MT
  • Large amounts of quality TM (many millions of words) and glossaries were provided and these engines are expected to continue to improve with additional feedback.
  • The customized engine was focused on the broad IT domain and was intended to translate new documentation and support content from English into Chinese and Spanish.
  • A key objective of the project was to eliminate the need for full translation and limit it to MT + Post-editing as a new modified production process.
  • The custom engine output delivered higher quality than their first pass human translators especially in Chinese
  • All output was proof read to deliver publication quality.
  • Using Asia Online Language Studio the customer saved 60% in costs and 77% in time over previous production processes based on their own structured time and cost measurements.
  • The client also produces an MT product, but the business units prefer to use Asia Online because of considerable quality and cost differences.
  • Client extremely impressed with result especially when compared to the output of their own engine.
  • The new pricing model enabled by MT creates a situation where the higher the volume the more beneficial the outcome.
The video presentation below by Sajan begins at 27 minutes (in case you want to skip over the Asia Online part) and even if you only watch the Sajan presentation for 5 minutes you will get a clear sense for the benefit delivered by the PEMT process.

A session on the multilingual web at TAUS by the trio Bruno Fernandez Ruiz, Yahoo! Fellow and Vice President, Bill Dolan, Head of NLP Research, Microsoft, Addison Phillips, Chair, W3C Internationalization Group / Amazon also produced many interesting observations such as:
  • The impact of “Big Data” and the cloud will affect language perspectives of the future and the tools and processes of the future need to change to handle the new floating content.
  • Future applications will be built once and go to multiple platforms (PC, Web, Mobile, Tablets)
  • The number of small nuggets of information that need to be translated instantly will increase dramatically
  • HTML5 will enable publishers to be much freer in information creation and transformation processes and together with CSS3 and Javascript can handle translation of flowing data across multiple platforms
  • Semantics have not proven to be necessary to solve a lot of MT problems contrary to what many believed even 5 years ago. Big Data will help us to solve many linguistic problems that involve semantics
  • Linking text to location and topic to find cultural meaning will become more important to developing a larger translation perspective
  • Engagement around content happens in communities where there is a definable culture, language and values dimension
  • While data availability continues to explode for the major languages we are seeing a digital divide for the smaller languages and users will need to engage in translation to make more content in these languages happen
  • Even small GUI projects of 2,000 words are found to have better results with MT + crowdsourcing than with professional translation
  • More translation will be of words and small phrases where MT + crowdsourcing can outperform HT
  • User s need to be involved in improving MT and several choices can be presented to users to determine the “best” ones
  • The community that cares about solving language translation problems will grow beyond the professional translation industry.

At TAUS, there were several presentations on Moses tools and instant Moses MT engines via a one or two step push button approach. While these tools facilitate the creation of “quick and dirty data” MT engines, I am skeptical of the value of this approach for real production quality engines where the objective is provide long-term translation production productivity. As Austin Powers once said, “This is emPHASIS on the wrong syllABLE" My professional experience is that the key to long-term success (i.e. really good MT systems) is to really clean the data and this means more than removing formatting tags and removing the most obvious crap. This is harder than most think. Real cleaning also involves linguistic and bilingual human supervised alignment analysis. Also, I have seen that it takes perhaps thousands of attempts across many different language pairs to understand what is happening when you throw data into the hopper, and that this learning is critical to fundamental success with MT and developing continuous improvement architectures. I expect that some Moses initiatives will produce decent gist engines, but are unlikely to do much better than Google/Bing for the most part. I disagree with Jaap’s call to the community to produce thousands of MT systems, what we really need to see are a few hundred really good, kick-ass systems, rather than thousands that do not even measure up to the free online engines. And so far, getting a really good MT engine is not possible without real engagement from linguists and translators and more effort than pushing a button. We all need to be wary of instant solutions, with thousands of MT engines produced rapidly but all lacking in quality and "new" super semantic approaches that promise to solve the automated translation problem without human assistance. I predict that the best systems will still come from close collaboration with linguists and translators and insight borne from experience.

I was also excited to see the initiative from MemoQ to establish a measure of translator productivity or post-editing effort expended, by creating an open source measurement of post-edited output, where the assumption is that an untouched segment is a good one. MemoQ will use an open and published edit distance algorithm that could be helpful in establishing better pricing for MT post-editing and they also stressed the high value of terminology in building productivity. While there is already much criticism of the approach, I think this is a great first step to formulating a useful measurement. At tekom I also got a chance to see the scheme that MemSource has developed where post-edited output is mapped back to a fuzzy matching scheme to establish a more equitable post-editing pricing scheme than advocated by some LSPs. I look forward to seeing this idea spread and hope to cover it in more detail in the coming months.

Localization World was a disappointing affair and I was struck by how mundane, unimaginative and irrelevant much of the content of the conference was. While the focus of the keynotes was apparently innovation, I found the @sarahcuda presentation interesting, but not very compelling or convincing at all in terms of insight into innovation. The second day keynote was just plain bad, filled with clichés and obvious truisms e.g. “You have to have a localization plan” or “I like to sort ideas in a funnel”. (Somebody needs to tell Tapling that he is not the CEO anymore even though it might say so on his card). I heard several others complain about the quality of many sessions, and apparently in some sessions audience members were openly upset. The MT sessions were really weak in comparison to TAUS and rather than broadening the discussion they succeeded in mostly making them vague and insubstantial. The most interesting (and innovative) sessions that I was witness to were the Smartling use case studies and a pre-conference session on Social Translation. Both of these sessions focused on how the production model is changing and both were not particularly well attended. I am sure that there were others that were worthwhile (or maybe not), but it appears that this conference will matter less and less in terms of producing compelling and relevant content that provides value in the Web 2.0 world. This event is useful to meet with people but I truly wonder how many will attend for the quality of the content.

The tekom event is a good event to get a sense for how technical business translation fits into the overall content creation chain and also see how synergies could be created within this chain. There were many excellent sessions and it is the kind of event that helps you to broaden your perspective and understand how you fit into a bigger picture and ecosystem. The event has 3300 visitors so it is also a much larger perspective in terms of many different views points. I had a detailed conversation with some translators about post-editing. They were most concerned about the compensation structure and post-editor recruitment practices. They specifically pointed out how unfair the SDL practice of paying post-editors 60% of standard rates was, and asked that more equitable and fair systems be put into place. LSPs and buyers would be wise to heed this feedback if they want to be able to recruit quality people in future. I got a close look at the MemSource approach to making this more fair, and I think that this approach which measures the actual work done at a segment level should be acceptable to many. This approach measures the effort after the fact. However, we still need to do more on making the difficulty of the task before the translators begin more transparent. This begins with an understanding of how good the individual MT system is and how much effort is needed to get to production quality levels. This is an area that I hope to explore further in the coming weeks.

I continue to see more progress on the PEMT front and I now have good data of measurable productivity even on a language pair as tough as English to Hungarian. I expect that a partnership of language and MT experts will be more likely to produce compelling results than many DIY initiatives, but hopefully we learn from all the efforts being made.

Tuesday, September 27, 2011

The Growing Interest & Concern About the Future of Professional Translation

I have noticed of late that every conference has a session or two that focuses on the future. Probably because many sense that change is in the air. Some of you may also have noticed that the protest from some quarters has grown more strident or even outright rude, to some of the ideas presented at these future outlook sessions. The most vocal protests seem to be directed at predictions about the increasing use of machine translation, anything about “good enough” quality and the process and production process changes necessary to deal with the increasing translation volume. (There are still some who think that the data deluge is a myth). 

Some feel personally threatened by those who speak on these subjects and rush to kill or at least stab the messenger. I think they miss the point that what is happening in translation, is just part of a larger upheaval in the way global enterprises are interacting with customers. The forces causing change in translation are also creating upheaval in marketing, operations and product development departments as many analysts have remarked for some time now. The discussion in the professional translation blogosphere is polarized enough (translators vs. technology advocates) that dialogue is difficult, but hopefully we all continue to speak with increasing clarity, so that the polemic subsides. The truth is that none of us really knows the definite future, but that should not stop us from making educated (or even wild) guesses at where current trends may lead. (I highly recommend you skim The Cluetrain Manifesto to get a sense for the broader forces at play.)
Brian Solis has a new book coming out that describes the overall change quite succinctly. The End of Business As Usual (his new book) explores each layer of the complex consumer revolution that is changing the future of business, media, and culture. As consumers further connect with one another, a vast and efficient information network takes shape and begins to steer experiences, decisions, and markets. It is nothing short of disruptive.
I was watching the Twitter feed from two conferences last week (LRC XVI in Ireland and Translation Forum in Russia)  and I thought it would be interesting to summarize and highlight some of the tweets as they pertain to this changing world and perhaps provide more clarity about the trends from a variety of voices and perspectives. The LRC conference had several speakers from large IT companies who talked about their specific experience, as well as technology vendor and LSP presentations. For those who are not aware, CSA research identifies IT as one of the single largest sectors buying professional translation services. The chart below shows the sectors with the largest share of global business. This chart is also probably a good way to understand where these changes are being felt most strongly.

Here are some Twitter highlights from LRC on the increasing volume of translation, changing content, improving MT and changing translation production needs. I would recommend that you check out @therosettafound for the most complete Twitter trail. I have made minor edits to provide context and clarify abbreviations and have attempted to provide some basic organization to the tweet trail to make it somewhat readable.
@CNGL Changing content consumption and creation models require new translation and localisation models – (according to) @RobVandenberg
@TheRosettaFound We are all authors, the enterprise is going social - implications for localisation?
@ArleLommel Quality even worse than Rob Vandenberg says: we have no real idea what it is/how to measure, especially in terms of customer impact
Issue is NOT MT vs. human translation (HT). It's becoming MT AND HT. Creates new opportunities for domain experts.
Dion Wiggins. LSPs not using MT will put themselves out of business? Prediction: yes in four/five years
CNGL says 25% of translators/linguists use MT. I wonder how many use it but say they don't use it due to (negative) perception (with peers)?
Waiting for translation technology equivalent of iPhone: something that transforms what we do in ways we can't yet imagine.

Tweets from Jason Rickhard’s presentation on Collaborative Translation (i.e. Crowdsourcing) and IT go Social.
@TheRosettaFound Jason of Symantec giving the enterprise perspective, added 15-20 languages to small but popular product, built tech to support this. Not just linguistic but also legal, organizational issues to be resolved in collaborative, paid-for product.
Is collaborative translation bad & not-timely? #lrcconf Not so, a lot of translators = involved users of the content/product they translate.
Review process is different in collaborative translation. Done by voting, not by editors
The smaller the language gets, the more motivated volunteer translators are and the better collaborative translation works.
Is volunteering something for people who don't have to worry that their day-to-day basics are covered?
Does collaborative translation and collaboration mean that content owners "give up the illusion of control" over their content?
Enterprises do collaborative translation for languages they would/could not cover otherwise - true, for both profit and non-profits
Collaborative/Community will not replace existing service providers but open up more content for more languages
Language Service Providers could play an important role in community translation by building, supporting, moderating communities
It's not correct to say Community Translation = bad; Professional Translation = good
Microsoft appoints moderators with a passion for the project/language for community localization
>1,200 users translated Symantec products into 35 languages
If >1,200 were needed to translate 2 small-ish products, how can millions of translators translating 1 ZB be 'managed'?
@ArleLommel Symantec research: Community involvement in support often leads to ~25% reduction in support spend
“Super users” are what make communities scalable. Key is to identify/cultivate them early in the process
Jason Rickard: Dell is a good example of using Facebook for support. One of few companies with real metrics and insight in this area.
Jason Rickard: Symantec has really cool/systemic/well-thought ways to support community

@TheRosettaFound 21st generation localisation is about the user, about user-generated content - Ellen Langer: Give up the Illusion of Control
@ArleLommel Illusion of control? You mean we can have even less control that we have now? That's a scary thought!
@TheRosettaFound The most dramatic shifts driven by the web happened because communities took over - Imagine: 100000s of user translators translating billions of words into 100s of languages - control that!
Seems the deep and complex problems of localisation are a minute drop in the ocean of digital content management
@CNGL Discovery, analysis, transformation - Alex O'Connor tells how CNGL is addressing the grand challenges of digital content management
@TheRosettaFound Is the L10N industry due for a wave of destruction or positive transformation?
@ArleLommel Yes, Most of the mainstream technologies for translators are non-ergonomic and still in 20-year-old paradigms

Tweets from Tony Allen, Product Manager Intel Localisation Solutions presentation
@TheRosettaFound 30+ langs >200k pages >40% localised @ Intel's web presence. Intel: important to have user-driven content, interaction with the customer. Integration important, e.g. multilingual support chat. Integration, Interoperability key issues for Intel L10N. To figure out how content flows, without loss of metadata, interoperates with internal/external range of systems, is crucial.
2.5b netizens, >15b connected devices >1 zetabyte of traffic by 2015 and companies will interact with their customers using social media - type setups; new challenges for localization.
#intel What does it mean for localization infrastructures if we have >1 zetabyte of content in 2015? Current methods won't keep up
@ArleLommel #intel says that interoperability standards are required for cloud to meet future demands. L10n must evolve to meet this need too.

@ArleLommel Alison Toon (#hp) puts it this way: “localization (people) are the garbage collectors of the documentation world”
@TheRosettaFound 600GB of data in Oracle's Translation Platform - We need concise well-structured content - then we're going to be able to deliver efficient translation services - How to get it right: analyze content, identify problems and throw it back into the face of writers and developers. I18N and l10n have to get into the core curriculum at Universities says Paul Leahy (of Oracle), since we spend too much time teaching it.

Tweets from Sajan / Asia Online MT presentation
@TheRosettaFound MT cannot perform magic on bad source text - user-generated non-native-speaker content is often 'bad'
MT errors make me laugh... but human errors make me cry - an old quote from previously recycled presentations... Asia Online
Dirty Data SMT - what kind of translations would you expect? If there are no humans involved you are training on dirty data, says Asia Online. Sajan achieved 60% reduction in costs and 77% time savings for specific project - a miracle? Probably no, let’s see.
Millions of words faster, cheaper, better translated by Sajan using Asia Online - is this phenomenal success transferable? How?
XLIFF contributed to the success of Sajan/Asia Online's MT project. Asia Online's process rejected 26% of TM training data.

Tweets from Martin Orsted, Microsoft presentation
@TheRosettaFound Cloud will lead to improved cycle times and scalability: 100+ languages, millions of words
Extraordinary scale: 106 languages for the next version of Office. Need a process that scales up & down in proportion.
Microsoft: We have fewer people than ever and we are doing more and more languages than ever
Martin: "The Language Game - Make it fun to review Office"... here is a challenge :) Great idea to involve community via game
How can a "Game" approach be used for translation? Levels of experience, quality, domains, complexity; rewards?
No more 'stop & go', just let it flow @robvandenburg >>Continuous publishing requires continuous translation. New workflows

Tweets from Derek Coffey, Welocalize presentation Are we the FedEx or the WallMart of words?
@TheRosettaFound TMS of SDL = burning stacks of cash - Reality: we support your XLIFF, but not your implementation
Lack of collaboration, workflow integration, content integration = most important bottle necks. Welocalize, MemoQ, Kilgray and Ontram working on reference implementation - Derek: Make it compelling for translators to work for us
It's all about the translators and they will seek to maximise their earning potential according to Derek.

Tweets from Future Panel
@TheRosettaFound Many translators don't know what XML looks like
Rob: more collaborative, community translation - Rob: Users who consume content will have a large input into the translation BINGO
Tony: users will drive localisation decision, translation live
Derek: future is in cooking xxx? Open up a whole new market - user generated, currently untranslatable content. HUGE market
Derek: need to re-invent our industry, with focus on supply chain
The end of the big projects - how are we going to make money (question from audience)
From service/product to community - the radical change for enterprises, according to Fred
No spark, big bang, revolution - but continuous change, Derek
Big Spark (Dion): English will no longer remain the almost exclusive source language

The Translation Forum Russia twitter trail has a much more translator oriented focus and is also bilingual. Here are some highlights below, again with minor edits to improve readability.

@antonkunin Listened to an information-packed keynote by @Doug_Lawrence at #tfru this morning. As rates keep falling, translators' income keeps rising.
@ilbarbaro Talking about "the art of interpreting and translation" in the last quarter of 2011 is definitely outdated
Language and "quality" are important for translators, speed and competence for (final) clients. Really?
Translators are the weakest link in the translation process
Bert: here and now translation more important than perfect translation
Bert on fan subbing as an unstoppable new trend in translation
Is Bert anticipating my conclusions? Noah's ark was made and run by amateurs, RMS Titanic by professionals
Carlos Incharraulde: terminology is pivotal in translators training < Primarily as a knowledge transfer tool
To renatobeninatto at who said: Translation companies can do without process standards < I don't agree
@renatobeninatto: Start looking at income rather than price/rates
Evaluating translation is like evaluating haircuts - it's better to be good on time than perfect too late
Few translation companies do like airlines: 1st class/ Economy/ Discount rates – Esselink
Traditional translation models only deal w/ tip of iceberg. New models required for other 90%. Esselink
Good enough revolution. Good enough translation for Wikileaks, for example. Bert Esselink
In 2007 English Russian was $0.22 per word, in 2010 it dropped to $0.16 @Doug_Lawrence
There's much talk on innovation but not much action - don't expect SDL and Lionbridge to be innovative
@Doug_Lawrence all languages except German and French decreased in pricing from 2007 to 2010
@AndreyLeites @ilbarbaro problem-solving is the most important feature translator should acquire - Don't teach translators technology, teach them to solve problems - language is a technology, we need to learn how to use it like technology - 85% of translators are still women
@ilbarbaro 3 points on quality: 1. Quality is never absolute, 2. Quality is defined by the customer, 3. Quality can be measured - it is necessary to learn to define quality requirements (precisely)
@Kilgraymemoq announces that they will open Kilgray Russia before the end of the year

This is of course my biased extraction from the stream, but the original Twitter trail will be there for a few more weeks and you can check it out yourself. It is clear to me from seeing the comments above, that at the enterprise level, MT and Community initiatives will continue to gather momentum.  Translation volumes will continue to rise and production processes will have to change to adapt to this. Also, I believe, there are translators who are seeking ways to add value in this changing world and I hope that they will provide the example that leads the way in this changing flux.

And for a completely different view of "the future" check this out.