This is an (opinionated) summary of interesting findings from a flurry of conferences that I attended earlier this month. The conferences were the TAUS User Conference, Localization World and tekom. Even though it is tiring to have so many so close together, it is interesting to see what sticks out a few weeks later. For me TAUS and tekom were clearly worthwhile, and Localization World was not, and I believe that #LWSV is an event that is losing it’s mojo in spite of big attendance numbers.
Some of the big themes that stand out (mostly from TAUS) were:
- Detailed case studies that provide clear and specific evidence that customized MT enhances and improves the productivity of traditional (TEP) translation processes
- The Instant on-demand Moses MT engine parade
- Initial attempts at defining post-editing effort and difficulty from MemoQ and Memosource
- A future session on the multilingual web from speakers who actually are involved with big perspective, global web-wide changes and requirements
- More MT hyperbole
- The bigger context and content production chain for translation that is visible at tekom
- Post-editor feedback at tekom
- The lack of innovation in most of the content presented at Localization World
The archived twitter stream from TAUS (#tausuc11) is available here, the tekom tag is #tcworld11 and Localization World is #lwsv. Many of the TAUS presentations will be available as web video shortly and I recommend that you check some of them out.
PEMT Case Studies
In the last month I have seen several case studies that document the time and cost savings and overall consistency benefits of good customized MT systems. At TAUS, Caterpillar indicated that their demand for translation was rising rapidly and thus they instituted their famed controlled language (Caterpillar English) based translation production process using MT. The MT process was initially more expensive since 100% of the segments needed to be reviewed but they are now seeing better results on their quality measurements from MT than from human translators on Brazilian Portuguese and Russian according to Don Johnson, Caterpillar. They expect to expand to new kinds of content as these engines mature.
Catherine Dove of PayPal described how the human translation process got bogged down on review and rework cycles (to ensure PayPal brand’s tone and style was intact) and was unable to meet production requirements of 15K words per week with a 3 day turnaround in 25 languages. They found that “machine-aided human translation” delivers better, more consistent terminology in the first pass and thus they were able to focus more on style and fluency. Deadlines are easier to meet and she also commented that MT can handle tags better than humans. They also focus on source cleanup and improvement to leverage the MT efforts and interestingly the MT is also useful in catching errors in the authoring phase. PayPal uses an “edit distance” measurement to determine the amount of rework and have found that the MT process reduces this effort by 20% on 8 of 10 languages they are using MT on. An additional benefit is that there is a new quality improvement process in place that should continue to yield increasing benefits.
A PEMT user case study was also presented by Asia Online and Sajan at the Localization Research Conference in September 2011. The global enterprise customer is a major information technology software developer, hardware/IT OEM manufacturer, and comprehensive IT services provider for mission critical enterprise systems in 100+ countries. This company had a legacy MT system developed internally that had been used in the past by the key customer stakeholders. Sajan and Asia Online customized English to Chinese and English to Spanish engines for this customer. These MT systems have been delivering translated output that even beats the first pass output from their human translators due to the highly technical terminology, especially in Chinese. A summary of the use case is provided below:
- 27 million words have been processed by this client using MT
- Large amounts of quality TM (many millions of words) and glossaries were provided and these engines are expected to continue to improve with additional feedback.
- The customized engine was focused on the broad IT domain and was intended to translate new documentation and support content from English into Chinese and Spanish.
- A key objective of the project was to eliminate the need for full translation and limit it to MT + Post-editing as a new modified production process.
- The custom engine output delivered higher quality than their first pass human translators especially in Chinese
- All output was proof read to deliver publication quality.
- Using Asia Online Language Studio the customer saved 60% in costs and 77% in time over previous production processes based on their own structured time and cost measurements.
- The client also produces an MT product, but the business units prefer to use Asia Online because of considerable quality and cost differences.
- Client extremely impressed with result especially when compared to the output of their own engine.
- The new pricing model enabled by MT creates a situation where the higher the volume the more beneficial the outcome.
The video presentation below by Sajan begins at 27 minutes (in case you want to skip over the Asia Online part) and even if you only watch the Sajan presentation for 5 minutes you will get a clear sense for the benefit delivered by the PEMT process.
A session on the multilingual web at TAUS by the trio Bruno Fernandez Ruiz, Yahoo! Fellow and Vice President, Bill Dolan, Head of NLP Research, Microsoft, Addison Phillips, Chair, W3C Internationalization Group / Amazon also produced many interesting observations such as:
- The impact of “Big Data” and the cloud will affect language perspectives of the future and the tools and processes of the future need to change to handle the new floating content.
- Future applications will be built once and go to multiple platforms (PC, Web, Mobile, Tablets)
- The number of small nuggets of information that need to be translated instantly will increase dramatically
- HTML5 will enable publishers to be much freer in information creation and transformation processes and together with CSS3 and Javascript can handle translation of flowing data across multiple platforms
- Semantics have not proven to be necessary to solve a lot of MT problems contrary to what many believed even 5 years ago. Big Data will help us to solve many linguistic problems that involve semantics
- Linking text to location and topic to find cultural meaning will become more important to developing a larger translation perspective
- Engagement around content happens in communities where there is a definable culture, language and values dimension
- While data availability continues to explode for the major languages we are seeing a digital divide for the smaller languages and users will need to engage in translation to make more content in these languages happen
- Even small GUI projects of 2,000 words are found to have better results with MT + crowdsourcing than with professional translation
- More translation will be of words and small phrases where MT + crowdsourcing can outperform HT
- User s need to be involved in improving MT and several choices can be presented to users to determine the “best” ones
- The community that cares about solving language translation problems will grow beyond the professional translation industry.
At TAUS, there were several presentations on Moses tools and instant Moses MT engines via a one or two step push button approach. While these tools facilitate the creation of “quick and dirty data” MT engines, I am skeptical of the value of this approach for real production quality engines where the objective is provide long-term translation production productivity. As Austin Powers once said, “This is emPHASIS on the wrong syllABLE" My professional experience is that the key to long-term success (i.e. really good MT systems) is to really clean the data and this means more than removing formatting tags and removing the most obvious crap. This is harder than most think. Real cleaning also involves linguistic and bilingual human supervised alignment analysis. Also, I have seen that it takes perhaps thousands of attempts across many different language pairs to understand what is happening when you throw data into the hopper, and that this learning is critical to fundamental success with MT and developing continuous improvement architectures. I expect that some Moses initiatives will produce decent gist engines, but are unlikely to do much better than Google/Bing for the most part. I disagree with Jaap’s call to the community to produce thousands of MT systems, what we really need to see are a few hundred really good, kick-ass systems, rather than thousands that do not even measure up to the free online engines. And so far, getting a really good MT engine is not possible without real engagement from linguists and translators and more effort than pushing a button. We all need to be wary of instant solutions, with thousands of MT engines produced rapidly but all lacking in quality and "new" super semantic approaches that promise to solve the automated translation problem without human assistance. I predict that the best systems will still come from close collaboration with linguists and translators and insight borne from experience.
Localization World was a disappointing affair and I was struck by how mundane, unimaginative and irrelevant much of the content of the conference was. While the focus of the keynotes was apparently innovation, I found the @sarahcuda presentation interesting, but not very compelling or convincing at all in terms of insight into innovation. The second day keynote was just plain bad, filled with clichés and obvious truisms e.g. “You have to have a localization plan” or “I like to sort ideas in a funnel”. (Somebody needs to tell Tapling that he is not the CEO anymore even though it might say so on his card). I heard several others complain about the quality of many sessions, and apparently in some sessions audience members were openly upset. The MT sessions were really weak in comparison to TAUS and rather than broadening the discussion they succeeded in mostly making them vague and insubstantial. The most interesting (and innovative) sessions that I was witness to were the Smartling use case studies and a pre-conference session on Social Translation. Both of these sessions focused on how the production model is changing and both were not particularly well attended. I am sure that there were others that were worthwhile (or maybe not), but it appears that this conference will matter less and less in terms of producing compelling and relevant content that provides value in the Web 2.0 world. This event is useful to meet with people but I truly wonder how many will attend for the quality of the content.
The tekom event is a good event to get a sense for how technical business translation fits into the overall content creation chain and also see how synergies could be created within this chain. There were many excellent sessions and it is the kind of event that helps you to broaden your perspective and understand how you fit into a bigger picture and ecosystem. The event has 3300 visitors so it is also a much larger perspective in terms of many different views points. I had a detailed conversation with some translators about post-editing. They were most concerned about the compensation structure and post-editor recruitment practices. They specifically pointed out how unfair the SDL practice of paying post-editors 60% of standard rates was, and asked that more equitable and fair systems be put into place. LSPs and buyers would be wise to heed this feedback if they want to be able to recruit quality people in future. I got a close look at the MemSource approach to making this more fair, and I think that this approach which measures the actual work done at a segment level should be acceptable to many. This approach measures the effort after the fact. However, we still need to do more on making the difficulty of the task before the translators begin more transparent. This begins with an understanding of how good the individual MT system is and how much effort is needed to get to production quality levels. This is an area that I hope to explore further in the coming weeks.
I continue to see more progress on the PEMT front and I now have good data of measurable productivity even on a language pair as tough as English to Hungarian. I expect that a partnership of language and MT experts will be more likely to produce compelling results than many DIY initiatives, but hopefully we learn from all the efforts being made.
Hello Kirti
ReplyDeleteI think to remember that only few weeks ago you said that MT is not about to kill human translators ..
now I read this couple of ominous claims:
# Even small GUI projects of 2,000 words are found to have better results with MT + crowdsourcing than with professional translation
# More translation will be of words and small phrases where MT + crowdsourcing can outperform HT
unless it is another MT bombastic claim, it is clear to me that MT (companies that sell it and ones that use it) is about kill our profession, so I think that it's high time that you experts find/show a *real* way to make us work (live)
I explain it better: personally I have nothing against this progress but I want to work and live as my peers
Unlukily, the one and only job in this field that I was offered so far is as a Language Weaver tester, that was quite interesting and funny, but nothing else appearead to my eyes
I admit that my sight is possibly poor, but can you (or other experts) show me (us) *real* jobs that I can do it in this field and companies that *really* and currently pay for that?
Note that I highlighted *real* as I heard so far a lot of speeches about supposed new and exciting possibilities for us translators, but in practice the only possibility that I see is indeed crowdsourcing, i.e. working for free or quasi-free, that is useless if you have to pay homes, food, clothes, gasoline and bills
Otherwise, it will be war to the death, I think
@Claudio
ReplyDeletePlease note that I am merely reporting what some of the panelists on The Multilingual Web session were saying. If you go to the link of the tweet archive you will see who actually says it. (Just because I report on it does not mean I agree with all of it). The 2000 word GUI project was related to the Kindle and honestly I was surprised as you are by the claim that it outperformed HT. Perhaps there are some particular linguistic characteristics that make this actually possible. When the videos come out it may be worth a clsoer examination.
MT enthusiasts seem to have a proclivity for hyperbole and making sweeping statements of coming revolution. Your experience with Language Weaver is telling and reveals much of the state of MT today where the focus is to produces lots of dirty data systems - LW apparently produced 10,000 of these systems this year. BFD!
I believe that it is better to create fewer high quality systems that translators actually look forward to working with rather than dreading. This will require close collaboration with linguists and can only work for certain kinds of content.
My opinion on the matter is better summarized by my clearly stated opinion in the quote:
"I think that what we really need to see are a few hundred really good, kick-ass systems, rather than thousands that do not even measure up to the free online engines. And so far, getting a really good MT engine is not possible without real engagement from linguists and translators. I think we need to be wary of instant solutions, thousands of MT engines produced rapidly but all lacking in quality and "new" super semantic approaches that promise to solve the automated translation problem without human assistance. I predict that the best systems will still come from close collaboration with linguists and translators."
@Luigi While I share highlights from the conference I do not agree with everything said there.
My vision is that good MT systems will produce more high fuzzy matches
Linguists will help steer these systems to produce better quality than what most of us are seeing today and that linguists can influence the number of high fuzzy matches produced by linguistic feedback to MT engines.
And much more content will get translated because of good MT systems which should in turn result in more work for editors and translators.
But change comes at it's own pace and is often uneven, thus I emphasize the actual case studies where MT does seem to be working.
I suspect that the reason we do not see more *real* work is that corporate localization professionals and LSPs are generally not associated with translating much of the new content. There is very little innovation in this community. There is probably much to be gained by trying to make contact with the real drivers of these new translation opportunities: The managers who run and manage international business initiatives not the people who do the SDL (software and documentation localization) work.
We all need to make contact with "the real buyers" as I stated in earlier posts to uncover these new opportunities.
"Just because I report on, it does not mean I agree with all of it"
ReplyDeleteOK, I see!
"I predict that the best systems will still come from close collaboration with linguists and translators."
then I hope that someone linguist will inform me, otherwise I can misfire ...
;-)))
"There is probably much to be gained by trying to make contact with the real drivers of these new translation opportunities: The managers who run and manage international business initiatives"
bother, it cuts me off as I don't have the needed skills!
thanks anyway for informations, Kirti
Kirti,
ReplyDeleteI think you would find Indra Samite's presentation at TM Europe in
Warsaw quite interesting. I haven't got around to writing it up yet, but she
discussed some rather interesting research on PE in the context of Latvian
translation, and the published analysis of the error comparison between MT
and HT was fascinating. The data clearly showed that MT enabled experienced
translators to work faster, however the documented error rates were very
much higher than straight HT. Apparently (as revealed in "exit surveys")
there was too high a level of trust for the MT output even among a selected
group of highly skilled translators. This is a similar phenomenon to studies
I have read of where editors who are aware that a text is the result of MT
were less inclined to make changes and more prone to overlook errors. In any
case, the study to which Indra referred in her talk is worth a look. I do
not share her optimism that the quality issues can be overcome in this case
by training. And my deep concern over the influence of excessive exposure to
defective text (from *either* HT or MT) remains unabated. Thus I cannot
support suggestions that good translators should poison their minds by
wallowing in linguistic sludge from any source. Clear expression is too rare
and valuable to undermine deliberately in this way.
Perhaps there are some who are immune to such effects. Or perhaps the effect
is hardly noticeable among those who can't write well in the first place.
And certainly, there are many who don't care. "Good enough" for many is a
bottle of Ripple; no need for a high quality wine with a snooty label if the
objective is just to get blasted, right?
Thank you for your comments on edit distance. I'm glad you agree that this
appears to be an honest attempt at developing a fairer model for
compensation, which is worth a close look. When I mentioned it recently, there were some expected
objections, but I see no more problems with this approach than many others
and indeed fewer than with most. As long as people are going to engage in PE
in some way, we do have to try to find the most reasonable way for people to
be compensated fairly for their work. I do not share the fears of some that
The Machine will make real translators redundant, simply because I really do
believe that this search for the linguistic Philosopher's Stone, while it
may lead to a few interesting and possibly useful discoveries, will
ultimately prove incapable of transmuting lead to gold. So let the IT
salesmen and consultants have their fun and earn what commissions they can.
Apply MT when and as it can be of real use, but ultimately this is a journey
none of us are truly obliged to undertake despite whatever attempts at
intimidation TAUS and others may feel inspired to make.
Excellent post, Kirti. I followed the tweet stream from #LWSV and was (apparently) mislead by the excitement around the Smartling presentation.
ReplyDeleteI especially appreciate your emphasis on involving linguists in the MT process ... not new, but newly expounded upon here. ("... getting a really good MT engine is not possible without real engagement from linguists and translators.") I think your specific reporting on the way content for translation may change in the (near) future is also key (larger volume of smaller chunks, etc.).
I also agree that work needs to happen on the compensation front in order to recruit the very people who can bring your vision to pass. An attempt to enforce an arbitrary discount factor is at best pushing the risk off on the translators, and at worst an unethical extortion scheme. On the other hand, I think the kinds of "editing distance" approaches proposed by MemoQ and MemSource are (at least potentially) subject to gaming. They are moving in the right direction, but they rely heavily on trust in an industry that seems to thrive on mutual distrust. It seems to me that some form of "salary plus bonus" or freelancer profit-sharing is needed to line up everyone's incentives.
And on a related note, I suppose it is to be expected that you would argue for a smaller number of "kick-ass systems, rather than thousands that do not even measure up to the free online engines", but I think that the same compensation (or perhaps here "business model") problem exists between LSP's and MT vendors. For LSP's, the idea of investing in internal resources for a DIY Moses system holds out the promise of very large upside for limited down-side risk. My sense of the typical MT vendor approach is to add a per-volume cost without respect to what the ultimate productivity volume turns out to be. (NOTE: I freely admit I have not checked on Asia Online's current pricing scheme, and would love to hear that it has changed.) From an LSP's perspective, an MT vendor who wants to take 5-8% of the retail price for human-quality translation without assuming any of the risk associated with expected productivity improvements is very similar to SDL's "forced discount" approach that you criticize above.
Any thoughts on how we can all work together to create a win-win-win-win among end buyers, LSP's, MT vendors and translators?
The more things change, the more they stay the same. The arguments about the effectiveness of MT + Post Editing have been going on since at least the early 1980s. And the 'conclusions' have always been the same: it's effective for some language pairs, with some content, in some environments, some of the time.
ReplyDeleteWhat was also known that far back is that PRE-editing the source content to eliminate known translation issues could reduce MT post-editing effort by 50% or more, and could even improve HT productivity by 15% or more.
Pre-editing for 15-20 specific issues in the source language is a lot easier than sorting out the roughly 50% of the sentences that the machine garbles with unedited source content. And pre-editing the source text ONCE has roughly the same improvement in MT performance in all supported language pairs. The math is pretty simple: if you use MT for more than 2 target languages, pre-editing generally provides significant overall savings.
The above math provides increasingly impressive overall savings as additional target languages are added, even when human editors do the pre-editing after the writers finish their job. But the forward-looking companies (including several mentioned in Kirti's post) have realized REALLY impressive results by equipping their writers with real-time information quality assurance tools that identify translation (and other content quality) issues DURING the writing process, and prompt the writers to eliminate them as far up the food chain as possible.
I suspect that most translators recognize this reality, and I understand why it isn't a popular subject in localization-oriented forums, but there is a growing awareness of this approach among the folks who pay the localization bills. I suggest that the localization community start considering the ramifications as computer-aided editing tools continue to proliferate in the (source language) content creation community.
Another opinionated prediction: in the not too distant future, innovative combinations of SEO, computer-aided pre-editing of source content, reader-initiated MT (GoogleTranslate et. al.), and probably a few additional pre-emergent linguistic technologies will completely eliminate the need for multiple language translations of significant chunks of corporate content.
Old paradigms are shifting, ushering in times of danger and opportunity.
Time to start thinking about how to apply the hard earned linguistic skills and talents of the localization community in new ways?
Radovan Pletka • Unles there will be a momentum among professional translators, who will be correcting the MT mess into usable PEMT, it is DOA. If this was really viable, many translators will be using it to make more money. I see it only as another way to push translators to get paid less money for their hard work. But if somebody will offer me for post editing TM as much money as I am making translating without MT, I may change my opinion. But I am afraid I will be waiting for a long time (smile)
ReplyDelete@Bob Interestingly, Istvan received immediate criticism on how his attempt to measure editing difficulty was flawed, and this seems to be the way ideas flow in the industry i.e. if it is not “perfect” it is not useful. But I still think it is a useful first step and provides a straw man for objective measurement, I also expect that it will evolve and for those who understand its shortcoming will still be useful. Just as BLEU has many problems, but for those who take the time to learn, it can still be useful.
ReplyDeleteWith regard to “a DIY Moses system holds out the promise of very large upside for limited down-side risk” – I think many in the industry seriously under-estimate the effort it takes to build these systems at a meaningful level of competence. The down-side risk is time and resources wasted and systems that don’t perform as well as the free systems. I have observed many hundreds of MT systems built and have seen, literally hundreds of ways that SMT can be screwed up. Moses was created in an academic research environment to further data experimentation in this environment and has very few fail-safe measures built in that are necessary in a professional production environment. Thus while some basic results are possible very quickly, high quality MT systems come from tools beyond those provided in Moses and come from insights learned in experimentation with hundreds of different data combinations across many language combinations. This is an area where expertise is built up slowly and with much experimentation (i.e. after hundreds and thousands of attempts). Moses is no more likely to solve professional MT challenges than open source TMS will solve all TMS needs, and to assume it will do so, is equivalent to saying that any LSP that has a TMS system is guaranteed to be more efficient. Sometimes true but most often not.
There is ALWAYS a certain amount of unpredictability in developing MT systems. This is a basic reality with this technology. It requires $ investment without guarantees of productivity benefits. While it is possible to make educated guesses that a project has a “high probability” of success, there is always a chance that it may not succeed. This is related to the complexity more than anything else. This risk can be shared This is perhaps one more reason to work with experts. My advice to those want a guaranteed outcome BEFORE they begin: Move on because you might fail. Like many things in life, there are few 100% guarantees with MT, though our confidence levels continue to rise, we still see failures to reach productivity levels that are meaningful sometimes, especially where data is sparse.
For now, this is still a technology for the daring and those who realize they may fail in their first few attempts. I predict the leaders in the future will be amongst the daring few.
@Kent I agree that any kind of source cleanup is hugely valuable but given the flowing streams of content we need much more agile tools than are available today to handle things like user generated content.
Also the scope of successful MT has expanded beyond the examples of the 80’s though they may sound similar on the surface. Today we have real predictable success in languages like Chinese, Hungarian, Arabic, Thai and Japanese. This would have been considered challenging even 5 years ago.
I expect the content flows in the future will go through multiple transformations before translation and after translation as we learn to make the entire content creation and consumption chain more efficient and rapidly multilingual.
Thank you both for your detailed comments – I am sure they are valuable to people who read through the original post and bring much greater clarity and value for all.
Walter Keutgen • I like especially that the post-editor compensation problem is on the table for solution now.
ReplyDelete@Kirti --
ReplyDeleteThanks for the lengthy response. There is a lot of wisdom there and I am certainly aware of how easy it is to underestimate complexity in an area of unfamiliarity. You may be right that LSP's would be better off outsourcing that complexity.
However, the heart of my comment was not so much the technical advisability as the business model considerations. I and most LSP's i work with certainly understand that there is risk (or uncertainty) involved in deploying MT of any kind. The difficulty from this perspective, though, is that there seems to be zero uncertainty when it comes to the business model. Put another way, the MT vendors do not seem willing to assume any of the risk associated with the ultimate value of the system. The MT usage fees are fixed whether or not the productivity gains justify the cost. I suppose that could be seen as a "risk" since there will be no ongoing revenue if the system is abandoned, but from the LSP perspective it seems to set an unnecessarily high bar for "success".
Have you at Asia Online made any progress in addressing this level of concern? Or perhaps explaining the various aspects of financial risk in a way that makes it more obvious that all stakeholders are invested? That would be helpful to share.
And thanks again for starting this interesting conversation here.
Kent Taylor (@acrokent) wrote about benefits of pre-editing. I was wondering whether translation vendors should offer pre-editing services or the authors should do the pre-editing?
ReplyDeleteInteresting discussion, though nothing unique to translation (as indeed the various analogies allude to). Reminds me of the challenge we in enterprise applications UX face all the time. Nothing wrong with the technology parts in the box, but post implementation we hear constant complaints from users, usually starting with the words "But all I wanted was...". This happens when there is a mismatch of implementation expertise with business requirements, critically it's the user of the system who is left out. The implementation team isn't that user. Solution: Match implementation expertise with user education and have a change management strategy. No too implementations are the same.
ReplyDeleteCloud deployment won't change the need for implementation and customization (or tailoring as we like to say these days) of technology. The option of single tenancy versus multi-tenancy sees to that.
Also agree with Kirti's observations on clean data. That it even had to be proven amazes me. Then again maybe not. The dangers of qualitative observation, discounted guidelines, rules of thumb, and "industry standard" practice are artfully brought to life in every implementation. (Read Jeff Sauro's (2004) article in Interactions on that one: http://dl.acm.org/citation.cfm?id=1005276).
Eventually user experience will come to the fore even in the localization industry. Until then Kirti, keep up the good fight and these provocative articles. You're the industry's Edward De Bono and we need more of that.