Tuesday, November 9, 2010

The Machine Translation Community Building Bridges with Translators

The American Machine Translation Association (AMTA) recently held their annual conference in close proximity with the ATA in an attempt to build bridges and foster a growing dialogue between these two communities. When I entered the world of MT (I have always preferred the term automated translation) I had the good fortune to work with Laurie Gerber at Language Weaver who encouraged engagement with translators. While her voice was not heard there, she has always stayed true to this vision and she was instrumental in influencing me to also reach out to the world of professional translators as a core business strategy. She has long been a clear voice encouraging the broad MT community to reach out to translators and she was visible in Denver last week making sure that ATA guests were engaged and making all the right connections or just having a good time.

It is clear to me that the path to better quality MT, that really does fulfill the promise of sharing information, knowledge more freely in the world can only come from a close, cooperative and collaborative relationship with professional translators. 

The conference began with a keynote from Nicholas Hartmann who is the current ATA President and also a past technical marketing translator. He gave, what I thought was an articulate, considered and clear perspective of the translator vis-à-vis translation technology and MT while pointing to some directions for real collaboration in future. I thought it would be valuable to restate what I heard, as there were several key messages for the MT community. A published paper version of his speech is also available on the AMTA website (but it is a really hard to get to these resources as the unique URL is not easily displayed.)
The ATA has 11,000 members, of whom 70% are freelancers and Nick had carefully prepared to be their voice, expressing their concerns and needs in this forum. (Here is the twitter stream). He stated that the bad blood with translators was originally created with the historical overstatement of MT capabilities in the 60’s where MT was expected to replace translators: FAHQT (Have you noticed that this sounds a lot like f**&ked?). He noted that many translators do in fact use some form of translation technology today even though they find the future vision of being post-editors at “burger flipping wages” abhorrent. He gave some examples of human translations that went beyond the literal, to show how only a human could make the non-literal interpretations to correctly translate some example phrases. The examples proved that even a “perfect” literal translation can be nonsense at times and asked if the future of MT is as T.S. Eliot says:
“That is not what I meant at all. That is not it, at all”
Some additional points he made included:
  • Translators have a very different view of quality which is linked to their code of ethics to render the source material accurately
  • MT makes sense where something is better than nothing
  • MT is really only  “a probability distribution over strings of letters and sounds” (especially SMT) (part of a quote from Martin Kay in which he specifically cautioned the MT community NOT to consider language so simplistically)
  • Translators want it to be acknowledged that their work is critical to feeding and improving SMT systems with HT corpus
  • MT should be matched to task and purpose and could be unfortunate in the hands of the wrong people e.g. the infamous Welsh street sign
  • SMT that is often built using flawed TM and thus one could hardly be surprised at some of the results and in this case past performance will be a predictor of future performance
  • MT must be edited and checked to avoid serious errors
  • Post-editing has come to mean cleaning up really bad quality MT output at very low wages even when everybody doing it understands that they could have done it faster without the MT
  • The data pollution of some SMT systems perpetuates and is difficult if not impossible to remove
He then went on to answer the following question. So what do translators want and need?
“We want to work together constructively. We want technology that we can use. Machine assisted translation does make sense to us but we do not want tools that make our jobs harder.” Translators want to have a hand in making the tools but want the dialogue to be realistic. They also do not want a role of PEMT drudgery and asked for technology that assists translators to be more productive.  He ended his talk saying that he had enjoyed meeting many in the MT community and he hoped that the dialogue would continue as, “We are all in the same business.”

I did not stay long enough to really get the reaction of the MT crowd as I rushed off to tekom in Germany that same evening. Of course there was one somewhat hostile question immediately, but I think many MT users want to know how to work with translators and as you will see from my previous blog entries that I am a big believer in this rapprochement.

Nick was followed by Jost Zetzsche who continued on the theme of building bridges and improved communication and he pointed out how translators have a self-perception of being bridge builders, language lovers, artists, cultural intermediaries in contrast to the techie, computer science self image that many MT practitioners have. Clearly cultural and communication problems can arise from this. He was self critical and admitted that translators need to learn more about MT technology and not resist it like they did with TM, but also pointed out some foolish statements made by MT proponents that any average translator would see as unfortunate or stupid. Some examples, Jaap van der Meer’s statement about letting a thousand MT systems bloom. Some of you may realize that this is really close to something that Mao Zedong said to flush out dissidents and eventually execute them. The other screamer he listed (without reference to the source, other than it being a major MT vendor executive) was, “It’s quite a magical technology when you see it (MT) work” by Mark Tapling of SDL (He says this with a smile at the end of the video clip).  (Dude, it’s a data transformation!!! Really ?!? I wonder if Tapling thinks that spreadsheets adding numbers up and Powerpoint slide transitions are also magical? Perhaps from a 19th century frame of mind this is all pretty magical). Jost contrasted this to a Twitter conversation he had with @kvashee ;-) about features that would make MT more useful to translators. ( I assure you I did not instigate this comparison.)

Some things that he asked for: Give us (translators) challenging tasks, we want to participate in “making it better”. He stated that he wanted to see that his corrections had immediate and direct impact on the system. (One of the biggest complaints that translators have about MT is that the systems make the same error over and over again.) He asked that MT vendors talk to translators in “real” language and “admit what your tools can and cannot do”. He ended on a positive note by saying that we as humans tend to demonize the unknown (HIC SVNT DRACONES!) and invited the audience to enter into each others terra incognita, and put our myths behind us. 

While this event was a good and constructive start, I hope that the dialogue between translators and MT developers continues beyond this conference and produces real innovation and collaboration. One of the first subjects that needs elucidation and better definition is “post-editing”. It was clear in several very instructive presentations in the “Commercial User Track” that the concept of post-editing needs development and clarification. There were several very good presentations and we see many successful MT implementations being discussed on a regular basis now. Check out and type #AMTA2010 to see a cool Twitter summary of the conference. Chris Wendt of Microsoft and I were also voted to the AMTA board, representing commercial users and I thank all those who voted for me and hope to help drive our common interests and agenda.

There was an interesting demo of several post-editing tools that are available in the market currently. Lingotek showed their translator workbench, PAHO showed their Word Macro based post-editor which is perhaps the longest running and most widely used direct post-editing tool around. GTS showed a promising looking community management and basic editing environment for Wordpress blogs. These examples all suggest that the tools to make post-editing more interesting are going to continue to evolve, and that while these are wonderful examples the best is yet to come. We also see that increasingly community and collaboration are intertwined with post-editing and this connection to MT is likely to develop further, as new kinds of people are drawn into the translation process. AMTA will make much of the content from this conference available on their website and I am sure many will find it useful if they can actually find it. (They need a serious update to their website).

Unfortunately, I had to rush off to the Tekom conference in Germany  at the end of the first day and missed the rest of the conference, but I kept in touch via the GTS blog since the twitter stream died pretty quickly after I left. I noticed Jost mentioned in his blog that Laurie had accomplished what she had set out to do, years ago i.e. bring MT developers and translators closer together. However, while Laurie may be happy at this initial accomplishment, I would suggest that she stay around a while longer and make sure that we all set sail together with the wind at our backs. The journey together has just begun and we have many miles to go before we sleep.  

And this was tekom – a blur of meetings and some wonderful dinner conversations. 



  1. I see the pilars of this bridge already up and hope you share my optimism. The number of sessions about or that addressed the topic MT at the ATA conference in Denver was impressive. All the ones I attended were packed, with people standing and seating on the floor. This interest is a good sign. It is exciting to see the translators warming up to a technology that can benefit them greatly, and trying to learn about it, instead of siding with the misinformation that vilify MT. Next step is to define "quality", which is challenging but crucial since it is one of the main arguments against MT. Thanks for sharing your observations.

  2. Hi Kirti, many thanks for your mention of the GTS Translation plugin. Great post, I see it was picked up by PROZ as well. Best, Dave

  3. Thanks Kirti for the good words about AMTA-2010! Based on the feedback that I have heard from both AMTA and ATA leadership and rank-and-file members, our collocation and collaboration were indeed very successful. We were very happy with the level of cross-participation between the two conferences and with the dynamics that was created in both planned content and informal interaction. There is also significant enthusiasm from the leaderships of both ATA and AMTA to continue building on this successful collaboration and not loose the momentum that has been created. The challenge, of course, will be to find the most effective ways to jointly engage translators, LSPs of different shapes and sizes, enterprise buyers of language and translation services, and the MT technology communities (commercial developers, researchers and users) in productive collaborative forums and projects that move our industry and field forward as a whole. There is no single answer here, and there is plenty of room for a variety of initiatives and activities at the micro and macro levels. I would like to encourage everyone reading to actively contribute to this dialog. We are very much looking for concrete ideas and suggestions. Feel free to post here, or to email me directly your thoughts, ideas or comments.

    Alon Lavie
    AMTA President

  4. Ceterum censeo

    Ferenc Kovacs: Meaning: The Translators’ Role in Clarifying Some Misconceptions

  5. If I have accomplished anything really important, it has been recruiting the right people into leadership roles in AMTA! Alon Lavie and Mike Dillinger (Current Pres and VP/Past Pres of AMTA) are energetic and capable advocates of the dialog between AMTA/ATA and the MT community and translators in general - we were the right team at the right time. Having three determined people working creatively on this over three years turned out to be much more effective than one person (me) whining at various developers ;-)

  6. Let me be a fly in the ointment. For the time being, my impressions of the ATA/AMTA bridge-building exercise are a mixed bag.

    For example, I (persnally) have completely misjudged and miscategorized Mike Dillinger based on his participation in the Man vs. Machine panel at the ATA conference as a fierce proponent of "MT uber alles", and did not care to talk to him at the AMTA conference until it was too late.

    This perception (in a very general sense) was further reinforced by all that focus on post-editing (which sounded quite strange to me, since there are much more efficient ways to inject human intelligence into the MT workflow, imho).

    This makes me wonder whether I was the only one guilty of similar judgment errors.

    OTOH, I've heard a few positive comments from ATA members re. useful insights into MT and its relevance (or irrelevance) for their livelihood.

    On a more constructive note, maybe the next ATA/AMTA discussion should focus more on the theme of "Overcoming the Curse of Babel Together".

  7. @konstantin

    I agree that it is difficult for a skilled translator to get very excited about being a post-editor in the way most people mean and understand that word.

    I think that skilled bilinguals have a role in shaping and providing linguistic steering to some kinds of MT engines. This is more challenging and intellectually stimulating work that akin to teaching linguistic knowledge.

    This is exactly what I meant when I said "One of the first subjects that needs elucidation and better definition is “post-editing”."

    But I think you are completely right to say that there are better ways to inject human intelligence into the MT workflow.

  8. There are some concerns about whether this dialogue is going to run that smoothly.

    I read Kirti's interesting report about the conference, and even to his blogpost there are voices that comment with doubts abut the success of the bridge-builders.

    I myself has recently got a comment from one of the VERY experienced professionals, who has been asked to comment on Post-Editing guidelines. He wrote:

    "For those intoxicated by the false optimism on MT achievements important psychophysiological factors are easy to miss. It is one thing to spend a day to review machine translation output and completely another – read and edit this gobbledygook every working day for several years. I can assure you that heavy consequences for the conscience, unconscious and the ability to command normal human language are inevitable to come. I point your attention to the fact that psychiatrists recommend to read good literature to support conscience and preserve sanity. Reading of illiterate text poisons the brain.

    It should be only a matter of time when freelance translators, and then (in civilized world) labor unions and legal authorities would pay close attention to the issues of language ecology in general and hazards of MT post-editing in particular. Many professionals already refuse to post-edit MT, and this is why some very large companies recommend hiring freshmen as posteditors and training them from scratch. Legal ban on “good enough” quality would be very useful for decent professional service firms, since it would be warding off hedge-born discounter outfits that made enough damage already."

    Extreme viewpoint as it seems, but does it have a point?
    What do you think?

  9. Serge - I don't think any of us should expect the various parties that we hope to engage in this dialog to agree on many of the issues upfront. The parties involved come into this with very different perspectives and experiences, and lets be frank - also different basic interests. So there will be sharp disagreements. What is important I think is to clarify the shared interest that these various communities all have in carrying out this dialog and understanding the issues and each other better. Language translation and localization is becoming increasingly complex with many different scenarios. There will be different solutions that emerge for these different scenarios, involving different mixes of technology and professional human involvement. I think much of the strong rhetoric (from all sides) is driven by lack of basic knowledge and information. It is this lack of knowledge that we hope to address, at least in the short term.

    - Alon

  10. Serge

    I think that we all admit that this dialogue is just beginning and that it may be difficult at times.

    I/we also agree that it is difficult for a "skilled translator" to get very excited about being a "post-editor" in the way most people mean and understand that word. I think this is as much a problem of definition of what "post-editing" is, as it is an assumption of who will do this work.

    I have seen first hand that skilled bilinguals can have a role in shaping and providing linguistic steering to some kinds of MT engines. In making them work better. This kind of work is more challenging and intellectually stimulating work that is akin to teaching linguistic knowledge and this too can be considered a kind of post-editing.

    The economic benefits of productivity have to be shared equitably. Much of the initial resistance has as much to do with one sided PEMT arrangements (where sponsors arbitrarily drop rates without understanding the work load or financial impact on the editors) as the kind of work that is done.

    This is exactly what I meant when I said "One of the first subjects that needs elucidation and better definition is “post-editing” in the blog. This is a good place to start. There is much more to "post-editing" than many are assuming, though it is true that early post-editing experiences are often the equivalent of janitorial work.

    But what we should also keep in mind is that the demand for translation is increasing by a factor of of 10X to perhaps even as much as 1000X. The velocity of information creation is not likely to slow down soon and hundreds of millions are interested in getting access to content that is currently locked within some key languages. Thus, it is a problem worth our attention and it is possible that some in the professional industry will have little or no interest in being part of the solution. Change often starts with a only a few who see the value of engagement. There are now a few on both sides who want to work together and I think there is a good possibility that this dialogue will lead to useful initiatives and outcomes that others could in time model.

  11. >Extreme viewpoint as it seems, but does it have a point?

    I think he does have a point, in the sense that the "language ecology" does exist and does affect all of us. Over the years, I have accumulated some anecdotal evidence to relate to what he is talking about.

    However, let me emphasize that this may be a much greater concern for the SMT community at large than it is for human translators.

    In a nutshell (and in a very broad sense), humans do have an immune system that may urge them to either filter out more or get out sooner. Machines have to breathe whatever "informational air" is available to them.

    Can they last long enough in the "good enough" atmosphere?

  12. to Kirti

    >I agree that it is difficult for a skilled translator to get very excited
    >about being a post-editor in the way most people mean and understand that word.

    That is not what I meant at all. That is not it, at all. :=)

    Big picture...

    Both MT and HT have evolved considerably since the mid-50s.

    The volume of translation work has expanded, as the basic business model has shifted from in-house to worldwide outsourcing. Hence, generally speaking, LSPs nowadays have to deal with much less controlled stream of information on the input side and a much greater variety of end-users on the output side.

    The MT, afaik, went through somewhat similar trials and tribulations: from the first purpose-built systems with an explicitly defined narrow domain of application to the current off-the-shelf, open-access and/or open-source systems.

    It seems that the MT technology may have matured enough to be of interest for a "skilled translator". And, conversely, it has matured enough to benefit from interaction with a "skilled translator".

    On the other hand, at least some "skilled translators" have had enough with scalability issues in high-quality specialized translation to be willing to "outsource" suitable chunks of their work to a machine that has no human dimension, and does not require empowerment and encouragement.

    In this scenario "post-editing" (cure) takes a back seat to the early input (prevention) provided by skilled bilingual domain specialists (and linguistic steering is a relatively small components of such input).

    It seems to me that this could be a good starting point for a more meaningful dialog between the two communities. However, it may require a bit of a paradigm shift within the MT community, and willingness to beat some enterprise-scale solutions to specialist tool kits.

  13. Here is another Translator Blog based perspective on these events. As you can see we have some distance to go before we celebrate a real rapprochement

  14. Hi all

    How much time is needed to have MT tools that are really suitable for mere translators as CATs are?

    So far they seem more suitable for semantics engineer or semantics professors, and even MMM is not enough "mere" to me.

    I mean tools that:
    1 - have a demonstrable performance better than GT
    2 - integrate smoothly with MS Office and most CATs
    3 - have an average learning curve comparable to most CATs
    4 - have a reasonable cost for the base tool, and no useless feature (as text-to.speech), i.e. one (1) language couple only and no specialistic dictionaries, but with unlimited entries for unlimited custom dictionaries
    5 - have unlimited entries for unlimited custom dictionaries
    6 - have good specialistic dictionaries in all languages couples (very few are EN to IT, for example)
    7 - don't need taking on Tianhe-1 to train them
    8 - as an alternative, the seller trains the MT with our own memories/corpora
    9 - can be tried as fully operational to judge if the game (quite expensive) is worth the candle (only Prompt follows this policy)
    10 - as an alternative, the seller gives free access to an on-line tool, but whose performance must be better than GT

    NOTE: I consider GT the gold standard as it's free and it is "good enough" in skilled hands (tested in my work since 1 month)

  15. the moral of the story to me is:

    if you want to allure translators, give them usable tools ...

  16. "I am a professional and experienced translator of EN<>PL and ES <>PL, but also a curious creature, and a sort of innovator, so I decided to take a peek at PE first hand, and took a PE assignment in hotel booking websites. Kirti said that "early post-editing experiences are often the equivalent of janitorial work." I'd rather say mine was like cleaning Augean stables, and for less than peanuts. I am not a hero like Heracles though, and have no gods to support me by which I mean I have to make a living.
    There are many professionals worldwide that would like to cooperate and can see good applications for MT, if it is done the right way. We really have to work together before it is too late.
    "Language ecology" mentioned by Serge is an important issue because - you probably also observe this trend in your respective national linguistic communities - overall the quality of speech and writing skills has been deteriorating. If we keep feeding the public with some sick products of MT that are not detoxified, as Nick put it, we may get to a point where some linguistic audiences become so degenerated that they may loose their potential because language is not only a tool of communication but a tool of invention and creativity too. Whatever human activity needs to be verbalized, put in words, and degenerated linguistically audiences may sooner or later lack the right words to express their ideas.
    Posted by Iwona Szymaniak

  17. as a peer says,, "And it is this quality of MT translation that post-editors should be willing to work on, not some absurd, free and often ridiculous rendition of source text."

    this is perhaps the reason why Iwona spent so much for gaining less than peanuts ...