Pages

Tuesday, November 9, 2010

The Machine Translation Community Building Bridges with Translators

The American Machine Translation Association (AMTA) recently held their annual conference in close proximity with the ATA in an attempt to build bridges and foster a growing dialogue between these two communities. When I entered the world of MT (I have always preferred the term automated translation) I had the good fortune to work with Laurie Gerber at Language Weaver who encouraged engagement with translators. While her voice was not heard there, she has always stayed true to this vision and she was instrumental in influencing me to also reach out to the world of professional translators as a core business strategy. She has long been a clear voice encouraging the broad MT community to reach out to translators and she was visible in Denver last week making sure that ATA guests were engaged and making all the right connections or just having a good time.

It is clear to me that the path to better quality MT, that really does fulfill the promise of sharing information, knowledge more freely in the world can only come from a close, cooperative and collaborative relationship with professional translators. 

The conference began with a keynote from Nicholas Hartmann who is the current ATA President and also a past technical marketing translator. He gave, what I thought was an articulate, considered and clear perspective of the translator vis-à-vis translation technology and MT while pointing to some directions for real collaboration in future. I thought it would be valuable to restate what I heard, as there were several key messages for the MT community. A published paper version of his speech is also available on the AMTA website (but it is a really hard to get to these resources as the unique URL is not easily displayed.)
 
The ATA has 11,000 members, of whom 70% are freelancers and Nick had carefully prepared to be their voice, expressing their concerns and needs in this forum. (Here is the twitter stream). He stated that the bad blood with translators was originally created with the historical overstatement of MT capabilities in the 60’s where MT was expected to replace translators: FAHQT (Have you noticed that this sounds a lot like f**&ked?). He noted that many translators do in fact use some form of translation technology today even though they find the future vision of being post-editors at “burger flipping wages” abhorrent. He gave some examples of human translations that went beyond the literal, to show how only a human could make the non-literal interpretations to correctly translate some example phrases. The examples proved that even a “perfect” literal translation can be nonsense at times and asked if the future of MT is as T.S. Eliot says:
“That is not what I meant at all. That is not it, at all”
Some additional points he made included:
  • Translators have a very different view of quality which is linked to their code of ethics to render the source material accurately
  • MT makes sense where something is better than nothing
  • MT is really only  “a probability distribution over strings of letters and sounds” (especially SMT) (part of a quote from Martin Kay in which he specifically cautioned the MT community NOT to consider language so simplistically)
  • Translators want it to be acknowledged that their work is critical to feeding and improving SMT systems with HT corpus
  • MT should be matched to task and purpose and could be unfortunate in the hands of the wrong people e.g. the infamous Welsh street sign
  • SMT that is often built using flawed TM and thus one could hardly be surprised at some of the results and in this case past performance will be a predictor of future performance
  • MT must be edited and checked to avoid serious errors
  • Post-editing has come to mean cleaning up really bad quality MT output at very low wages even when everybody doing it understands that they could have done it faster without the MT
  • The data pollution of some SMT systems perpetuates and is difficult if not impossible to remove
He then went on to answer the following question. So what do translators want and need?
 
“We want to work together constructively. We want technology that we can use. Machine assisted translation does make sense to us but we do not want tools that make our jobs harder.” Translators want to have a hand in making the tools but want the dialogue to be realistic. They also do not want a role of PEMT drudgery and asked for technology that assists translators to be more productive.  He ended his talk saying that he had enjoyed meeting many in the MT community and he hoped that the dialogue would continue as, “We are all in the same business.”

I did not stay long enough to really get the reaction of the MT crowd as I rushed off to tekom in Germany that same evening. Of course there was one somewhat hostile question immediately, but I think many MT users want to know how to work with translators and as you will see from my previous blog entries that I am a big believer in this rapprochement.
Jost

Nick was followed by Jost Zetzsche who continued on the theme of building bridges and improved communication and he pointed out how translators have a self-perception of being bridge builders, language lovers, artists, cultural intermediaries in contrast to the techie, computer science self image that many MT practitioners have. Clearly cultural and communication problems can arise from this. He was self critical and admitted that translators need to learn more about MT technology and not resist it like they did with TM, but also pointed out some foolish statements made by MT proponents that any average translator would see as unfortunate or stupid. Some examples, Jaap van der Meer’s statement about letting a thousand MT systems bloom. Some of you may realize that this is really close to something that Mao Zedong said to flush out dissidents and eventually execute them. The other screamer he listed (without reference to the source, other than it being a major MT vendor executive) was, “It’s quite a magical technology when you see it (MT) work” by Mark Tapling of SDL (He says this with a smile at the end of the video clip).  (Dude, it’s a data transformation!!! Really ?!? I wonder if Tapling thinks that spreadsheets adding numbers up and Powerpoint slide transitions are also magical? Perhaps from a 19th century frame of mind this is all pretty magical). Jost contrasted this to a Twitter conversation he had with @kvashee ;-) about features that would make MT more useful to translators. ( I assure you I did not instigate this comparison.)

Some things that he asked for: Give us (translators) challenging tasks, we want to participate in “making it better”. He stated that he wanted to see that his corrections had immediate and direct impact on the system. (One of the biggest complaints that translators have about MT is that the systems make the same error over and over again.) He asked that MT vendors talk to translators in “real” language and “admit what your tools can and cannot do”. He ended on a positive note by saying that we as humans tend to demonize the unknown (HIC SVNT DRACONES!) and invited the audience to enter into each others terra incognita, and put our myths behind us. 

While this event was a good and constructive start, I hope that the dialogue between translators and MT developers continues beyond this conference and produces real innovation and collaboration. One of the first subjects that needs elucidation and better definition is “post-editing”. It was clear in several very instructive presentations in the “Commercial User Track” that the concept of post-editing needs development and clarification. There were several very good presentations and we see many successful MT implementations being discussed on a regular basis now. Check out http://www.wallofsilver.net/ and type #AMTA2010 to see a cool Twitter summary of the conference. Chris Wendt of Microsoft and I were also voted to the AMTA board, representing commercial users and I thank all those who voted for me and hope to help drive our common interests and agenda.

There was an interesting demo of several post-editing tools that are available in the market currently. Lingotek showed their translator workbench, PAHO showed their Word Macro based post-editor which is perhaps the longest running and most widely used direct post-editing tool around. GTS showed a promising looking community management and basic editing environment for Wordpress blogs. These examples all suggest that the tools to make post-editing more interesting are going to continue to evolve, and that while these are wonderful examples the best is yet to come. We also see that increasingly community and collaboration are intertwined with post-editing and this connection to MT is likely to develop further, as new kinds of people are drawn into the translation process. AMTA will make much of the content from this conference available on their website and I am sure many will find it useful if they can actually find it. (They need a serious update to their website).

Unfortunately, I had to rush off to the Tekom conference in Germany  at the end of the first day and missed the rest of the conference, but I kept in touch via the GTS blog since the twitter stream died pretty quickly after I left. I noticed Jost mentioned in his blog that Laurie had accomplished what she had set out to do, years ago i.e. bring MT developers and translators closer together. However, while Laurie may be happy at this initial accomplishment, I would suggest that she stay around a while longer and make sure that we all set sail together with the wind at our backs. The journey together has just begun and we have many miles to go before we sleep.  

And this was tekom – a blur of meetings and some wonderful dinner conversations. 

tekomacross




Tuesday, October 26, 2010

Megatrends and Their Impact on Professional Translation

I have recently made public presentations to audiences of LSPs and localization professionals about broad trends affecting the world of professional translation. My perspective seemed to resonate and I thought it might be useful to put the core message in a blog entry and share it, to possibly get critical feedback or extend the discussion. In some ways I have touched upon these “megatrends” in earlier blog entries (Why MT Matters, The Data Deluge, Translation Technology, Innovation in Localization)  but it is useful to bring it all together in a single place.

I am aware that others are making similar points but I think this summary incorporates conversations I have heard around the web and is closer to the collective intelligence. The trends in brief are as follows:
  • There is an explosion in relevant content affecting global enterprises
  • Social Media and Social Networks are now increasingly in control of branding impressions
  • There is an increasing use of open innovation and community collaboration models in many businesses
  • Translation technology and automation are becoming increasingly important for speed and cost reasons
  • A rising Asia will change the priority of strategic languages away from the current FIGS dominance

The Content Explosion

Data Growth Trends
We are seeing exponential growth in the digital universe and much of it is very relevant to global enterprise concerns. It is important to understand that this is happening on a scale never before seen in the history of man. A lot of this content is related to facilitating global commerce so understanding this becomes highly relevant for the global enterprise.
Enterprise Content

The Impact of Social Networks and Customer Conversations

The carefully calculated marketing and corporate image control that global enterprises have been used to is also coming undone. Brand impressions are increasingly being formed by real customer conversations in social networks. We see that the world of marketing is undergoing a transformation and what used to be considered critical corporate messaging is increasingly viewed as “corporate-speak” and is often not trusted by the end-customers who matter the most. Jeremiah Owyang wrote a prescient blog entry three years ago where he predicted this shift and increasingly the corporate website is seen as a place where pro-corporate bullshit resides. More and more, the high value content is being created by customers and business partners and corporations have little editorial control of this content creation process.

UGC Impact

The Growing Importance of Open Innovation & Collaboration Models

We are also seeing that co-creation of products and services with customers can be a huge momentum builder. Facebook has shown that engaging users in translating interfaces can also help accelerate building the customer base in those languages. Their international growth has been closely linked to their crowdsourcing translation efforts even though crowdsourcing can be less predictable.  Dell actively solicits and nurtures active customer feedback in IdeaStorm to develop new products. Collaborating with customers and partners is emerging as a way to to engage customers, build loyalty and accelerate penetration in new markets. There are very few large companies that have figured out how to do this because open collaboration is a huge cultural change from the command and control mentality that most executives work with. In the B2B world this may also mean that customers prefer best-of-breed solutions to one-stop solutions. Diversification is generally considered a wise investment strategy and is increasingly understood to be a good way to build business infrastructure and avoid vendor lock-in even though it may also mean more integration work.

 The Rise of Asia

Western Europe and FIGS have dominated the professional translation world historically. While FIGS will remain important, in the future it looks like Brazilian Portuguese, Spanish will be the most important European languages and Chinese, Hindi, Indonesian and other Asian languages will become increasingly more important and strategic as global revenue generating translation investments. Many companies now are expanding their base of languages and the FIGS-CJK view is slowly receding.The graphic below is clear in it’s implications: Asia offers substantial commercial opportunity for those who make the localization investments. I have written previously about how Asia is a long-term opportunity for truly global companies. (I will update it shortly as the momentum keeps building.) There are many market potential studies that suggest Asia offers significant opportunity for IT infrastructure, mobile devices, luxury goods as well as new bottom of the pyramid (BOP) product opportunities.
Strategic LPs

What Does All This Mean?

So if we add all this up it shows that global enterprises are facing a content deluge with dynamic content coming from both internal and external sources and high volumes of this content is expected to be translated increasingly faster to have any value in competitive situations. Global enterprises that quickly identify high value content and make it multilingual will find that this can drive international revenues and that translation can be a strategic tool to building long-term competitive advantage.

Now, more than at any other time in history, speed and agility are decisive competitive advantages...David Meerman Scott

However this is a time of revolution, and the TEP (Translate-Edit-Proof) and SDL (Software and Documentation Localization) mindsets are not likely to be adequate to meet these new translation challenges. The old approach worked for static, low volume content but new thinking and new approaches are required to deal with the data deluge today. Automated translation is an absolute necessity in the new world but this is not the MT of yesteryear that many are still implementing and describing at localization conferences today.

Old Approach Cant Work
In the new world, data has to flow from content creation to consumption as seamlessly as possible, delivered to where it is needed at desired quality levels. This means that humans need to be part of the production process and are the key to producing the best quality. The future is about much better man-machine collaboration. The automated translation tools need to be learning and getting better all the time. They need to be responsive to skilled human linguistic steering and corrective feedback. They need to be focused on dynamic streams of content not just static, packaged translation projects like user documentation. They need to understand that with flowing content, upstream cleanup efforts will flow through the production line and make every downstream process easier and more efficient. They see the information cycle as a system, as organic and thus build the collaboration infrastructure to address the whole problem. They need to see MT systems improve with human steering, in weeks, not months or years. They need to be MT systems that can be trained and managed by skilled professionals to get you to “good enough” production quality “fast enough” to have a positive impact on your business, not the black box MT of yesteryear. If you look closely there are very few choices in reality. (Think Asia Online!)
Localization Trends

Have you noticed that Google and Bing Translate improve regularly since they shifted to statistical MT (SMT)? Does that tell you something? Why did we not see these improvements when these same companies were RbMT based?  On domain focused systems the progress and quality improvements are even more compelling. I have recently seen Asia Online systems compared to heavily customized RbMT “hybrid” systems, and it is clear that the yield to effort ratios with hybrid SMT approaches are clearly superior, (I am of course biased) but the evidence continues to mount. (BTW Moses is perhaps 5% of what is needed to accomplish this). In the future professionals will undertake to translate content streams that contain tens of millions of words that encapsulate important customer conversations. If translation is strategic make sure that you align with technology that can go the distance and that has demonstrated the ability to evolve.
New Requirements for MT
As more senior corporate executives realize that translation is strategic, and that translation technology properly used can generate substantial revenue in global markets, they will start to look at solving these new kinds of translation problems. Executive managers are unlikely to be excited by the possibility of getting user documentation done faster and cheaper. However, as they start to get their hands around the fact that customer conversations are important (many are struggling with this), and that they need to respond with speed and agility to build global customer relationships, we can expect to see a new kind of executive who will seek to make flowing content multilingual. They will care that real standards exist (not SDL versions) and will likely remove products that do not comply. Handling the flow efficiently will become the focus, and the most visionary localization managers may even have senior roles in making this happen. Speed and agility are key and customer engagement across the globe will require a real understanding of how to make these dynamic content focused translation systems work. There is a video of an extended presentation version of this blog available from the Localization Technology Roundtable seminar (starts at about 4’30” after introduction). (Yes, I need to lose weight or at least wear shirts that are not so tight).

We are living in a time of great change. In times of change there is often an opportunity for new paradigms and new leaders to launch, remember MSFT grabbing the desktop market away from IBM and Google grabbing away the web search market from MSFT?  New leaders with new visions are the change agents that make this happen, and they are often dismissed by the established status quo and “leaders” of the time. I do not see that new MT initiatives by Lionbridge and SDL really address these new needs. I think there is too much of the old view in their approach,  but I may be proven wrong. I am skeptical that the current “market leaders” have the vision and/or culture to drive this change and I think we will see somebody from outside the industry or smaller, more agile LSPs be the driving force of this coming change. For current leaders to be change agents often means cannibalizing current revenue streams and very few have succeeded in crossing that chasm. We shall see as this unfolds.

In revolution, the best of the new is incompatible with the best of the old. It’s about doing things a whole new way...Clay Shirky

Anyway these are truly interesting times, and it looks like things will get even more interesting. 

Monday, October 18, 2010

Networking and Learning at ELIA and the CNGL in Sunny Dublin

I was in Dublin last week at the ELIA Networking Days conference. In many ways it was less a conference, and much more a peer community coming together and sharing ideas and views of the world, much more so than most other professional translation industry “conferences”. I think it was one of the best LSP-issue focused events I have seen.  While some might say that I say this because I did have a prominent role there, (I was both a keynote speaker and ran a workshop on how LSPs could get started with MT), I assure you this is not why I say this.(Hear me now and believe me later – in German accent)

We live in an age, where increasingly marketing and corporate-speak is challenged, undermined and sometimes even seen as disingenuous and false. (Raise your hand if you trust and respect corporate press releases).  Today we see customer voices rise above the din of corporate messaging, and taking control of branding and corporate reputations with their own “authentic” discussions of actual customer experiences, while marketing departments look on haplessly. I think this phenomenon is happening on many fronts, including conferences in the localization industry. There are too many events in the L10N industry that seem formulaic, routine, repetitive and engineered based on the same old viewpoints. This, I think affects the ability of these events to really spark dialogue, excitement and generate vital learning experiences that make these conferences must-attend events. While these events remain useful for “face-time”, they often have little value for really engaging attendees at a professional level.

What makes for a great conference or professional industry event? To my mind: high quality content, interactive and engaged audiences in sessions that broaden one’s horizons, interesting people who continue the professional dialogue outside of the sessions and share learning experiences and of course a good location. And if you can offer all of this at a reasonable cost, even better. A great professional event is characterized by learning, the more intensive the learning experience, the better. The best ones leave you thinking for awhile after the event.  Intense learning rarely happens at really big events because it is hard to scale this. The ELIA Networking Days meeting (rather than conference) had many of the elements of good professional group meetings. The attendees had a lot of say in determining what was presented, the sessions were highly interactive and engaged, attendees all shared without concern for exposing personal weaknesses and it was clear that the broad feedback was genuinely positive and the biggest single complaint seemed to be that people wanted to be in two sessions that were running concurrently. Ultan (pictured below in traditional Irish garb) aka @localization has also written about his own early impressions of the event with more to follow soon in Multilingual magazine.


66641_441487187182_615467182_5275267_4325167_n
This event was also very organized in terms of Twitter coverage and attendees saw continuous live coverage through the event where several screens showed the Twitter feedback up to that point. Type “#ELIA” at the wall of silver to see one such live view. Opinions from tweeps outside the conference were shared with the session group if they were deemed pertinent or relevant. This virtual expansion is a key characteristic of the best events like TED and SXSW where one can see audience reactions in real time and the external virtual audience may actually be several times larger than the physical one.  This event also has some great candid photo shots taken by @AgaGonczarek like the one below. She also created the mood video that captures much of the feel of the conference.

elia2
Some of the sessions that I found most interesting – Sharon O’Brian’s overview of post-editing, @paraics review of the work CNGL is doing, @marylaplante’s perspective of changing enterprise needs and innovation. Ultan also did a great impromptu presentation on the huge value of Information Quality at Oracle (he really does not like descriptions of IQ with the word controlled or simplified in them).  I also really enjoyed the Bulls Eye Sales Pitch session which compared and contrasted the actual sales pitch used by several brave and courageous LSPs who voluntarily faced criticism from both the panel and the room on their presentations.(We all need to do more of this to learn faster).  Differentiation is a fundamental business challenge for language service providers and this session did a great job of raising awareness and providing real insights into different differentiation strategies. There is a very nice downloadable twitter archive here where you can see mine and others coverage of these sessions, but unfortunately all the hardcore tweeters tended to attend the same sessions, so some sessions did not have very much twitter coverage.

I also spent a day with several people at CNGL at Trinity College Dublin to better understand what they are working on and also had a session with graduate students that was a lot of fun, where I got to tell them my opinion on what broad language technology problems needed to be solved and what research they should do to help us (the world) make more rapid advances. The students were respectful (about  my ideas) and at one point informed me that while my suggestions made sense and could even possibly have an impact on important translation problems, they were unlikely to have merit as PhD thesis ideas.  Most people don’t realize that CNGL has major SMT research initiatives underway that would rank it amongst the largest MT research programs in the world. They are also doing some very cool and leading edge work on global customer support. In fact, I think one of their research initiatives has produced customer support related technology that could be the basis of a formidable offering in the market, if I had the money I would be very interested in trying to commercialize it. This technology solves a very important problem: Getting the customer to the information they want, by quickly matching queries with carefully filtered and highly relevant response information.

I also got to be on a panel of judges that ranked CNGL PhD student thesis presentations. The winners were focused on improving Patent Search, Developing Better Localization Data Standards and Managing Quality in Crowdsourcing. There were also some cool SMT ideas and ontology development ideas that looked promising. The highlight of the day had to be the very brief visit to see the Book of Kells and the Long Room Library. It is heartening to realize that men sat together a few hundred years ago, and said lets build a library, and lets make it so amazing that men (and women) in future will realize that knowledge, words and books can and should be approached with awe and reverence. This place truly is imbued by the hand and touch of civilized beings. A marvelous and magical place whose scale cannot really be appreciated in pictures, but here’s one anyway (please sir, pardon my use of it without permission).

Dublin-The Long Room Library Trinity College

All conferences have some wonderful conversations and some of these are really heartfelt and should be celebrated. Even though some of these conversations may be the one and only interaction I have with these people in my life, I agree with Carlos Castaneda who recommends (in The Active Side of Infinity), that these conversations should be chronicled as memorable moments in one’s life. I can recall a few of these conversations from this trip: several conversations with Ultan O’Broin (sometimes with Renato there as well), a conversation about differentiation, vision and distinction with the lovely Polish duo of @AgaGonczarek and Marta (who tried repeatedly to get me to say Wroclaw correctly), a conversation with the lovely and talented Sara Nicolini about purpose and passion and finally a dinner conversation with Reinhardt Schaler (and Paraic) about India and Hard Times in Ireland and elsewhere.

This was also a particularly intense event for many and varied professional conversations about how to get started with MT. I look forward to continuing these conversations.

ELIA Networking Days Dublin 2010 from Agnieszka Gonczarek on Vimeo.


And indeed it was actually sunny for those three networking days.