Using Translation as a Force to Address Information Poverty: AGIS 2010

I have the good fortune to reflect and report on the AGIS 2010 conference as we approach “the holiday season,” which is a time of reflection for many in the world. A time of goodwill and at least temporary good deeds for some. The conference was held in India, which can be a challenge because the basic infrastructure is still primitive, but the event went off well with very few glitches and I think AGIS is slowly building momentum.

AGIS stands for Action for Global Information Sharing, and is focused on conducting a resolute crusade against Information Poverty since its inception. The overall tone and tenor of this conference is very different from the typical conference in the world of enterprise localization (LocWorld, GALA, LISA). The focus is on making all kinds of knowledge and information accessible in places where it has never been available before, not just to sell products. There is clear evidence shared by many speakers, that shows that access to information creates the conditions for economic prosperity or perhaps even actually drives it. In some parts of the world localization is all about reducing information poverty and improving the human condition. Reinhard has provided a summary of the highlights of the event in his blog. There was also coverage here. And for those who ask why do need yet another conference in the industry, Reinhard explains below:
You might be asking yourself, “Why AGIS, why YALC (yet another localisation conference)? What makes AGIS so different?” Well, first of all, it is not owned by any particular organisation, it is not run for profit, and it is (almost) free to attend. Then, it takes place where people need localisation, not where people are rich enough to pay for it. Nothing is sold, nothing is bought at AGIS. And last but not least, AGIS attendees have a social agenda, not (just) a commercial one.
The highlights that can be also be found in the twitter stream on my Friendfeed (Scroll back to newer items to see the chronological sequence. I don’t know why Twitter has already made much of the data unavailable).  

In the keynote, Dr. Vijay Bhatkar (a digital visionary of India) pointed out how globalization and localization are tightly linked and how NLP, MT and language technologies are only just beginning their evolutionary march. He pointed out how Japan’s hopes for world dominance were stymied by linguistic issues and that access to knowledge and information in the 21st century will be key to building prosperity as an increasing part of the GDP of many nations will come from knowledge services. This is already true for India. He informed the Indians in the room, that India can not consider itself an IT power when 350 million people are illiterate and urged the community to preserve the Indian languages while continuing the push forward with English education. He also pointed out that both Telephone and TV are mostly language neutral but information cannot be, and localization is critical to broad access. 

Reinhard Schaler and the Rosetta Foundation are leading an initiative to build a platform to facilitate self configurable, distributed and shared data based global localization initiatives. CNGL and University of Limerick students provided overviews and demonstrations of these tools. Reinhard highlighted that each day 24,000 die children because of lack of access to basic healthcare information. That is 1 every 3 seconds! These deaths could be avoided if information was available more easily. This appears to be a primary motivator and raison d'etre for the Rosetta Foundation.

Ms. Swaran Lata painted a clear picture of the amazing complexity of the Indian linguistic landscape. 20+ major languages with some states having 3-4 languages and multiple scripts. The CDAC organization is attempting to solve the linguistic computing issues to ensure that Indian languages gain a stronger digital presence and are preserved. As Prof. Bhatkar asked: "Can you really say you know Hindi if you cannot  use it on a computer? This is key in the information age."  Ms. Lata described initiatives that focused on the digital education of youth, which interestingly also resulted in the knowledge being passed on to illiterate parents and grandparents. She talked also about initiatives to reach out to the “other side of India” to ensure that illiterate people are not left behind. As Indian consumers become more powerful, Indian languages are critical to reaching their purchasing power.

I spoke about how the Asia Online vision is finally coming to fruition, when we start rolling out a Thai Wikipedia comprising of translations of 3.5 million articles starting in January 2011. When all these articles are up and ready, Thailand will have the second largest Wikipedia in the world after the English one.  This is a huge boost from the current 60,000 article Thai Wikipedia, many which are just barely more than stubs. In contrast, the index alone for just the article titles in the English Wikipedia are in excess of 600,000 pages! The Asia Online project is an initiative that directly addresses information poverty. Shockingly it was also uncovered that the Hindi Wikipedia only has about 50,000 articles for a population of almost 400 million people! This means that a child that does not speak English is deprived of basic educational information access and has a fraction of the content available to an English speaking child.
Some other interesting information from the conference:
  • Ravi Gupta pointed out there are 62,000 newspapers in India and 92% of these are not in English
  • Subtitles do not work well in India because of literacy issues but they can also be a means of building literacy
  • There are no English TV channels in the Top 100 TV channels in India but English speaking consumers are the wealthiest consumers
  • Ravi Kumar, The President of  the Indian Translators Association made an impassioned plea asking that buyers and the community at large respect translators as professionals
  • The CNGL team showed various elements of the open SOLAS platform they are making available to anybody who needs it
  • Mahesh Kulkarni’s wonderful presentation on standards which he called traffic rules that ease both user and creator experience. He has a much more holistic and systematic view of standards than we see from the feeble standards initiatives in the traditional localization industry but he too, expressed the difficulties of getting good standards in place.
  • He also pointed that that there are 670 million mobile phones in India and asked is this the end of the internet as we know it? 
 Mahesh Kulkarni and Raimond Doctor are a joy to behold; passionate, knowledgeable and driven in spite of having to deal with Indian governmental bureaucracy as part of their daily lives. Raimond is perhaps the most erudite and knowledgeable person I have met on comparative linguistics. He shares his deep knowledge and insight with a verve that draws you right into his delight for language. I hope that CDAC realize what treasures these men are, and gives them room and resources to execute on their vision and passion.

Reinhard also pointed out that the non profit world was substantially larger in terms of market potential and actual localization activities than the Fortune 500 market. Non-profit does not mean no payment, no recognition and jobs, it is in fact bigger than the energy sector in US.  

Take a look at this TED video to see how information access can change lives and empower people to learn, take control and transform their own lives.
If there is a revolution coming in translation – I think one is much more likely to see the first signs of the revolution at a conference like AGIS, rather than at more mainstream localization conferences. We see all the key elements lining up here: people focused on large scale collaboration infrastructure, community and crowdsourcing management, massive translation automation and standards and the most important ingredient of all: PASSION. We see people at this conference who are driven by a passion to change the world. We see people who are not making a lot of money but are still working long and diligent hours. We see people undertaking translation projects that will involve hundreds of millions of words on a routine basis. We see technology, collaboration tools & infrastructure and community coming together in ways that just does not happen at traditional professional translation events.  We see people who want to make an impact on the human condition. We hear and see people talking about nation building and the human right to knowledge. This kind of talk gets me all warm inside and I think this is what we all had in common, a “higher” sense of purpose and mission which does not equate to a stance of moral superiority as some might think. Many of the people here have a soul satisfying answer to the question: Why do you do what you do? Isn't that enough?

My interest in automated translation has always been related to the potential impact this technology has on improving information access and thus improving human lives across the world and also potentially improving the quality and depth of communication between linguistic groups, cultures and nations. One step on the way to Pacem in Terris? Foolish and idealistic perhaps, but we need to dare to dream first before we can actually make it happen. As a teenager, a wise man told me once, “You are the world and the world is you” and I have explored that statement ever since,  holding it close to heart as a seminal influence in my life.

Join and support the Rosetta Foundation and help make this into a movement that cannot be stopped. If you have the ability to influence a major corporate entity to get involved and support this please do so now, and join Reinhard as he forges and builds this new path to change the world for the better.
And as if that were not enough, I even had a brief meeting with former Indian Prime Minister I.K. Gujral, who even in his nineties has a stature, grace and humility that is disarming. He was impressed by the Thai Wikipedia project I am involved with, and said it would be wonderful for India to do the same in Hindi. And thanks to my friend Vishal I also got to meet several industry leaders of emerging India who seek to build transparency and a relatively corruption free government.

I was also greatly heartened to see the corruption establishment take a serious blow when Minister Raja was exposed for taking obscenely huge bribes, in excess of $40 billion I believe. What makes corruption in India especially horrific is the complete lack of remorse and shame that these public officials have. India is on the move but still has far to go as the culture of corruption is everywhere you turn, and will not die easily. One of the other benefits of free flowing information is that it also makes this kind of self dealing and abuse of trust harder to maintain. Information poverty is also an enabler and friend of corrupt officials and thus this is yet another reason to address this issue. 

Happy Holidays to you all and I hope that you explore and find "goodness" in your life. And here is the link to holiday greetings in many languages.

Tuesday, November 9, 2010

The Machine Translation Community Building Bridges with Translators

The American Machine Translation Association (AMTA) recently held their annual conference in close proximity with the ATA in an attempt to build bridges and foster a growing dialogue between these two communities. When I entered the world of MT (I have always preferred the term automated translation) I had the good fortune to work with Laurie Gerber at Language Weaver who encouraged engagement with translators. While her voice was not heard there, she has always stayed true to this vision and she was instrumental in influencing me to also reach out to the world of professional translators as a core business strategy. She has long been a clear voice encouraging the broad MT community to reach out to translators and she was visible in Denver last week making sure that ATA guests were engaged and making all the right connections or just having a good time.

It is clear to me that the path to better quality MT, that really does fulfill the promise of sharing information, knowledge more freely in the world can only come from a close, cooperative and collaborative relationship with professional translators. 

The conference began with a keynote from Nicholas Hartmann who is the current ATA President and also a past technical marketing translator. He gave, what I thought was an articulate, considered and clear perspective of the translator vis-à-vis translation technology and MT while pointing to some directions for real collaboration in future. I thought it would be valuable to restate what I heard, as there were several key messages for the MT community. A published paper version of his speech is also available on the AMTA website (but it is a really hard to get to these resources as the unique URL is not easily displayed.)
The ATA has 11,000 members, of whom 70% are freelancers and Nick had carefully prepared to be their voice, expressing their concerns and needs in this forum. (Here is the twitter stream). He stated that the bad blood with translators was originally created with the historical overstatement of MT capabilities in the 60’s where MT was expected to replace translators: FAHQT (Have you noticed that this sounds a lot like f**&ked?). He noted that many translators do in fact use some form of translation technology today even though they find the future vision of being post-editors at “burger flipping wages” abhorrent. He gave some examples of human translations that went beyond the literal, to show how only a human could make the non-literal interpretations to correctly translate some example phrases. The examples proved that even a “perfect” literal translation can be nonsense at times and asked if the future of MT is as T.S. Eliot says:
“That is not what I meant at all. That is not it, at all”
Some additional points he made included:
  • Translators have a very different view of quality which is linked to their code of ethics to render the source material accurately
  • MT makes sense where something is better than nothing
  • MT is really only  “a probability distribution over strings of letters and sounds” (especially SMT) (part of a quote from Martin Kay in which he specifically cautioned the MT community NOT to consider language so simplistically)
  • Translators want it to be acknowledged that their work is critical to feeding and improving SMT systems with HT corpus
  • MT should be matched to task and purpose and could be unfortunate in the hands of the wrong people e.g. the infamous Welsh street sign
  • SMT that is often built using flawed TM and thus one could hardly be surprised at some of the results and in this case past performance will be a predictor of future performance
  • MT must be edited and checked to avoid serious errors
  • Post-editing has come to mean cleaning up really bad quality MT output at very low wages even when everybody doing it understands that they could have done it faster without the MT
  • The data pollution of some SMT systems perpetuates and is difficult if not impossible to remove
He then went on to answer the following question. So what do translators want and need?
“We want to work together constructively. We want technology that we can use. Machine assisted translation does make sense to us but we do not want tools that make our jobs harder.” Translators want to have a hand in making the tools but want the dialogue to be realistic. They also do not want a role of PEMT drudgery and asked for technology that assists translators to be more productive.  He ended his talk saying that he had enjoyed meeting many in the MT community and he hoped that the dialogue would continue as, “We are all in the same business.”

I did not stay long enough to really get the reaction of the MT crowd as I rushed off to tekom in Germany that same evening. Of course there was one somewhat hostile question immediately, but I think many MT users want to know how to work with translators and as you will see from my previous blog entries that I am a big believer in this rapprochement.

Nick was followed by Jost Zetzsche who continued on the theme of building bridges and improved communication and he pointed out how translators have a self-perception of being bridge builders, language lovers, artists, cultural intermediaries in contrast to the techie, computer science self image that many MT practitioners have. Clearly cultural and communication problems can arise from this. He was self critical and admitted that translators need to learn more about MT technology and not resist it like they did with TM, but also pointed out some foolish statements made by MT proponents that any average translator would see as unfortunate or stupid. Some examples, Jaap van der Meer’s statement about letting a thousand MT systems bloom. Some of you may realize that this is really close to something that Mao Zedong said to flush out dissidents and eventually execute them. The other screamer he listed (without reference to the source, other than it being a major MT vendor executive) was, “It’s quite a magical technology when you see it (MT) work” by Mark Tapling of SDL (He says this with a smile at the end of the video clip).  (Dude, it’s a data transformation!!! Really ?!? I wonder if Tapling thinks that spreadsheets adding numbers up and Powerpoint slide transitions are also magical? Perhaps from a 19th century frame of mind this is all pretty magical). Jost contrasted this to a Twitter conversation he had with @kvashee ;-) about features that would make MT more useful to translators. ( I assure you I did not instigate this comparison.)

Some things that he asked for: Give us (translators) challenging tasks, we want to participate in “making it better”. He stated that he wanted to see that his corrections had immediate and direct impact on the system. (One of the biggest complaints that translators have about MT is that the systems make the same error over and over again.) He asked that MT vendors talk to translators in “real” language and “admit what your tools can and cannot do”. He ended on a positive note by saying that we as humans tend to demonize the unknown (HIC SVNT DRACONES!) and invited the audience to enter into each others terra incognita, and put our myths behind us. 

While this event was a good and constructive start, I hope that the dialogue between translators and MT developers continues beyond this conference and produces real innovation and collaboration. One of the first subjects that needs elucidation and better definition is “post-editing”. It was clear in several very instructive presentations in the “Commercial User Track” that the concept of post-editing needs development and clarification. There were several very good presentations and we see many successful MT implementations being discussed on a regular basis now. Check out and type #AMTA2010 to see a cool Twitter summary of the conference. Chris Wendt of Microsoft and I were also voted to the AMTA board, representing commercial users and I thank all those who voted for me and hope to help drive our common interests and agenda.

There was an interesting demo of several post-editing tools that are available in the market currently. Lingotek showed their translator workbench, PAHO showed their Word Macro based post-editor which is perhaps the longest running and most widely used direct post-editing tool around. GTS showed a promising looking community management and basic editing environment for Wordpress blogs. These examples all suggest that the tools to make post-editing more interesting are going to continue to evolve, and that while these are wonderful examples the best is yet to come. We also see that increasingly community and collaboration are intertwined with post-editing and this connection to MT is likely to develop further, as new kinds of people are drawn into the translation process. AMTA will make much of the content from this conference available on their website and I am sure many will find it useful if they can actually find it. (They need a serious update to their website).

Unfortunately, I had to rush off to the Tekom conference in Germany  at the end of the first day and missed the rest of the conference, but I kept in touch via the GTS blog since the twitter stream died pretty quickly after I left. I noticed Jost mentioned in his blog that Laurie had accomplished what she had set out to do, years ago i.e. bring MT developers and translators closer together. However, while Laurie may be happy at this initial accomplishment, I would suggest that she stay around a while longer and make sure that we all set sail together with the wind at our backs. The journey together has just begun and we have many miles to go before we sleep.  

And this was tekom – a blur of meetings and some wonderful dinner conversations. 


Tuesday, October 26, 2010

Megatrends and Their Impact on Professional Translation

I have recently made public presentations to audiences of LSPs and localization professionals about broad trends affecting the world of professional translation. My perspective seemed to resonate and I thought it might be useful to put the core message in a blog entry and share it, to possibly get critical feedback or extend the discussion. In some ways I have touched upon these “megatrends” in earlier blog entries (Why MT Matters, The Data Deluge, Translation Technology, Innovation in Localization)  but it is useful to bring it all together in a single place.

I am aware that others are making similar points but I think this summary incorporates conversations I have heard around the web and is closer to the collective intelligence. The trends in brief are as follows:
  • There is an explosion in relevant content affecting global enterprises
  • Social Media and Social Networks are now increasingly in control of branding impressions
  • There is an increasing use of open innovation and community collaboration models in many businesses
  • Translation technology and automation are becoming increasingly important for speed and cost reasons
  • A rising Asia will change the priority of strategic languages away from the current FIGS dominance

The Content Explosion

Data Growth Trends
We are seeing exponential growth in the digital universe and much of it is very relevant to global enterprise concerns. It is important to understand that this is happening on a scale never before seen in the history of man. A lot of this content is related to facilitating global commerce so understanding this becomes highly relevant for the global enterprise.
Enterprise Content

The Impact of Social Networks and Customer Conversations

The carefully calculated marketing and corporate image control that global enterprises have been used to is also coming undone. Brand impressions are increasingly being formed by real customer conversations in social networks. We see that the world of marketing is undergoing a transformation and what used to be considered critical corporate messaging is increasingly viewed as “corporate-speak” and is often not trusted by the end-customers who matter the most. Jeremiah Owyang wrote a prescient blog entry three years ago where he predicted this shift and increasingly the corporate website is seen as a place where pro-corporate bullshit resides. More and more, the high value content is being created by customers and business partners and corporations have little editorial control of this content creation process.

UGC Impact

The Growing Importance of Open Innovation & Collaboration Models

We are also seeing that co-creation of products and services with customers can be a huge momentum builder. Facebook has shown that engaging users in translating interfaces can also help accelerate building the customer base in those languages. Their international growth has been closely linked to their crowdsourcing translation efforts even though crowdsourcing can be less predictable.  Dell actively solicits and nurtures active customer feedback in IdeaStorm to develop new products. Collaborating with customers and partners is emerging as a way to to engage customers, build loyalty and accelerate penetration in new markets. There are very few large companies that have figured out how to do this because open collaboration is a huge cultural change from the command and control mentality that most executives work with. In the B2B world this may also mean that customers prefer best-of-breed solutions to one-stop solutions. Diversification is generally considered a wise investment strategy and is increasingly understood to be a good way to build business infrastructure and avoid vendor lock-in even though it may also mean more integration work.

 The Rise of Asia

Western Europe and FIGS have dominated the professional translation world historically. While FIGS will remain important, in the future it looks like Brazilian Portuguese, Spanish will be the most important European languages and Chinese, Hindi, Indonesian and other Asian languages will become increasingly more important and strategic as global revenue generating translation investments. Many companies now are expanding their base of languages and the FIGS-CJK view is slowly receding.The graphic below is clear in it’s implications: Asia offers substantial commercial opportunity for those who make the localization investments. I have written previously about how Asia is a long-term opportunity for truly global companies. (I will update it shortly as the momentum keeps building.) There are many market potential studies that suggest Asia offers significant opportunity for IT infrastructure, mobile devices, luxury goods as well as new bottom of the pyramid (BOP) product opportunities.
Strategic LPs

What Does All This Mean?

So if we add all this up it shows that global enterprises are facing a content deluge with dynamic content coming from both internal and external sources and high volumes of this content is expected to be translated increasingly faster to have any value in competitive situations. Global enterprises that quickly identify high value content and make it multilingual will find that this can drive international revenues and that translation can be a strategic tool to building long-term competitive advantage.

Now, more than at any other time in history, speed and agility are decisive competitive advantages...David Meerman Scott

However this is a time of revolution, and the TEP (Translate-Edit-Proof) and SDL (Software and Documentation Localization) mindsets are not likely to be adequate to meet these new translation challenges. The old approach worked for static, low volume content but new thinking and new approaches are required to deal with the data deluge today. Automated translation is an absolute necessity in the new world but this is not the MT of yesteryear that many are still implementing and describing at localization conferences today.

Old Approach Cant Work
In the new world, data has to flow from content creation to consumption as seamlessly as possible, delivered to where it is needed at desired quality levels. This means that humans need to be part of the production process and are the key to producing the best quality. The future is about much better man-machine collaboration. The automated translation tools need to be learning and getting better all the time. They need to be responsive to skilled human linguistic steering and corrective feedback. They need to be focused on dynamic streams of content not just static, packaged translation projects like user documentation. They need to understand that with flowing content, upstream cleanup efforts will flow through the production line and make every downstream process easier and more efficient. They see the information cycle as a system, as organic and thus build the collaboration infrastructure to address the whole problem. They need to see MT systems improve with human steering, in weeks, not months or years. They need to be MT systems that can be trained and managed by skilled professionals to get you to “good enough” production quality “fast enough” to have a positive impact on your business, not the black box MT of yesteryear. If you look closely there are very few choices in reality. (Think Asia Online!)
Localization Trends

Have you noticed that Google and Bing Translate improve regularly since they shifted to statistical MT (SMT)? Does that tell you something? Why did we not see these improvements when these same companies were RbMT based?  On domain focused systems the progress and quality improvements are even more compelling. I have recently seen Asia Online systems compared to heavily customized RbMT “hybrid” systems, and it is clear that the yield to effort ratios with hybrid SMT approaches are clearly superior, (I am of course biased) but the evidence continues to mount. (BTW Moses is perhaps 5% of what is needed to accomplish this). In the future professionals will undertake to translate content streams that contain tens of millions of words that encapsulate important customer conversations. If translation is strategic make sure that you align with technology that can go the distance and that has demonstrated the ability to evolve.
New Requirements for MT
As more senior corporate executives realize that translation is strategic, and that translation technology properly used can generate substantial revenue in global markets, they will start to look at solving these new kinds of translation problems. Executive managers are unlikely to be excited by the possibility of getting user documentation done faster and cheaper. However, as they start to get their hands around the fact that customer conversations are important (many are struggling with this), and that they need to respond with speed and agility to build global customer relationships, we can expect to see a new kind of executive who will seek to make flowing content multilingual. They will care that real standards exist (not SDL versions) and will likely remove products that do not comply. Handling the flow efficiently will become the focus, and the most visionary localization managers may even have senior roles in making this happen. Speed and agility are key and customer engagement across the globe will require a real understanding of how to make these dynamic content focused translation systems work. There is a video of an extended presentation version of this blog available from the Localization Technology Roundtable seminar (starts at about 4’30” after introduction). (Yes, I need to lose weight or at least wear shirts that are not so tight).

We are living in a time of great change. In times of change there is often an opportunity for new paradigms and new leaders to launch, remember MSFT grabbing the desktop market away from IBM and Google grabbing away the web search market from MSFT?  New leaders with new visions are the change agents that make this happen, and they are often dismissed by the established status quo and “leaders” of the time. I do not see that new MT initiatives by Lionbridge and SDL really address these new needs. I think there is too much of the old view in their approach,  but I may be proven wrong. I am skeptical that the current “market leaders” have the vision and/or culture to drive this change and I think we will see somebody from outside the industry or smaller, more agile LSPs be the driving force of this coming change. For current leaders to be change agents often means cannibalizing current revenue streams and very few have succeeded in crossing that chasm. We shall see as this unfolds.

In revolution, the best of the new is incompatible with the best of the old. It’s about doing things a whole new way...Clay Shirky

Anyway these are truly interesting times, and it looks like things will get even more interesting. 

Monday, October 18, 2010

Networking and Learning at ELIA and the CNGL in Sunny Dublin

I was in Dublin last week at the ELIA Networking Days conference. In many ways it was less a conference, and much more a peer community coming together and sharing ideas and views of the world, much more so than most other professional translation industry “conferences”. I think it was one of the best LSP-issue focused events I have seen.  While some might say that I say this because I did have a prominent role there, (I was both a keynote speaker and ran a workshop on how LSPs could get started with MT), I assure you this is not why I say this.(Hear me now and believe me later – in German accent)

We live in an age, where increasingly marketing and corporate-speak is challenged, undermined and sometimes even seen as disingenuous and false. (Raise your hand if you trust and respect corporate press releases).  Today we see customer voices rise above the din of corporate messaging, and taking control of branding and corporate reputations with their own “authentic” discussions of actual customer experiences, while marketing departments look on haplessly. I think this phenomenon is happening on many fronts, including conferences in the localization industry. There are too many events in the L10N industry that seem formulaic, routine, repetitive and engineered based on the same old viewpoints. This, I think affects the ability of these events to really spark dialogue, excitement and generate vital learning experiences that make these conferences must-attend events. While these events remain useful for “face-time”, they often have little value for really engaging attendees at a professional level.

What makes for a great conference or professional industry event? To my mind: high quality content, interactive and engaged audiences in sessions that broaden one’s horizons, interesting people who continue the professional dialogue outside of the sessions and share learning experiences and of course a good location. And if you can offer all of this at a reasonable cost, even better. A great professional event is characterized by learning, the more intensive the learning experience, the better. The best ones leave you thinking for awhile after the event.  Intense learning rarely happens at really big events because it is hard to scale this. The ELIA Networking Days meeting (rather than conference) had many of the elements of good professional group meetings. The attendees had a lot of say in determining what was presented, the sessions were highly interactive and engaged, attendees all shared without concern for exposing personal weaknesses and it was clear that the broad feedback was genuinely positive and the biggest single complaint seemed to be that people wanted to be in two sessions that were running concurrently. Ultan (pictured below in traditional Irish garb) aka @localization has also written about his own early impressions of the event with more to follow soon in Multilingual magazine.

This event was also very organized in terms of Twitter coverage and attendees saw continuous live coverage through the event where several screens showed the Twitter feedback up to that point. Type “#ELIA” at the wall of silver to see one such live view. Opinions from tweeps outside the conference were shared with the session group if they were deemed pertinent or relevant. This virtual expansion is a key characteristic of the best events like TED and SXSW where one can see audience reactions in real time and the external virtual audience may actually be several times larger than the physical one.  This event also has some great candid photo shots taken by @AgaGonczarek like the one below. She also created the mood video that captures much of the feel of the conference.

Some of the sessions that I found most interesting – Sharon O’Brian’s overview of post-editing, @paraics review of the work CNGL is doing, @marylaplante’s perspective of changing enterprise needs and innovation. Ultan also did a great impromptu presentation on the huge value of Information Quality at Oracle (he really does not like descriptions of IQ with the word controlled or simplified in them).  I also really enjoyed the Bulls Eye Sales Pitch session which compared and contrasted the actual sales pitch used by several brave and courageous LSPs who voluntarily faced criticism from both the panel and the room on their presentations.(We all need to do more of this to learn faster).  Differentiation is a fundamental business challenge for language service providers and this session did a great job of raising awareness and providing real insights into different differentiation strategies. There is a very nice downloadable twitter archive here where you can see mine and others coverage of these sessions, but unfortunately all the hardcore tweeters tended to attend the same sessions, so some sessions did not have very much twitter coverage.

I also spent a day with several people at CNGL at Trinity College Dublin to better understand what they are working on and also had a session with graduate students that was a lot of fun, where I got to tell them my opinion on what broad language technology problems needed to be solved and what research they should do to help us (the world) make more rapid advances. The students were respectful (about  my ideas) and at one point informed me that while my suggestions made sense and could even possibly have an impact on important translation problems, they were unlikely to have merit as PhD thesis ideas.  Most people don’t realize that CNGL has major SMT research initiatives underway that would rank it amongst the largest MT research programs in the world. They are also doing some very cool and leading edge work on global customer support. In fact, I think one of their research initiatives has produced customer support related technology that could be the basis of a formidable offering in the market, if I had the money I would be very interested in trying to commercialize it. This technology solves a very important problem: Getting the customer to the information they want, by quickly matching queries with carefully filtered and highly relevant response information.

I also got to be on a panel of judges that ranked CNGL PhD student thesis presentations. The winners were focused on improving Patent Search, Developing Better Localization Data Standards and Managing Quality in Crowdsourcing. There were also some cool SMT ideas and ontology development ideas that looked promising. The highlight of the day had to be the very brief visit to see the Book of Kells and the Long Room Library. It is heartening to realize that men sat together a few hundred years ago, and said lets build a library, and lets make it so amazing that men (and women) in future will realize that knowledge, words and books can and should be approached with awe and reverence. This place truly is imbued by the hand and touch of civilized beings. A marvelous and magical place whose scale cannot really be appreciated in pictures, but here’s one anyway (please sir, pardon my use of it without permission).

Dublin-The Long Room Library Trinity College

All conferences have some wonderful conversations and some of these are really heartfelt and should be celebrated. Even though some of these conversations may be the one and only interaction I have with these people in my life, I agree with Carlos Castaneda who recommends (in The Active Side of Infinity), that these conversations should be chronicled as memorable moments in one’s life. I can recall a few of these conversations from this trip: several conversations with Ultan O’Broin (sometimes with Renato there as well), a conversation about differentiation, vision and distinction with the lovely Polish duo of @AgaGonczarek and Marta (who tried repeatedly to get me to say Wroclaw correctly), a conversation with the lovely and talented Sara Nicolini about purpose and passion and finally a dinner conversation with Reinhardt Schaler (and Paraic) about India and Hard Times in Ireland and elsewhere.

This was also a particularly intense event for many and varied professional conversations about how to get started with MT. I look forward to continuing these conversations.

ELIA Networking Days Dublin 2010 from Agnieszka Gonczarek on Vimeo.

And indeed it was actually sunny for those three networking days.

Friday, October 8, 2010

Highlights from the TAUS User Conference

Earlier this week, there were a 100+ people gathered in Portland,Oregon at the TAUS annual user conference. This included a large contingent of translation buyers, mostly from the IT space - (Intel, Oracle, EMC, Sun, Adobe, Cisco, Symantec, Sony),  a few LSPs with real MT experience and people from several MT and translation automation technology providers gathered to share information about what was working and what was not. Like any conference, some presentations were much more compelling and interesting than others, and I wanted to share some of the things that stood out for me. It was not always easy to tweet, since the connectivity faded in and out but a few of us did try to get some coverage out. CSA also has a nice blog entry on the event that focuses on the high level themes. I think that some of this will be made available as streaming video eventually.

If you are interested in the business use of machine translation this was a  useful event and it showed you many examples of successful use-cases and also get technology presentations from MT vendors in some depth, perhaps too much depth for some. There was a full day that was filled with MT related presentations from users, MT tool developers and from LSPs using MT. Some of the highlights:

Jaap stated that translation has become much more strategic and that global enterprises will need language strategies. He stated that he felt that there would not be any substantial breakthroughs in MT technology research in the foreseeable future. I actually agree with him on this, in the sense that the rate of improvement from pure MT technology research alone will decline, but I believe that we are only at the beginning of the improvements possible around better man-machine collaboration. It is my opinion that the many of the systems presented at the event are far from the best systems possible today with the technology and people available today.  Another way to say this is that I think the improvements from free online MT will slow down, but the systems coming from professionals collaborations like Asia Online has with LSP partners will rise rapidly in quality and show clearly that just data, computers and algorithms are not enough. I predict that collaboration with skilled, focused and informed professional linguistic feedback (LSPs) will drive improvements in MT quality faster than anything done in MT research labs. The two most interesting pure MT technology  presentations included an overview of morpho-syntactic SMT (a hybrid with 500M rules! from Kevin Knight at USC/ISI) and the overview of a “deep” hybrid RbMT approaches from ProMT where statistical selection is introduced all through the various stages of an enhanced RbMT transfer process. This is in contrast to the “shallow” hybrid model used by Systran which uses SMT concepts as a post-process to improve output fluency. All the RbMT vendors/users stressed the value of upfront terminology work to get better quality. For those of you who heard the presentation that Rustin Gibbs (mostly, who was also a star performer at the conference) and I did, will know that I am much more bullish on what a "vendor-LSP collaboration"  could accomplish over any technology-only approach.


It was good to see (finally!) that many, including Jaap admit that data quality and cleanliness do matter in addition to data volume.


Several people came up to me and mentioned that they had experienced that sometimes less is more, when dealing with the TDA data. TM cleaning, corpus linguistics analysis to better understand the data and assessing TM quality at a supplier level at least is now a recognized and real issue that is getting  more attention within TAUS. There were several presentations that mentioned data quality, “harmonizing terminology”, TM scrubbing and strategies to reduce the risk of data pollution when large amounts of TM are gathered together.  The TDA database now has 2.7 billion words and TAUS admitted that it has become more difficult to search and use and thus they are integrating the data into a Globalsight based repository to make the data more usable. They are hoping to add more open source components to further add value to the data and it’s ongoing use. Their ability to further refine this data and really standardize and normalize it in useful ways will define the TDA’s success as a data resource in future, in my opinion.

There was an effective and very interesting point counterpoint session (unlike the ones I have seen at GALA) between Keith Mills, SDL and Smith Yewell, Welocalize that positioned the “practical” legacy system view versus the open systems view of the world. It really did justice to both viewpoints in a common framework and thus provided insight to the careful observer. It was interesting to see that SDL used the word “practical” as many as 30 times while presenting the SDL view of the world. In brief, SDL claimed that they will spend $15M in 2010 on R&D to create a platform to link content creation more closely to language. He said that SDL does not make money on TMS because there is too little leverage due to too many one-off translation processes. He also said that SDL will not go open source but was “really into standards” and will create APIs to “let” customers integrate with other software infrastructure. 

Smith presented his view of a world in contrast to the “walled garden” of SDL that was interoperable, “open” and collaborative and involved multiple vendors. Keith responded they will “eventually” connect to other products and “are working on it,” with counters on how closed black boxes make it difficult to scale up and meet new customer requirements of translating fast flowing content from disparate sources. It was interesting to see Jaap characterize the debate saying that perhaps Smith was “too visionary” (a nice way of saying impractical?)  and that the SDL perspective is “realistic and practical”. I actually have captured the flow of the debate in my twitter stream. Keith also made good points about how MT must learn to deal with formatted data flows to be really usable but seemed to completely miss the growing urgency for language data to move in an out of SDL software systems. Smith also pointed out that MT is not revolutionary, rather it is just another tool that needs to be integrated into the right business processes to add value. I liked the debate because it accurately presented the two viewpoints in an authentic way and let you see the strengths and weaknesses of both perspectives. I, of course, have an open systems and collaboration bias but this session gave me some perspective on the value of the walled garden perspective as well. 

I had some back channel Twitter chatter with @paulfilkin and @ajt_sdl about what meaningful openness means. In my view SDL does NOT have it. My advice to SDL on this is to share API information like the TAUS API, that is published for self-service use and so any customer/member can connect to get data in and out of the TDA repository efficiently. Facilitate the work to let the data flow and give customers enough information (API, SDK) so that they can do this themselves without SDL permission and move language data to wherever it is needed easily e.g. TMS to TMS, TMS to TM, TMS to CMS, TMS to MT, TMS to Web. They should provide basic information access for free and customers should only be required to pay for this if they need engineering support. SDL needs to understand that language data can be useful outside of SDL tools and that they can make this easier by delivering real openness to both their customers and the overall market.  I have written about this previously. My advice and warning to customers – stay away from SDL until they do this or you will find yourself constantly wounded and tending to “integration friction”. 

One thing that I found really painful about the conference was a horrendously detailed continuing presentation (it refused to end) on the history of MT. I am not sure that there is anything to be learnt from such minutiae. For me this is clearly a case where history was horrifically boring  and did not teach any real lessons. It made me want to poke my eyes and see how much pain I could tolerate before crying out. I hope they never do it again and that the presentation and recordings are destroyed.

I also feel that the conference was way too focused on “internal” content. That is, content created by and within corporations. It is no surprise that SDL (which originally stood for Software and Documentation Localization) is so committed to the walled garden since in the legacy world, command and control has always been the culture, even within most global companies. In an age where social networks and customers sharing information with each other increasingly drives purchase behavior, this is a path to irrelevance or obscurity. I am not a big believer in any Global Content Value Chain (GCVS) that does not have a strong link to high value community, partner and customer created content. The future is about BOTH internal and external content. I think the TAUS community would be wise to wake up to this, to stay relevant and attract new and higher levels of corporate sponsorship. We should not forget that the larger goal of localization and translation efforts is to build strong relationships with global customers.  Conversations in social networks is how these relationships with brands and customer loyalty are being formed today. Localization will need to learn to connect into these customer conversations and add value on behalf of the company in these living, ever present conversations happening mostly outside the corporations content control initiatives. Richard Margetic from Dell said it quite clearly at Localization World Seattle: “Corporations will have to take their heads out of the sand and listen to their customers, we believe that the engagement of customers on Twitter is critical to success (and sales)” He also said “We had to teach our corporate culture that negative comments have value. It teaches you to improve your products” and I hope that the TAUS board will take my comments in that spirit.

There was surprisingly little discussion about data sharing and the focus had moved to much more pragmatic issues like data quality, standards, categorization and meaningful identification and extraction from the big mother database but there were few details on this other than some basic demos of how Globalsight and the search engine worked. If you have ever seen a database search and lookup demo you will know that it is seriously underwhelming. The TDA is great if you are looking for Europarl or IT data but is pretty thin if you want data in other domains. A lot of this data is available elsewhere with no strings attached. The TDA needs to give people a reason to come to their site for this data.The value of TDA in future, IMO, is going to depend on how they add value to the data, normalize it, clean it and categorize it.  This to my mind is the real value creation opportunity.  They also need to find ways to attract more people who are not from the localization world but have an interest in large scale translation. Additionally I believe they should expand the supervisory board beyond IT company people to increase the possibility of being a real force of change. When everybody has the same background, groupthink is inevitable. (Aren't they forced to watch those workforce diversity presentations that I was forced to when I was at EMC?) I suspect that open innovation / collaboration models will be more likely to come up with ways to add value to data resources and new ways to share data and hopefully TDA finds a way to engage with people like Meedan, TRF, Yeeyan and other information poverty focused initiatives. While there was a focus on innovation and collaboration I got the feeling that it was too focused on getting more open source tools and nothing else. I think open innovation needs more diversity in opinion and ideas than we had in the room. What is open innovation?  Henry Chesbrough; “Open innovation is a paradigm that assumes that firms can and should use external ideas as well as internal ideas, and internal and external paths to market, as the firms look to advance their technology”

Here is a presentation on Data Is the New Oil. I think the point they make is very useful to TDA: Refine, Refine, Refine to create value. Value is getting the right data to the right people at the right time in the right format. I think it is worth finding out what that actually means in terms of deliverables. Make it easy to slice and dice and package and wrap. 

Some of you who know me, know that I am a Frank Zappa fan and Frank (well ahead of his time as usual) said it well in 1979 on the Joe’s Garage album: (I would add “Data is not information” for this blog and recommend you look at more Zappa quotes)

"Information is not knowledge.Knowledge is not wisdom.

Wisdom is not truth.Truth is not beauty.

Beauty is not love, Love is not music.

Music is THE BEST" 

Finally, one thing about conferences is that there are always a few conversations that stand out. While there were many professional conversations of substance, for me, the ones that stand out the most are conversations that come from the heart. I was fortunate to have four such conversations at this event. One with Elia Yuste of Pangeanic about building trust, another with Alon Lavie of AMTA about where innovation and the real exciting MT opportunities will come from, a third with Smith of Welocalize about building openness with substance, integrity & honor and finally a conversation with Jessica Roland about life, finding purpose and family. I thank them all for bringing out the best in me.

As I prepare for my ELIA keynote, I am compelled to share one more quote that I think is worth pondering:

In revolution, the best of the new is incompatible with the best of the old. It’s about doing things a whole new way… Clay Shirky

Friday, October 1, 2010

Inspiration from TED – Part II

I recently wrote about and listed some of my favorite TED presentations. I mentioned that my initial list was just the tip of the iceberg and there were many more presentations that are worth a look. As TED grows as a brand, a place where influential voices come forth and share observations and their vision, it is something worth monitoring on an ongoing basis. This is where the most exciting new ideas break out. As TED expands across the globe we are seeing more international perspectives. So, now interesting viewpoints from Asia and Europe are becoming more common in the TED portfolio. Here are a few more selections that I found interesting and that I think also have some relevance to the world of translation.

Clay Shirky is a prominent thinker on the social and economic effects of Internet technologies and he often speaks in favor of crowdsourcing and open collaboration. In fact he wrote a book called Here Comes Everybody. In this talk he describes the basis for his bullishness about something he calls “cognitive surplus”. I think he is one of the most insightful voices on the subject of crowdsourcing and the new collaborative initiatives that are forming around internet access and technology. I recommend this and many other presentations he has scattered across the web.

This Simon Sinek talk is a recommendation from @fabcid who summarized this talk as:” Brilliant and inspiring. A must-see.” on how great leaders inspire action. Simon Sinek has a simple but powerful model for inspirational leadership all starting with a golden circle and the question "Why?"

This is a talk by Tom Wujec who studies how we share and absorb information. He's an innovative practitioner of business visualization -- using design and technology to help groups solve problems and understand ideas. This shows how childlike attitudes are so critical to successful collaboration and how so much of our formal business training actually undermines our ability to collaborate.

I first heard about “soft power” from Joseph Nye. It is something good leaders naturally have, it is the ability to shape the preferences of others. Simply put, in behavioral terms, soft power is attractive power or the power to attract rather than command a following. This is a great talk by Shashi Tharoor on India’s rise as a power. He argues that “soft" power is what makes India formidable in the long run. This is its ability to share its culture with the world through food, music, technology, Bollywood. He argues that in the long run it's not the size of the army that matters as much as a country's ability to influence the world's hearts and minds. One striking example he gives is that today, Indian restaurants in Britain employ more people than the coal mining, ship building and iron and steel industries combined.

One of the passions in my life is music and I am confident that at some point in my life I will be engaged with sound and music in a substantial way so I have also been keeping an eye on the TED stuff related to music and sound. In this talk Robert Gupta talks about his relationship with a schizophrenic that was also the subject of a movie with Jamie Foxx called The Soloist. Music is medicine, music changes us and for some music is sanity.

Finally this is another TED-like group that illustrates speeches in a very useful and interesting way that I think facilitates understanding and engagement. The talk by Daniel Pink is based on his research on motivation. This shows that human beings have an innate inner drive to be autonomous, self-determined, and connected to one another. I think it also shows why standards like EN15038 are flawed – they kill any creativity.