Pages

Sunday, December 19, 2010

Using Translation as a Force to Address Information Poverty: AGIS 2010

I have the good fortune to reflect and report on the AGIS 2010 conference as we approach “the holiday season,” which is a time of reflection for many in the world. A time of goodwill and at least temporary good deeds for some. The conference was held in India, which can be a challenge because the basic infrastructure is still primitive, but the event went off well with very few glitches and I think AGIS is slowly building momentum.

AGIS stands for Action for Global Information Sharing, and is focused on conducting a resolute crusade against Information Poverty since its inception. The overall tone and tenor of this conference is very different from the typical conference in the world of enterprise localization (LocWorld, GALA, LISA). The focus is on making all kinds of knowledge and information accessible in places where it has never been available before, not just to sell products. There is clear evidence shared by many speakers, that shows that access to information creates the conditions for economic prosperity or perhaps even actually drives it. In some parts of the world localization is all about reducing information poverty and improving the human condition. Reinhard has provided a summary of the highlights of the event in his blog. There was also coverage here. And for those who ask why do need yet another conference in the industry, Reinhard explains below:
You might be asking yourself, “Why AGIS, why YALC (yet another localisation conference)? What makes AGIS so different?” Well, first of all, it is not owned by any particular organisation, it is not run for profit, and it is (almost) free to attend. Then, it takes place where people need localisation, not where people are rich enough to pay for it. Nothing is sold, nothing is bought at AGIS. And last but not least, AGIS attendees have a social agenda, not (just) a commercial one.
67119_476962533547_141976673547_5546839_7198894_n
The highlights that can be also be found in the twitter stream on my Friendfeed (Scroll back to newer items to see the chronological sequence. I don’t know why Twitter has already made much of the data unavailable).  

In the keynote, Dr. Vijay Bhatkar (a digital visionary of India) pointed out how globalization and localization are tightly linked and how NLP, MT and language technologies are only just beginning their evolutionary march. He pointed out how Japan’s hopes for world dominance were stymied by linguistic issues and that access to knowledge and information in the 21st century will be key to building prosperity as an increasing part of the GDP of many nations will come from knowledge services. This is already true for India. He informed the Indians in the room, that India can not consider itself an IT power when 350 million people are illiterate and urged the community to preserve the Indian languages while continuing the push forward with English education. He also pointed out that both Telephone and TV are mostly language neutral but information cannot be, and localization is critical to broad access. 

Reinhard Schaler and the Rosetta Foundation are leading an initiative to build a platform to facilitate self configurable, distributed and shared data based global localization initiatives. CNGL and University of Limerick students provided overviews and demonstrations of these tools. Reinhard highlighted that each day 24,000 die children because of lack of access to basic healthcare information. That is 1 every 3 seconds! These deaths could be avoided if information was available more easily. This appears to be a primary motivator and raison d'etre for the Rosetta Foundation.

Ms. Swaran Lata painted a clear picture of the amazing complexity of the Indian linguistic landscape. 20+ major languages with some states having 3-4 languages and multiple scripts. The CDAC organization is attempting to solve the linguistic computing issues to ensure that Indian languages gain a stronger digital presence and are preserved. As Prof. Bhatkar asked: "Can you really say you know Hindi if you cannot  use it on a computer? This is key in the information age."  Ms. Lata described initiatives that focused on the digital education of youth, which interestingly also resulted in the knowledge being passed on to illiterate parents and grandparents. She talked also about initiatives to reach out to the “other side of India” to ensure that illiterate people are not left behind. As Indian consumers become more powerful, Indian languages are critical to reaching their purchasing power.

I spoke about how the Asia Online vision is finally coming to fruition, when we start rolling out a Thai Wikipedia comprising of translations of 3.5 million articles starting in January 2011. When all these articles are up and ready, Thailand will have the second largest Wikipedia in the world after the English one.  This is a huge boost from the current 60,000 article Thai Wikipedia, many which are just barely more than stubs. In contrast, the index alone for just the article titles in the English Wikipedia are in excess of 600,000 pages! The Asia Online project is an initiative that directly addresses information poverty. Shockingly it was also uncovered that the Hindi Wikipedia only has about 50,000 articles for a population of almost 400 million people! This means that a child that does not speak English is deprived of basic educational information access and has a fraction of the content available to an English speaking child.
image
Some other interesting information from the conference:
  • Ravi Gupta pointed out there are 62,000 newspapers in India and 92% of these are not in English
  • Subtitles do not work well in India because of literacy issues but they can also be a means of building literacy
  • There are no English TV channels in the Top 100 TV channels in India but English speaking consumers are the wealthiest consumers
  • Ravi Kumar, The President of  the Indian Translators Association made an impassioned plea asking that buyers and the community at large respect translators as professionals
  • The CNGL team showed various elements of the open SOLAS platform they are making available to anybody who needs it
  • Mahesh Kulkarni’s wonderful presentation on standards which he called traffic rules that ease both user and creator experience. He has a much more holistic and systematic view of standards than we see from the feeble standards initiatives in the traditional localization industry but he too, expressed the difficulties of getting good standards in place.
  • He also pointed that that there are 670 million mobile phones in India and asked is this the end of the internet as we know it? 
 Mahesh Kulkarni and Raimond Doctor are a joy to behold; passionate, knowledgeable and driven in spite of having to deal with Indian governmental bureaucracy as part of their daily lives. Raimond is perhaps the most erudite and knowledgeable person I have met on comparative linguistics. He shares his deep knowledge and insight with a verve that draws you right into his delight for language. I hope that CDAC realize what treasures these men are, and gives them room and resources to execute on their vision and passion.

Reinhard also pointed out that the non profit world was substantially larger in terms of market potential and actual localization activities than the Fortune 500 market. Non-profit does not mean no payment, no recognition and jobs, it is in fact bigger than the energy sector in US.  

Take a look at this TED video to see how information access can change lives and empower people to learn, take control and transform their own lives.
If there is a revolution coming in translation – I think one is much more likely to see the first signs of the revolution at a conference like AGIS, rather than at more mainstream localization conferences. We see all the key elements lining up here: people focused on large scale collaboration infrastructure, community and crowdsourcing management, massive translation automation and standards and the most important ingredient of all: PASSION. We see people at this conference who are driven by a passion to change the world. We see people who are not making a lot of money but are still working long and diligent hours. We see people undertaking translation projects that will involve hundreds of millions of words on a routine basis. We see technology, collaboration tools & infrastructure and community coming together in ways that just does not happen at traditional professional translation events.  We see people who want to make an impact on the human condition. We hear and see people talking about nation building and the human right to knowledge. This kind of talk gets me all warm inside and I think this is what we all had in common, a “higher” sense of purpose and mission which does not equate to a stance of moral superiority as some might think. Many of the people here have a soul satisfying answer to the question: Why do you do what you do? Isn't that enough?

My interest in automated translation has always been related to the potential impact this technology has on improving information access and thus improving human lives across the world and also potentially improving the quality and depth of communication between linguistic groups, cultures and nations. One step on the way to Pacem in Terris? Foolish and idealistic perhaps, but we need to dare to dream first before we can actually make it happen. As a teenager, a wise man told me once, “You are the world and the world is you” and I have explored that statement ever since,  holding it close to heart as a seminal influence in my life.

Join and support the Rosetta Foundation and help make this into a movement that cannot be stopped. If you have the ability to influence a major corporate entity to get involved and support this please do so now, and join Reinhard as he forges and builds this new path to change the world for the better.
And as if that were not enough, I even had a brief meeting with former Indian Prime Minister I.K. Gujral, who even in his nineties has a stature, grace and humility that is disarming. He was impressed by the Thai Wikipedia project I am involved with, and said it would be wonderful for India to do the same in Hindi. And thanks to my friend Vishal I also got to meet several industry leaders of emerging India who seek to build transparency and a relatively corruption free government.


I was also greatly heartened to see the corruption establishment take a serious blow when Minister Raja was exposed for taking obscenely huge bribes, in excess of $40 billion I believe. What makes corruption in India especially horrific is the complete lack of remorse and shame that these public officials have. India is on the move but still has far to go as the culture of corruption is everywhere you turn, and will not die easily. One of the other benefits of free flowing information is that it also makes this kind of self dealing and abuse of trust harder to maintain. Information poverty is also an enabler and friend of corrupt officials and thus this is yet another reason to address this issue. 

Happy Holidays to you all and I hope that you explore and find "goodness" in your life. And here is the link to holiday greetings in many languages.


Tuesday, November 9, 2010

The Machine Translation Community Building Bridges with Translators

The American Machine Translation Association (AMTA) recently held their annual conference in close proximity with the ATA in an attempt to build bridges and foster a growing dialogue between these two communities. When I entered the world of MT (I have always preferred the term automated translation) I had the good fortune to work with Laurie Gerber at Language Weaver who encouraged engagement with translators. While her voice was not heard there, she has always stayed true to this vision and she was instrumental in influencing me to also reach out to the world of professional translators as a core business strategy. She has long been a clear voice encouraging the broad MT community to reach out to translators and she was visible in Denver last week making sure that ATA guests were engaged and making all the right connections or just having a good time.

It is clear to me that the path to better quality MT, that really does fulfill the promise of sharing information, knowledge more freely in the world can only come from a close, cooperative and collaborative relationship with professional translators. 

The conference began with a keynote from Nicholas Hartmann who is the current ATA President and also a past technical marketing translator. He gave, what I thought was an articulate, considered and clear perspective of the translator vis-à-vis translation technology and MT while pointing to some directions for real collaboration in future. I thought it would be valuable to restate what I heard, as there were several key messages for the MT community. A published paper version of his speech is also available on the AMTA website (but it is a really hard to get to these resources as the unique URL is not easily displayed.)
 
The ATA has 11,000 members, of whom 70% are freelancers and Nick had carefully prepared to be their voice, expressing their concerns and needs in this forum. (Here is the twitter stream). He stated that the bad blood with translators was originally created with the historical overstatement of MT capabilities in the 60’s where MT was expected to replace translators: FAHQT (Have you noticed that this sounds a lot like f**&ked?). He noted that many translators do in fact use some form of translation technology today even though they find the future vision of being post-editors at “burger flipping wages” abhorrent. He gave some examples of human translations that went beyond the literal, to show how only a human could make the non-literal interpretations to correctly translate some example phrases. The examples proved that even a “perfect” literal translation can be nonsense at times and asked if the future of MT is as T.S. Eliot says:
“That is not what I meant at all. That is not it, at all”
Some additional points he made included:
  • Translators have a very different view of quality which is linked to their code of ethics to render the source material accurately
  • MT makes sense where something is better than nothing
  • MT is really only  “a probability distribution over strings of letters and sounds” (especially SMT) (part of a quote from Martin Kay in which he specifically cautioned the MT community NOT to consider language so simplistically)
  • Translators want it to be acknowledged that their work is critical to feeding and improving SMT systems with HT corpus
  • MT should be matched to task and purpose and could be unfortunate in the hands of the wrong people e.g. the infamous Welsh street sign
  • SMT that is often built using flawed TM and thus one could hardly be surprised at some of the results and in this case past performance will be a predictor of future performance
  • MT must be edited and checked to avoid serious errors
  • Post-editing has come to mean cleaning up really bad quality MT output at very low wages even when everybody doing it understands that they could have done it faster without the MT
  • The data pollution of some SMT systems perpetuates and is difficult if not impossible to remove
He then went on to answer the following question. So what do translators want and need?
 
“We want to work together constructively. We want technology that we can use. Machine assisted translation does make sense to us but we do not want tools that make our jobs harder.” Translators want to have a hand in making the tools but want the dialogue to be realistic. They also do not want a role of PEMT drudgery and asked for technology that assists translators to be more productive.  He ended his talk saying that he had enjoyed meeting many in the MT community and he hoped that the dialogue would continue as, “We are all in the same business.”

I did not stay long enough to really get the reaction of the MT crowd as I rushed off to tekom in Germany that same evening. Of course there was one somewhat hostile question immediately, but I think many MT users want to know how to work with translators and as you will see from my previous blog entries that I am a big believer in this rapprochement.
Jost

Nick was followed by Jost Zetzsche who continued on the theme of building bridges and improved communication and he pointed out how translators have a self-perception of being bridge builders, language lovers, artists, cultural intermediaries in contrast to the techie, computer science self image that many MT practitioners have. Clearly cultural and communication problems can arise from this. He was self critical and admitted that translators need to learn more about MT technology and not resist it like they did with TM, but also pointed out some foolish statements made by MT proponents that any average translator would see as unfortunate or stupid. Some examples, Jaap van der Meer’s statement about letting a thousand MT systems bloom. Some of you may realize that this is really close to something that Mao Zedong said to flush out dissidents and eventually execute them. The other screamer he listed (without reference to the source, other than it being a major MT vendor executive) was, “It’s quite a magical technology when you see it (MT) work” by Mark Tapling of SDL (He says this with a smile at the end of the video clip).  (Dude, it’s a data transformation!!! Really ?!? I wonder if Tapling thinks that spreadsheets adding numbers up and Powerpoint slide transitions are also magical? Perhaps from a 19th century frame of mind this is all pretty magical). Jost contrasted this to a Twitter conversation he had with @kvashee ;-) about features that would make MT more useful to translators. ( I assure you I did not instigate this comparison.)

Some things that he asked for: Give us (translators) challenging tasks, we want to participate in “making it better”. He stated that he wanted to see that his corrections had immediate and direct impact on the system. (One of the biggest complaints that translators have about MT is that the systems make the same error over and over again.) He asked that MT vendors talk to translators in “real” language and “admit what your tools can and cannot do”. He ended on a positive note by saying that we as humans tend to demonize the unknown (HIC SVNT DRACONES!) and invited the audience to enter into each others terra incognita, and put our myths behind us. 

While this event was a good and constructive start, I hope that the dialogue between translators and MT developers continues beyond this conference and produces real innovation and collaboration. One of the first subjects that needs elucidation and better definition is “post-editing”. It was clear in several very instructive presentations in the “Commercial User Track” that the concept of post-editing needs development and clarification. There were several very good presentations and we see many successful MT implementations being discussed on a regular basis now. Check out http://www.wallofsilver.net/ and type #AMTA2010 to see a cool Twitter summary of the conference. Chris Wendt of Microsoft and I were also voted to the AMTA board, representing commercial users and I thank all those who voted for me and hope to help drive our common interests and agenda.

There was an interesting demo of several post-editing tools that are available in the market currently. Lingotek showed their translator workbench, PAHO showed their Word Macro based post-editor which is perhaps the longest running and most widely used direct post-editing tool around. GTS showed a promising looking community management and basic editing environment for Wordpress blogs. These examples all suggest that the tools to make post-editing more interesting are going to continue to evolve, and that while these are wonderful examples the best is yet to come. We also see that increasingly community and collaboration are intertwined with post-editing and this connection to MT is likely to develop further, as new kinds of people are drawn into the translation process. AMTA will make much of the content from this conference available on their website and I am sure many will find it useful if they can actually find it. (They need a serious update to their website).

Unfortunately, I had to rush off to the Tekom conference in Germany  at the end of the first day and missed the rest of the conference, but I kept in touch via the GTS blog since the twitter stream died pretty quickly after I left. I noticed Jost mentioned in his blog that Laurie had accomplished what she had set out to do, years ago i.e. bring MT developers and translators closer together. However, while Laurie may be happy at this initial accomplishment, I would suggest that she stay around a while longer and make sure that we all set sail together with the wind at our backs. The journey together has just begun and we have many miles to go before we sleep.  

And this was tekom – a blur of meetings and some wonderful dinner conversations. 

tekomacross




Tuesday, October 26, 2010

Megatrends and Their Impact on Professional Translation

I have recently made public presentations to audiences of LSPs and localization professionals about broad trends affecting the world of professional translation. My perspective seemed to resonate and I thought it might be useful to put the core message in a blog entry and share it, to possibly get critical feedback or extend the discussion. In some ways I have touched upon these “megatrends” in earlier blog entries (Why MT Matters, The Data Deluge, Translation Technology, Innovation in Localization)  but it is useful to bring it all together in a single place.

I am aware that others are making similar points but I think this summary incorporates conversations I have heard around the web and is closer to the collective intelligence. The trends in brief are as follows:
  • There is an explosion in relevant content affecting global enterprises
  • Social Media and Social Networks are now increasingly in control of branding impressions
  • There is an increasing use of open innovation and community collaboration models in many businesses
  • Translation technology and automation are becoming increasingly important for speed and cost reasons
  • A rising Asia will change the priority of strategic languages away from the current FIGS dominance

The Content Explosion

Data Growth Trends
We are seeing exponential growth in the digital universe and much of it is very relevant to global enterprise concerns. It is important to understand that this is happening on a scale never before seen in the history of man. A lot of this content is related to facilitating global commerce so understanding this becomes highly relevant for the global enterprise.
Enterprise Content

The Impact of Social Networks and Customer Conversations

The carefully calculated marketing and corporate image control that global enterprises have been used to is also coming undone. Brand impressions are increasingly being formed by real customer conversations in social networks. We see that the world of marketing is undergoing a transformation and what used to be considered critical corporate messaging is increasingly viewed as “corporate-speak” and is often not trusted by the end-customers who matter the most. Jeremiah Owyang wrote a prescient blog entry three years ago where he predicted this shift and increasingly the corporate website is seen as a place where pro-corporate bullshit resides. More and more, the high value content is being created by customers and business partners and corporations have little editorial control of this content creation process.

UGC Impact

The Growing Importance of Open Innovation & Collaboration Models

We are also seeing that co-creation of products and services with customers can be a huge momentum builder. Facebook has shown that engaging users in translating interfaces can also help accelerate building the customer base in those languages. Their international growth has been closely linked to their crowdsourcing translation efforts even though crowdsourcing can be less predictable.  Dell actively solicits and nurtures active customer feedback in IdeaStorm to develop new products. Collaborating with customers and partners is emerging as a way to to engage customers, build loyalty and accelerate penetration in new markets. There are very few large companies that have figured out how to do this because open collaboration is a huge cultural change from the command and control mentality that most executives work with. In the B2B world this may also mean that customers prefer best-of-breed solutions to one-stop solutions. Diversification is generally considered a wise investment strategy and is increasingly understood to be a good way to build business infrastructure and avoid vendor lock-in even though it may also mean more integration work.

 The Rise of Asia

Western Europe and FIGS have dominated the professional translation world historically. While FIGS will remain important, in the future it looks like Brazilian Portuguese, Spanish will be the most important European languages and Chinese, Hindi, Indonesian and other Asian languages will become increasingly more important and strategic as global revenue generating translation investments. Many companies now are expanding their base of languages and the FIGS-CJK view is slowly receding.The graphic below is clear in it’s implications: Asia offers substantial commercial opportunity for those who make the localization investments. I have written previously about how Asia is a long-term opportunity for truly global companies. (I will update it shortly as the momentum keeps building.) There are many market potential studies that suggest Asia offers significant opportunity for IT infrastructure, mobile devices, luxury goods as well as new bottom of the pyramid (BOP) product opportunities.
Strategic LPs

What Does All This Mean?

So if we add all this up it shows that global enterprises are facing a content deluge with dynamic content coming from both internal and external sources and high volumes of this content is expected to be translated increasingly faster to have any value in competitive situations. Global enterprises that quickly identify high value content and make it multilingual will find that this can drive international revenues and that translation can be a strategic tool to building long-term competitive advantage.

Now, more than at any other time in history, speed and agility are decisive competitive advantages...David Meerman Scott

However this is a time of revolution, and the TEP (Translate-Edit-Proof) and SDL (Software and Documentation Localization) mindsets are not likely to be adequate to meet these new translation challenges. The old approach worked for static, low volume content but new thinking and new approaches are required to deal with the data deluge today. Automated translation is an absolute necessity in the new world but this is not the MT of yesteryear that many are still implementing and describing at localization conferences today.

Old Approach Cant Work
In the new world, data has to flow from content creation to consumption as seamlessly as possible, delivered to where it is needed at desired quality levels. This means that humans need to be part of the production process and are the key to producing the best quality. The future is about much better man-machine collaboration. The automated translation tools need to be learning and getting better all the time. They need to be responsive to skilled human linguistic steering and corrective feedback. They need to be focused on dynamic streams of content not just static, packaged translation projects like user documentation. They need to understand that with flowing content, upstream cleanup efforts will flow through the production line and make every downstream process easier and more efficient. They see the information cycle as a system, as organic and thus build the collaboration infrastructure to address the whole problem. They need to see MT systems improve with human steering, in weeks, not months or years. They need to be MT systems that can be trained and managed by skilled professionals to get you to “good enough” production quality “fast enough” to have a positive impact on your business, not the black box MT of yesteryear. If you look closely there are very few choices in reality. (Think Asia Online!)
Localization Trends

Have you noticed that Google and Bing Translate improve regularly since they shifted to statistical MT (SMT)? Does that tell you something? Why did we not see these improvements when these same companies were RbMT based?  On domain focused systems the progress and quality improvements are even more compelling. I have recently seen Asia Online systems compared to heavily customized RbMT “hybrid” systems, and it is clear that the yield to effort ratios with hybrid SMT approaches are clearly superior, (I am of course biased) but the evidence continues to mount. (BTW Moses is perhaps 5% of what is needed to accomplish this). In the future professionals will undertake to translate content streams that contain tens of millions of words that encapsulate important customer conversations. If translation is strategic make sure that you align with technology that can go the distance and that has demonstrated the ability to evolve.
New Requirements for MT
As more senior corporate executives realize that translation is strategic, and that translation technology properly used can generate substantial revenue in global markets, they will start to look at solving these new kinds of translation problems. Executive managers are unlikely to be excited by the possibility of getting user documentation done faster and cheaper. However, as they start to get their hands around the fact that customer conversations are important (many are struggling with this), and that they need to respond with speed and agility to build global customer relationships, we can expect to see a new kind of executive who will seek to make flowing content multilingual. They will care that real standards exist (not SDL versions) and will likely remove products that do not comply. Handling the flow efficiently will become the focus, and the most visionary localization managers may even have senior roles in making this happen. Speed and agility are key and customer engagement across the globe will require a real understanding of how to make these dynamic content focused translation systems work. There is a video of an extended presentation version of this blog available from the Localization Technology Roundtable seminar (starts at about 4’30” after introduction). (Yes, I need to lose weight or at least wear shirts that are not so tight).

We are living in a time of great change. In times of change there is often an opportunity for new paradigms and new leaders to launch, remember MSFT grabbing the desktop market away from IBM and Google grabbing away the web search market from MSFT?  New leaders with new visions are the change agents that make this happen, and they are often dismissed by the established status quo and “leaders” of the time. I do not see that new MT initiatives by Lionbridge and SDL really address these new needs. I think there is too much of the old view in their approach,  but I may be proven wrong. I am skeptical that the current “market leaders” have the vision and/or culture to drive this change and I think we will see somebody from outside the industry or smaller, more agile LSPs be the driving force of this coming change. For current leaders to be change agents often means cannibalizing current revenue streams and very few have succeeded in crossing that chasm. We shall see as this unfolds.

In revolution, the best of the new is incompatible with the best of the old. It’s about doing things a whole new way...Clay Shirky

Anyway these are truly interesting times, and it looks like things will get even more interesting.