Pages

Showing posts with label crowdsourcing. Show all posts
Showing posts with label crowdsourcing. Show all posts

Tuesday, September 27, 2011

The Growing Interest & Concern About the Future of Professional Translation

I have noticed of late that every conference has a session or two that focuses on the future. Probably because many sense that change is in the air. Some of you may also have noticed that the protest from some quarters has grown more strident or even outright rude, to some of the ideas presented at these future outlook sessions. The most vocal protests seem to be directed at predictions about the increasing use of machine translation, anything about “good enough” quality and the process and production process changes necessary to deal with the increasing translation volume. (There are still some who think that the data deluge is a myth). 

Some feel personally threatened by those who speak on these subjects and rush to kill or at least stab the messenger. I think they miss the point that what is happening in translation, is just part of a larger upheaval in the way global enterprises are interacting with customers. The forces causing change in translation are also creating upheaval in marketing, operations and product development departments as many analysts have remarked for some time now. The discussion in the professional translation blogosphere is polarized enough (translators vs. technology advocates) that dialogue is difficult, but hopefully we all continue to speak with increasing clarity, so that the polemic subsides. The truth is that none of us really knows the definite future, but that should not stop us from making educated (or even wild) guesses at where current trends may lead. (I highly recommend you skim The Cluetrain Manifesto to get a sense for the broader forces at play.)
Brian Solis has a new book coming out that describes the overall change quite succinctly. The End of Business As Usual (his new book) explores each layer of the complex consumer revolution that is changing the future of business, media, and culture. As consumers further connect with one another, a vast and efficient information network takes shape and begins to steer experiences, decisions, and markets. It is nothing short of disruptive.
I was watching the Twitter feed from two conferences last week (LRC XVI in Ireland and Translation Forum in Russia)  and I thought it would be interesting to summarize and highlight some of the tweets as they pertain to this changing world and perhaps provide more clarity about the trends from a variety of voices and perspectives. The LRC conference had several speakers from large IT companies who talked about their specific experience, as well as technology vendor and LSP presentations. For those who are not aware, CSA research identifies IT as one of the single largest sectors buying professional translation services. The chart below shows the sectors with the largest share of global business. This chart is also probably a good way to understand where these changes are being felt most strongly.
image

Here are some Twitter highlights from LRC on the increasing volume of translation, changing content, improving MT and changing translation production needs. I would recommend that you check out @therosettafound for the most complete Twitter trail. I have made minor edits to provide context and clarify abbreviations and have attempted to provide some basic organization to the tweet trail to make it somewhat readable.
image
@CNGL Changing content consumption and creation models require new translation and localisation models – (according to) @RobVandenberg
@TheRosettaFound We are all authors, the enterprise is going social - implications for localisation?
@ArleLommel Quality even worse than Rob Vandenberg says: we have no real idea what it is/how to measure, especially in terms of customer impact
Issue is NOT MT vs. human translation (HT). It's becoming MT AND HT. Creates new opportunities for domain experts.
Dion Wiggins. LSPs not using MT will put themselves out of business? Prediction: yes in four/five years
CNGL says 25% of translators/linguists use MT. I wonder how many use it but say they don't use it due to (negative) perception (with peers)?
Waiting for translation technology equivalent of iPhone: something that transforms what we do in ways we can't yet imagine.

Tweets from Jason Rickhard’s presentation on Collaborative Translation (i.e. Crowdsourcing) and IT go Social.
@TheRosettaFound Jason of Symantec giving the enterprise perspective, added 15-20 languages to small but popular product, built tech to support this. Not just linguistic but also legal, organizational issues to be resolved in collaborative, paid-for product.
Is collaborative translation bad & not-timely? #lrcconf Not so, a lot of translators = involved users of the content/product they translate.
Review process is different in collaborative translation. Done by voting, not by editors
The smaller the language gets, the more motivated volunteer translators are and the better collaborative translation works.
Is volunteering something for people who don't have to worry that their day-to-day basics are covered?
Does collaborative translation and collaboration mean that content owners "give up the illusion of control" over their content?
Enterprises do collaborative translation for languages they would/could not cover otherwise - true, for both profit and non-profits
Collaborative/Community will not replace existing service providers but open up more content for more languages
Language Service Providers could play an important role in community translation by building, supporting, moderating communities
It's not correct to say Community Translation = bad; Professional Translation = good
Microsoft appoints moderators with a passion for the project/language for community localization
>1,200 users translated Symantec products into 35 languages
If >1,200 were needed to translate 2 small-ish products, how can millions of translators translating 1 ZB be 'managed'?
@ArleLommel Symantec research: Community involvement in support often leads to ~25% reduction in support spend
“Super users” are what make communities scalable. Key is to identify/cultivate them early in the process
Jason Rickard: Dell is a good example of using Facebook for support. One of few companies with real metrics and insight in this area.
Jason Rickard: Symantec has really cool/systemic/well-thought ways to support community


@TheRosettaFound 21st generation localisation is about the user, about user-generated content - Ellen Langer: Give up the Illusion of Control
@ArleLommel Illusion of control? You mean we can have even less control that we have now? That's a scary thought!
@TheRosettaFound The most dramatic shifts driven by the web happened because communities took over - Imagine: 100000s of user translators translating billions of words into 100s of languages - control that!
Seems the deep and complex problems of localisation are a minute drop in the ocean of digital content management
@CNGL Discovery, analysis, transformation - Alex O'Connor tells how CNGL is addressing the grand challenges of digital content management
@TheRosettaFound Is the L10N industry due for a wave of destruction or positive transformation?
@ArleLommel Yes, Most of the mainstream technologies for translators are non-ergonomic and still in 20-year-old paradigms

Tweets from Tony Allen, Product Manager Intel Localisation Solutions presentation
@TheRosettaFound 30+ langs >200k pages >40% localised @ Intel's web presence. Intel: important to have user-driven content, interaction with the customer. Integration important, e.g. multilingual support chat. Integration, Interoperability key issues for Intel L10N. To figure out how content flows, without loss of metadata, interoperates with internal/external range of systems, is crucial.
2.5b netizens, >15b connected devices >1 zetabyte of traffic by 2015 and companies will interact with their customers using social media - type setups; new challenges for localization.
#intel What does it mean for localization infrastructures if we have >1 zetabyte of content in 2015? Current methods won't keep up
@ArleLommel #intel says that interoperability standards are required for cloud to meet future demands. L10n must evolve to meet this need too.

@ArleLommel Alison Toon (#hp) puts it this way: “localization (people) are the garbage collectors of the documentation world”
@TheRosettaFound 600GB of data in Oracle's Translation Platform - We need concise well-structured content - then we're going to be able to deliver efficient translation services - How to get it right: analyze content, identify problems and throw it back into the face of writers and developers. I18N and l10n have to get into the core curriculum at Universities says Paul Leahy (of Oracle), since we spend too much time teaching it.

Tweets from Sajan / Asia Online MT presentation
@TheRosettaFound MT cannot perform magic on bad source text - user-generated non-native-speaker content is often 'bad'
MT errors make me laugh... but human errors make me cry - an old quote from previously recycled presentations... Asia Online
Dirty Data SMT - what kind of translations would you expect? If there are no humans involved you are training on dirty data, says Asia Online. Sajan achieved 60% reduction in costs and 77% time savings for specific project - a miracle? Probably no, let’s see.
Millions of words faster, cheaper, better translated by Sajan using Asia Online - is this phenomenal success transferable? How?
XLIFF contributed to the success of Sajan/Asia Online's MT project. Asia Online's process rejected 26% of TM training data.

Tweets from Martin Orsted, Microsoft presentation
@TheRosettaFound Cloud will lead to improved cycle times and scalability: 100+ languages, millions of words
Extraordinary scale: 106 languages for the next version of Office. Need a process that scales up & down in proportion.
Microsoft: We have fewer people than ever and we are doing more and more languages than ever
Martin: "The Language Game - Make it fun to review Office"... here is a challenge :) Great idea to involve community via game
How can a "Game" approach be used for translation? Levels of experience, quality, domains, complexity; rewards?
No more 'stop & go', just let it flow @robvandenburg >>Continuous publishing requires continuous translation. New workflows

Tweets from Derek Coffey, Welocalize presentation Are we the FedEx or the WallMart of words?
@TheRosettaFound TMS of SDL = burning stacks of cash - Reality: we support your XLIFF, but not your implementation
Lack of collaboration, workflow integration, content integration = most important bottle necks. Welocalize, MemoQ, Kilgray and Ontram working on reference implementation - Derek: Make it compelling for translators to work for us
It's all about the translators and they will seek to maximise their earning potential according to Derek.

Tweets from Future Panel
@TheRosettaFound Many translators don't know what XML looks like
Rob: more collaborative, community translation - Rob: Users who consume content will have a large input into the translation BINGO
Tony: users will drive localisation decision, translation live
Derek: future is in cooking xxx? Open up a whole new market - user generated, currently untranslatable content. HUGE market
Derek: need to re-invent our industry, with focus on supply chain
The end of the big projects - how are we going to make money (question from audience)
From service/product to community - the radical change for enterprises, according to Fred
No spark, big bang, revolution - but continuous change, Derek
Big Spark (Dion): English will no longer remain the almost exclusive source language

image
The Translation Forum Russia twitter trail has a much more translator oriented focus and is also bilingual. Here are some highlights below, again with minor edits to improve readability.

@antonkunin Listened to an information-packed keynote by @Doug_Lawrence at #tfru this morning. As rates keep falling, translators' income keeps rising.
@ilbarbaro Talking about "the art of interpreting and translation" in the last quarter of 2011 is definitely outdated
Language and "quality" are important for translators, speed and competence for (final) clients. Really?
Translators are the weakest link in the translation process
Bert: here and now translation more important than perfect translation
Bert on fan subbing as an unstoppable new trend in translation
Is Bert anticipating my conclusions? Noah's ark was made and run by amateurs, RMS Titanic by professionals
Carlos Incharraulde: terminology is pivotal in translators training < Primarily as a knowledge transfer tool
To renatobeninatto at who said: Translation companies can do without process standards < I don't agree
@renatobeninatto: Start looking at income rather than price/rates
Evaluating translation is like evaluating haircuts - it's better to be good on time than perfect too late
Few translation companies do like airlines: 1st class/ Economy/ Discount rates – Esselink
Traditional translation models only deal w/ tip of iceberg. New models required for other 90%. Esselink
Good enough revolution. Good enough translation for Wikileaks, for example. Bert Esselink
In 2007 English Russian was $0.22 per word, in 2010 it dropped to $0.16 @Doug_Lawrence
There's much talk on innovation but not much action - don't expect SDL and Lionbridge to be innovative
@Doug_Lawrence all languages except German and French decreased in pricing from 2007 to 2010
@AndreyLeites @ilbarbaro problem-solving is the most important feature translator should acquire - Don't teach translators technology, teach them to solve problems - language is a technology, we need to learn how to use it like technology - 85% of translators are still women
@ilbarbaro 3 points on quality: 1. Quality is never absolute, 2. Quality is defined by the customer, 3. Quality can be measured - it is necessary to learn to define quality requirements (precisely)
@Kilgraymemoq announces that they will open Kilgray Russia before the end of the year

This is of course my biased extraction from the stream, but the original Twitter trail will be there for a few more weeks and you can check it out yourself. It is clear to me from seeing the comments above, that at the enterprise level, MT and Community initiatives will continue to gather momentum.  Translation volumes will continue to rise and production processes will have to change to adapt to this. Also, I believe, there are translators who are seeking ways to add value in this changing world and I hope that they will provide the example that leads the way in this changing flux.

And for a completely different view of "the future" check this out.


Monday, August 1, 2011

Translation Crowdsourcing


The phenomena of a crowd (or community) stepping forward and doing real translation work, often for no direct financial compensation is something that troubles many in the professional translation world. Mostly because they see this activity, as work being taken away from legitimate professionals or they see it as a ploy to reduce prices. While in some cases their fears may actually be justified, in the most successful uses of this approach I think it is clear that this is not true.

As I have said before, the growing momentum in the volume of content demands new production models. This momentum which exists both in the corporate world and also the general world out there simply cannot be addressed by ONLY using traditional professional translation production models. New needs require new approaches. For those who insist that the data deluge is a fiction, the rest of this post is probably irrelevant.


Another key driving force behind new crowdsourcing initiatives is the need to engage and interact with users in new markets. In new markets having active conversations with locals is key to building brand awareness and really learning about local needs and behavior. This is frequently a more important driver than cost containment as many in the industry think.


In some cases were it not for crowd-based localization efforts, it would simply not be done as it is not economically feasible to undertake the same efforts (and expenses) as are made for FIGS/CJK for “lesser” languages. Thus crowdsourcing is emerging sometimes as a means to get “lesser” languages done.


If we look at some of the most successful examples of crowd-sourced translation in practice, we can see that they have many if not all of the following elements in common.


A Crowd/Community That Is Invested:


·         TED Open Translation Project – Volunteer translators are often inspired by the content and wish to share it with their friends and countrymen. June Cohen has said that the volunteer translators in general do better quality work than the many of the paid professionals, who initially did a few translations to seed the project because of their passion for the subject and often their subject matter expertise. This effort has now enabled over 20,000 translations into 80+ languages of really challenging material. Many professionals also volunteer because they believe in the high general value of the content.


·         Facebook – Users who wish to build and expand the friend community in their particular language group. This effort has enabled Facebook to grow rapidly in international markets and accomplish very rapid coverage across 60+ languages. Had they used traditional means to do this it may have taken them years to get to the same point. Critics also often miss the point that engaging real users in the translation task also encourages rapid growth of the user base as “user translators” engage friends into their network.

·         Microsoft - MVPs (top accredited reseller partners) who wish to make technical support knowledge about Microsoft products more easily and widely available in their markets. Their efforts are rewarded by lower support costs and also an increase in product sales as more and more users look for self-service knowledge base information. Microsoft has been a trail blazer in making large amounts of knowledge base content available via MT, they are now adding crowd based editing to raise the quality of the translated information. Thus the most used and vital information tends to get the most attention and benefits all users.

·         Asia Online – Student users provide corrective feedback to continue to improve the translation quality of the Wikipedia and other knowledge content that is initially done by highly customized MT engines and paid translators. The students themselves will be the primary beneficiaries of this content, and their efforts will enable them to access high quality educational information. The volume of this information will likely increase a thousand fold.

 ·         Yeeyan:  has 150,000 registered users, who collectively translate 50 to 100 news articles every day from English to Chinese. Since its inception in 2006, the site has grown into a key gateway for Chinese speakers who want to follow international news. It has been so successful that it has attracted the attention of major news sources like The Guardian and ReadWriteWeb. Yeeyan is focused on addressing the problem of ghettoization of information by language through a community collaboration, where members both identify interesting content and also help to translate this content.


·          Adobe: This is a much more carefully managed effort designed to engage influential users, partners and customers to help provide relevant information for the broad Adobe User community in China.

T   Twitter: The translation center asks Twitter users -- all volunteers -- to help translate Twitter's interface into various languages. Once the basic support pages are translated, a select group of the "most active" translators are invited to work with Twitter to "maintain localized versions of the service." Twitter boasts that its translation center has 200,000 translators, and that the localization process for Dutch and Indonesian took just one month from the first call for involvement to its announcement. The availability of its interface in multiple foreign languages is bound to increase its popularity and effectiveness not only for online marketing but also for social and political activism.

Software Infrastructure That Facilitates Contribution & Participation:

In all of the cases above the companies involved crowdsourced translation initiatives need to invest in software that enables tasks to be parceled out, evolve as tasks change, enable efficient administration, maintain quality, gather feedback, and build self-sustaining eco-systems. The tools developed by dotSUB, Lingotek, Yeeyan and Asia Online are all unique collaboration and translation workflow management tools that enable these kinds of initiatives, They make little or no use of industry standard tools like Trados and TMS because of the highly proprietary, rigidity and archaic nature of these tools. These new-generation tools are much more open and are designed to evolve with technical and process advances on the internet today. It is quite possible that these community efforts could produce tools that supercede many of the tools in use today as these new tools focus on collaboration and sharing assets to enhance the efficiency of the collaborative translation process.

The Importance of Engagement and Higher Purpose:

It is interesting to note that translation is not the primary business of any of the companies listed in the examples above. In every case the goal and intent is to make more information available faster. Even for many of the corporations that are exploring crowdsourcing, the rationale is often more about customer engagement than cost savings. It is also important to note that none of these initiatives could even be attempted without the use of automation and large-scale community support and they are enabling initiatives that would not be possible otherwise. This is also true for Facebook who still had to use professionals to translate legalese that their community was not interested in translating.  The role of communities is likely to increase in future as more of the world comes online.


As we move forward we will see much more video and other rich content come online and already it is clear that the old approaches will not enable us to make this new content multilingual in an effective time frame. Crowdsourcing and automated translation will be necessary tools for an organization that seeks to communicate across the globe. As Clay Shirky has pointed out, the ‘cognitive surplus’ of the online population is a force that can be harnessed under the right circumstances and for the right purposes. It is likely that the professional translation world is going to see significant disruption in the coming years, as innovators figure out how to build sustainable models around community engagement, technology and organizational mission. However, as we have already seen, there is much that the crowd has no interest in doing and we should expect that this is not likely to change.


Crowdsourcing is here to stay and is a new mode of production that enables high–volume projects to be undertaken, engage with users and partners more deeply and participate in multilingual social networks where so many branding impressions are being formed. Managing crowdsourcing is also a major opportunity for savvy LSPs who have processes in place to recruit and manage the collaboration of dispersed volunteers and contributors.

Sunday, December 19, 2010

Using Translation as a Force to Address Information Poverty: AGIS 2010

I have the good fortune to reflect and report on the AGIS 2010 conference as we approach “the holiday season,” which is a time of reflection for many in the world. A time of goodwill and at least temporary good deeds for some. The conference was held in India, which can be a challenge because the basic infrastructure is still primitive, but the event went off well with very few glitches and I think AGIS is slowly building momentum.

AGIS stands for Action for Global Information Sharing, and is focused on conducting a resolute crusade against Information Poverty since its inception. The overall tone and tenor of this conference is very different from the typical conference in the world of enterprise localization (LocWorld, GALA, LISA). The focus is on making all kinds of knowledge and information accessible in places where it has never been available before, not just to sell products. There is clear evidence shared by many speakers, that shows that access to information creates the conditions for economic prosperity or perhaps even actually drives it. In some parts of the world localization is all about reducing information poverty and improving the human condition. Reinhard has provided a summary of the highlights of the event in his blog. There was also coverage here. And for those who ask why do need yet another conference in the industry, Reinhard explains below:
You might be asking yourself, “Why AGIS, why YALC (yet another localisation conference)? What makes AGIS so different?” Well, first of all, it is not owned by any particular organisation, it is not run for profit, and it is (almost) free to attend. Then, it takes place where people need localisation, not where people are rich enough to pay for it. Nothing is sold, nothing is bought at AGIS. And last but not least, AGIS attendees have a social agenda, not (just) a commercial one.
67119_476962533547_141976673547_5546839_7198894_n
The highlights that can be also be found in the twitter stream on my Friendfeed (Scroll back to newer items to see the chronological sequence. I don’t know why Twitter has already made much of the data unavailable).  

In the keynote, Dr. Vijay Bhatkar (a digital visionary of India) pointed out how globalization and localization are tightly linked and how NLP, MT and language technologies are only just beginning their evolutionary march. He pointed out how Japan’s hopes for world dominance were stymied by linguistic issues and that access to knowledge and information in the 21st century will be key to building prosperity as an increasing part of the GDP of many nations will come from knowledge services. This is already true for India. He informed the Indians in the room, that India can not consider itself an IT power when 350 million people are illiterate and urged the community to preserve the Indian languages while continuing the push forward with English education. He also pointed out that both Telephone and TV are mostly language neutral but information cannot be, and localization is critical to broad access. 

Reinhard Schaler and the Rosetta Foundation are leading an initiative to build a platform to facilitate self configurable, distributed and shared data based global localization initiatives. CNGL and University of Limerick students provided overviews and demonstrations of these tools. Reinhard highlighted that each day 24,000 die children because of lack of access to basic healthcare information. That is 1 every 3 seconds! These deaths could be avoided if information was available more easily. This appears to be a primary motivator and raison d'etre for the Rosetta Foundation.

Ms. Swaran Lata painted a clear picture of the amazing complexity of the Indian linguistic landscape. 20+ major languages with some states having 3-4 languages and multiple scripts. The CDAC organization is attempting to solve the linguistic computing issues to ensure that Indian languages gain a stronger digital presence and are preserved. As Prof. Bhatkar asked: "Can you really say you know Hindi if you cannot  use it on a computer? This is key in the information age."  Ms. Lata described initiatives that focused on the digital education of youth, which interestingly also resulted in the knowledge being passed on to illiterate parents and grandparents. She talked also about initiatives to reach out to the “other side of India” to ensure that illiterate people are not left behind. As Indian consumers become more powerful, Indian languages are critical to reaching their purchasing power.

I spoke about how the Asia Online vision is finally coming to fruition, when we start rolling out a Thai Wikipedia comprising of translations of 3.5 million articles starting in January 2011. When all these articles are up and ready, Thailand will have the second largest Wikipedia in the world after the English one.  This is a huge boost from the current 60,000 article Thai Wikipedia, many which are just barely more than stubs. In contrast, the index alone for just the article titles in the English Wikipedia are in excess of 600,000 pages! The Asia Online project is an initiative that directly addresses information poverty. Shockingly it was also uncovered that the Hindi Wikipedia only has about 50,000 articles for a population of almost 400 million people! This means that a child that does not speak English is deprived of basic educational information access and has a fraction of the content available to an English speaking child.
image
Some other interesting information from the conference:
  • Ravi Gupta pointed out there are 62,000 newspapers in India and 92% of these are not in English
  • Subtitles do not work well in India because of literacy issues but they can also be a means of building literacy
  • There are no English TV channels in the Top 100 TV channels in India but English speaking consumers are the wealthiest consumers
  • Ravi Kumar, The President of  the Indian Translators Association made an impassioned plea asking that buyers and the community at large respect translators as professionals
  • The CNGL team showed various elements of the open SOLAS platform they are making available to anybody who needs it
  • Mahesh Kulkarni’s wonderful presentation on standards which he called traffic rules that ease both user and creator experience. He has a much more holistic and systematic view of standards than we see from the feeble standards initiatives in the traditional localization industry but he too, expressed the difficulties of getting good standards in place.
  • He also pointed that that there are 670 million mobile phones in India and asked is this the end of the internet as we know it? 
 Mahesh Kulkarni and Raimond Doctor are a joy to behold; passionate, knowledgeable and driven in spite of having to deal with Indian governmental bureaucracy as part of their daily lives. Raimond is perhaps the most erudite and knowledgeable person I have met on comparative linguistics. He shares his deep knowledge and insight with a verve that draws you right into his delight for language. I hope that CDAC realize what treasures these men are, and gives them room and resources to execute on their vision and passion.

Reinhard also pointed out that the non profit world was substantially larger in terms of market potential and actual localization activities than the Fortune 500 market. Non-profit does not mean no payment, no recognition and jobs, it is in fact bigger than the energy sector in US.  

Take a look at this TED video to see how information access can change lives and empower people to learn, take control and transform their own lives.
If there is a revolution coming in translation – I think one is much more likely to see the first signs of the revolution at a conference like AGIS, rather than at more mainstream localization conferences. We see all the key elements lining up here: people focused on large scale collaboration infrastructure, community and crowdsourcing management, massive translation automation and standards and the most important ingredient of all: PASSION. We see people at this conference who are driven by a passion to change the world. We see people who are not making a lot of money but are still working long and diligent hours. We see people undertaking translation projects that will involve hundreds of millions of words on a routine basis. We see technology, collaboration tools & infrastructure and community coming together in ways that just does not happen at traditional professional translation events.  We see people who want to make an impact on the human condition. We hear and see people talking about nation building and the human right to knowledge. This kind of talk gets me all warm inside and I think this is what we all had in common, a “higher” sense of purpose and mission which does not equate to a stance of moral superiority as some might think. Many of the people here have a soul satisfying answer to the question: Why do you do what you do? Isn't that enough?

My interest in automated translation has always been related to the potential impact this technology has on improving information access and thus improving human lives across the world and also potentially improving the quality and depth of communication between linguistic groups, cultures and nations. One step on the way to Pacem in Terris? Foolish and idealistic perhaps, but we need to dare to dream first before we can actually make it happen. As a teenager, a wise man told me once, “You are the world and the world is you” and I have explored that statement ever since,  holding it close to heart as a seminal influence in my life.

Join and support the Rosetta Foundation and help make this into a movement that cannot be stopped. If you have the ability to influence a major corporate entity to get involved and support this please do so now, and join Reinhard as he forges and builds this new path to change the world for the better.
And as if that were not enough, I even had a brief meeting with former Indian Prime Minister I.K. Gujral, who even in his nineties has a stature, grace and humility that is disarming. He was impressed by the Thai Wikipedia project I am involved with, and said it would be wonderful for India to do the same in Hindi. And thanks to my friend Vishal I also got to meet several industry leaders of emerging India who seek to build transparency and a relatively corruption free government.


I was also greatly heartened to see the corruption establishment take a serious blow when Minister Raja was exposed for taking obscenely huge bribes, in excess of $40 billion I believe. What makes corruption in India especially horrific is the complete lack of remorse and shame that these public officials have. India is on the move but still has far to go as the culture of corruption is everywhere you turn, and will not die easily. One of the other benefits of free flowing information is that it also makes this kind of self dealing and abuse of trust harder to maintain. Information poverty is also an enabler and friend of corrupt officials and thus this is yet another reason to address this issue. 

Happy Holidays to you all and I hope that you explore and find "goodness" in your life. And here is the link to holiday greetings in many languages.


Tuesday, November 9, 2010

The Machine Translation Community Building Bridges with Translators

The American Machine Translation Association (AMTA) recently held their annual conference in close proximity with the ATA in an attempt to build bridges and foster a growing dialogue between these two communities. When I entered the world of MT (I have always preferred the term automated translation) I had the good fortune to work with Laurie Gerber at Language Weaver who encouraged engagement with translators. While her voice was not heard there, she has always stayed true to this vision and she was instrumental in influencing me to also reach out to the world of professional translators as a core business strategy. She has long been a clear voice encouraging the broad MT community to reach out to translators and she was visible in Denver last week making sure that ATA guests were engaged and making all the right connections or just having a good time.

It is clear to me that the path to better quality MT, that really does fulfill the promise of sharing information, knowledge more freely in the world can only come from a close, cooperative and collaborative relationship with professional translators. 

The conference began with a keynote from Nicholas Hartmann who is the current ATA President and also a past technical marketing translator. He gave, what I thought was an articulate, considered and clear perspective of the translator vis-à-vis translation technology and MT while pointing to some directions for real collaboration in future. I thought it would be valuable to restate what I heard, as there were several key messages for the MT community. A published paper version of his speech is also available on the AMTA website (but it is a really hard to get to these resources as the unique URL is not easily displayed.)
 
The ATA has 11,000 members, of whom 70% are freelancers and Nick had carefully prepared to be their voice, expressing their concerns and needs in this forum. (Here is the twitter stream). He stated that the bad blood with translators was originally created with the historical overstatement of MT capabilities in the 60’s where MT was expected to replace translators: FAHQT (Have you noticed that this sounds a lot like f**&ked?). He noted that many translators do in fact use some form of translation technology today even though they find the future vision of being post-editors at “burger flipping wages” abhorrent. He gave some examples of human translations that went beyond the literal, to show how only a human could make the non-literal interpretations to correctly translate some example phrases. The examples proved that even a “perfect” literal translation can be nonsense at times and asked if the future of MT is as T.S. Eliot says:
“That is not what I meant at all. That is not it, at all”
Some additional points he made included:
  • Translators have a very different view of quality which is linked to their code of ethics to render the source material accurately
  • MT makes sense where something is better than nothing
  • MT is really only  “a probability distribution over strings of letters and sounds” (especially SMT) (part of a quote from Martin Kay in which he specifically cautioned the MT community NOT to consider language so simplistically)
  • Translators want it to be acknowledged that their work is critical to feeding and improving SMT systems with HT corpus
  • MT should be matched to task and purpose and could be unfortunate in the hands of the wrong people e.g. the infamous Welsh street sign
  • SMT that is often built using flawed TM and thus one could hardly be surprised at some of the results and in this case past performance will be a predictor of future performance
  • MT must be edited and checked to avoid serious errors
  • Post-editing has come to mean cleaning up really bad quality MT output at very low wages even when everybody doing it understands that they could have done it faster without the MT
  • The data pollution of some SMT systems perpetuates and is difficult if not impossible to remove
He then went on to answer the following question. So what do translators want and need?
 
“We want to work together constructively. We want technology that we can use. Machine assisted translation does make sense to us but we do not want tools that make our jobs harder.” Translators want to have a hand in making the tools but want the dialogue to be realistic. They also do not want a role of PEMT drudgery and asked for technology that assists translators to be more productive.  He ended his talk saying that he had enjoyed meeting many in the MT community and he hoped that the dialogue would continue as, “We are all in the same business.”

I did not stay long enough to really get the reaction of the MT crowd as I rushed off to tekom in Germany that same evening. Of course there was one somewhat hostile question immediately, but I think many MT users want to know how to work with translators and as you will see from my previous blog entries that I am a big believer in this rapprochement.
Jost

Nick was followed by Jost Zetzsche who continued on the theme of building bridges and improved communication and he pointed out how translators have a self-perception of being bridge builders, language lovers, artists, cultural intermediaries in contrast to the techie, computer science self image that many MT practitioners have. Clearly cultural and communication problems can arise from this. He was self critical and admitted that translators need to learn more about MT technology and not resist it like they did with TM, but also pointed out some foolish statements made by MT proponents that any average translator would see as unfortunate or stupid. Some examples, Jaap van der Meer’s statement about letting a thousand MT systems bloom. Some of you may realize that this is really close to something that Mao Zedong said to flush out dissidents and eventually execute them. The other screamer he listed (without reference to the source, other than it being a major MT vendor executive) was, “It’s quite a magical technology when you see it (MT) work” by Mark Tapling of SDL (He says this with a smile at the end of the video clip).  (Dude, it’s a data transformation!!! Really ?!? I wonder if Tapling thinks that spreadsheets adding numbers up and Powerpoint slide transitions are also magical? Perhaps from a 19th century frame of mind this is all pretty magical). Jost contrasted this to a Twitter conversation he had with @kvashee ;-) about features that would make MT more useful to translators. ( I assure you I did not instigate this comparison.)

Some things that he asked for: Give us (translators) challenging tasks, we want to participate in “making it better”. He stated that he wanted to see that his corrections had immediate and direct impact on the system. (One of the biggest complaints that translators have about MT is that the systems make the same error over and over again.) He asked that MT vendors talk to translators in “real” language and “admit what your tools can and cannot do”. He ended on a positive note by saying that we as humans tend to demonize the unknown (HIC SVNT DRACONES!) and invited the audience to enter into each others terra incognita, and put our myths behind us. 

While this event was a good and constructive start, I hope that the dialogue between translators and MT developers continues beyond this conference and produces real innovation and collaboration. One of the first subjects that needs elucidation and better definition is “post-editing”. It was clear in several very instructive presentations in the “Commercial User Track” that the concept of post-editing needs development and clarification. There were several very good presentations and we see many successful MT implementations being discussed on a regular basis now. Check out http://www.wallofsilver.net/ and type #AMTA2010 to see a cool Twitter summary of the conference. Chris Wendt of Microsoft and I were also voted to the AMTA board, representing commercial users and I thank all those who voted for me and hope to help drive our common interests and agenda.

There was an interesting demo of several post-editing tools that are available in the market currently. Lingotek showed their translator workbench, PAHO showed their Word Macro based post-editor which is perhaps the longest running and most widely used direct post-editing tool around. GTS showed a promising looking community management and basic editing environment for Wordpress blogs. These examples all suggest that the tools to make post-editing more interesting are going to continue to evolve, and that while these are wonderful examples the best is yet to come. We also see that increasingly community and collaboration are intertwined with post-editing and this connection to MT is likely to develop further, as new kinds of people are drawn into the translation process. AMTA will make much of the content from this conference available on their website and I am sure many will find it useful if they can actually find it. (They need a serious update to their website).

Unfortunately, I had to rush off to the Tekom conference in Germany  at the end of the first day and missed the rest of the conference, but I kept in touch via the GTS blog since the twitter stream died pretty quickly after I left. I noticed Jost mentioned in his blog that Laurie had accomplished what she had set out to do, years ago i.e. bring MT developers and translators closer together. However, while Laurie may be happy at this initial accomplishment, I would suggest that she stay around a while longer and make sure that we all set sail together with the wind at our backs. The journey together has just begun and we have many miles to go before we sleep.  

And this was tekom – a blur of meetings and some wonderful dinner conversations. 

tekomacross




Tuesday, October 26, 2010

Megatrends and Their Impact on Professional Translation

I have recently made public presentations to audiences of LSPs and localization professionals about broad trends affecting the world of professional translation. My perspective seemed to resonate and I thought it might be useful to put the core message in a blog entry and share it, to possibly get critical feedback or extend the discussion. In some ways I have touched upon these “megatrends” in earlier blog entries (Why MT Matters, The Data Deluge, Translation Technology, Innovation in Localization)  but it is useful to bring it all together in a single place.

I am aware that others are making similar points but I think this summary incorporates conversations I have heard around the web and is closer to the collective intelligence. The trends in brief are as follows:
  • There is an explosion in relevant content affecting global enterprises
  • Social Media and Social Networks are now increasingly in control of branding impressions
  • There is an increasing use of open innovation and community collaboration models in many businesses
  • Translation technology and automation are becoming increasingly important for speed and cost reasons
  • A rising Asia will change the priority of strategic languages away from the current FIGS dominance

The Content Explosion

Data Growth Trends
We are seeing exponential growth in the digital universe and much of it is very relevant to global enterprise concerns. It is important to understand that this is happening on a scale never before seen in the history of man. A lot of this content is related to facilitating global commerce so understanding this becomes highly relevant for the global enterprise.
Enterprise Content

The Impact of Social Networks and Customer Conversations

The carefully calculated marketing and corporate image control that global enterprises have been used to is also coming undone. Brand impressions are increasingly being formed by real customer conversations in social networks. We see that the world of marketing is undergoing a transformation and what used to be considered critical corporate messaging is increasingly viewed as “corporate-speak” and is often not trusted by the end-customers who matter the most. Jeremiah Owyang wrote a prescient blog entry three years ago where he predicted this shift and increasingly the corporate website is seen as a place where pro-corporate bullshit resides. More and more, the high value content is being created by customers and business partners and corporations have little editorial control of this content creation process.

UGC Impact

The Growing Importance of Open Innovation & Collaboration Models

We are also seeing that co-creation of products and services with customers can be a huge momentum builder. Facebook has shown that engaging users in translating interfaces can also help accelerate building the customer base in those languages. Their international growth has been closely linked to their crowdsourcing translation efforts even though crowdsourcing can be less predictable.  Dell actively solicits and nurtures active customer feedback in IdeaStorm to develop new products. Collaborating with customers and partners is emerging as a way to to engage customers, build loyalty and accelerate penetration in new markets. There are very few large companies that have figured out how to do this because open collaboration is a huge cultural change from the command and control mentality that most executives work with. In the B2B world this may also mean that customers prefer best-of-breed solutions to one-stop solutions. Diversification is generally considered a wise investment strategy and is increasingly understood to be a good way to build business infrastructure and avoid vendor lock-in even though it may also mean more integration work.

 The Rise of Asia

Western Europe and FIGS have dominated the professional translation world historically. While FIGS will remain important, in the future it looks like Brazilian Portuguese, Spanish will be the most important European languages and Chinese, Hindi, Indonesian and other Asian languages will become increasingly more important and strategic as global revenue generating translation investments. Many companies now are expanding their base of languages and the FIGS-CJK view is slowly receding.The graphic below is clear in it’s implications: Asia offers substantial commercial opportunity for those who make the localization investments. I have written previously about how Asia is a long-term opportunity for truly global companies. (I will update it shortly as the momentum keeps building.) There are many market potential studies that suggest Asia offers significant opportunity for IT infrastructure, mobile devices, luxury goods as well as new bottom of the pyramid (BOP) product opportunities.
Strategic LPs

What Does All This Mean?

So if we add all this up it shows that global enterprises are facing a content deluge with dynamic content coming from both internal and external sources and high volumes of this content is expected to be translated increasingly faster to have any value in competitive situations. Global enterprises that quickly identify high value content and make it multilingual will find that this can drive international revenues and that translation can be a strategic tool to building long-term competitive advantage.

Now, more than at any other time in history, speed and agility are decisive competitive advantages...David Meerman Scott

However this is a time of revolution, and the TEP (Translate-Edit-Proof) and SDL (Software and Documentation Localization) mindsets are not likely to be adequate to meet these new translation challenges. The old approach worked for static, low volume content but new thinking and new approaches are required to deal with the data deluge today. Automated translation is an absolute necessity in the new world but this is not the MT of yesteryear that many are still implementing and describing at localization conferences today.

Old Approach Cant Work
In the new world, data has to flow from content creation to consumption as seamlessly as possible, delivered to where it is needed at desired quality levels. This means that humans need to be part of the production process and are the key to producing the best quality. The future is about much better man-machine collaboration. The automated translation tools need to be learning and getting better all the time. They need to be responsive to skilled human linguistic steering and corrective feedback. They need to be focused on dynamic streams of content not just static, packaged translation projects like user documentation. They need to understand that with flowing content, upstream cleanup efforts will flow through the production line and make every downstream process easier and more efficient. They see the information cycle as a system, as organic and thus build the collaboration infrastructure to address the whole problem. They need to see MT systems improve with human steering, in weeks, not months or years. They need to be MT systems that can be trained and managed by skilled professionals to get you to “good enough” production quality “fast enough” to have a positive impact on your business, not the black box MT of yesteryear. If you look closely there are very few choices in reality. (Think Asia Online!)
Localization Trends

Have you noticed that Google and Bing Translate improve regularly since they shifted to statistical MT (SMT)? Does that tell you something? Why did we not see these improvements when these same companies were RbMT based?  On domain focused systems the progress and quality improvements are even more compelling. I have recently seen Asia Online systems compared to heavily customized RbMT “hybrid” systems, and it is clear that the yield to effort ratios with hybrid SMT approaches are clearly superior, (I am of course biased) but the evidence continues to mount. (BTW Moses is perhaps 5% of what is needed to accomplish this). In the future professionals will undertake to translate content streams that contain tens of millions of words that encapsulate important customer conversations. If translation is strategic make sure that you align with technology that can go the distance and that has demonstrated the ability to evolve.
New Requirements for MT
As more senior corporate executives realize that translation is strategic, and that translation technology properly used can generate substantial revenue in global markets, they will start to look at solving these new kinds of translation problems. Executive managers are unlikely to be excited by the possibility of getting user documentation done faster and cheaper. However, as they start to get their hands around the fact that customer conversations are important (many are struggling with this), and that they need to respond with speed and agility to build global customer relationships, we can expect to see a new kind of executive who will seek to make flowing content multilingual. They will care that real standards exist (not SDL versions) and will likely remove products that do not comply. Handling the flow efficiently will become the focus, and the most visionary localization managers may even have senior roles in making this happen. Speed and agility are key and customer engagement across the globe will require a real understanding of how to make these dynamic content focused translation systems work. There is a video of an extended presentation version of this blog available from the Localization Technology Roundtable seminar (starts at about 4’30” after introduction). (Yes, I need to lose weight or at least wear shirts that are not so tight).

We are living in a time of great change. In times of change there is often an opportunity for new paradigms and new leaders to launch, remember MSFT grabbing the desktop market away from IBM and Google grabbing away the web search market from MSFT?  New leaders with new visions are the change agents that make this happen, and they are often dismissed by the established status quo and “leaders” of the time. I do not see that new MT initiatives by Lionbridge and SDL really address these new needs. I think there is too much of the old view in their approach,  but I may be proven wrong. I am skeptical that the current “market leaders” have the vision and/or culture to drive this change and I think we will see somebody from outside the industry or smaller, more agile LSPs be the driving force of this coming change. For current leaders to be change agents often means cannibalizing current revenue streams and very few have succeeded in crossing that chasm. We shall see as this unfolds.

In revolution, the best of the new is incompatible with the best of the old. It’s about doing things a whole new way...Clay Shirky

Anyway these are truly interesting times, and it looks like things will get even more interesting.