Tuesday, January 26, 2010

Why Machine Translation Matters for LSPs

In my last entry I pointed out some of the reasons why the information deluge on the web and the worldwide thirst for knowledge is driving the development of MT technology and only touched upon why this might matter to Language Service Providers (LSP) and the professional translation industry. I thought it might be useful to expand on this.

It is useful to summarize some of the key trends in the professional industry as they provide a useful backdrop for why the need for more automation will only continue to build momentum. I am told that SDL originally stood for Software and Documentation Localization. The world has changed and the increasing source data volume, falling prices and the increasing value of dynamic community-created content suggests that changes are needed.The list below explains this further.

-- Corporations are facing an increasing demand for translated content without significantly higher budgets. It is possible that the increase in source content volume could be anywhere from 5X to 100X historical volumes. The internet has been instrumental in increasing the demand for large amounts of multilingual content.

-- The sustainable price per word rate has steadily declined over the last 10 years and buyers are still demanding greater productivity.

-- Much of the industry is currently optimized around the TEP process and project management skills.

-- Translation Memory technology is now widely used by agencies and translators those who work with moderate to highly repetitive source content and there is growing momentum to centralize this.

-- Translation Management Systems (TMS) are increasingly being used to manage the growing workloads and improve project management efficiency.

-- The importance of communities and collaborative models are gaining momentum by the day. Crowdsourcing is seen as a legitimate and efficient way to do certain kinds of projects.

-- The value of speed and agility is increasing all the time as we enter the real time internet world. = We need it translated NOW!

-- The value of static content that has been the focus of professional translation continues to decline and many global corporations see that dynamic knowledge content and community content has much higher value to the global customer e.g. iPhone ships worldwide without a manual.

-- Long standing industry experts have pointed out that current approaches and processes are outmoded and that translation automation technology needs to evolve to keep up with the rapidly growing information needs of global customers.

The Consortium for Service Innovation whose members include Dell, Symantec, Cisco, Microsoft and others, provides some very compelling data on how the customer support reality is changing for a typical IT company that produces a software or hardware product for the global market.The graphic above shows that customers in the IT industry get 40X the support from self service knowledge bases and communities than they do from the documentation and the direct customer support organization.

The picture above shows how major corporations focus on the relatively low value, static content even though there is increasing evidence that customer loyalty is built elsewhere. This is slowly changing and it can already be seen by the increasing use of MT by market leaders like Microsoft, Intel and Cisco to widely distribute support information even though it is sometimes raw MT. The ROI in creating more self-service and community content is huge as can be seen from these diagrams.

As these forces gather it becomes increasingly important for an LSP to be able to help their customers respond to the much more dynamic world. How many LSPs have been involved with large knowledge base translation projects? Why not? Clearly automation is necessary and a different more collaborative process is required. Global enterprises are being driven to understand and use community content, and constantly changing support requirements to engage global customers. LSPs that have the skills and knowledge to help with this will be seen as valued partners.

Many say that SMT is in some sense a much more flexible and robust TM technology and will now be much more easily used and deployed with tools like Language Studio Pro. These domain focused MT systems improve over time and can thus become a strategic asset that enable an LSP to respond faster and at lower cost than otherwise possible. LSPs with good quality MT systems become valued partners for every global company interested in that domain that enable a much broader dialogue with global customers. However, it requires an investment to get to this point.

MT will also open up completely new markets for the professional translation industry as major Publishers begin to realize that they can make high value content usefully multilingual at much lower costs, and in a much faster timeframe. At every step, the leaders will be those companies that learn to use these tools well and also engage translators in this process as they are necessary and critical to producing high quality. There will be a need for new kinds of people, some with deep linguistic skills who identify translation error patterns and develop corrective strategies, and and there will also be a role for less skilled monolingual post-editors.

MT can also enable improved productivity and standardization on large localization projects. MT engines can leverage all future projects in the same domain or for the same customer. In some cases the availability of imperfect MT output can also trigger and create human translation demand e.g. patents. Most of all I think that LSPs that learn to use MT effectively become true globalization partners who are engaged in many international initiatives wherever new languages and new multilingual content can drive new market growth.

I predict that 2010 will be a year that we see some LSPs develop definite competitive advantage using their knowledge and expertise with MT as a key differentiator.

 "Inquiry and change are not separate; they are simultaneous. The moment we ask a question, we begin to create change.” – David Cooperrider, Appreciative Inquiry

There is an excellent audio-visual presentation  (requires registration) that describes the technology and provides some specific use scenarios for LSPs interested in exploring this further.

Friday, January 22, 2010

Why Machine Translation Matters

"I have not failed. I've just found 10,000 ways that won't work." - Thomas Edison

After more than fifty years of eMpTy promises and repeated failures, amazingly, interest in machine translation continues to grow. It is still something that almost everybody hopes will work someday. We just won’t give up. Why? How can an industry that fails to deliver for 50 years still be around? Clearly, MT is a difficult problem, but I think the main reason that we persist is that there is a huge thirst for information, data and knowledge that exists across language barriers. The growing volume of valuable information on the internet only makes this thirst more urgent.

Is automated translation finally ready to deliver on its promise? What are the issues with this technology and what will it take to make it work? I would like to provide my perspective on why it matters and why it is important that we continue in our quest to make it work better.

In the professional translation world there is much skepticism about MT and we see MT regularly being trashed in Translator and LSP blogs, forums and conversations at conferences. Many dismiss it entirely, as a foolish and pointless quest, based on what they see on Google and other free online translation portals. Very few understand or have ever seen the potential that carefully tuned and customized MT systems suggest.  There are a few who have begun to understand that MT is an imperative that will not go away and step tentatively forward. I am happy to see some wholeheartedly embrace it and try and learn how to use it skillfully to develop long term competitive advantage.

For some professionals there is a debate about whether Rule-based MT (RbMT) or Statistical Machine Translation (SMT) is better and of late it has become very fashionable to claim that the "right" approach is hybrid. Industry giants (Google, Microsoft, IBM) are all very focused on SMT with increasingly greater linguistic variations, and there is a healthy open source movement also underlying this (SMT) technology that is spawning innovative, new companies. My company, Asia Online, I think is one of the bright lights on the horizon.

My personal interest in MT is driven by a conviction that it can truly be an instrument to bring positive change in the world. It is possible that, using MT to ease access to critical knowledge could revolutionize and rapidly accelerate the development of much of the world’s poorest communities. I don’t think it is an exaggeration to say that “good” MT could help to improve the lives of millions in the coming years. And thus I feel that improving the quality of MT is a problem worthy of the attention of the best minds on the planet. I also think that getting the professional industry engaged with the technology is key to rapidly driving the quality of MT systems higher and perhaps to reach a tipping point where it enables all kinds of valuable information to rapidly become multilingual. My sense is that MT needs to earn the respect of professionals to really build a quality momentum and make the breakthroughs that so many us yearn for.

The Increasing Velocity of Information Creation
We live in a world where knowledge is power and information access, many say has become a human right. In 2006, the amount of digital information created, captured, and replicated was 1,288 x 1018 bits. In computer parlance, that's 161 exabytes or 161 billion gigabytes …

This is about 3 million times the information in all the books ever written!

Between 2006 and 2010, the information added annually to the digital universe will increase more than six fold from 161 exabytes to 988 exabytes. In 2007 it was already 281 exabytes. It is likely that the bulk of this new information will originate in just a few key languages of the digitally privileged knowledge driven economies. So are we heading into a global digital divide in the not so distant future?  The famous Berkeley study on How Much Information testifies to this huge momentum. A recent update to the study suggests that US households consumed approximately 3.6 zettabytes of information in 2008. Access to information is closely linked to prosperity and economic well being as shown below.

Peter Brantley at Berkeley in a personal blog quotes Zuckerman's wonderful essay:
“For the Internet to fulfill it’s most ambitious promises, we need to recognize translation as one of the core challenges to an open, shared and collectively governed internet. Many of us share a vision of the Internet as a place where the good ideas of any person, in any country, can influence thought and opinion around the world. This vision can only be realized if we accept the challenge of a polyglot internet and build tools and systems to bridge and translate between the hundreds of languages represented online."
Brantley goes on to say:
"Mass machine translation is not a translation of a work, per se, but it is rather, a liberation of the constraints of language in the discovery of knowledge."
Today, the world faces a new kind poverty. While, we in the West face a glut of information, much of the world faces information poverty. The cost for this can be high. “80% of the premature deaths in the developing world are due to lack of information” according to the University of Limerick President Prof. Don Barry. Much of the world’s knowledge is created and remains in a handful of languages, inaccessible to most who don’t speak these languages. Asia Online conducted a survey of local content available in SE Asian languages, and found that China and Japan each had 120X more content, and English speakers have perhaps 600X more content available to them than the billion people in the SEA region. Access to knowledge is one the keys to economic prosperity. Automated translation is one of those technologies that offers a way to reduce the digital divide and raise living standards across the world. As imperfect as it is, this technology may even be the key to real people-to-people contact across the globe.

The seminal essay The Polyglot Internet by Ethan Zuckerman has got to be the most eloquent justification for why translation technology and collaborative processes must and will improve.  It has become the inspiration and manifesto for the Open Translation Tools Summit.
While there is profound need to continue improving machine translation, we also need to focus on enabling and empowering human translators.Professional translation continues to be the gold standard for the translation of critical documents. But these methods are too expensive to be used by web surfers simply interested in understanding what peers in China or Colombia are discussing and participating in these discussions.
The polyglot internet demands that we explore the possibility and power of distributed human translation.
We are at the very early stages of the emergence of a new model for translation of online content – “peer production” models of translation.

Visionaries like Vint Cerf also points this out in a recent interview. Ray Kurzweil has spoken on the transformational potential that this technology could have on the world. Bill Gates has commented many times on the potential of MT to help unlock  knowledge, both for emerging countries and those who do not speak English. The Asia Online project is focused on breaking the language barriers for knowledge content using a combination of automated translation and crowdsourcing. Much of the English Wikipedia is intended to be translated into several Asian languages that are content starved using hybrid SMT and crowdsourcing.  Meedan is yet another example of how SMT and a community can work together to translate interesting content quickly at high quality levels to share information. There are many more.

While stories of MT mishaps and mistranslations abound, (we all know how easy it is to make MT look bad), it is becoming increasingly apparent to many, that it is important to learn how to use and extend the capabilities of this technology successfully. While MT is unlikely to replace human beings in any application where quality is really important, there are a growing number of cases that show that MT is suitable for:

· Highly repetitive content where productivity gains with MT can dramatically exceed what is possible with just using TM alone
· Content that would just not get translated otherwise
· Content that cannot afford human translation
· High value content that is changing every hour and every day
· Knowledge content that facilitates and enhances the global spread of critical knowledge
· Content that is created to enhance and accelerate communication with global customers who prefer a self-service model
· Content that does not need to be perfect but just approximately understandable

The forces that drive the interest in this technology continue to build momentum. Disruption is coming and much of the momentum is from outside the professional industry. I believe there is an opportunity for the professional translation industry to lead, and to develop and demonstrate best practice models that others will follow and emulate. Some may even learn to build competitive advantage from their use and superior understanding of how to leverage MT in professional projects.

I invite those interested in a productive professional dialogue to join the Automated Language Translation group in LinkedIn to come and explore how to learn to use this technology to professional advantage. I think we will continue to see more companies learn to use MT technology and I look forward to changing the archaic translation model that rules today for large and massive scale translation projects.

So here you have finally a real MT focused blog entry.

Sunday, January 17, 2010

Rising Asia and its Implications

Somebody whose opinion I really value, just told me that my blog is about "men fighting" after seeing my first three posts. While I really do care about responsible free speech and will always aggressively defend it; enough about censorship and moderator abuses and let's move on.

I find the notion of "A Rising Asia" interesting,  and I think it presents a major opportunity for the localization and translation industry in the years to come. I was first introduced to this concept many years ago by Nick Kristof and Sheryl WuDunn in their excellent book, Thunder from the East(They were also among the first to point out the promise of China/Rising Asia, several years before the "experts" did.)  I really liked the book because it also quickly provided an economic history of the world and is a very easy read.

I thought it would be good to update and expand upon an article I wrote for GALA a little while ago. I have been surprised how little real awareness there is even in the localization industry where leaders often equate Asia with China & Japan (CJK).  This was really brought home at the #LTBKK conference, when I saw what a revelation Biraj Rath's excellent presentation on the Indian localization market opportunity was, to many industry experts. LISA, to their credit announced an India Forum shortly after the conference.

So I thought it might be useful to provide a basic primer on the broader Asian market opportunity. For some in the L10N industry this might all be obvious but here goes anyway. I am interested because:

- There are a lot of people living in Asia (maybe 95% of the next billion Internet users)

- The Internet has very low penetration thus far. Multilingual content will very likely play a key role in driving increasing  penetration and commercial opportunity.

- They will need a lot of information quickly (huge opportunity for automated translation technology)

- Largest concentration of young people in the world (also the Middle East and Brazil) 


Asia is extremely diverse, economically and culturally, but yet there are some strong common elements. It is also much less connected than Europe. Today, we are aware of the current economic momentum that India and China have, historically they both also had a deep and lasting cultural influence on much of Asia. An awareness of this history is very useful in developing effective business strategies for different countries. The internet is only just beginning to take root in much of Asia (18% vs. 73% for North America), however, it is expected that almost half of all Internet users will be Asian by 2013. Already, China has more people online than the US. Asia could be a major opportunity for companies that learn to tap into this new emerging online population. But this will require an understanding of the diversity and characteristics of the various segments and will also need new approaches in communication and marketing. Asian economies continue to rise in importance and growth, as both a supplier and consumer. Today China and India are the largest mobile phone markets in the world.

Some interesting and perhaps less known facts about Asia that provide a useful contrast to Europe are shown below. They also give one a sense for the different type of opportunities available and the differing reality of Asia.

-GDP per Capita in Asia (~$15,000) is less than half of the EU average and there is a much wider standard distribution and a large population living in poverty throughout the continent.
-While India and China are among the fastest growing economies in the world, the GDP per Capita is $2,800 for India and $6,000 for China and they should still be considered developing economies.
-The top GDP/Capita countries (2008) in Asia are: Singapore ($52K), HK, Japan, Taiwan, South Korea ($23K), Malaysia, Kazakhstan and Thailand ($8.5K).
-India has 22 official languages that are as distinct and different as the 23 EU languages, and also include at least 6 different scripts. English is only spoken by about 7% of the people in India. However, it is possible to get deep penetration into the Indian market with 5 key languages.
-There is very little local language content for Asian languages on the web in general. Based on a survey done by Asia Online in 2007, less than 15% of the total content on the web is in Asian languages. Almost 90% of the Asian language content is in Chinese and Japanese. There is huge need for more local language content all over SE Asia.
-Mandarin is beginning to edge out English as the preferred 2nd language in Asia
-China is now the fastest growing patent office in the world. The WIPO and others state that China is clearly an emerging scientific and technological power.
-The share of Asian country based patent filings is now in excess of 50% of all patents filed across the world.
-India has more gifted and talented students in high school than the total school student population in the US.
-China has more students in Science and Technology college degree programs than India and the US combined.
-McKinsey has identified a “Rising Asia” as a stable long term trend that will fundamentally change consumption patterns. Gartner suggests using IT to reach the market. They suggest that global companies use IT to ‘lighten’ their Asian business model to address the specific cultural, geographic reach, and supply chain considerations.
-The wealthy Asians are concentrated in major cities like Shanghai, Beijing, Hong Kong, Singapore, Kuala Lumpur, Mumbai, Delhi, Seoul, Manila and Bangkok.
-China is now the fastest growing market for Bentley and BMW.
-More cars are now sold in China than in America.
-Even countries like Laos, Nepal, Pakistan, Sri Lanka, Myanmar, and Cambodia which have very low GDP/Capita are interesting markets for cell phones and basic commodities.
-An understanding of Buddhism, Hinduism and Confucianism cultural perspectives can dramatically enhance your communications strategy into most parts of Asia.
-The fastest growing FaceBook markets in 2H2009 are Taiwan, Indonesia, Philippines and Thailand.
-Google is not dominant in key Asian markets, in Korea they have less than 2% search market share and they are a distant second in China and Japan. Maybe even completely out of China soon. Local companies dominate because of better understanding of local content, language and customer preferences. This suggests that standard US approaches may not work as well in many Asian markets.
-Chinese social networking startups have produced many innovations that have led to them becoming profitable much faster than US equivalents like MySpace and Facebook.  We are now seeing Asian innovation gradually making its way to the west.
-Most of Asia has been relatively unscathed by the global financial and real estate market collapse.
-India is increasingly considered a "soft power". Influential culturally way beyond it's direct sphere of influence.
-The venture capital markets in India and China are rapidly developing with help from "returning" entrepreneurs and hostile US immigration policies.

    But simple strategies like simply making your web content available in the local language may not work. Asian cultures may look superficially similar and even western on the surface, but can have deep cultural differences. The localization market is estimated to be $1.5B in 2010 and could grow dramatically. My sense is that those numbers miss much of the impact of recent growth as the Facebook trends show, mobile computing and successful bottom of the pyramid marketing strategies.

    All of these factors point to fundamental shifts in the global economy and indicate that many of these trends will accelerate further. Asia is a significant opportunity for informed globalization managers -- and probably key for long-term leadership for many global enterprises.

    Global companies need to develop broad and unique country-specific strategies to be able to prosper and thrive in this rapidly changing world. Localization and translation will be key elements of any successful globalization plan and should present significant opportunities to vendors that prepare for this change.

    It's wise to remember that the Chinese ideogram for "change" can also mean "opportunity."

    Friday, January 15, 2010

    Back to Fundamental Questions

    This is an attempt to return to the core questions on the role of associations raised by Renato in his blog. This again is content from the GALA group discussion in Linked In.

    Before we got sidetracked by the arbitrary and childishly unnecessary censorship, I think that Renato did raise some questions that many of us in this industry are asking. There is a good reason that his presentation on the Future of the Localization Industry at the Thailand conference has been viewed by almost 800 people (who sit through a 35 minute talk according to the web logs) in less than a month, significantly more than the other speakers who were also excellent. His audience so far already exceeds the number of people attending both the GALA and Localization World conferences added together!  Here are the videos from #LTBKK conference. 

    Clearly people want to hear what he is saying as he is asking questions that many of us are ALSO asking. I list a few that Renato raised as well as some that I have below: 

    • Why are there so many localization industry events? Are they all useful?
    • Would it not be more useful to the community and industry professionals at large to have industry associations work together and produce fewer but higher profile events that get more noticed by the real world? Would fewer more intense and better attended events not be preferred by most of us? 
    • What would inter-association collaboration models look like? 
    • How would collaborative conference content be determined? How would revenues be shared? How could 3 / 4 associations come together to make this happen?  
    • What do we as an industry have to do to get noticed by the WSJ, Time, BBC and other high profile media? Are we content to remain within the confines of our industry journals, as wonderful as they are?

    • Localization managers in most Fortune 1000 companies have significantly lower status and influence than Sales, Marketing, Engineering, Product Management and Customer Support managers? Could the associations not help change this low profile if it really became a focus? 
    • How do we as professionals connect profitably to the huge momentum behind the open translation and crowdsourcing trends? How will social networking and open source affect the localization and translation industry?
    • Where is the leadership in this industry? I cannot point to any that are not often divisive and consistently following a clearly articulated vision and public interest (beyond their own group) but perhaps I am too harsh and perhaps I don't really know. (Actually I don't, but I cant "see" them.) It is often said that we get the leaders we deserve.
    • Are there other ways for associations to raise money so that we do not have this continuing event deluge? 

     While I personally found the GALA conference content better than the Localization World content, I am sure there are many who found the opposite true. (One of the sessions I attended at GALA had two people present. WTF? ) And I see is that both of these events attract very few new people. So if the same people are talking to each other all the time, how does this help the attendees professionally? And we all know that having these two events so close together will mean that many people have to choose one or the other. (Prague in May and Berlin in June - Approved?) Who really benefits from this? Does it not hurt both events?

    In terms of content, networking and real learning, the best events I attended last year were IMTT (it was wonderful to finally hear translators talking to LSPs and vice versa and hear new voices), LRC (leading edge localization research and clear buyer perspectives and hear new voices) and Localization & Translation Thailand (great technology coverage and really diverse views from several association leaders, translators, buyers and vendors). The Thailand event itself was an example of collaboration between LISA, ProZ, Asian Language/Translation Associations, and Asia Online and it certainly had an impact in terms of high quality content. While clearly much could be improved, it is also a model of how multiple groups can work together, share responsibilities and revenue from an event. If ProZ represents 300,000+ translators why are they not and should they not be speaking at every industry conference to get at least some version of the freelancers perspective?

    There is clearly a case for some smaller, focused and specialized events but I for one am having difficulty telling the difference in focus and agenda between many of the bigger and frequent events I have attended. I have also had this conversation with several other people so I think this is a growing sentiment.

    Serge has posted answers to my questions above in the GALA group.Unlike him, I don't have ready made answers. I think we have to hold and consider these questions for awhile, and listen to many voices (no censoring allowed)  so that the action that follows is actually progress.

    I hope that our focus here and elsewhere moves more to these questions as we all stand to gain.It would be a pity to let our attention be derailed by reckless and irresponsible censorship. We need to keep our attention on the real fundamental questions, don't we?

    This blog is supposed to be about MT, right?  Yes, someday soon.

    Renato Presentation: The On-going Evolution of the Localization Business

    Thursday, January 14, 2010

    Censorship in the News

    It is interesting that my first blog entry which talked about censorship coincided with the Google news storm in China.

    In a blog entry that rocked the world they said: "We have decided we are no longer willing to continue censoring our results on" and "we have evidence to suggest that a primary goal of the attackers was accessing the Gmail accounts of Chinese human rights activists." Apparently they are willing to pull out of China if necessary.

    I was heartened to see this, as I have often felt that Google (and others) really had a policy that was more accurately described , "Don't be evil (except if it's inconvenient)".

    The best coverage of this issue that I have seen comes from Rebecca Mackinnon who is sympathetic, Imagethief who provides some analysis, Techcrunch who is skeptical about Google's real motivation and James Fallows who looks at the big picture political implications. The WSJ suggests that the China issue was a major moral dilemma for Russian founder Sergey Brin who felt strongly about not supporting censorship (unlike somebody else we know). I also found a Chinese perspective at China Youren interesting. Ars Technica (love that name!) suggests that Chinese hackers infiltrated automated systems set up to provide information to law enforcement in the US government. Ahh, the plot thickens.... we are being watched too, huh?

    I REALLY like that I can gather different perspectives to get a "real" sense for what this means. The adult world (where people are allowed to speak) is so cool sometimes, but so messy and muddy isn't it?

    Thus, I have decided to take @localization's advice and just air some of my comments from the LinkedIn censorship firestorm "out there" and make them visible in my blog, outside the Papal Conclave so to speak. There is too much in all to put everything here, so these are just my preferred highlights. So back to our little world.

    Selected comments (mine in small font) from the 80 comment original Linked In discussion:

    I happened by chance to read the original post that Renato made and while I saw that he clearly had a different opinion and viewpoint, I saw nothing offensive in language, style or intent of his deleted posting. I wish I had copied it so that others could see how sober and tempered it actually was.

    It is unfortunate that you (Serge) chose to delete it based on your sole judgment. To my view this is an abuse of privilege and "power". There was nothing in his post that was not already stated in his blog and the original discussion was launched and stated in a very positive way.

    I guess we should all be wary of making a statement that holds opinions different from yours lest we be judged and deleted

    Serges Response: Kirti, thank you for joining in. There's a fine line between strongly opposing views and strongly aggressive offending views. ...........

    Serge Response after several opposing views: What keeps me going here are private comments (one shown below ==) that I am getting (of course people understandbly (sic) are not too willing to get Jussara-messages
    (another villain with an opposing opinion from the discussion ...KV
    ) addressed to themselves):

    I really do not have time to write a formal comment on this thread to support you. But I think you are doing the right thing. I have been to a few forums where people keep bashing each other, leading to the demise of the group. Just stand firm.


    My response: It would be wonderful to actually hear from one of your "supporters" as I am not sure how anybody could support your action without actually seeing the content that was deleted, unless they have already decided that anything Renato has to say is irrelevant.

    I actually saw and read the actual posting that was deleted and it was clear to me that a lot of thought and care had gone into writing it.

    Also you seem to forget that your opening comments were less than respectful to an initial very benevolent statement encouraging comments and discussion on how associations can raise money through means other than events. I think the tenor and tone of your first comments pretty much speaks for itself.

    I don't necessarily have to agree with everything he said but I certainly respect the right of a member to state an opinion, especially one that to me seemed to have been done with great care, even if it is different from mine.

    Also, perhaps you overestimate the "support" you have - it certainly does not seem to be strong enough to get somebody to actually step up and say it out aloud in public.

    Censorship always works best when it dark, veiled and hidden. You don't have to justify it then.

    I for one will always be suspicious of deletions in this group from this point on.

    And the basic point he (Renato) made at the outset is still valid: how can we get fewer, higher quality events and more collaboration between the associations to build a better future for every Localization Professional?

    --- and me later again after reading the recreated version of the deleted post:

    Having read the original post, I think the recreation of the original posting on Renato's blog is a very close if not an exact replica of his posting that was deleted, especially in terms of tone, tenor and substance.

    And again, I have to say from any reasonable moderation stance that I can think of, I cannot see a good reason for deleting it simply because he may/may not have said that "XXX has lost its direction or leadership". Since it was deleted we will never even know if this is true.

    Surely, we as professionals can discern and handle this level of comment without getting defensive and resorting to suppression and arbitrary censorship .

    I fundamentally question the judgment made by the moderator that characterized this as negative enough to be suppressed and deleted. In my opinion this was clearly a use of "excessive force".

    The news is filled with anti-Obama (or any current administration) comments constantly and this is (unfortunately) very much a part if not the very essence of democracy - the haters and ugly voices are quickly identified and mostly dismissed by most reasonable people.

    I am glad that there are other forums within LinkedIn and the web where openness and free-speech are seen as less threatening.

    Frank Wang said.. (now the conversation moves to Renato's blog)
    Kirti, We are adults. We all know the politically correct statement kids pick up in middle schools. Rubbing it time and again into the readers' face does not make your argument any stronger. Each organization/group has its own rules and policies, to keep them running the way the organizer sees fit. When you join a group/organization, you accept the terms. Or you can choose not to join or to leave. It's that simple. Don't complicate the issue with politically charged terms which we all know by heart. There is enough negativity/bashing both in the LinkedIn thread and here, from those defending Renato. "USSR", "dictatorship", "dark, veiled and hidden" (from yourself). If those are not bad enough, how about this one: "Following the thread of the dispute there, I wondered at times whether his command of English was really up to the task of moderation and actually understanding a point being made in all but the simplest language." May I paraphrase it into "You moron, you don't know what you are reading or doing"? Has Renato or any of his defenders said anything about this? I would definitely quit a group that allows this attack on a colleague.

    I guess I continue to speak because I did NOT accept or expect that a comment like the one that was deleted could or would be removed without some kind of due process.

    And precisely because we are adults, we do not need to make examples of personally-oriented disrespectful remarks. They speak for themselves, don't they?

    You should watch CSPAN or British parliamentary proceedings sometimes if you think this discussion was not civil.

    Also, you misrepresent my comment which was: "Censorship always works best when it dark, veiled and hidden. You don't have to justify it then."

    In functioning censorship nobody protests because nobody knows.

    In this case the censorship was visible and questionable and that was made it disturbing and worth a little bit of furor.

    Maybe we do need to pick up our middle school textbooks and refresh our minds on why it is important to speak up when you see something that you believe is just plain wrong.

    I hope that the furor will cause some change and raise the level of transparency and accountability in all the groups in LinkedIn. As a community member I reserve the right to be heard, especially when I speak with civility and respect.

    The group does not belong to Serge, it is only what it is and has value, because the community has decided to trust it's integrity and moderation.

    Perhaps I am overly sensitive as I grew up in South Africa under apartheid = institutionalized racism. One thing I learnt very clearly from middle school there: nothing will change if you do not speak up.

    I am glad that we are able to have this discussion and I do appreciate the point you make about how some people do make personal and ethnically based insulting remarks. I agree it is not appropriate.

    But as CSPAN shows, democracy is messy but still worthwhile.

    And back to the GALA group forum in Linked In now:

    Serge to me: Kirti, exactly what change do you want to bring? I am not sure that this is clearly verbalized.

    My response: The change I would like to see is that anybody who writes the kind of posting that Renato wrote BE ALLOWED TO DO SO. There was nothing in his post that warranted a unilateral deletion. If you felt that there was something was offensive you needed to at least talk to him BEFORE you deleted it. I read it initially and will continue to defend his right to say what he did even though I actually don't agree with him.

    As a moderator I delete blatant ads posted as discussions all the time. Once I deleted somebody who posted an ad for Used Cars and also blocked this person from the group. But I always leave any serious MT focused issue alone even if I think it is wrong. The community decides what is interesting and what is not.

    These forums are not the personal playgrounds of the moderators, and the community members can and should hold them accountable in exchange for their trust, involvement and engagement.

    Leadership in the forums is most clearly demonstrated by giving voice to opinions that are different in a fair and equitable way.

    I think an apology would be in order.

    Having said that, I actually feel that GALA produced a better conference than Localization World in terms of content. Miserable in terms of location and attendance. It makes sense to me that they take a greater leadership role in defining the agenda of major conferences (Buyers, Vendors AND Freelancers) and perhaps make money from it too, but I would love to see more accommodation for freelance translators, lower rates for them to attend etc...

    And I agree it that it would be great if there were fewer more collaborative events held. I am still surprised that the associations don't work together more often to get collaboration happening, even at the individual member level. Isn't feudalism over?

    I hope we see more collaboration happening in 2010.

    Messy isn't it? And I said that this blog was going to be about MT didn't I?

    Tuesday, January 12, 2010

    Introduction and Soft Censorship

    This is my first attempt at blogging, though I have been quite active in Linked In and on Twitter.
    I am calling this blog "eMpTy Pages" because it will often be about MT (Machine Translation) or Automated Language Translation as I prefer to call it. Also, there is a song by Traffic by the same name that I like, and one of my favorite quotes related to machine translation is: "The history of MT is filled with eMpTy promises."

    As we all know, MT is not the most scintillating topic so it will often, or at least sometimes be empty i.e. I won't have anything to say. I am not sure how often I will have anything to say, so I make no commitments at this time. I find Twitter easy to do while I do my real work. I am not sure about blogging. I promise to continue to do this if it is easy, fun and does not become a technical challenge.

    I have decided to do this independently, as I may not always be representing the views of my employer and they may at times prefer to keep a distance - to a great extent it will just be my musings and thoughts about the topics listed below.

    We have Dave Grunwald @davegrun to thank (or blame) for pushing me into this and moving me out of my inertia by naming me as a top blogger in the industry even though I technically did not have a blog.

    I reserve the right to speak about things I find interesting in general, though I will try and keep the bulk of my posts focused on Translation Technology, Localization, Globalization, Internet Trends, Social Networking, Crowdsourcing, Collaboration, Global Business and things like that.
    So now that the introduction is done on to my first entry.

    I was recently moved to comment online, about what I felt was clearly unwarranted and unjustifiable censorship, and yesterday I realized that I had been subjected to a similar exclusion. I would characterize my experience as "soft censorship" in contrast to the arbitrary deletion that @renatobeninatto faced. So anyway here is the excerpt from the discussion on TM sharing in the Automated Language Translation Group in LinkedIn I wrote yesterday. (Repurposing already.)

    As the New Year begins I notice that the TDA has publicized several data consolidation experiments last year, which show the unambiguous and definite benefit of sharing TM. The results are all positive and wonderfully, there are no problems or no failures. It ALWAYS works. Using TDA data is ALWAYS beneficial.

    Unfortunately not all of us believe this. Especially one such as me who has seen that this does not always happen.

    I have also noticed that TAUS has decided to keep the results of the study I was involved with, on the "down low" i.e. not mention it at all. The results of the Asia Online data consolidation experiment showed that all data is not equal and that some work needed to be done to make TM consolidation work well. This is well documented in this LinkedIn thread and the detailed 50 page Asia Online report on their specific experiment with TM data consolidation that can be downloaded from here.

    Like everything else in data processing, SMT does indeed follow the "Garbage In Garbage Out" rule. This should lead to questions that help your possibility of success like: What is Clean Data for SMT? How does one keep data in a format useful for both TM and SMT? However, this did not happen.

    I bring this up now because there has been a little bit of a storm in the Localization Professional group where I was disturbed by the arbitrary censorship of a carefully stated differing view. Source of the Controversy in this LinkedIn group.

    If you have an hour or two to kill, go and take a look, it is both a waste of time and quite interesting. In my opinion it was also quite wrong and an abuse of moderator power.

    It struck me that the unwillingness to share and make the Asia Online report visible is also a kind of soft censorship and I began to wonder why. This, perhaps explained why I felt such a strong urge to make some statements in the @renatobeninatto discussion (apart from it simply being the right thing to do) as I really felt it was wrong.

    Could the test that Asia Online conducted have been flawed?
    The report shows in detail what the process and methodology was and invites scrutiny and process criticism and thus deliberately provided gory detail. I have never heard any direct criticism so I am not sure.

    Could it be that since we were not, and still are not TAUS or TDA members we were being kept out of view as it is reasonable that only TDA members should get the limelight?

    Could it be that somebody did not care for the suggestion that all data is not equal and that cleaning and normalizing TM was necessary?

    As there was no feedback I am not really sure, and can only ask these questions and hope that somebody will step up and provide clarification.

    My intent is not merely to poke fun or provoke, I truly do believe that the TDA will be taken more seriously if it provides some information about when TM sharing does NOT work, for surely there are some cases where this is true. I know of at least two instances where this is true, even in a common domain.

    We all need to learn what is necessary to do, to make an initiative like TDA work and I think it would be helpful for everybody to know why it sometimes might not work. As specifically as possible.

    It might also be useful to get a better understanding not only on what clean data is, in terms of SMT, but also what makes one TM policy/process work better than another when used in SMT. There might be some value in this for all those interested in using shared TM for SMT engine building even beyond TDA.

    I am a strong believer in openness, transparency and in the promise of Web 2.0 and "real" collaboration to bring about change. These characteristics together, I feel, can produce real meritocracies and functioning, effective organizations and governance, so I continue to reach out. And when I sense a lack of openness, I am filled with curiosity about what are they hiding, and why.

    Anyway, here's hoping that I find out answers to this, and many other questions I have in 2010.

    I wish all those who make it to this blog entry a wonderful, prosperous, discovery and collaboration-filled New Year.

    P.S. My guide on the mechanics and spirit of how to start this blog was Penelope Trunk. Just do it she says, and I like her because she is authentic and real.

    P.P.S. Staring at empty pages
    Centered 'round the same old plot
    Staring at empty pages
    Flowing along the ages

    Listen to the song
    Empty Pages