Monday, March 22, 2010

Why Machine Translation Matters -- Part II

I just spent the last few days at the ATA-TCD Conference in Scottsdale AZ. You can read highlights from the Twitter stream by searching on #TCD11. While it always nice to be in the sun for a few days, it is encouraging to see people in the industry focused on change along key dimensions like standards, technology and automation as well as the impact of social media on business strategy. I enjoyed several thought provoking presentations and discussions I had with many attendees during the conference.

One of the sessions I did was with Alon Lavie and Mike Dillinger who (both represent AMTA leadership) gave a very useful overview for LSPs to get a better, more realistic sense about MT and provided a basic primer on the subject. I thought that since my original blog post on this subject is my most popular post it might be useful to further develop this theme.

My original post focused on how MT could help address information poverty.Here are some of my new comments on why MT matters from the presentation at TCD11. The issue of growth in the sheer volume of information is increasingly clear to most but it is worth restating with some specific projections from IDC and EMC who monitor this very closely. The following chart shows projections just on enterprise content volume.
Enterprise Data Growth

In actual fact the fastest growth is actually in user generated content (UGC) e.g. blogs, FB, Youtube, Flickr and community forums. It is estimated that 70% of the content on the web is UGC and much of that is very pertinent and useful to enterprises. This content is now influencing consumer behavior all over the world and is often referred to as word-of-mouth-marketing (WOMM). Consumer reviews are often more trusted than corporate marketing-speak and even “expert” reviews.We all have experienced Amazon, travel sites, C-Net and other user rating sites. It is useful for both global consumers and global enterprises to make this multilingual. Given the speed at which this information emerges, MT has to be part of the translation solution though involving humans in the process will produce better quality.
UGC Importance

So if this is going on, it also means that what used to be the primary focus for the professional industry, needs to change from the static content of yesteryear to the more dynamic and much higher volume user generated content of today. This is often where product opinions are formed and this is also where customer loyalty or disloyalty can form as the customer support experience shows. This is what I call high value content. The following chart shows that MT will play a critical role in making this content more visible because it is high value and because of the sheer volume.
Shift to Dynamic

I also found another powerful argument for any multicultural society like the US and UK in this paper by Julia Alanen. She points out that language barriers keep 25 million non-English speakers deprived of critical government services (in the US) and that this also affects the rest of the population. While she focuses on the need for translators and interpreters, the content explosion is hitting this sector too. She point out:
Deprivation of plenary language access undermines human dignity, exacerbates many immigrants’ innate vulnerabilities, and harms society at large by impeding the efficacy of the healthcare and justice systems.
Getting back to the TCD conference, I was glad to see that several people (LSP leaders) asked me how they could learn more about MT and get more engaged with the technology. AMTA is proactively reaching out and trying to connect to the ATA by timing their conference to expand collaboration with the ATA. This is heartening to see and quite a contrast to negativity and the dueling conferences we see in other parts of the localization industry.

I also saw a quote from June Cohen, Executive Producer of TED Media at SXSW that I think is pretty wonderful (even though it may be naive and idealistic) when she was asked "What technology would you like invented? Or uninvented?"
"Instantaneous, accurate translation online. Nothing would do more to promote peace on this planet." 

Change is coming, and what are initially seen as threats can often be opportunities when one changes one’s own viewpoint. So here’s to change that creates more opportunity. Cheers.


  1. That quote from June Cohen about instantaneous translation and global peace is a wonderful example of what I would call "romantic rationalism".

    In my view, language simply does not work like that, nor does human nature. There are many observable cases of deliberate linguistic differentiation and exclusive peer group language, even within the same nominal language - things like Cockney rhyming slang, ghetto language and the "in" language of youth cultures (in many cases deliberately designed to be incomprehensible to outsiders).

    In my view, this is not just a fringe phenomenon for extreme cases of group dynamics, it is a variant of an underlying human tendency. Many ambitious works of literature have a similar tendency to create their own linguistic context (although usually with less anger and peer exclusivity). In my view, the fragmentation of language in the first "Babel" also stemmed from social differentiation in a flurry of technological innovation and authoritarian division of labour.

    So although MT can and will make progress in dealing with the mechanics of language, it will always lag at least three steps behind the cutting-edge development of human language (step 1: language inventiveness, step 2: consolidation of the invented forms, step 3: analysis and programming). And as for the ideal that is sometimes toted of authors using controlled language as a way of improving MT results, for me this is rather like trying to stamp out violence and strife by appealing to people to be nice to each other.

    Oh well, back to work (humanly translating a very complex text on new developments in German law - way out of reach for any MT system).

  2. Hi Kirti,

    Great post. Ms. Cohen's quote may be somewhat pie-in-the-sky, though I certainly hope she's right. However we must also be aware of the alternative--in the words of the master himself:

    "Meanwhile, the poor Babel fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation."
    - Douglas Adams
    The Hitchhiker's Guide to the Galaxy

    [Any old excuse to quote HHGTTG really. Several great language bits in there that I have yet to compile for some reason...]


    John Weisgerber