Tuesday, March 15, 2011

The Future of Translation Memory (TM)

There have been several voices talking about the demise of TM recently, most notably Renato Beninatto who has made it a theme of several of his talks at industry conferences in the true agent provocateur spirit. More recently, apparently Jaap van der Meer said the same thing (dead in 5 years no less) at the final LISA standards summit event. (My attempt to link to the twitter trail failed since @LISA_org is no more). This resulted in comments by Peter Reynolds and some commentary by Jost Zetzsche (published in the Translation Journal) questioning these death announcements and providing a different perspective.

Since there have been several references to the value of TM to statistical MT (which by the way are all pretty much hybrid nowadays, as they try to incorporate linguistic ideas in addition to just data), I thought that I would jump in with my two cents as well and share my opinion.
So what is translation memory technology? At it’s most basic level it is a text matching technology whose primary objective is to save the professional translator from having re-translate the same material over and over again. The basic technology has evolved from segment matching to sub-segment matching or something called corpus-based TM.(is there a difference?) In it’s current form it is still a pretty basic database technology applied to looking up strings of words. Many of the products in the market focus a lot on format preservation and this horrible (and somewhat arbitrary quantification, I think) concept called fuzzy matching, which unfortunately has become the basis for translation payment determination. This matching rate based payment scheme I think is at the heart of marginalizing professional translation work, but I digress.

It makes great sense to me that any translator working on a translation project be able to easily refer to their own previous work, and possibly even all other translation work in the domain of interest to expedite their work. There are some, if not many translators who are also ambivalent about TM technology e.g. this link and this oneMy sense is that the quality of the “text matching technology” is still very primitive in the current products, but the basic technology concept could be poised for a significant leap forward to be more flexible, accurate and linguistically informed in other parts of the text-oriented tools world, e.g. Search, natural language processing (NLP) and Text Analytics, where the stakes are higher than just making translation work easier (or finding a rationale to pay translators less). Thus, I would agree that the days are numbered for the old “klunker-type TM” technology,  but I also think that new replacements will probably solve this problem in much more elegant and useful ways.

The old klunker-type TM technology has an unhealthy obsession with project and format related meta-data and I think we will see that in the evolution of this technology that linguistics will become more important. We are already seeing early examples of this next generation at Linguee. In a sense I think we may see the kind of evolution that we saw in word-processing technology, from something used by geeks and secretaries only, to something any office worker or executive could use and operate with ease. The ability to quickly access the reference use of phrases, related terms and context as needed is valuable, and I expect we will move forward in delivering useful, use-in-context material to a translator who uses such productivity tools.

It is clear that SMT based approaches do get better with more TM data and to some extent (up to 8 words) they will even reproduce what they have seen in the same manner that TM does. But we have also seen that there are limits to the benefit of ever growing volumes of data and that it actually matters more to have the “right” data in the cleanest possible form to get the best results. For many short sentences, SMT already performs as a TM retrieval technology, and we can expect that this capability will become more visible and more controls may become available to improve concordance and look-ups. We should also expect that the growing use of data-driven MT approaches will create more translation memory after post-editing, so TM is hardly going to disappear but hopefully it will get less messy. In SMT we are already developing tools to transform and change existing TM for normalization and standardization related reasons, to make it work better for specific purposes, especially when using pooled TM data. I think it will also be likely that many translation projects will start with pre-translation from a (TM+MT) process and hopefully a better, more equitable payment methodology. 

The value of TM created from the historical development of user documentation is likely to change. This user documentation TM that makes up much of what is in the TDA repository is seen as valuable by many today, but as we move increasingly to making dynamic content more multilingual I think it’s relative value will decline. I also expect that the most valuable TM will be that which is related to customer conversations. Also, community based collaboration will play an ever increasing role in building leverageable linguistic assets and we are already seeing evidence of MT and collaboration software infrastructure working together on very large translation initiatives. It is reasonable to expect that the tools will get better to make collaboration smoother and more efficient.

I found this following video fascinating, as it shows just how complex language acquisition is, has delightful baby sounds in it, and also shows just what state-of-the-art database technology looks like in the context of the massive data we are able to collect and analyze today, all in a single package. If you imagine what could be possible as you use this kind of database technology on our growing language assets and TM, I think you will agree that we are going to see some major advances in the not so distant future, since this technology is already touching how we analyze large clusters of words and language today. It is just a matter of time as it starts impacting all kinds of text repositories in the translation industry, enabling us to extract new value from them. What is emerging from these amazing new data analysis tools, is the ability to see new social structures and dynamics that were previously unseen and to make our work more and more relevant and valuable. This may even solve the fundamental problem of helping us all understand what matters most for the customers who consume the output of professional translation.

Tuesday, March 8, 2011

The Changing Face of Localization (Professional Translation)

I was at a party recently where somebody asked a Language Service Provider, what they did in their professional work. It was amazing to witness how completely mystified the questioner was by the response which included the word “localization” several times. It took several minutes of conversation before the person (who admittedly was a little slow) gathered that it involved translation for business purposes in some way. To my view “localization” is not a great word, to get the general-world-out-there, engaged and interested in, or even just understand what you do. Looking at the localization entry in Wikipedia explains the confusion felt by the average guy on the street; the word has different meanings in translation, psychology, medicine, physics, mathematics and more recently even in location based services like FourSquare and Facebook Places.

Does it really matter? (I think it does, especially in interactions with people outside the translation industry, = the real world?) I have always felt that it is important to be able to communicate what you do, quickly and easily in casual social settings to enhance your professional life. Good casual social interaction can often lead to useful professional references and interactions, but only if people actually understand what you do. It matters even more when you as an industry are trying to increase your visibility to the world out there. I think the word localization may have made sense when the focus was only “software and documentation localization” (SDL), but this view of what we do is increasingly being questioned in terms of overall value to generating and facilitating international business.(BTW the word “transcreation”, to my mind is even worse in terms of obfuscation and classic HUYA-ness.)

Ironically, I had a brief Twitter exchange with Ultan O’Broin (aka @localization) discussing this shift.

(Unfortunately the service I used to show the conversation is now defunct. And since Twitter makes it so hard to get old conversations it is pretty hard to retrieve those snippets.) 

We were basically discussing data interchange standards in the translation industry (TMX, XLIFF) and Ultan said something that I thought was very insightful about the old SDL view of the business:”people don't get "structure". Obsessed with formatting, still”. This helps to explain the relatively low status of localization professionals in most global enterprises. The view is that, the localizers handle the translation production of carefully formatted material that goes into product packaging, and some pro-corporate, self-congratulating, mostly irrelevant content on the corporate web site. Thus, it is not surprising that localization professionals have kind of a secretarial status in most internationally focused business groups. They provide basic services and assistance to international business initiatives. As Ultan said, they have an administrative assistant view rather than a system administrator view on information flows related to international business initiatives.

As I have stated in previous posts the world is changing, and to stay relevant we need to also change what we do, how we do it and why we do it. At the executive level of global enterprises, it is increasingly becoming clear that customer decision making processes have changed, largely due to open and free access to more information. This information is increasingly created outside the global enterprise and is not easily controlled by stakeholders within the global enterprise. In many industries global customer conversations are MORE influential in driving customer behavior (and corporate sales) than corporate content. To be relevant, we need a new mindset that looks at the flow from information creation (internal and external) to information consumption and has an honest and real focus on the final customer. Real conversations with real users matter more than corporate content and some are beginning to realize this. Value needs to be defined by how useful a customer finds the information, not by how many translation and formatting errors there are in a user manual that few are likely to read. Ultan is at the leading edge of this new focus in an area called User Experience (UX) and thus we should all be listening to what he and others like him have to say.

Here is a more detailed overview on these broad changes from my viewpoint at a recent Localization ;-) Technology Roundtable seminar in Palo Alto:

An interesting aside: I was informed by an SDL Plc marketing representative that I would not be welcome at their recent SDL Innovate  event in Palo Alto because of “my position at Asia Online”, however they did admit that, “we will look forward to seeing you at future industry events.” To be honest, I did apply as Kirti Vashee, CEO of Maya Acoustics (which I truly am involved with). But unfortunately the expert marketing department sleuths there tracked me down as the author of this blog post. (Hmm, could it have been my name?)  Or perhaps because I think that associating SDL with Innovation is oxymoronic, or perhaps because I represent competition that is feared and formidable. Apparently Renato and people from TermWiki were also denied admission into the compound.

Interestingly the keynotes brought forth new supporting data for many of the themes I have been writing about in the last year (I think so anyway): Openness, Customer Focus & Collaboration, Standards and Information Flow. Maybe I am biased, but do you really see these as themes that resonate and receive meaningful commitment at SDL?

Some highlights (gathered from Twitter, thank you @scottabel) from Toby Bell of the Gartner Group:
  • People are becoming brands
  • More stuff is uploaded to YouTube in 60 days than all television networks combined have created in 60 years (Yes indeed, UGC is for real)
  • Everything is interactive. If you're not polling or offering live interactive contact with customers, you're missing out on the engagement opportunity
  • Manufacturers, retailers now allow customers to create documentation and interestingly this content is often better than what their own employees create
  • You must tune the experience (with the "right" content and information about your products and services) to the users goals
  • Corporate leadership still views web content management as a publishing function. It's not. It's really about customer experience.
Some highlights from Marcia Metz, EMC on Information Liquidity:
  • Information is an asset that can be turned into revenue
  • We can't keep pace with the growth of the volume of information and speed and efficiency are becoming more critical to business success
  • We are working to provide information as a service. Content creators and consumers should be able to collaborate for best results
  • Information liquidity requires a comprehensive platform that is standards-based, organized and manages the full information life cycle from co-creation to consumption
There was also another interesting Twitter based discussion that focused on the dubious value of TMS systems. While I do understand that translation projects have been messy historically, and that some level of automation is required to make things more efficient, I have many doubts about the "solution" that many have chosen. Most of my doubts are about relative value not absolute value. Why are there so many TMS systems? Why do they all have such small installed bases? Why does every LSP and Corporate Localization department think that their translation project management process is so unique that it can only be properly automated by creating a new TMS system?  Could this not be better accomplished by using more mainstream (= installed base of hundreds or thousands) collaboration and database tools? Jaap van der Meer of TAUS stated at the now defunct LISA's final standards summit event: GMS/TMS will disappear over time, in favor of plug-ins to other systems. Adam Blau surprised me at our technology roundtable meeting in Palo Alto when he said that milengo does not use or believe in TMS systems. The reason: Too much investment for too little return and a reduction in overall flexibility. You can see him say it in his own words here. He also provides some interesting observations on best-of-breed tools that they use, and the issues related to developing a specific technology agnostic strategy in his talk.

Also, the future of translation I think will see more deployments of collaborative communities or crowdsourcing. The Monterey Institute of International Studies (MIIS), announced recently that they’re deploying Lingotek’s hosted Collaborative Translation Platform. As we head into more 10 million and 100 million+ word translation initiatives, this kind of collaboration facilitating infrastructure becomes more and more necessary. It is interesting to see that few if any universities ever adopted translation memory and TMS tools into their curriculum in the past. Tools that enable and facilitate open collaboration and easily integrate into mainstream content management software infrastructure become ever more important. The people that lead the charge in these new initiatives are often not from the localization community and they seem to understand that data must flow freely for the tools to be useful.

We are at a point in time where it can be recognized that professional translation efforts focused on customer conversations are actually impacting overall international business success. Quite possibly we are at a point where what we do (enable and facilitate global customer conversations), is seen as driving customer satisfaction and customer loyalty and thus international revenues across the globe.

And if there is somebody out there who could educate me about personalization, I would love to learn more, as I think that personalization and mobile will also grow in importance as drivers for building international business. The coming shift to mobile is hopefully obvious to everybody, there are already 4X as many mobile (cell phone) users in the world as there are online users and we should expect that a lot of information consumption will shift to these devices as they get more powerful. This does not mean that PCs go away, but rather, that the conversation becomes more mobile and free form.

So if the future is about more free flowing conversations with customers and much more dynamic internal and external content, we should be thinking about new ways to describe what we do. I think our new description will likely include terms like professional translation, collaboration, global customer engagement and effective use of translation technology. While traditional localization work is unlikely to disappear, I think the best is yet to come and some of us will be lucky enough to be involved with world changing initiatives.

For a cool response on "what do you do?" see what that other localization expert had to say: "We are building technology that facilitates serendipity." --Dennis Crowley, Foursquare co-founder, as quoted by the Los Angeles Times. For me personally, I like the sound of: "We develop technology that enables global enterprises to talk to their customers across the world and we also help to address and alleviate information poverty in South East Asia".