Wednesday, February 17, 2010

The Global Customer Support Translation Opportunity

Recently I have written about why MT is important for LSPs. MT is a key enabling technology to make large volumes of dynamic high-value business content multilingual.  I have also pointed out the significant business value of making customer support content in particular more multilingual.I would like to go into more detail on the specific challenges one is likely to face in translating knowledge base and community content and how this could be addressed.

In most cases support content is likely to be 20X to 1000X the volume, of even a large documentation project so using MT technology will be a core requirement. It is also important that stakeholders understand that human quality is not achievable across the whole body of content and that it is important to define a “good enough” quality level early in the process.

Understanding the Corpus

The first step to developing a translation strategy for “massive” content is to profile the source corpus and understand volatility, language style, terminology, high frequency linguistic patterns, content creation process,  and assess existing linguistic resources available to build an MT engine. It is usually wise to do the following:
-- Gather existing translation memory (TM) and glossaries for training corpus
-- Identify sections that must be human translated (e.g. security, payment processing terms and conditions, legal content)
-- Analyze the source corpus and identify high frequency phrase patterns and ensure that they they are translated and validated by human translators
-- Identify the most frequently used knowledge base and community content and ensure that these are translated by humans and used as training corpus.
Once this is done, an MT engine can be built and evaluated. While it is important to do linguistic evaluation, it is perhaps even more important to show samples of MT output to real customers and determine whether the output is useful.
KB Development Process
It is generally recommended that new knowledge base content is run through the initial engine and the MT translation is analyzed and corrected by human post-editors and linguists until a target quality level is achieved. This process may involve several iterations to continually improve the quality. The whole knowledge base can be periodically retranslated as big improvements in MT engine quality are accomplished. It is important to understand that this is an ongoing and continuously evolving process and that overall quality will be strongly related to the amount of human corrective feedback that is provided.
Self-service KB
It is worth restating that there are significant benefits to doing this as the customer support environment evolves with the general momentum behind collaboration and community networks.The ROI in terms of call deflection savings and improved customer satisfaction is well documented and is significant. But perhaps the greatest benefit is the expanded visibility for the global customer who cannot really use the English content in it’s original form.

Microsoft has clearly demonstrated the value of making their huge knowledge base multilingual. At a recent TAUS conference they reported that hundreds of millions of support queries are handled by raw MT and interestingly, surveys indicate that the successful resolution and customer satisfaction in many of these languages is actually higher than it is for English! Others are starting to follow suit and Intel and Cisco have also done similar things on a smaller scale. The CSI presentation by Greg Oxton at a recent TAUS meeting states it very simply:

Content is King -- Language is Critical

I saw recently that analysts in the content management community have identified the growing demand for multilingual content as one of the strongest trends of 2009 and see it growing further in 2010. The Gilbane Group has a big emphasis on content globalization in their upcoming conference this summer. I was involved with a webinar yesterday with Moravia that focused on the customer support content globalization issue. A replay of the webinar is available here.

The time is now, to focus on and learn how to undertake content globalization projects that start at ten million words and and can run into hundreds of millions of words. This is the future of professional translation and I think that effective man-machine collaborations will be a key to success.


  1. I would like to posit a different approach--while I completely agree that the above MT method is highly effective and procedural in it's decomposition of elements and their reconstitution in other formats according to set rules... Have you heard of structured ontologies? OWL?
    OWL Web Ontology Language

    This markup language is meant for knowledge transfer--not merely a concordance. If, we need to couch this in many disassociative terms and ambiguity: IF we had a method to describe interactions, relationships that were independent of grammar, case, tense, and just about all linguistic mechanics; what do we have? Ontologies! They are methods of constructing a logic-based rules system for determining a set of activities that are valid for a set result. Particularly in Help Desk or other such structured environments you have an opportunity to describe the nature/role of actors and their agency within a defined [by engineering, policy and other set factors] environment or universe.

    This would be a two-fold engagement of first defining the "world" of support, and it's myriad evolutionary actions, methods, limitations and resolutions; and making a "translator" apparatus that constructed meaningful terms to a set audience. It bears mentioning that certain types of language are susceptible to the Goldilocks problem: Too Technical, Not Technical Enough,and finally, Just right. Not to mention the application of linguistic and internationalization concerns, this "translation apparatus" could do more than just machine translation--it does cultural, technical, and whatever-domain-you-can-link to-adaptation.

    Is this for real? maybe. It is out there in the OWL standard and nobody is even thinking of how this would apply to the distribution of knowledge.

  2. This is certainly interesting but I do not know enough about it to comment meaningfully. I would encourage you to post some links that I and others interested in this subject could explore to learn more.

    Thank you for your suggestion.