Recently I have written about why MT is important for LSPs. MT is a key enabling technology to make large volumes of dynamic high-value business content multilingual. I have also pointed out the significant business value of making customer support content in particular more multilingual.I would like to go into more detail on the specific challenges one is likely to face in translating knowledge base and community content and how this could be addressed.
In most cases support content is likely to be 20X to 1000X the volume, of even a large documentation project so using MT technology will be a core requirement. It is also important that stakeholders understand that human quality is not achievable across the whole body of content and that it is important to define a “good enough” quality level early in the process.
Understanding the Corpus
The first step to developing a translation strategy for “massive” content is to profile the source corpus and understand volatility, language style, terminology, high frequency linguistic patterns, content creation process, and assess existing linguistic resources available to build an MT engine. It is usually wise to do the following:
-- Gather existing translation memory (TM) and glossaries for training corpus
-- Identify sections that must be human translated (e.g. security, payment processing terms and conditions, legal content)
-- Analyze the source corpus and identify high frequency phrase patterns and ensure that they they are translated and validated by human translators
-- Identify the most frequently used knowledge base and community content and ensure that these are translated by humans and used as training corpus.
Once this is done, an MT engine can be built and evaluated. While it is important to do linguistic evaluation, it is perhaps even more important to show samples of MT output to real customers and determine whether the output is useful.
It is generally recommended that new knowledge base content is run through the initial engine and the MT translation is analyzed and corrected by human post-editors and linguists until a target quality level is achieved. This process may involve several iterations to continually improve the quality. The whole knowledge base can be periodically retranslated as big improvements in MT engine quality are accomplished. It is important to understand that this is an ongoing and continuously evolving process and that overall quality will be strongly related to the amount of human corrective feedback that is provided.
It is worth restating that there are significant benefits to doing this as the customer support environment evolves with the general momentum behind collaboration and community networks.The ROI in terms of call deflection savings and improved customer satisfaction is well documented and is significant. But perhaps the greatest benefit is the expanded visibility for the global customer who cannot really use the English content in it’s original form.
Microsoft has clearly demonstrated the value of making their huge knowledge base multilingual. At a recent TAUS conference they reported that hundreds of millions of support queries are handled by raw MT and interestingly, surveys indicate that the successful resolution and customer satisfaction in many of these languages is actually higher than it is for English! Others are starting to follow suit and Intel and Cisco have also done similar things on a smaller scale. The CSI presentation by Greg Oxton at a recent TAUS meeting states it very simply:
Content is King -- Language is Critical
Content is King -- Language is Critical
I saw recently that analysts in the content management community have identified the growing demand for multilingual content as one of the strongest trends of 2009 and see it growing further in 2010. The Gilbane Group has a big emphasis on content globalization in their upcoming conference this summer. I was involved with a webinar yesterday with Moravia that focused on the customer support content globalization issue. A replay of the webinar is available here.
The time is now, to focus on and learn how to undertake content globalization projects that start at ten million words and and can run into hundreds of millions of words. This is the future of professional translation and I think that effective man-machine collaborations will be a key to success.