Tuesday, April 27, 2010

Falling Translation Prices and Implications for Translation Professionals

I recently saw an active discussion in the Localization Professionals group in LinkedIn. The key point raised initially was that lower prices cause lower quality and thus begins a discussion on who is to blame. (There has to be someone / something to blame, right?) Since people’s livelihoods and bonuses are at stake, the discussion can quickly become emotional. The culprits according to the discussion are bad localization managers, freelancers who accept lower rates, lack of standards, lack of differentiation, lack of process, the internet, marketplaces becoming more efficient etc.. In some way all these reasons are valid, but is there something else going on?

I would like to present another perspective on this issue that points to larger forces that are driving structural changes and are also driving prices down and putting pressure on the old order. These forces exist independently of what we may feel about them, they simply are an observable fact of the translation landscape today.


The Increasing Volume of Content

Most global enterprises today, are facing huge increases in content volumes that increasingly demands to be translated. However, they are usually not given increased translation budgets to match this increase in content volume. So they have a few options to deal with this situation:
1) Refuse to translate the content that is out of budget range
2) Reduce prices to current translation suppliers (but increase volume)
3) Look for ways to increase translator productivity (MT, CAT Tools, Crowdsourcing) which can enable them to get more done with the same money i.e. raise individual translator productivity from 2,500 words/day to something much higher (10,000+?). 

Thus, one of the driving forces behind lower prices is the need to make  more and more content available for the global end-customer. The web and the growth in demand for dynamic content affects all kinds of global businesses. We can expect this content growth to only increase in future and the demands for productivity will get louder.  This does not mean that rates will automatically go lower but we are seeing that it does have an impact and that this content growth is also possibly driving interest in MT. How else do you cope with 10X to 100X the content volume?


The Value of Localization Content

Another dimension that should also be examined is the value of what we translate. Most of the professional translation industry is focused on SDL = Software & Documentation Localization and a relatively small and static part of corporate websites. However, it is increasingly being understood that while this “traditional localization content” is necessary, and will continue to be so, it is not where the greatest value lies for a global enterprise trying to build global customer loyalty. Very few people read manuals, in fact, some say that translators are often the only people who actually read manuals, especially in IT and consumer electronics industries. If users do not value this documentation we will continue to see price and efficiency pressures for this kind of translation. Customer and community generated content is increasingly more important, especially for complex IT and Industrial Engineering products. So we have major IT companies like Microsoft, Dell and Symantec who find that the localization content they are translating and producing only services 2% of their real customer inquiries. The customer need and expectation for having a large amount of searchable content continues to grow, and given the velocity of information creation, translation automation and MT become necessary. This is another source of downward pressure on prices, though we also see higher volumes and new attitudes and definitions of translation quality. Global enterprises realize that new quality standards are required for this massive content and new production models and approaches are sought as standard TEP costs and process are not viable to meet these needs.


The Global Demand for Translation & Knowledge

The third dimension is the raw demand for knowledge, information and collaboration from the world beyond the professional translation world. The thirst for knowledge across the globe is driving new collaboration models like crowdsourcing and development of open source software/tools, Open Data initiatives like the Meedan, EC, Opus, World Bank continue to build momentum. It is not clear that costly, membership-only initiatives like TAUS,  will prosper unless they do in fact provide higher quality data and add some real value. We should expect to see the growing use of amateurs as we see at Yeeyan and TED. We also see the growth of fan translations that prove that fans can create viable long-term initiatives, and again this shows that it is possible to reduce translation costs when motivated communities can be built. Adobe recently decided to use the crowd and a technology platform to manage the crowd to further their core business mission of developing international market revenue. And guess what one of the outcomes of this is? Lower translation costs.

So the poor translator and most LSPs are caught in these major shifts which are clearly out of the control of any single player in the localization world. We all (Buyer, LSPs and Translators) have to learn how to deal with these forces as they affect us all. In future, we should expect that ongoing productivity improvements will matter, and skill and experience with translation automation will be increasingly valued. 

I think skill with SMT (I am biased) will especially matter, as it is a technology that brings together TM, crowdsourcing, project management expertise, linguistic competence and collaboration infrastructure in way that can demonstrate and deliver higher productivity, lower costs per word and yet maintain quality. Traditional localization work will likely be under increasing price pressure because of it’s low relative value, and we will begin to see global enterprises that seek to do 10X to 100X the volume of current translations to be done. The price drops perhaps also signal that we are seeing the collapse of the old business model of static repetitive content done in a TEP cycle. While this is painful for all concerned, it is also a time where some will learn how to bring the technology, people and processes together to be able to deliver on new market demands. Recent announcements from Lionbridge and others show that change is necessary but making it mandatory and punitive is to my mind clumsy and ill-advised. (This old command and control mentality is really hard to subdue).  I suspect that change initiatives that do not create clear win-win scenarios for all the parties concerned will struggle and face growing resistance.  

I think that initiatives that marginalize, exploit or strong arm translators, are doomed to fail.  Willing, motivated and competent translators have been the key to delivering quality in the past and will be even more so in future. Disruptive change is not all it is cracked up to be. Even Google and Microsoft found that revolution consisted of a continuing sequence of small changes as they built their empires, and I suspect that, that approach will also work best here. It would not surprise me to find new leadership emerge from outside the industry, for these are surely interesting times, and openness and real collaboration are hard to find in the professional translation world.


Innovation & Collaboration

So how do we respond to these forces? What is an intelligent response to these kinds of macro changes?

I don't have many answers but list some possibilities below, and I do understand that this is very easily said and not so easily done:
-- Find out how to increase productivity while maintaining or even raising quality (Better, more efficient automation processes and effective use of MT especially data-driven SMT that learns and improves.)
-- Work together with global enterprises and find out what the highest value content is. Then learn how to make it multilingual efficiently and effectively as a long-term partnership mission. ( I doubt that this content is software and document localization or a few select pages on the corporate website)
-- Develop better, collaborative relationships with customers (the translation buyer) and the translators who are the keys to quality to build long-term leverage and benefit or all the parties in the game.
-- Develop higher value add services like transcreation (is there really no better word) in addition to basic translation services
-- Develop new business models and new ways to get high-value content done where there is a much tighter and longer term commitment from buyers, vendors and a team of translators i.e. a common and shared mission orientation rather than single project management and orientation. 

This is a time of big structural change and it will require innovation and collaboration on a new scale. The question that I have been asking and I think the market is also asking is:
How do we reduce costs, maintain quality and increase productivity and speed up the translation  process?

I know this is somewhat vague, but I think it should start with a new way to look at the problem. If we look at this as an opportunity or challenge rather than as threats that we bash and attack, we might find some constructive and useful answers. Today, I see overwhelmingly negative feedback on MT, crowdsourcing, open source and even collaboration initiatives from many in the professional translation industry in social forums. Very few trying to understand these forces better. People often go through a sequential emotional cycle of attack, fear, despair before they learn to cope, and eventually even thrive when facing disruptive change. Those who get get stuck at fear and despair, often end up as victims.  It is said that those who ignore the lessons of history are doomed to repeat them. Maybe, we should be talking much more about the Luddites and the “luddite fallacy” at all the conferences we hold. If the Luddite fallacy were true we would all be out of work because productivity has been increasing for two centuries. Perhaps the biggest change we need to deal with these forces is to adopt a new, dispassionate and open view. A new view is Step 1 in developing new strategies and new approaches to managing these forces, and technology is only part of the solution. I predict we will only see real success, where technology, process and collaboration come together.

Friday, April 23, 2010

Collaboration and Localization

This is a copy of an article that was published in the April/May 2010 Issue of Multilingual magazine. It was co-written with Michael Cox, a L10N project management professional, formerly at Intel and now at the Worldify Consulting group. He is also responsible for the L10NCafe site – a perfect example of a growing collaborative knowledge sharing community.

In The World Is Flat, Thomas Friedman divides modern history into three periods. Globalization 1.0 occupied the years between Columbus’ 1492 voyage to the Americas and the 1800s. It is characterized by collaboration at a country level. That is, things were generally accomplished within a national scope, and people were concerned with how their country fit into the world. 

Between 1800 and 2000, we graduated to Globalization 2.0, where companies became the primary vehicle by which we experienced the world. More recently, we progressed to Globalization 3.0 and now have the ability to experience, work and collaborate with the world as individuals apart from the companies or countries to which we belong. This is a big change and is greatly influencing both our personal and professional lives. 

The growth of open source illustrates this change. Open source solutions are becoming increasingly popular, even more so than their traditional counterparts. According to the book Wikinomics, 2006 was the year when Flickr beat Webshots, Wikipedia beat Britannica, Blogger beat CNN, Epinions beat Consumer Reports, Google Maps beat MapQuest, and Craigslist beat Monster. About half of all web servers are now powered by Apache, and 1.2 million people download OpenOffice every week. The number of open-source projects doubles every 14 months. By 2012, according to Gartner, 90% of companies with IT services will use open-source products of some type. Clearly, a lot of energy is associated with collaborating to provide products that are more interactive, higher quality and less expensive if not free. 

In our own industry there are collaboration groups and associations such as TAUS, LISA and even LinkedIn groups. While these organizations serve important functions, we will demonstrate that there is also a need for a peer-to-peer network where experts are voted into place by the community and that authority is earned based on contribution, sharing and the quality of interactions.

Collaboration Terminology

Social networking and collaboration in particular are often confused with each other. Social networking is concerned with creating and maintaining relationships, getting people together and coordinating their activities. Collaboration is concerned with accomplishing a task, project or goal. Of course, accomplishing anything usually requires people, and any time people are involved there is a social aspect to their interactions. Therefore, collaboration solutions usually include social networking features to help the work along. 
Collaboration can happen either synchronously or asynchronously. Phone calls, meetings and webinars are examples of synchronous collaboration where everyone is working together at the same time. E-mail, blogs, wikis, lists, documents and workflows are examples of asynchronous collaboration where people participate at different times depending on their location, availability and priorities. 

Internal collaboration refers to intra-company interactions or working with other employees of the same organization or company. Organizing the knowledge of people “owned” by the company is a natural step towards more open collaboration. Some of the fears and obstacles associated with open collaboration are avoided by working exclusively with others who are bound by the same rules and organizational culture. However, many are realizing that no matter how smart or numerous their employees, ideas and opportunities exist outside their companies. To avail themselves of this external knowledge and creativity, companies have begun collaborating on specific initiatives with select partners — a few companies that share similar goals, rules and culture. Many choose to continue collaborations after the initiatives that instigated them have ended and are turning increasingly towards more open or external collaboration. 

Collaboration Features

In the past, if you asked people what collaboration tools they used, most would mention e-mail and the telephone. Though these tools still reign supreme, blogs, microblogs and the like are becoming more common. It seems almost everyone has a blog now. Many of us are tweeting, and wikis are prevalent as well. 

Today, robust collaboration platforms contain a growing list of features including blogs, microblogs, wikis, versioning, alerting, rating, RSS, metrics, user management, security, podcasting, search, mobile support, chat, polling, user profiles, dashboards, workflows and calendars. Integration with e-mail, online services and the ability to add external widgets from sites such as Flickr is important. Project management features such as task tracking and sharing are becoming more important. For example, Microsoft is strengthening the tie between Microsoft Project and SharePoint for its 2010 release. 

The better platforms can also manage web pages, documents, graphics and videos along with their source files. They also integrate with content creation tools. Unfortunately, the ability to schedule, host and document web meetings with desktop sharing and telephony usually requires using multiple services. While this is an obvious gap for many platform providers, plenty of vendors specialize in these services. Some even offer free internet meetings for small numbers of people. Knowledge management is a newer area to emerge and includes expert location, knowledge networks, idea management, talent development and information organization. 

With Facebook, MySpace and Twitter, we have become familiar with social networking and its associated features of finding people and establishing networks. Some platforms emphasize these features while others emphasize collaboration features. The large platform vendors have seen the need for both and are moving to shore up weaknesses in both areas. 

Who are the major collaboration platform vendors? According to research conducted by Gartner, of the hundreds of providers, only three are considered to be leaders in the platform space for the workplace: Jive Software, IBM Lotus Connections and Microsoft SharePoint. Most of the smaller vendors focus on niche areas, such as WordPress does with blogs or Zoho does with online meetings. Though many of these services provide plug-ins or APIs to offset their weaknesses, putting them together in a coherent way that provides security and interoperability is nearly impossible. That’s why most companies opt for a collaboration platform that provides the most desirable features along with a framework that ties them all together for security, search, maintenance and so on. While small vendors work to make their products integrate with the platforms, the platform providers are working to include the goodness of the smaller vendors into their own products. 

While social networking is often free, collaboration tools usually are not. There are many ways vendors can charge for and provide these features, including software as a service, hosted service, on-premise installations as well as pay-per-use or pay-per-user subscriptions. The communities using these platform tools are often internal to a specific company. This is because the collaboration platforms have been cost prohibitive for external, grass roots communities. However, free, open and less expensive options are available and improving all the time. 

Why do we collaborate?

First, there are internal, self-motivating reasons for collaboration. For example, we enjoy the recognition and validation we receive from peers for our ideas and our work. After all, our peers are often more capable of appreciating and valuing our accomplishments than our employers. So, collaboration fills an emotional need. It also provides an outlet for our causes or at least a vehicle through which to accomplish them. Most of us have our causes — special topics or issues we are passionate about and want the world to better understand so that others will value them more as well. Collaboration provides a way of accomplishing that — a way to get the word out, educate people and gather like-minded people. Once gathered, these people can be mobilized to aid the cause. These causes might be anything from helping people with rare forms of cancer to organizing an industry to better tackle its challenges and fulfill its purpose. 

Current trends toward increased collaboration are at least partially due to generational differences. Generation X, those people born between 1961 and 1981, tends to value loyalty, seniority, security and authority. However, Generation Y — also known as the “Next” or “N Generation” — values creativity, social connectivity, fun, freedom, speed and diversity. These values make Generation Y more collaborative by nature. 

There are also external reasons for increased collaboration, as noted earlier. Consumers benefit by getting software for free or at a much-reduced cost. A motivator for companies is profit, especially profit via cost savings. Companies that have switched to open-source software such as Apache, Linux and OpenOffice claim to have eliminated most of their costs in these areas. Some companies with competing products have switched from duplicating competitors’ efforts to joining open-source communities and contributing to foundation code that is shared by all. They then create software to enhance the foundation code and sell that as a new way to generate revenue.

Collaboration Opponents

Here are some common issues and fears relating to collaboration in the Globalization 3.0 world along with their common counterpoints. 

First, it is increasingly difficult to monitor all the communications, tweets, conversations, Facebook statuses and text messages of employees. Companies fear employees may compromise their intellectual property, disclose release dates or disclose other confidential information. However, this risk can be mitigated by educating participants and other means. Also, some of the less confidential information companies protect might be exchanged for knowledge that is more valuable to them, such as something they don’t already know. 

Second, increased collaboration, especially in regard to open-source initiatives, will result in more things becoming free. This is great for consumers but not for developers who are unable to secure rewards for their creativity and hard work. On the other hand, these efforts often serve to standardize platforms that minimize the duplication of platform work and integration issues. Developers can make money building and supporting add-ons to the standard platform. 

Third, collaboration fosters a sort of collectivism that places the needs of the group over the needs of the individual. While this is how some cultures operate, some Western cultures fear anything that hints remotely of socialism or communism. This may be answered by taking it slow. Though it is difficult to predict the future or to assuage political fears, we can take one step at a time. 

Fourth, you can’t be sure of what you will get from strangers or crowds as they contain experts as well as opinionated non-experts. It’s a mixed bag. The counterpoint to this is that with the right process, technology and oversight, you can corral the efforts and knowledge of the crowd to produce a quality product, in many ways better than any subset of people could create. Wikipedia, Apache, OpenOffice and Linux have proven this.

Current Industry Collaboration

We have seen the first few examples of crowdsourcing translation in the last two years. Common Sense Advisory defines collaborative translation as an emerging approach to translation in which companies use the elements of crowdsourcing in a controlled environment for working on large corporate projects in short periods of time. Common Sense also talks about an experience that mixes community, crowdsourced and collaborative translation to offer a translation that is quick, of good quality and in tune with users’ experience. It can involve professional translators or not. We see that the lines between internal and external get much less clear, and collaboration and cooperation become an imperative. Recently, we have seen success at Facebook and many others, and some uproar at LinkedIn. But this formula of community, open collaboration platform and a common purpose appears to be gaining momentum, and it behooves us to try to learn how to best leverage these emerging models to further our professional objectives. 

Once a functioning and active community is assembled and organized, there are significant benefits for all. We are beginning to see several examples from outside the professional translation industry that are now driving the development of collaborative communities and platforms. It is not improbable that these outside initiatives could eventually bring about fundamental and enduring changes to the professional translation industry as well. 
Some examples clarify this and show that this combination of community (paid and/or volunteer), collaboration platform, increased automation and common purpose can exist both inside and outside the enterprise. 

-- The TED Open Translation Project has translated over 4,200 talks into 63 languages in just six months after building a platform and issuing an open call for volunteers. Another 4,000 more are in progress, and new languages are added constantly.
-- IBM is using bilingual employees to develop and improve the quality of statistical machine translation (SMT) systems designed to make cross-lingual chat communication easier and more effective.
-- Yeeyan in China has steadily translated whole issues of the Economist and the Guardian on a regular basis into Chinese, using a network of 8,000 volunteer translators and a larger community of reviewers around a collaboration platform. In spite of occasional skirmishes with censors, it continues today. They do not use translation memory (TM).
-- Adobe, Cisco, Symantec and others are similarly exploring translating high value customer support content.
-- EMC, Symantec, Facebook and others are expanding their localized language coverage using the community in carefully managed and administered collaboration platforms.
-- Meedan regularly translates news content between Arabic and English to promote understanding and are even giving away a one-million-word TM to anybody who wants it.
-- Microsoft is having highly motivated Most Valuable Professional partners edit and improve its machine translated knowledge base content to continue to improve the customer support experience.


We can expect that the processes and tools will get better. The forces that are driving these initiatives are not based on cost savings, as many believe. LISA recently conducted a survey showing that increased language coverage and deeper engagement with customers motivated many global enterprises into exploring community collaboration projects. 

The global enterprise faces an explosion of content. It is increasingly recognized that product-focused community content often has higher value in terms of building customer loyalty and is useful to translate. Some say the increase in the volume of translation-worthy content is at least tenfold and possibly as much as a hundredfold. Clearly, the old model cannot be the foundation for all of this new content in future, either in terms of process or cost. 

Last summer, the Open Translation Tools Summit held in Amsterdam was attended by tool builders, translators and publishers motivated to make more information available across languages. Open source initiatives and tools now exist for many linguistic data management tools. Few from the professional translation industry were present at this conference, and it is yet to be seen if the initiatives from it will gain real momentum. 

Some in the professional translation industry are waking up to this changing model. Lionbridge, GlobalSight and Lingotek come to mind, and hopefully we will see others embrace openness and standards. Already, the open-source Moses-based statistical engines are outperforming machine translation available from Google and other vendors. 

These emerging trends present an opportunity for the most agile translation industry companies, who will lead the change rather than resist it. It is important for us as a community to begin to collaborate and explore how to leverage these trends. We need to understand how to evaluate and use collaboration platforms, when and how to expand our use of automated translation technology and when, where and how to engage invested and willing communities whenever possible. These are the needs that new platforms such as the L10NCafĂ© are hoping to fill. 

Clearly, our industry is in the early stages of organizing itself for productive collaboration. The suggested model to develop this is to implement a collaboration and communication platform to improve and multiply our interactions. The community using such a platform could be much larger than any single organization, company or group. This seems unlikely today with the large number of associations, conferences and fractured interests that we see, but our industry perhaps more than many others is actually well-suited to develop effective collaboration models in an online community. In an open and trusted community, experts would emerge over time for all to see and consult with. Many of these experts would be people you will never meet at a conference. 

Business author Peter Drucker has his own way of reconciling the past and future. He believes the acquisition of knowledge will supplant the acquisition and distribution of property and income that dominated recent centuries. That pursuit is all about who can assimilate and make use of the most relevant information the fastest. Collaboration, especially the open/ external type, provides an efficient way to assemble, filter, validate and disseminate knowledge. We can therefore expect to see much more of it in our future. 
I have written previously about collaboration, actually inspired by this article, in an entry titled Learning to Share in Professional Forums: Collaboration

Michael W. Cox, a project management professional, has worked as a Chinese translator, localization engineer, resource manager, localization strategist and GMS/TMS integrator. He is now now at the Worldify Consulting group. He is also responsible for the L10NCafe site – a perfect example of a growing collaborative knowledge sharing community

Monday, April 19, 2010

The Data Deluge and the Growing Need for Innovation

Recently the Economist had excellent coverage on the huge growth in data on the internet in quite some detail. The amount of digital information increases tenfold every five years! They actually suggest that this data deluge is equivalent to the industrial revolution in overall impact, and that we are moving to what Craig Mundie, head of research and strategy at Microsoft calls a “data-centered economy”. He suggests we are at the nascent stage and that the infrastructural and business models for this new world are not well understood and are just being formed. He also says that big societal and macroeconomic changes are coming with the data economy. Data are becoming the new raw material of business: an economic input almost on a par with capital and labor. If you are skeptical, look at at Wal-Mart, Amazon, Google, Netflix, EBay, Farecast and see how data collection, organization, leverage and analysis underlies their core business mission and value.

I have written previously about how this data deluge is affecting the professional translation industry and why change is necessary not only in processes and tool technology but also in the whole view of the professional translation business. I have pointed out:
-- About how enterprises are also facing a huge growth in data volume from both customers and internal processes
-- The changing nature of the information required to build loyalty in the global customer base. The continuing evolution from static documentation to dynamic user and community created content that is considered the most valuable content in customer support is one example of how dramatic this shift is. The word of mouth impact on products and companies in social networks is already powerful and will become even more so in future.
-- There is also now evidence that crowdsourcing is and will continue to be a force in getting things done in the translation world, not so much because it is cheaper, but more often because it increases customer engagement and allows global companies to address long tail issues in a cost effective way. It does not work everywhere, but it is a model worth understanding and using. Also, we are likely to see more groups of highly motivated amateurs focus on large translation projects like the Yeeyan, Global Voices, Meedan initiatives already show.
Web 1.0 world
However, in 2010 in the professional translation world (and elsewhere), people have gotten used to the tools and processes that got us here. If you look back ten years, I would say we’re not doing things too differently from the way we did in 2000. We use essentially the same software and processes we did back then, though things have sped up a bit and maybe TMS systems are taken more seriously of late. It is very much a TEP world that is optimized for the global business and localization reality of 2000. Professional translation services firms try and build value around project management capabilities and un-definable notions of “quality”. It is ironic that an industry “leader” is named SDL, which originally stood for Software and Documentation Localization. Do they really do any more than that today? Part of the website?

So how would one contrast what happens in the data economy with the old way? As the Economist points out:
Google applies this principle of recursively learning from the data to many of its services, including the humble spell-check, for which it used a pioneering method that produced perhaps the world’s best spell-checker in almost every language. Microsoft says it spent several million dollars over 20 years to develop a robust spell-checker for its word-processing program. But Google got its raw material free: its program is based on all the misspellings that users type into a search window and then “correct” by clicking on the right result. With almost 3 billion queries a day, those results soon mount up.
So as we head into this data economy what will professional translation companies need to do to thrive? It is clear that this is a time for innovation and real fundamental change, in both process and focus. A recent LISA report on Crowdsourcing has this statement from a senior Dell executive in the executive summary that summarizes what global enterprise buyers are looking for.
Dell: “What we want eventually from our services provider is a combination of localization, machine translation and crowdsourcing services.”
What Clients want
So what will a next generation LSP that thrives in the data-centered economy look like? Here my thoughts on this:
-- Competence in managing crowds and ensuring translation quality and managing the overall contribution and validation process 
-- Competence with machine translation, especially SMT which is a data-driven approach that will soon overshadow the RbMT approaches
-- Competence with recruiting, assessing and managing both professional and amateur crowds to engage in translation projects when and where needed
-- Competence with building and managing motivated and engaged online communities and social networks
-- Competence in building and managing large linguistic data repositories that can be brought to bear for different client needs
-- Large linguistic corpus analysis, cleaning and preparation skills
-- Skills in the development of systems infrastructure to enable large groups of professional and amateur translators to collaborate on large, data-rich and very high volume (10M words+)  translation projects
-- Systems process development skills with social networks and crowd initiatives and large data set management skills including handling audio and video translation cost-effectively and efficiently
-- The ability to make rapid and accurate quality assessments on work product in a regular, consistent and definable way
-- The ability to provide a satisfying and mutually beneficial experience for freelancers and amateurs that engage with the firm including useful and free translation productivity tools

This little promotional video for the book Different by Youngme Moon of the Harvard Business School, I thought had a very compelling message that is especially pertinent to the professional translation industry today.

Innovation is something we should all be thinking about and I would bet that collaboration is something that will help and further this exploration. I look forward to sharing this journey with you.

Friday, April 9, 2010

Linking Translation Quality to Business Purpose

The question of translation quality has been a hot mess for the professional translation industry for some time now. It is regularly a subject of discussion, a core theme in every industry conference and yet is also continuously the greatest source of confusion or disagreement amongst translation professionals. This is even further complicated when considering how to determine what an acceptable quality measurement is for many machine translation projects.  One major source of confusion is caused by the conflation of process quality standards with linguistic quality standards.They are different and should not be mixed and matched. The SMT developer community has added to the confusion by introducing automated translation quality measurements like BLEU, METEOR and TERp. I have attempted in this blog to try and clarify some of the SMT stuff at least. But to be able to develop useful MT systems it is necessary to get some rough working definitions in place so that work can be planned, done and fairly compensated. 

Quality Definitions that do NOT seem to work
For the most part, certification in ISO 9001, EN15038, Microsoft QA and LISA QA 3.1 does not provide a buyer with a clear sense of the actual translation quality that will be delivered, even though it does suggest that the certified LSP has process discipline and a quality sensibility. Also, automated measurements like BLEU have very little value to translation professionals as general  linguistic quality measurements and should be used with care anyway. This is why buyers always ask for test/sample translations. They need to have some understanding of what they can expect before they undertake a larger project.

The most widely accepted quality definition that I am aware of in the TEP world is that quality is achieved when the buyer or the customer is satisfied.  Making content useful for a target customer is perhaps a way to approach MT projects as well. Since MT is unlikely to produce the quality produced by a typical human translation process it does not make sense to always use the same quality assessment procedures unless the goal is in fact to get to a human quality level. Wherever high quality is required more human post-editing support will be required, but there are many business use scenarios where lower linguistic quality can adequately accomplish business objectives.  However, in every case we discuss below, the objective is to produce better quality than the free translation services like Google, MSN and Babelfish as I assume that competent professional involvement will produce systems that outperform these free alternatives.
Quality and Purpose

I thought that it would be useful to present three different scenarios where MT could be used, and then see how the business objectives impact the translation quality requirements.


eDiscovery & Litigation Support

Business Objective: Identify Key Documents and ensure Searchability of Multilingual Content. Speed & Fast Turnaround Are Critical.
Translation Quality Required: Raw MT produced by a lightly customized engine that has paid particular attention to accurate translation of key terminology and search terms is adequate. Once key documents are identified they go through a more standard human translation process.
Typical Business Use Scenarios: Very large content, very short time frame for turnaround and rapid identification of key documents e.g. Litigation Support and other eDiscovery applications. Also valuable in Military and National Security applications that involve information triage of foreign language blogosphere or websites to “find a smoking gun”.
LSP Role: Translation professionals can play a significant role in developing critical terminology and working on the MT engine to get it to acceptable quality levels for the business application. In patent litigation some TM preparation and training may also be required.
Info Discovery

Knowledge Base Support Content

Business Objective: Provide Current and Comprehensive Technical Support and Knowledge Base content to global customers in target languages. Rapid Availability and Breadth of coverage are critical and should be aligned and equivalent to English content. The success and ability of the customer to perform self-service and solve technical problems is critical.
Translation Quality Required: The most active and most critical content is translated by humans. For the bulk of the KB content, raw MT produced by a carefully customized engine that uses localization content, active KB article translations and key terminology as training material is adequate. The content can be linguistically flawed and imperfect but still useful as long as the basic meaning is accurate. Initial content is put into production and made available quickly and can be further refined and improved gradually over time. Users are warned when content is raw MT.In 2010 possibly 1 billion+ people will use this type of translated content.
Typical Business Use Scenarios: Very large content, very short time frame for turnaround and continuing updates on a very regular basis. Major IT firms like Microsoft, Intel, HP have a huge need for this to reduce support costs, improve support experience of global customers and facilitate support  processes for key resellers in global markets. Any industrial engineering / manufacturing company that has a global customer base and wants to improve the technical support experience could be a viable user and many will do so as the success of initial projects is publicized.
LSP Role: Translation professionals can play a significant role in translating most active KB articles, critical content (financial transactions and security related), developing critical terminology and tuning the MT engine to get it to acceptable quality levels for customer acceptance testing. It is possible that they have an ongoing role in improving the MT engine to periodically re-translate the whole corpus as a higher quality level.
Multilingual KB Content

High Volume Localization Content

Business Objective: Improve production and efficiency of critical dealer channel service and maintenance documentation for global customers in several target languages. Cost and Quality are as important as Speed to ensure brand identity. OEMs who rely on globally distributed service and maintenance network need to provide regular updates on maintenance documentation to their dealer channel with new models of major products.
Translation Quality Required: Brand consistent human quality is required. Large high quality TMs and glossaries are available to train MT systems. MT engine is carefully developed and tuned to maximize the raw quality and ensure high productivity post-editing effort. MT + Post-editing _ proofing process is used to quickly bring MT output to human quality levels.  Same quality checks are applied as would be to a fully human process and the buyers will have the same expectations.
Typical Business Use Scenarios: Large content (500 to 3000 page manuals), rapid turnaround expected but not real time, and periodic updates as models change. Major Automotive manufacturers, Electronic Equipment OEMs have an ongoing need for this to ensure that their service and dealer channels can properly service new models.
LSP Role: Translation professionals need to play a significant role in building the MT engine and ensuring that it produces high quality output that requires a minimum amount of clean-up work (post-editing).  LSPs need to monitor and validate that they are getting  increasingly higher levels of productivity. This should make each successive version or model easier and more cost effective to produce. It is reasonable to expect new manuals to be done faster and more efficiently.

Thus we have a relatively new concept and approach to quality as the use of MT expands the scope of translation projects. Some refer to this less than human quality  as “usable” and some call it “good enough”. There are many who find this expanding use of MT very threatening, but the greatest use of MT so far is still in areas that just could not be done any other way. As skill levels amongst translation professionals increase and the MT quality improves, it is likely that we will see MT being used in many high volume localization projects, driven primarily by cost and production efficiencies. The industry has yet to really develop clear quality definitions so that buyers and service providers can build effective business models around these new possibilities. However, given the content deluge that we increasingly face, this is coming, and this is a significant opportunity for those service providers who take the time to develop the skills to undertake these and other high-volume high-value kinds of projects.

Saturday, April 3, 2010

How Translation Professionals Can Get Started With SMT

At the recent ATA-TCD conference in Scottsdale I noticed that there was a strong focus on translation automation. This was nicely summarized by Ben Sargent of CSA. At the end of my presentation I noticed there were many LSPs who are interested in better understanding machine translation and getting more involved with MT but were not sure how to proceed. Thus, I thought, it might be useful to present a list of resources that might be helpful. I will also raise key questions in affiliated areas, that I think we are all just beginning to explore.
It is a clear that I have a bias in favor of statistical MT. This is because I think that SMT provides much more user control, more rapid and continuing MT quality improvements and a greater scope and role for translation professionals to add value and build long-term leverage. My view on the RbMT vs SMT debate is clear, but I would still recommend that potential users explore this on their own and come to their own conclusions.  I believe that the best MT systems we will see in the coming years will have significant LSP/professional translator involvement and that MT will remain a marginal technology for professional use, until it actually makes sense to professional translators. I think that MT systems developed in close collaboration with translation professionals will the best systems around and that they will be domain focused. Quality is key.

So on to some high quality resources: The American Machine Translation Association (AMTA) is reaching out  and trying to better connect to the ATA by timing their annual conference to coincide and expand collaboration with professional translators. AMTA plans to hold several educational sessions tailored to the professional translation audience. I think the quality of the information on business use of MT is likely to be better than typically found at major localization conferences. Even though this conference has been very technical historically, Alon Lavie assures me this is changing.

This is my short list of the best links for web based MT resources:
Automated Language Translation group in LinkedIn where 1600+ professionals discuss many MT related subjects on an ongoing basis. Read through the most active discussions to get a sense for the issues from many different viewpoints. (Requires membership)

Common Sense Advisory provides research on MT in business use for the professional translation industry. LISA also has MT focused content, gathered over the years on how localization professionals interface with MT. The TAUS site also has useful content on enterprise use of MT, though I think that their technology advice is somewhat naive and I do question their editorial policy. (Some content may require membership)

The eMpTy Pages Blog has a lot of SMT related material in bite sized chunks that I will continue to update, and keep focused on why it matters for the professional translation industry. (Blatant self promotion!)

There are also a growing number of video lectures that can be an educational shortcut and way to get informed. Here are a few to start.

For more technical material:
John Hutchins has created a comprehensive archive of MT articles over the years. The Statistical Machine Translation site is also a great source on all things SMT, including links to parallel corpus, open source software tools and research and overviews of the technology.  The Euromatrix project is also a source for useful research, data and findings.

There are also of course some company websites that provide useful information on MT from the perspective of the localization industry. The best content I have seen is at Lionbridge and Asia Online and the Language Studio site which I know will continue to be enhanced with new content on an ongoing basis.

So how do LSPs and professional translators add value when working with SMT? As I mentioned before, with RbMT there is little beyond building dictionaries and human post-editing that a professional can do to improve the linguistic quality of the MT output or the overall engine. Most of the RbMT systems out there today are a result of decades of refinement and work.
New Skills Reqd
In contrast, there are many places where professionals can add value to the "Hybrid SMT" system development process. Some of this is new linguistic work and these new skills will need to be learned and developed. All of the following activities can have a direct and measurable impact on the quality of a hybrid SMT engine:
- Data Cleaning, Preparation and Analysis of Training Corpus
- Development of Test and Tuning Data Sets
- MT Translation Quality Assessment & Evaluation
- Linguistic Analysis focusing on Error Analysis & Correction 
- Dictionary & Glossary Development
- Amplifying Post-Editor Corrections to improve SMT engines
- Ongoing Management of Data Resources &  Linguistic Assets
- Managing Optimization of Domain SMT Engines for a Specific Customer
- Identification and Preparation of High Quality Target Language (Monolingual) data
- Development of Linguistic Rules Structures to improve Quality with Languages like Chinese, Japanese and Arabic e.g. SVO = SOV.
- Creation of Specialized Linguistic Data to Correct Specific Linguistic Error Patterns


Remember that the better the MT system is, the higher the raw MT quality will be, and thus the easier and more efficient the post-editing experience is. A good MT engine  provides both a cost and speed advantage to it's users. Most SMT engines can be improved on a regular and continuing basis via the linguistic work and input described above and so it is reasonable to expect ongoing improvements in post-editing efficiency.

These are new skills, and translation professionals should understand that there is long-term leverage in building these engines skillfully. A good MT engine becomes a strategic advantage. It gives you a continuing cost and delivery time advantage in specific domains. And thus we see that Language Service Providers can become Language Solutions Providers (as my friend Bob Donaldson says) who assist their customers in making different types of high-value content multilingual. The future of professional translation has to be where the highest value content is, and increasingly this content is voluminous and volatile. Being able to respond to high volume content dissemination with speed and relatively good quality, becomes increasingly valuable to the global enterprise. Partners who can facilitate this will be valued.Shift to Dynamic The world is changing and user documentation and critical localization work is not going away, but it will increasingly be under cost pressure. The dialogue with the global customer is becoming increasingly important, and in the Web 2.0 world, customers expect and even demand that they get the same information that English speaking customers do. This will not be possible without ever improving and evolving translation automation. This technology provides a major opportunity for translation professionals to build sustainable competitive advantage and differentiation.

The keys to success do not lie with SMT expertise alone. As Jost and I pointed out at the ATA-TCD conference, in addition to translation technology, there will also need to be effective data sharing schemes, improved engagement and management of communities and crowds, and the development of collaborative technology platforms. While many translation professionals view these trends as threats, I am sure a few will realize there is real opportunity here and may even yell out: “There is gold in them thar hills.”
But as we all know gold is hard to find without digging. So start digging.
  Threat Oppty Landscape