Saturday, April 3, 2010

How Translation Professionals Can Get Started With SMT

At the recent ATA-TCD conference in Scottsdale I noticed that there was a strong focus on translation automation. This was nicely summarized by Ben Sargent of CSA. At the end of my presentation I noticed there were many LSPs who are interested in better understanding machine translation and getting more involved with MT but were not sure how to proceed. Thus, I thought, it might be useful to present a list of resources that might be helpful. I will also raise key questions in affiliated areas, that I think we are all just beginning to explore.
It is a clear that I have a bias in favor of statistical MT. This is because I think that SMT provides much more user control, more rapid and continuing MT quality improvements and a greater scope and role for translation professionals to add value and build long-term leverage. My view on the RbMT vs SMT debate is clear, but I would still recommend that potential users explore this on their own and come to their own conclusions.  I believe that the best MT systems we will see in the coming years will have significant LSP/professional translator involvement and that MT will remain a marginal technology for professional use, until it actually makes sense to professional translators. I think that MT systems developed in close collaboration with translation professionals will the best systems around and that they will be domain focused. Quality is key.

So on to some high quality resources: The American Machine Translation Association (AMTA) is reaching out  and trying to better connect to the ATA by timing their annual conference to coincide and expand collaboration with professional translators. AMTA plans to hold several educational sessions tailored to the professional translation audience. I think the quality of the information on business use of MT is likely to be better than typically found at major localization conferences. Even though this conference has been very technical historically, Alon Lavie assures me this is changing.

This is my short list of the best links for web based MT resources:
Automated Language Translation group in LinkedIn where 1600+ professionals discuss many MT related subjects on an ongoing basis. Read through the most active discussions to get a sense for the issues from many different viewpoints. (Requires membership)

Common Sense Advisory provides research on MT in business use for the professional translation industry. LISA also has MT focused content, gathered over the years on how localization professionals interface with MT. The TAUS site also has useful content on enterprise use of MT, though I think that their technology advice is somewhat naive and I do question their editorial policy. (Some content may require membership)

The eMpTy Pages Blog has a lot of SMT related material in bite sized chunks that I will continue to update, and keep focused on why it matters for the professional translation industry. (Blatant self promotion!)

There are also a growing number of video lectures that can be an educational shortcut and way to get informed. Here are a few to start.

For more technical material:
John Hutchins has created a comprehensive archive of MT articles over the years. The Statistical Machine Translation site is also a great source on all things SMT, including links to parallel corpus, open source software tools and research and overviews of the technology.  The Euromatrix project is also a source for useful research, data and findings.

There are also of course some company websites that provide useful information on MT from the perspective of the localization industry. The best content I have seen is at Lionbridge and Asia Online and the Language Studio site which I know will continue to be enhanced with new content on an ongoing basis.

So how do LSPs and professional translators add value when working with SMT? As I mentioned before, with RbMT there is little beyond building dictionaries and human post-editing that a professional can do to improve the linguistic quality of the MT output or the overall engine. Most of the RbMT systems out there today are a result of decades of refinement and work.
New Skills Reqd
In contrast, there are many places where professionals can add value to the "Hybrid SMT" system development process. Some of this is new linguistic work and these new skills will need to be learned and developed. All of the following activities can have a direct and measurable impact on the quality of a hybrid SMT engine:
- Data Cleaning, Preparation and Analysis of Training Corpus
- Development of Test and Tuning Data Sets
- MT Translation Quality Assessment & Evaluation
- Linguistic Analysis focusing on Error Analysis & Correction 
- Dictionary & Glossary Development
- Amplifying Post-Editor Corrections to improve SMT engines
- Ongoing Management of Data Resources &  Linguistic Assets
- Managing Optimization of Domain SMT Engines for a Specific Customer
- Identification and Preparation of High Quality Target Language (Monolingual) data
- Development of Linguistic Rules Structures to improve Quality with Languages like Chinese, Japanese and Arabic e.g. SVO = SOV.
- Creation of Specialized Linguistic Data to Correct Specific Linguistic Error Patterns


Remember that the better the MT system is, the higher the raw MT quality will be, and thus the easier and more efficient the post-editing experience is. A good MT engine  provides both a cost and speed advantage to it's users. Most SMT engines can be improved on a regular and continuing basis via the linguistic work and input described above and so it is reasonable to expect ongoing improvements in post-editing efficiency.

These are new skills, and translation professionals should understand that there is long-term leverage in building these engines skillfully. A good MT engine becomes a strategic advantage. It gives you a continuing cost and delivery time advantage in specific domains. And thus we see that Language Service Providers can become Language Solutions Providers (as my friend Bob Donaldson says) who assist their customers in making different types of high-value content multilingual. The future of professional translation has to be where the highest value content is, and increasingly this content is voluminous and volatile. Being able to respond to high volume content dissemination with speed and relatively good quality, becomes increasingly valuable to the global enterprise. Partners who can facilitate this will be valued.Shift to Dynamic The world is changing and user documentation and critical localization work is not going away, but it will increasingly be under cost pressure. The dialogue with the global customer is becoming increasingly important, and in the Web 2.0 world, customers expect and even demand that they get the same information that English speaking customers do. This will not be possible without ever improving and evolving translation automation. This technology provides a major opportunity for translation professionals to build sustainable competitive advantage and differentiation.

The keys to success do not lie with SMT expertise alone. As Jost and I pointed out at the ATA-TCD conference, in addition to translation technology, there will also need to be effective data sharing schemes, improved engagement and management of communities and crowds, and the development of collaborative technology platforms. While many translation professionals view these trends as threats, I am sure a few will realize there is real opportunity here and may even yell out: “There is gold in them thar hills.”
But as we all know gold is hard to find without digging. So start digging.
  Threat Oppty Landscape

No comments:

Post a Comment