Friday, July 1, 2011

The Google Translate API Furor: Analysis of the Impact on the Professional Translation Industry – Part I

This is a post further exploring the Google API announcements by guest writer Dion Wiggins, CEO of Asia Online (dion.wiggins@asiaonline.net) and former Gartner Vice President and Research Director. The opinions and analysis are those of the author alone.

Overview

This is Part II of the posting that was posted on June 1, 2011. Part I detailed the reasons behind the Google announcement that it will shut down access to the Google Translate API completely on December 1, 2011, and reduce capacity prior to the shutdown. Part II which will be released as two posts, analyzes the impact that the announcement will have on the professional language services industry and also explores the implications of Google charging for it's MT services.

Summary

Humans will be involved in delivering quality language translation for the foreseeable future. The ability to understand context, language and nuance is beyond the capabilities of any machine today. If machine translation ever becomes perfect, then it truly will be artificially intelligent. But there are many roles for machine translation in the professional language services industry today, despite the limitations of the technology in comparison to human capability.

With a combination of machine translation technology with human editors, a quality level of translation output that is the same as a human only approach can be delivered in a fraction of the time and cost. The perception that machine translation is not good enough and it is easier to translate by human from the outset is outdated. It is time to put that idea to rest, since there are now many examples that clearly prove the validity of using machine translation with human editing to deliver high quality results.

The old adage of “there is no such thing as a free lunch” can be adapted to “there is no such thing as free translation” – you get what you pay for. The professional language service industry needs more than a generalized translation tool – control, protection, quality, security, proprietary rights and management are necessities.

· Google’s decision to move to a payment model for its Translate API is not a trivial one. It is part of a long term strategic initiative that is the right thing to do for Google’s business. Google’s primary rationale is to address issues relating to control of how and when translation is used and by whom, which in turn addresses the problem of “polluted drinking water” and will help clean up some of the lower quality content that Google has been criticized for ranking highly in its search results. This is a key strategic decision that will be part of their core business for the next decade and beyond.

· The professional language services industry (or Language Service Providers – LSPs) will not be negatively or positively impacted by Google moving to a paid translation model. True professionals do not use free or out-of-the-box translation solutions. Google’s business model does not fit well with LSPs and does not deliver the services which make LSPs professional. LSPs are not a market or customer demographic of significance to Google. Google’s customers are primarily Advertisers, not content providers or other peripheral industries. While these tools may give an initial impression that Google is serious about the language industry, the tools are in reality a thinly veiled cover over a professional crowd-sourcing initiative that delivers data and knowledge to Google under license that can then be used by Google to achieve greater advertising revenue and market share.

· Google Translate is a one-size-fits-all approach designed to give a basic understanding (or ‘gist’) of a document. This is insufficient in meeting the needs of the professional language services industry. What the industry needs are customized translation engines based around clean data, focused on a client’s specific audience, vocabulary, terminology, writing style and domain knowledge because this results in a document that is translated with the goal of publication and with reader satisfaction in mind.

· Google is not in the business of constructing data sets based on individual customer requirements or fine tuning to meet a customer’s specific need. The model of individual domain customizations is not economical for Google and, due to the human element required to deliver high-quality translation engines, this model does not scale even remotely close to other Google service offerings or revenue opportunities.

· Where enterprises have a real need for translation and a desire to use technology to help, expect some to try experimenting with open source. A small number of enterprises will succeed if they have sufficient linguistic skills, technical capability and data resources. Most will not. Others will try commercial machine translation technology. Out-of-the-box solutions will be insufficient, but those who invest the time and energy with commercial translation technology providers and LSPs to deliver higher quality output that is targeted for specific audiences and domain will be more likely to be successful in their machine translation efforts.

· Enterprises considering machine translation should ensure that machine translation providers and/or LSPs that they are working with will protect their data. Contracts should allow for the use of a customer’s proprietary data with third parties in order to deliver lower cost and faster services, but should ensure that the data is protected and not used for any other purpose other than service delivery. It may be wise to ensure a legal sign off process from within your own organization before any third party service is used.

· Like Google, enterprises should protect how, when and by whom their data and knowledge are used. Translations, knowledge, content and ideas are all data that Google gains advantage from and leverages from third party and user efforts. Google does this legally by having users grant them an almost totally non-restrictive license. As Google states in its own highlights of its Terms of Service “Don’t say you weren’t warned.”

Detailed Analysis

And Then Came The Worst Kept Secret in the Translation Industry… Google Wants To Charge!

Somewhat predictably, Google has changed its public position and is now going to charge for access to the Translate API. Announcing the shutdown may have been nothing more than a marketing ploy as there are clear indications that Google was intending to charge all along.

On June 3, Google’s APIs Product Manager, Adam Feldman announced the following with a small edit to the top of their original blog post:

“In the days since we announced the deprecation of the Translate API, we’ve seen the passion and interest expressed by so many of you, through comments here (believe me, we read every one of them) and elsewhere. I’m happy to share that we’re working hard to address your concerns, and will be releasing an updated plan to offer a paid version of the Translate API. Please stay tuned; we’ll post a full update as soon as possible.”

The reason the announcement came as no surprise is very simple – Google already has paid models for the Custom Search API, Google Storage API and the Prediction API via the API Console (https://code.google.com/apis/console/).

Google Translate API V2 has been listed in the API Console for a number of months already and offers 100,000 characters (approximately 15,000 words) per day limit. There is also a limit of 100.0 characters per second. While there is a link for requesting a higher quota, clicking on the link currently presents the following information:

Google Translation API Quota Request

The Google Translate API has been officially deprecated as of May 26, 2011. We are not currently able to offer additional quota. If you would like to tell us about your proposed usage of the API, we may be able to take it into account in future development (though we cannot respond to each request individually). In the mean time, for website translations, we encourage you to use the Google Translate Element.

For those who choose to respond, be prepared to reveal potentially sensitive information to Google. The form presented asks for a company profile, number of employees, expected translation volume per week and a field to tell Google how you intend to use the Translate API. I do not believe that Google offers any real privacy guarantees on much of the data it collects, and is in essence crowd-sourcing for interesting innovations and use of its own API. The Terms of Service at the bottom of the page include a very interesting clause:

By submitting, posting or displaying the content you give Google a perpetual, irrevocable, worldwide, royalty-free, and non-exclusive license to reproduce, adapt, modify, translate, publish, publicly perform, publicly display and distribute any Content which you submit, post or display on or through, the Services.

So quite simply – be careful. You are giving Google your ideas and at the same time granting them the right to do pretty much anything they wish with it. One could argue, as others have done in the past, that this type of broad legal permission is required by an operator such as Google in order to operate its network in a reasonable manner. Even if that is so, it does not give LSPs any comfort with respect to their confidential client data.

Payment for services managed by the API console is via Google Checkout and all Google needs to do now is publicly set a price for a specified number of characters and turn the billing function on in the API console. Meanwhile they have had a number of “developers” testing the API (for free) and ironing out any issues since the launch of the Translate API V2.

Given that the Translate API V2 is already tested and in use, billing and quota management features are already available in the API console, and the API console allows for business registration and authentication, it would appear that Google’s initial announcement that it was shutting down the Translate API was little more than a marketing stunt designed to bring attention to the Translate API ahead of the change to a fee based model.

What Does Google Achieve By Charging and Managing the Google Translate API by the API Console Control: Google can now control who can and cannot use the API in addition to how much the API is used and at what speed. This solves nearly all of the abuse problems that were discussed in the prior blog post on this topic. Control will most likely be at a level of an individual or at the level of a company, but not at the level of software products. Products will adapt to allow the user to enter their own key and be billed by Google directly.

When you sign up for the paid Translate API or purchase translation capacity via Google Checkout, you are acknowledging the Google Terms of Service. Google has a much more explicit commitment from you and knows who you are. If the Terms of Service are abused in any way, Google has the means to track the use and take the appropriate legal action. This will most certainly have a significant impact on the “polluted drinking water” problem.

· Revenue: It does give Google some revenue. However in comparison to other revenue streams such as advertising, this is likely to be insignificant. It is unlikely that Google will offer post-pay options, so users should expect to pay in advance using Google Checkout.

· Blocking Free Services: Developers that would have (or have already) built applications that offer free translation will cease to use the Google Translate API. In reality, these applications offer little value-add to users and this feature is offered by Google in other tools. Free applications that integrate Google Translate within competitors’ products such as Facebook and Apple will cease to exist, giving Google products such as Android a competitive advantage by being the exclusive developer of products that leverage its translation service without a charge to the user. If so, is this potentially anti-competitive?

Google may wish to keep some free third-party applications around in order to give the perception that it is encouraging innovation and to gather ideas for its own use of the Translate API, so it would come as no surprise if Google offers a smaller amount of words for free and possibly even require individual users to log in using a Google User ID in order to not just control the application’s use of the Translate API, but also the individuals who use the application. By requiring individual users of an application to log in, tracking is extended to an individual level and blocking one errant user will not block an entire application.

· Blocking of Abusive Applications: As Google has control over who accesses the API and Google is also charging, it will no longer be economical to mass translate content in an attempt to build up content for Search Engine Optimization (SEO).

Encourage Value Add Applications: Developers that have created a true value-added product (i.e. a translation management platform) where the Google Translate API is a component of the overall offering, but not a main feature, will gain from there being less competitive noise in the market place. Google wants to be seen as empowering innovation. User perception of innovation can be further expanded when Google’s technology is embedded in other innovative products. Commercial products of this nature are often expensive and often used by larger corporations. Customers who use their products may be required to get their own access key unless they have a billing agreement with the service provider. This provides yet another mechanism for Google to create a commercial relationship with enterprises

Impact on the Professional Language Industry

There are many different segments within the professional language industry that are impacted by Google’s decisions about translation technology.

Impact on Machine Translation Providers

Those who offer machine translation free of charge using Google as the back-end will cease to exist unless they are able to generate an alternative revenue stream or other value-added features that users are willing to pay for. Those who offer free translation using other non-Google translation technology will likely see an increase in traffic to their sites as the Google-based providers start to vanish. Google will experience an increase in users going directly to their translation tools instead of via third-party websites.

Anecdotally, Asia Online has seen a considerable increase in inquiries from companies that have a commercial use for machine translation since the Google Translate Shutdown announcement. It is expected that other machine translation providers have seen a similar rise in interest.

There has been some speculation that machine translation providers may increase their prices as a result of the Google announcement. However, this is unlikely. Most offerings are relatively low cost, especially in comparison to large scale human translation costs. Asia Online views the change in Google’s translation strategy as an opportunity to stand above the crowd and demonstrate how customized translation systems can significantly outperform Google in terms of quality.

Impact on Open Source Machine Translation

Open Source machine translation projects will see some additional interest, but implementing these technologies is not at all simple and well beyond the technology maturity of many language industry developers and organizations. There are many open source translation platforms, and they vary in their underlying technique. These include rules-based, example-based and statistical-based machine translation systems. Most of these systems are not intended for real world commercial use, and many open source initiatives are part of ongoing research and development at universities. These are mostly academic development systems and have not been designed nor were they ever intended for commercial projects.

One of the most popular open source machine translation projects is the Moses Decoder, which is a statistical machine translation (SMT) platform that was originated by Asia Online’s Chief Scientist, Philipp Koehn, with continued development from a large number of developers researching natural language programming (NLP), including Hieu Hoang who also recently joined Asia Online.

In addition to the complexity of building an SMT translation platform such as Moses, expertise in linguistics is required to build pre and post-processing modules that are specific to each language pair. But the biggest barrier to building out an SMT platform such as Moses is simply the lack of data. While there are publically available sets of parallel corpora (collections of bilingual sentences translated by humans), such parallel corpora are usually not in the right domain (subject or topic area) and is usually insufficient in both quantity and quality to produce high quality translation engines.

Many companies will try open source machine translation projects, but few will succeed. The effort, linguistic knowledge and data required to build a quality machine translation platform is often underestimated. As an example, many of Asia Online’s translation engines now have tens of millions of bilingual sentences as data to learn from. For more complex languages, statistics alone are not sufficient. Technologies that perform additional syntactic analysis and data restructuring are required. Every language pair combination has unique differences and machine translation systems such as Moses accommodate for very little of the nuances between each unique language pair.

Even Google does not handle some of the most basic nuances for some languages. As an example, if you translate a Thai date that represents the current year of 2011, it will be translated from Thai into English in its original Thai Buddhist calendar form of 2554. (e.g. “Part 2 of Harry Potter and the Deathly Hallows film will be released in July 2554”). For languages like Chinese, Japanese, Korean and Thai, additional technologies are required in order to separate words as there are no spaces between words as there are in romanized languages. In Thai, there are not even markers that indicate the end of a sentence. Most commercial machine translation vendors have not yet invested in the necessary expertise required to process more complex languages. As an example, SDL Language Weaver does not even try to determine the end of a sentence in Thai and simply translates the entire paragraph as if it is one long sentence. If commercial machine translation vendors are so far unable to conquer some of these complex technical tasks, it is not realistic to expect that the experimental ambitions of even sophisticated enterprises will be enough to be successful.

The language industry has been active in building open source technologies that convert various document formats into industry standard formats such as TMX or XLIFF. However, these tools, while improving, still leave much to be desired, and using them often results in format loss. With the demise of LISA, the XLIFF standard is gaining traction faster than ever before, but there still remains much disagreement and incompatibilities with the XLIFF development community that are unlikely to be resolved in the near term. Companies like SDL continue to “extend” the XLIFF standard, as they did with the TMX standard, by modifying the standard into a proprietary format that is not supported by other tools.

Claims will be made that the standard does not support their tools requirements. But the reality is these requirements can be supported within the extensions to XLIFF without breaking the actual standard format and the real reason for modifying or “enhancing” the standard is vendor lock-in, a familiar occurrence in the history of software.

There will be an increase in development activity of “Moses for Dummies” or “Do it yourself MT” type projects. These kits will try to dumb down the installation and will mostly be offered as open source. While this will allow for the installation of Moses to be streamlined, it will not resolve many critical technical or linguistic issues and most importantly will not resolve issues relative to data volume or quality. Without a robust linguistic skill-set, knowledge of Natural Language Programming (NLP) techniques and vast volumes of data, this is still a daunting challenge. Unfortunately most users will not have such skills, and through attempting this approach will learn a time consuming and often costly lesson. If high quality machine translation was as easy as getting the install process for open source solutions right, these tools would have been built long ago and many companies would already be using them and offering show cases of their high quality output.

These attempts may result in organizations turning to commercial machine translation providers. If a company is willing to invest in trying to build their own machine translation platform, they most likely have a real business need. If these companies fail in using open source machine translation software, the need may be filled by commercial machine translation providers once the experimentation phase with open source machine translation ends, with a portion of the work of data gathering already complete and a customer who understands technical aspects to some degree.

Impact on Language Tools

Expect to see tool vendors like SDL and Kilgray updating their commercial products to support Google’s Translate API V2 and adding features around purchasing, cost management and integration of Google Checkout functionality.

Users of these products will most likely have to get their own access key from Google and will need to set themselves up for Google Checkout. It would be reasonable to assume that Google will update the Translate API V2 to include a purchase feature so that applications that embed Google Translate can integrate the purchase process directly into their workflow and processes.

But updating language tools to support billing is not the end of the story. Current processes, such as the 2 examples below, will need to be updated:

Pre-translating the entire document: A translation memory should always be used to match against previously translated material.
Mixed source reviews: Some systems provide the ability to show the translation memory output and the machine translation output beside each other.

Both these processes, while useful, can become very expensive if system processes are not updated and such translations occur automatically without authorization of the user. Software updates to manage these processes more effectively will be important.

As more companies try to leverage open source machine translation technologies, vendors such as NoBabel and Terminotix that provide tools that extract, format, clean and convert data into translation memories may see an increase in their business.

Impact on Language Service Providers (LSPs)

LSPs who use software from SDL, Kilgray, Across and other tools that integrated with the Google API should be prepared to update their software and learn new processes to ensure that they do not get billed inadvertently for machine translation that was sent to Google without consideration for costs. If the software is not updated, the Google Translate function will simply stop working on December 1, 2011 when Google terminates the Google Translate V1.0 API.

LSPs may still use the Google Translator Toolkit or copy and paste content into the Google Translate web page, but should be very aware of the relevant privacy and data security issues.

Overall, there should be little impact to LSPs, with the exception of those who were using Google Translate behind the scenes or offering machine translation to customers using Google as the back-end. While this was a breach of Google’s Terms of Service, I believe a a number of LSPs were doing this.

Due to insufficient and unclear warning or in some cases no warnings at all, it is understandable that some may not have been aware of or have not fully understood Google’s and other machine translation service providers’ Terms of Service. However, with the Google Translate V2 API, users of the service must expressly sign up and agree to the terms. It will no longer be possible reasonably to claim ignorance of the terms or of the associated risks for customer data when it is submitted to the API.

LSPs are focused on delivering quality services with the higher-end skills required for translating and localizing content for a target market and then ultimately publishing to a market. Without a doubt, machine translation has a significant role in the future of translation, in particular for accelerating production and giving LSPs access to new markets of mass translation. But high quality translation systems that meet the needs of LSPs and ultimately their customers will require customized translation engines that are focused on a much narrower domain of knowledge than Google’s engines and are ultimately combined with both a human post editing effort and a human feedback cycle that continually improves the engine by giving high-quality human driven input back to the engine. It is a built for purpose machine and human collaboration that will ultimately deliver to the end customer’s needs, not just machine translation alone.

Impact on Corpus Providers

Industry organizations such as TAUS may gain some traction from the short term increase in demand for data while companies experiment with open source machine translation. However, as research has clearly shown, data quality from a variety of sources can actually reduce machine translation quality.

In 2008, Asia Online participated in a study with TAUS, during which Asia Online built 29 translation engines using its own data in various combinations combined with 3 TAUS members’ data. In the resulting report the impact of each data set can be seen clearly. But another factor was the cleanliness of the data. The study clearly showed that having more data can provide some Improvements in quality, but if the source data is not cleaned and processed correctly, the quality of the data can cause considerably lower translation output quality. It was also shown that smaller amounts of clean source data can produce better quality output results than using source data sets even two times larger. TAUS has recently worked to improve the quality of their data and the data continues to slowly improve. But in reality, the human effort required for such a task is considerable and not practical on a large scale without the right linguistic expertise and tools in each language.

This analysis is continued in Part II

12 comments:

AnonymousJuly 1, 2011 at 11:54 AM
A pretty clear analysis there. On the matter of LSP's & end users doing their own thing I expect there will be more packages like Do Moses Yourself (DoMY) from Precision Translation Tools, which I recently downloaded and got up & running in a couple of hours. A good way to build customised engines.
Terence Lewis
ReplyDelete
Replies
Jean-Marie Le RayJuly 1, 2011 at 11:25 PM
Well, we've been waiting the follow-up for one month, but it's definitively worth the wait!
ReplyDelete
Replies
Dion WigginsJuly 3, 2011 at 9:22 PM
@Terence - It seems that you may have missed a key point that I made in the post: If creating a high quality customized translation system was as simple as just getting the installer working, this would be standard and everyone would do it. Those that I know creating "MT for dummies" type packages all create them in the hope of getting consulting work of the back to work on data and other parts of the customization process. Just having it installed and running is not a solution. The customization process to deliver high quality output, even when you have all the other technical and linguistic challenges resolved takes weeks, not a few hours. Preparing the data, normalizing terms, setting non-translatable terms, defining glossary, removing poor quality data and many more tasks are needed. Even the training process takes several days to run through the statistical analysis of the data.

Anyone can be up and running with an out-of-the-box solution in a couple of hours. But as my blog post discusses in detail, this is not a high quality customized system - it is an out-of-the-box system that is not designed to meet any specific purpose. I suggest you read the post again to further explore the differences between a system that installs in a couple of hours and a custom system designed for a specific putpose.
ReplyDelete
Replies
Dion WigginsJuly 3, 2011 at 9:25 PM
@Jean-Marie - Thank you. This post took a lot more work that I had expected. I am happy with the final product. It still surprises me how many companies are willing to use third-party services without understanding the legal ramifications of doing so and the exposure that they risk for themselves and their companies. This applies to not just Google, but any third-party service. Hopefully this will make more LSPs aware and they can approach third-party services with better preparation and information.
ReplyDelete
Replies
AnonymousJuly 5, 2011 at 4:41 AM
I think for a lot of translators and even many small and mid sized SLV’s and MLV's the whole topic is a bewildering example of technology being used by giant organizations to essentially wipe out the oldest industry in the world, for greed and profit.

On one hand for many in “translation” all this technology is hard to grasp and even more difficult to deploy and master without extensively employing IT professionals. So no matter how “FREE” it was beforehand, it was still a challenge, unless you were able to set it up economically.

On the other hand this technology exists and is here to stay, and while for the foreseeable (near) future true linguists will do a “better job”, but how will linguists leverage this in their favour without significant capital input in either investment or resources, and compete and survive, it this cut throat competitive cost driven industry.

Will we all need to re-train our linguistically orientated minds towards IT and systems technology?, or, Will we be leveraged as copyeditors to the machine ! ?
Like not so long ago into the “DTP” market, where translators became editors to publishing and graphic artists as a skill set in place of translation, pushing the language into second place in favour of the new great hope?

This is a perfect example of the technologists wielding their trade (WAR) to their benefit and to the detriment of the artistically focused. It is so frustrating and bewildering to imaging further erosion of one of the oldest industries in the world and being washed into the gutter. It’s tantamount to economic genocide.
ReplyDelete
Replies
Kirti VasheeJuly 6, 2011 at 11:37 AM
@anonymous

Thank you for your comment. While your perspective is one view of what is going on, there are also forces other than cost-cutting and greed driving the improvements in MT technology. From where I sit, I think the development motive has never been to hurt translators, though I admit some have used in this way sometimes.

Machines have largely replaced humans for highly repetitive and mechanical work, and I think this may happen with translation of very low value(and repetitive)content related to selling products and other commercial purposes.

Most frequently MT is used for business content that has a limited shelf life (and value) and speed is more frequently the most important factor driving it's use. Also, MT is often the only option for very high volumes of short lived content. MT has not been useful to translate literature and poetry and anything that has deep artistic value. This is unlikely to change in the future.

For another perspective on why MT matters and what other factor might be motivating the development of this technology see: http://goo.gl/OupXi

I believe there is a real role for translators in the continuing development of this technology that goes far beyond being copy editors.

Given the nature of human language - it is unlikely that machines will ever truly replace humans in truly human endeavors and pursuits.
ReplyDelete
Replies
Alicia GonzálezJuly 7, 2011 at 9:46 AM
A very exhaustive analysis. I hope this will help LSPs to see that Machine Translation (with or without Google's participation) will not take anybody out of the business. Thanks for sharing this article, Kirti!

Posted by Alicia González
ReplyDelete
Replies
Mark @ Odista Serbian-EnglishJuly 15, 2011 at 12:25 AM
A really excellent (and epic) analysis - it's going to take a while to digest all of it!

I think you very rightly recognise that Google was never really interested in the "translation industry", but rather GT has always been yet another tool for Google to leverage in increasing its user base and general domination of the Internet. I don't know if Google are still soliciting contributions of translation memories in order to help expand the base - in any case, the tool has hit a wall of diminishing returns for the main languages and no new data will significantly impact the quality of translation. I declined to yield up my hundreds of thousands of words of translated (often confidential) material, needless to say!

Anyway, the impact of these changes will be interesting to see - most of us will not miss the novelty tools that were set up around the Translate API largely for entertainment purposes. Those of us who were using the GT API directly or indirectly for commercial purposes will no doubt find a way to make this commercially viable on an ongoing basis.

And nobody will miss the avalanche of auto-translated junk web content that was created by the black-hat SEO fraternity for quick gain...
ReplyDelete
Replies
Dion WigginsJuly 17, 2011 at 8:10 PM
@Mark, thank you for your comments. With respect to Google soliciting contributions of translation memories, I believe this takes 2 forms. First, Google takes non-exclusive rights in any TMs uploaded or proof read via the Translator Toolkit. Second, Google has been paying LSPs to translate using Google Translate and then proof read. The second method is all fair and good, but the first is where LSPs should take care for the reasons that I explained in the blog post. Few LSPs realize that they are giving Google the rights to use the output of their post editing and also the input of what is to be translated (this is used for language modeling in SMT).
I agree that in many languages that GT is hitting a wall – the reason is that Google has taken the dirty data approach. This approach takes all the data they can and hopes that statistically the good data rises to the top. The problem with this approach is that sometimes bad data can be more statistically relevant. To displace this bad data is very difficult and requires huge volumes of data. Also as discussed in the blog post, there will always be bias towards a given domain such as finance because there is more data available in some domains and less in others.
The clean data approach requires less data, but the data must be clean and in domain. This delivers much higher quality output focused on the target audience and vocabulary. The challenge in this case is finding the right data. Many LSPs have such data and are in an ideal position to use it. When an error occurs with clean data, it is most commonly for 2 reasons – unknown words (solution – add the word) and wrong word order (solution – add a small amount of monolingual data to support the grammar). This can very quickly be addressed via post editing and rapid improvement can be seen as a result.
The one-size-fits-all approach is not practical for LSPs. If it was, any human translator could translate anything. This puzzles me that people, even professionals in the industry, judge the quality of machine translation by what they can get out of the box or what free MT like Google offers – without doing any work to make it meet their needs. A MT system should be treated similar to a human translation project – it needs glossary, style guide, non-translatable terms, etc.
ReplyDelete
Replies
Dion WigginsJuly 17, 2011 at 8:20 PM
@Alica – I think you are correct – MT will not take away work from translators and will not take anybody out of business. In fact, in many case where we work with LSPs, it is creating work for translators. Many projects were not viable using human translation only. But with a combination of MT and HT, the business case for many projects is now realistic. When it comes to such projects, there are 3 types of translation that occur:

• HT Only – where a machine is not suitable. For example the menu and core content that defines the structure of a website. Where there is only a small amount of content in a particular style or domain that does not cost justify the customization of a MT engine in the domain.
• MT + HT – where there is substantial content, but the content must still be human quality.
• MT Only – where there is substantial content, but the quality can be slightly lower quality than human.

We are finding that a healthy mixture of the above is quite common and that many enterprises are now looking to translate content that they would not have previously considered translating as a result. There are new areas of work for MT post editors and also business analysts that need to understand what kind of content fits each category. We have a number of customers with projects that would not have been cost justifiable and would not have been realistic in terms of time frame if they had only used humans.

In this way, the higher end skills become even more valuable, while the lower end skills become more frequently required. This is a win-win for both the domain specialists and those in the early stages of a translation career.
ReplyDelete
Replies
Multilizer / NikoJuly 17, 2011 at 11:52 PM
I also agree with Alicia and Dion. In this age of globalization, more effective translation means that more and more translation will be bought. I think this situation is fully comparable to other industries experiencing fast technological progress, e.g car-industry about 100 years ago:
http://translation-blog.multilizer.com/machine-translation-friend-or-foe-for-human-translators/
ReplyDelete
Replies
lingoblokeNovember 8, 2011 at 9:22 AM
@Dion,
Sorry, I didn't spot your reply to me back in July. I certainly wasn't suggesting you could use a package like DoMY out of the box for professional purposes. Once you've installed it with the small ready-made translation & language models, you'll only be able to translate Eurospeak and will soon be asking "what do I do next"? Much of what you do next will involve everything that you described in your post. I certainly couldn't subscribe to the view that such packages "dumb down" SMT - they just take some of the pain out of the set-up stage. I have now built engines with both DoMY and the "traditional Moses" method and find the former approach just as intellectually demanding as the latter one.
Terence
ReplyDelete
Replies