Pages

Wednesday, June 1, 2011

Analysis of the Shutdown Announcements of the Google Translate API


There has been some buzz about what this means to the translation industry and so I thought it would be good to have a detailed and in-depth analysis of  this announcement. This is an insightful post by guest writer Dion Wiggins, CEO of Asia Online (dion.wiggins@asiaonline.net) and a former senior Gartner analyst. The opinions and analysis are that of the author alone.

Reviewing the Facts of the Announcement

A simple read of the announcement would lead one to believe that Google has merely shut the door for developers who wish to integrate automated Google translations into their code and products, while still allowing users to translate web content on-the-fly with either the Google Translate web page or the Google Translate Widget/Web Element. However there is more to this announcement than one might realize at first.

Google has recently made a number of formal announcements, in addition to a few quiet actions, affecting users of their various language translation tools. On Thursday, May 26, 2011 Adam Feldman , Google’s APIs Product Manager, announced in a blog post (http://googlecode.blogspot.com/2011/05/spring-cleaning-for-some-of-our-apis.html) that Google was adding 7 new Application Programmer Interfaces (APIs) for use, but 18 Google APIs covering a variety of areas will first be deprecated and many then shut down. One of those APIs destined to be terminated is the Translate API.

Google updated the Google Translate API, Transliteration API and Translator Toolkit API webpages with the following messages:
Important: The Google Translate API has been officially deprecated as of May 26, 2011. Due to the substantial economic burden caused by extensive abuse, the number of requests you may make per day will be limited and the API will be shut off completely on December 1, 2011. For website translations, we encourage you to use the Google Translate Element.
Important: The Google Transliteration API has been officially deprecated as of May 26, 2011. It will continue to work as per our deprecation policy.
Google frequently offers alternatives to deprecated API’s. An alternative to the Google Translate API would have been the Google Translator Toolkit API. However, without making any announcement, Google also has quietly modified access to the Translator Toolkit API, removing all documentation and restricting access.
Important: The Google Translator Toolkit API is now a restricted API. However, we have no current plans to remove the functionality for current users. If you are a current user of the API or are interested in access to the documentation, please let us know.
These changes do not mean that Google is going to do any less with its own machine translation efforts. It simply means that all forms of translation API are now being progressively deprecated or restricted for use by developers. 

This impact to Google’s translation services as a whole can be summarized as follows:
  • Google Translate web page (http://translate.google.com/) will still translate text that is typed into the text box and will also translate a HTML web page when a URL is submitted.
  • Google Translate Widget/Web Element (http://www.google.com/webelements/#!/translate) will continue to function and will still translate content on-demand when a viewer of a web page requests a translation.
  • Google Transliteration API (http://code.google.com/apis/language/transliterate) will continue to function up until May 26, 2014 as per the deprecation policy.
  • Google Translator Toolkit (http://translate.google.com/toolkit) will still function as before and users can submit TMX and other documents formats for translation.
  • Google Translator Toolkit API (http://code.google.com/apis/gtt/) will continue to function as before and documents can be submitted and retrieved as previously. However new development using this API has been restricted.

Abuse, Economic Burden and Google’s Right to Shut down the Translate API

Google has been offering the Translate API free of charge. The Terms of Use (http://code.google.com/apis/language/translate/terms.html) discuss deprecation of the service. In the terms Google states:
For a period of 3 years after an announcement (the "Deprecation Period"), Google will use commercially reasonable efforts to continue to operate the Deprecated Version of the Service

Google has however noted that it will continue providing the service only until December 1, 2011, as they consider that there is indeed a substantial economic burden. There is confusion between the blog entry, which says that “Following the standard deprecation period – often, as long as three years – some of the deprecated APIs will be shut down.” and the December 2011 date. The two statements are contradictory and bring into question the extent to which users can rely on Google Terms of Use.
Google has worded the announcement carefully so as to allow the use of the following clause:
Google reserves the right in its discretion to cease providing all or any part of a Deprecated Version of the Service immediately without any notice if:
d. providing the Deprecated Version of the Service could create a substantial economic burden on Google as determined by Google in its reasonable good faith judgment; 

While Google may be within its rights to shut down the service if there is indeed a substantial economic burden, lack of clarity means that Google risks further upsetting, confusing and frustrating its users and developer community by not being fully transparent on the reasons for the decision.

The vagueness of the announcement and the lack of information on the rationale behind it are already giving rise to speculation as to what the actual reasons could be:

Substantial Economic Burden: The amount of text translated via Google’s Translate API is believed to be only a fraction of the volume compared to other means of translation provided by Google such as via the web interface or the Google Translate Widget/Web Element. Costs incurred by Google would include bandwidth and processing capacity, but the Translation API expenses overall would be miniscule when compared to those of Search, YouTube, Gmail and Google Apps. Therefore we can deduce that the substantial economic burden is not related to the operational costs of the API itself.

Extensive Abuse: This could be interpreted as users of the API not following the Terms of Use (http://code.google.com/apis/language/translate/terms.html ) of the Google Translate API. Abuses would include activities such as using the Translate API for commercial purposes in a manner that Google deems in violation of its Terms of Use or incorporating output from the Translate API in website content.

Deeper Analysis

Google’s stated mission is to “organize the world's information and make it universally accessible and useful.” Language translation is one method for the creation of content for this purpose that helps achieve this mission. However, with this announcement, Google is making it very clear that it reserves the right to exclusively use this method within its own applications such as the Translate Widget/Web Element. Google wants to control how and when content is translated into another language and by whom.


In order to better understand the announcement and analyze possible reasons for Google shutting down the API, it is important to understand Google’s products and customers. Google’s primary revenue source is advertising driven by content and the contextual analysis of said content.

In this model, Google’s customers are advertisers who purchase ads, not users of Google services. Google’s products are not really search, translation or Gmail – these are tools that Google offers to users. In fact, Google’s product that it is selling to its advertising customers is the large number of users of various Google tools. 

When Google was a young company, users adopted the new search tool rapidly because it provided superior quality search results over alternatives. Increasingly Google is now being challenged by other search tools, especially in non-English markets. In addition Google is frequently criticized for delivering lower quality content in its results than alternative offerings in the Search arena. 

On February 24, 2011, Google’s official blog discussed many of the issues that it faced deliver high-quality results in a blog entry entitled “Finding more high-quality sites in search” (http://googleblog.blogspot.com/2011/02/finding-more-high-quality-sites-in.html). The post states:
Our goal is simple: to give people the most relevant answers to their queries as quickly as possible. This requires constant tuning of our algorithms, as new content—both good and bad—comes online all the time.
But in the last day or so we launched a pretty big algorithmic improvement to our ranking—a change that noticeably impacts 11.8% of our queries

This was followed by a major algorithmic update announcement on April 11, 2011 on a blog post entitled “High-quality sites algorithm goes global, incorporates user feedback”  which was more commonly known as the “Farmer Update”. The aim of this update of the Google search algorithm was to downplay the influence of mass produced low quality content specifically created for Search Engine Optimization (SEO) purposes and also reduce the rankings of search results associated with content farms such as Demand Media. The update impacted a further 12% of Google US search queries and was initially only applicable to English language content, but was followed shortly afterwards by the “Panda Update” for non-English content.

It is clear by these actions that Google has realized that while it has significant market dominance in search, emerging competitors are starting to gain ground by delivering higher quality results, just as Google did when it first launched its search tool many years ago.

Google remains the undisputed leader in most major markets. However there are some notable exceptions which include Russia, China, Japan and South Korea. In these markets, local search operators such as Yandex, Baidu, Yahoo! Japan and NHN respectively have dominant market share. It is easy to blame local factors (such as governmental policy or influence) or other restrictions for lower market share. But, Google is in reality playing catch-up in many of these non-English markets. Local search operators have both deep market insights and an even deeper understanding of their own language and culture. These advantages have allowed these companies to deliver higher quality search results in their local language targeted at specific local audiences. 
In order for Google to deliver high quality search results, Google relies on high quality web content. But the forces of globalization may actually be leading to lower quality web content through automated translation without the human post-editing process that would ensure quality. With the rapid expansion of the Internet and globalization for many companies, enterprises and website owners have been increasingly translating their content from English and other languages into the languages of new markets across the world.

Translating content professionally is both slow and expensive. Depending on the domain of translation and the language pairs being translated, professional translation can cost as much as US$0.50 per word for a language such as Japanese. For European languages, costs typically range between US$0.08 to US$0.20 per word. For many publishers, this translation expense is too high and cannot be justified. Human translators typically translate at around 2,000-3,000 words per day. Rapid translation of time sensitive content using machine translation is an alternative. Being first to market with new information can be a significant competitive advantage and bring significantly greater advertising revenue to online publishers. 

It would therefore be no surprise that some online publishers are abusing the free Google Translate API to translate content and then publish local language content to complement existing content that has been created by human translators or authored in local language. This is a common technique that SEO companies have applied to bring more users to a website and then in turn link through to premium content.
Google’s Terms of Use state clearly that this use of the Translate API is not permitted and there are very good reasons for doing so beyond the financial cost of bandwidth and processing power.

Internet Stats (https://supplygem.com/internet-usage-statistics/) estimates that more than 50%, more than 2 billion Internet users come from Asia, with continued rapid growth expected. Other non-English speaking markets in Europe, Africa and the Middle East are also growing quickly. Like the websites that are abusing the Google Translate API to deliver local language content and gain users in global markets, Google also aspires to compete more effectively in many non-English markets and grow its worldwide user base. 

Google’s ultimate product for its advertising customers is the expansion of its ability to secure large volumes of users in non-English language markets to complement its English centric origins where it has already established a dominant market share.

In order to achieve this aim Google must control the means of access and the quality of the content. Google’s Translate Widget/Web Element offers a real time translation alternative to the Translate API which has many additional benefits to Google that gives it not only this control, but also control of who accesses translation functionality. It is therefore no surprise that Google is deprecating the Google Translate API, in favor of translation methods that it controls directly. 

Advantages and Disadvantages of Google’s Strategy
  • Content that is translated and published via the Google Translate API and then stored by publishers reflects the quality of Google’s translation technologies at the time the translation was performed. In many cases, this is sometime in the past (most users of the API create static content that is not updated often), and does not reflect the improvements in translation quality as Google updates its translation technologies on a continuous basis. When this old content then ranks highly on a Google search result, users may be frustrated at the lower quality result. 
  • User frustration at lower quality local language content is only a part of the issue. Advertisers want to target their advertisements onto high-quality local language content that will in turn driver users to click on their advertisements. Lower quality sites have a negative impact on the click-through rate that disappoints Google’s customers – the advertisers.
  • By shutting down the Translate API, Google forces web publishers to find an alternative translation tool or use the Google Translate Widget/Web Element. Using the Google Translate Widget/Web Element has the advantage that the user can still see the content translated into their local language, but Google’s crawlers do not see this content as it is only translated for users on demand on a one-time basis. On demand translation also means the user is seeing the most updated output from Google’s latest translation technology that will usually be better than an older translation.
The last benefit above is huge, and it is the most likely, but not the only, reason for shutting down the Google Translate API.

Polluting Its Own Drinking Water
Google crawls and gathers data from many sources. In turn this data is used for a variety of purposes. In order to deliver high-quality search and local language content results, Google needs high-quality data. In recent times, it can be assumed that an increasing amount of the website data that Google has been gathering has been translated from one language to another using Google’s own Translate API. Often, this data has been published online with no human editing or quality checking, and is then represented as high-quality local language content. Google represents that data in its search results and also integrates this mix of local language content into tools such as Google Translate. 

It is not easy to determine if local language content has been translated by machine or by human or perhaps whether it is in its original authored language. By crawling and processing local language web content that has been published without any human proof reading after being translated using the Google Translate API, Google is in reality “polluting its own drinking water.” By indexing local-language content translated in this manner, Google delivers a mix of very different quality local language search results, which are often frustrating for users in many parts of the world.

This problem only gets worse when you consider that the same data that Google crawls for indexing websites is also used to improve its language translation technologies. Using the technique of statistical machine translation (SMT), Google relies on huge quantities of local language content sourced from its crawlers of the web to continually improve and enhance its language translation software.

The higher the quality of input to this training process, the higher quality the resulting engine can translate. So the increasing amount of “polluted drinking water” is becoming more statistically relevant. Over time, instead of improving each time more machine learning data is added, the opposite can occur. Errors in the original translation of web content can result in good statistical patterns becoming less relevant, and bad patterns becoming more statistically relevant. Poor translations are feeding back into the learning system, creating software that repeats previous mistakes and can even exaggerate them. This results in potentially lower quality translations over time, rather than improvements. 

One of Google’s key differentiators has been its ability to efficiently and effectively process extremely large volumes of data. While Google has not publicized how much data it has gathered to train its SMT engines, various articles indicate that Google has scanned about 11% of all printed content ever published. Google has access to widely varying volumes of data depending on language pair involved. While there are massive amounts of online language content for some languages (such as Tier 1 languages which include English, Spanish, Chinese, French, Japanese, etc.), this is not true for the vast majority of languages in the world. In general, the glass ceiling of data limitations is relatively low for many languages. While more data is becoming available online every day, but the challenge for Google is getting sufficient data volumes for Tier 2 and Tier 3 languages in order to reach a quality level that is acceptable for users. 

But even for Tier 1 languages, Google is facing a significant data glass ceiling. In a Guardian Newspaper article entitled “Can Google break the computer language barrier?”, Google’s Andreas Zollmann discussed the data glass ceiling issue and states that "Each doubling of the amount of translated data input led to about a 0.5% improvement in the quality of the output.” He makes a very important point about the limits of Google’s approach “We are now at this limit where there isn't that much more data in the world that we can use, so now it is much more important again to add on different approaches and rules-based models." 

Putting this in context, each time Google doubles the data, it gets diminishing returns. If Google doubles the data 3 times (11%x 2 = 22%, 22% x 2 = 44%, 44% x 2 = 88%) it quickly reaches the limit of data that it can collect. But despite this vast volume of data, the quality improvement is just 1.5%. Kirti Vashee, blogged on this topic back in January this year when Google first publicly discussed the data glass ceiling issue.

 So Why is Google Shutting Down the Translate API?
What Google did not anticipate was extent of abuse of the Google Translate API in a manner prohibited by its Terms of Use. This has resulted in such a significant mass of poorly translated content that the impact on Google’s core search business is notable and poses a significant threat to the quality of Google’s search results and the quality of its future translation initiatives. Given how important search and translation are to Google’s current and future business, this is most likely the “Substantial Economic Burden” and “abuse” that Google refers to in its shutdown announcement. With this realization, it makes sense that Google is taking action to rectify the problem. 

Possible Additional Reasons for Shutting Down the Translate API

Google’s market-beating revenue growth in 2010 can be attributed to three key business pillars:
  • Search – Google’s core revenue stream
  • Video – Short-term revenue
  • Mobile and Google Apps – Long-term revenue
Each pillar in turn has already integrated machine translation and Google is expected to add further functionality in future as demand expands further beyond English only content. 
The one thing that Google has most notably not yet been successful in, despite several attempts, is social networking. Facebook’s rapid rise and entrance into the online advertising space means that there is now a real competitor for online advertising dollars beyond Google’s AdSense. 
Many of Google’s users are already Facebook users. Facebook has already mastered the use of crowdsourcing. With the Google Translate API arguably offering the best translation quality any of the free translation tools, developers had already created products that integrate Google’s Translate API with the Facebook API to deliver a bridge between languages for Facebook users. 

With Microsoft as an investor in Facebook and Facebook being a significant threat to Google’s market share for both users (product) and advertising (customers), helping Facebook become even more popular via the free use of Google Translate API is certainly something that Google would not find desirable.

What About Google’s Software Developer Community?
Google has come to understand the strategic benefits of limiting the reach of machine translation, but cannot limit it to some (i.e. Facebook) and keep it open for others. By shutting down the Translate API to developers, Google is now the only software developer that can develop applications using Google Translate. 

By allowing use of the Google Translate API, Google has successfully seeded the market with applications that leverage machine translation as a core function within a product. There will be a literal smorgasbord of great applications that no longer function as a result of the Translate API shutdown that Google can take the best elements from and launch its own products without competition. 

This is going to be particularly important in the browser and mobile application space.
·   Web Browsers: Google Chrome already has embedded foreign language auto-detection and translation of content. Similar third-party plugins for Microsoft Internet Explorer and Firefox built using the Translate API will cease to function at the beginning of December. 

·   Mobile: Developers that have built products for Android using the Translate API will have the same problem, but expect Google to deliver increasing functionality that incorporates translation in the Android OS. Developers who built applications for Android competitors such as Apple iPhone and iPad will be harder hit when mobile translation applications cease functioning on their platforms at the beginning of December. It is unlikely that Google will build similar applications and functionality when it can maintain significant advantage for Android by withholding such functionality. 

By deprecating and then shutting down the API, Google reduces the capability of abusers to freely use automated translation to produce content and slows the rate of low quality content appearing on the Internet. However in doing so, Google is taking a risk.

Developers have invested money and time into their software products, many of which will cease to function (or need to be updated) with the shutdown of the Google Translate API. Just as Google underestimated the abuse of the Translate API, Google may also have underestimated the backlash from the developer community for what is seen by many as one of the most valuable Google APIs. Hundreds of postings have been made already discussing the shutdown of the APIs. Most of the posters are upset about the Translate API, with very few comments on the shutdown of the other 17 APIs. Emotions range from surprise, anger and distrust of Google to bewilderment. 

Google shutting down key tools such as the Translate API without offering an alternative to developers decreases confidence in the use of any Google API. Due to the dominance of Google and its tools, development and innovation in new innovation in Internet applications may slow or be stifled as a result of trust issues now brought to the forefront by the Google Translate API shutdown. 

The impact will likely reach beyond just the Google APIs and have knock-on effects on the use of all free APIs irrespective of who is providing them. There is a risk of a perception being created with developers that if a company as large as Google can pull the rug out from under developers, then any company could do so, with hundreds of hours of software development and marketing costs being wiped out with a simple shutdown notice.

When the Google Translate API was released in March 2008, Google released a train from the station that went hurtling down the tracks at a pace that Google had not anticipated. Developers quickly integrated the technology into their applications. With little management and oversight on the Google Translate API, Google quickly lost control of how, when and by whom it was used. 

The developer outcry in response to the shutdown of the Google Translate API is a clear indicator that Google’s attempt to recall the train back to the station is not going to be taken lightly. Many developers will not allow their hard work to be wasted. Developers will simply switch to other competing technologies. 

Smart developers have already built support for multiple free translation technologies into their products. The loss of functionality provided by Google Translate may mean fewer language pairs or lower quality translation in some cases, but once the train has left the station, there is no turning back. Microsoft Bing will be the most likely beneficiary and it would be no surprise to see Microsoft investing even more into its translation technologies as a result. 

Google’s attempt to control access to machine translation may be too late. In opening up the Google Translate API, controls should have been in place from the outset. It would not be surprising to see either a commercial or an open source API for translation appear in near future that encapsulates all remaining free translation technologies in addition to many commercial translation technologies in a single consolidated API. The demand is clearly there and such an API would make it even easier for developers to integrate translation into their products. Indeed, the result may be the exact opposite of what Google intended, with the further proliferation of machine translation products and content at an even more rapid pace.

Conclusions

·      Google is shutting down the Translate API, but Google Translate will continue to exist and improve in a manner that allows Google to leverage Google Translate in its own applications, but will not allow third-party developers to leverage the technology. Although late in applying controls to translation and with some risk, this is probably the best strategy for Google as a business.
  • Eliminating language as a barrier to knowledge and communication is one of the last great challenges of the Internet. Google most certainly understands the benefit and potential of its automated translation technology and is now trying to regain a level of control over it.
  • By shutting down the Translate API, Google is able to control when users access their translate functionality and when they can deliver advertisements to these users. Google also benefits by reducing the quantity of machine translated content that is misrepresented by websites as quality local language content.
  • The “substantial financial burden” that Google refers to is not related to the operational costs of the API itself, but the burden and risk to Google’s business as a whole that uncontrolled access to Google Translate functionality represents.
  • It is clear that Google understands the potential for translation. But it is also clear that Google understands the potential for abuse of translation and the knock-on impact that it is facing or may face. Not having control of what is translated and how the translations are used creates a threat to Google’s core revenue streams and potentially helps competitors such as Facebook to increase their value at Google’s expense. This is most likely the substantial economic burden that Google refers to in its announcement.
  • The shutdown of the Translate API is truly a shame for the many software developers that did not violate the Terms of Use and used the Translate API in a manner permitted. Developers should be aware of the limitations of free APIs where they have no control or say in the future of the service. Business models built around a free API with little other value-add are doomed to failure from the outset. If a free API must be used, then developers should try to look for multiple providers of similar functionality and build in support for as many APIs as possible in order to reduce risk. Developers should anticipate the possibility of competition from Google in applications that leverage automated translation and move to protect themselves via patents and by offering features that go beyond those of interest to Google.

The analysis in this post has been focused on gaining a clearer understanding of Google’s announcement as well as the probable reasons for Google shutting down the Google Translate API. A follow-up post that analyses the impact on the language services and translation industry will be posted shortly.