Monday, June 20, 2016

The Larger Context Translation Market

This is another post inspired by my recent visit to Rio and dialogue at the ABRATES conference, where translators were eager to engage with MT in a meaningful way, and asked many questions about where the most interesting translation challenges were. In several conversations about “the translation market” I had a very clear sense of how there really needs to be a larger perspective on what this means, as the most interesting opportunities with MT tend to lie outside what is generally understood as the translation industry. 

For most of the people who attend “translation industry” and localization conferences, the most trusted description of the industry is the market that Common Sense Advisory (CSA) describes as the Language Services Market. A market where translation agencies provide translation, localization and interpreting services to buyers for a fee. This is a market that is estimated by CSA to be $38.16 Billion in 2015, with Lionbridge proudly claiming to be a “perennial list-topper” and the largest language service provider (LSP).  Their PR piece provides a clear description of what the CSA market definition covers. This link lists the Top 20 LSPs globally, measured by total revenue. Here is another view of the CSA market summary that shows geographical concentration of the industry.
*The Language Services Market 2015, Common Sense Advisory Research, June 2015
  This CSA sourced graphic from the Lionbridge blog describes the fastest growth segments in the language services market. Clearly, translation, software and website globalization are at the top of the growth list for this type of paid translation service.

The following graphic looks more closely at what the focus of the LSP Translation industry is, and we see the kind of content they focus on, and the tools and skills most relevant to addressing translation of this type of content. Thus, project management is the core business function and the most important tools are TMS systems and TM.

While some LSPs do use MT, it is generally not a mission critical tool. MT is used if the LSP is able to pull together an MT system that improves productivity and reduces costs, and is largely reactive i.e. often because the client insisted. But, it is important to understand that the focus is still on the same kind of content shown above. The translation industry has mixed success with MT, and maybe a few systems do become integral to the overall translation production process. But most do-it-yourself LSP MT initiatives fail or wallow in a kind of confused and isolated geekdom with mediocre results.  MT systems that consistently produce excellent output are the hardest to develop, so it is somewhat ironic that those that understand the least about how the technology works, try and build the most capable and efficient MT systems. The MT experience that Jost and other translators describe in blogs,  described as PEMT, MpT, MT+PE etc.. is presented as the great evil by IAPTI, or horrid commoditization of the work of translation by many others. Most often they are working with MT systems at arms length, and have no ability to steer or guide the  MT system development to make it more useful.  Hopefully, the notion of Moses-as-instant-magic is now widely understood as a limited success strategy, and the more savvy enterprises and LSPs leave it to experts, who also struggle to meet these consistent high quality output goals. Good MT systems will always take time, expertise and articulate linguistic feedback to develop.   

However, I think the new Adaptive Dynamically Learning MT that Lilt is producing has a very bright future with smart LSPs, and provides a platform to transform the MT experience into a much more predictable and worthwhile endeavor and will also allow translators to be much more engaged and involved in steering the MT system.

The Larger Context for Value Added Translation


Though the “translation industry” MT experience is mixed, I would argue that MT has been responsible for driving revenue or definable economic value in a variety of non-traditional scenarios, on a scale that dwarfs “the translation industry” as defined above. Interestingly, the most successful implementations of MT are done by global enterprises who often still work with LSPs for the static structured content, but for the higher value unstructured dynamic content are largely choosing to do it themselves (sometimes with some expert help) or build internal teams to address what they see as a long-term and strategic need to enhance international business initiatives.  It is useful to consider some case studies of these strategic MT initiatives by enterprises, to  understand this better.
The graphic above shows the volume of words that Google translates every single day with their MT systems as reported at Google I/O in April 2016. To put this in context, I saw a Lionbridge presentation a few years ago, where the CFO said they translated just over a billion words that year (2009). SDL who is probably the most MT savvy and active with MT LSP, recently claimed they are doing 20B words per month through MT. So, it is quite possible that Google alone, translates more words a week than the whole “translation industry” does in a year. When one considers that perhaps over 90% of Google’s revenue (~$78B/year) is generated from advertising linked to key words, it is quite possible that Google derives tens of billions of dollars from their MT technology initiative! It also gives them very specific intelligence on what matters to people across the world and what cross language content is the most sought after. The economic value of this knowledge is significant, and hidden in the advertising revenue they report. This knowledge of what matters across the globe is something the “translation industry” and SEO experts would love to know.

is another example of high value derived from MT. Their initial entry into MT was like Google related to search, but additionally they had a massive knowledge base in English for their software products that was difficult for their substantial global customer base to efficiently access. Thus, while Microsoft probably spent hundreds of millions on “translation industry” services for static content, this only covered a tiny fraction of what they needed to translate. Given that Microsoft gets as much as 70% of their revenue from non-English speaking countries, translation of all kinds of product related content is important. Making more technical support and customer care content rapidly multilingual was an imperative for executives who cared about the customer experience, and also generated huge savings in support costs and dramatically improved the user experience for the non-English speaking customer. The software industry measures the value of self-service content by something called deflection cost. So, if they can deflect a call to the support center, by making more knowledge base content available in more languages,  using MT, they can save possibly as much as a $100M+ per day given the size of their user base and actual volumes of knowledge base access. Add maybe another 50B words/day that their Bing MT does for the random internet user, and we have another stream of economic value coming from search words that generate advertising revenue across the globe. Their recent Skype STS initiative also will likely yield great benefit and new ways to monetize their translation technology expertise.

When you consider that both Intel and Adobe also use the Microsoft Hub MT to translate knowledge base support content, the deflected cost savings impact is easily worth hundreds of millions of dollars a day. This is not even considering the many other IT companies doing this on their own using other MT technology e.g. Symantec. The “translation industry” has a very small footprint in this kind of translation activity,which is now often considered mission-critical and probably involves several billion words per month.

The online eCommerce market is another example of economic value generated by competent MT efforts that is off-the-books of the “translation industry”. EBay decided some years ago that emerging economies were a huge opportunity worth strategic attention. So they acquired MT technology and built a competent MT team that had astrong linguistic collaborative component in the team. Based on presentations they made at the AMTA 2014 conference it was clear that there was a huge growth impact in the Russian market from their initial efforts to to make more Russian content available. It would be safe to say that the value of the impact is probably in the hundreds of millions of dollars of new revenue, from all the new markets that they have been addressing using MT. It is also really worth taking a look at what is involved in doing this. It takes focus on solving new kinds of translation problems and making sure the translation problems you solve do enhance the value of your MT efforts. This last link shows the special issues related just to Brazilian Portuguese. We should note that most competent MT efforts of any scale, move carefully, one language at time rather than trying to do 20 or 30 in a single go. We also see that Amazon acquired Safaba in 2015 and possibly have similar plans to make catalogue content multilingual to drive bigger volumes of international business. Alibaba and Baidu also have eCommerce focused MT efforts well underway but fewer details are available. The net economic value of all these type of MT initiatives: Probably in excess of $20B per year by my very rough estimates.

Recently Facebook surprised the world by announcing that they have a substantial MT effort underway after using the Microsoft Bing MT technology for several years. When asked why they did this Alan Packer said: Scale is one reason Facebook has invested in its own MT technology. The other reason is adaptability, they wanted technology that was optimized for their very specific needs. Facebook is now serving 2 billion text translations per day. The problem they had with Bing they claimed was, it was built to translate properly written website text and did not do well with the slang, metaphor and idiom typical in Facebook comments. Packer described Facebook language as “extremely informal. It’s full of slang, it’s very regional.” He said it is also laden with metaphors, idiomatic expressions, and is riddled with misspellings (most of them intentional). Additionally, as in the rest of the world, there is a marked difference in the way different age groups communicate on Facebook. They know that already 50% of Facebook users regularly use auto translation. This user group will only  grow as more people come online. Packer says that access to the translation product leads users to “have more friends, more friends of friends, and get exposed to more concepts and cultures.” The more people across the world that Facebook users can connect with, the longer they’ll spend on the social network, and the more revenue-earning ads they’ll see. The economic value of this is probably several billion dollars a year. Emerging social networks that are global will need to address the same problem. 

So what we see is that a select few companies are generating more economic value from solving very specialized and much more challenging translation problems than the whole gross revenue output of the “translation industry”.   Solving large-scale translation problems using MT is apparently a very high value proposition, and none of the global enterprises mentioned above considered going to the “translation industry” to help solve these really challenging and complex translation problems. Probably because it is very clear to any strategic observer at these companies, that most LSPs lack the vision, skill, interest and competence to solve these types of translation problems.  We can perhaps even generalize the core requirements for a larger set of global enterprises as shown below. There is a real mismatch in terms of skills and focus between the broader translation needs of global enterprises and the service focus of the “translation industry”. While the static content will likely remain important as a mandated requirement, it is not where long-term corporate value is built either for the enterprise buyer or the LSP in my opinion.
To illustrate this further let us consider the investor sentiment on value that can be gleaned from stock market data. While this might be a stretch of logic to some, I think we can fairly assume that investors value solutions to certain kinds of translation problems more than others. Facebook has seen a huge growth in mobile ad revenues and it seems that they are taking ad share away from Google recently, and so these Market Value/Sales numbers reflect very active market trends. The investor sentiment is that Facebook is probably better poised to gain $$$s from the next wave of internet adoption than any of the companies listed in the chart as they climb beyond 2B users, very few who speak English or French or German. As one analyst says: "Advertising budgets are moving towards Facebook, and it seems to be a winner in the online advertising world with measurable results." Contrast this with the investor sentiment for large LSPs, surely, it has something to do with long-term promise and potential. To me this suggests that investors in general view the LSP focus as lower in value but understand that translation can produce huge leverage in the right hands.

So to those wonderful translators at ABRATES who asked me what kinds of MT projects to get involved with, I would say the following:
  • Focus on companies who are solving interesting translation problems. They will have the most rewarding work and it might involve stepping out of the translation industry.
  • Stay away from LSPs who sell the Moses Mirage, this is likely to be the worst PEMT experience. MT systems that don't want translator feedback at a pattern level are not likely to be a professionally satisfying experience.
  • Work with people (LSP/Enterprise) who allow and want you (translator/editor) to provide feedback and interact with the MT development process.
  • Learn about Machine Learning and AI in other domains, and develop skills with Regex, Corpus Analysis & Corpus Editing and Pattern Identification skills to be considered valuable.
  • Explore Adaptive Dynamic Learning MT like Lilt (maybe others will appear soon)  to understand how MT can work with you and for you while you wait for the right opportunity. This is truly a paradigm shift that is worth at least some experimentation to see how the translator desktop could evolve.
  • Ease up on the need to have everything on the desktop. The future of Machine Intelligence solutions will require big data and big computing, so the best and most sophisticated tools will by definition only be available in the cloud. Lilt is a first generation example of this, others are coming. The cloud makes sense and allows new, more effective ways to solve old problems and is not a bad thing.
And for those who think the four companies above are the exception rather than the rule, it is worth noting that we are just beginning with what is possible to do with Machine Learning and Artificial Intelligence. Neural MT is just beginning and could drive a whole new wave of higher quality and more adaptive MT. The Machine Intelligence market is still nascent and we will very likely see big data + big computing + smart algorithms come together, to solve problems that we thought were beyond the scope of computers just last year.

Tuesday, June 7, 2016

The ABRATES Conference in Rio: Translators focusing on MT

I had the honor of participating in the 7th ABRATES International Translation and Interpreting Conference in Rio de Janeiro last week. An event that had over 500 attendees, based on my casual observation. A large portion of the attendees were translators, but there were also some LSPs and Enterprise representatives. As much of the information was presented in Portuguese I had direct experience with simultaneous translation via a headset which was also kind of cool, and it was fun to switch around when I was less interested in the actual subject matter.

The formidable, emotion packed sign language interpretation by Paloma Bueno, intensely focused simultaneous interpreter volunteers in the booth, and the abounding loveliness of Rio.

I found the conference surprisingly refreshing for several reasons including:
  • The high level of understanding that many translators had about MT, Post-Editing practice and their general attitude that it is better to understand and use translation technology than fight it or fear it.
  • The beautiful location, as Rio is a naturally scenic and inviting spot.
  • An emotionally powerful sign language interpretation of the keynote session by Paloma Bueno who I cannot believe was doing this in real time.
  • The eagerness and openness of many translators present, to try and understand how they as translators could engage and work with MT and develop meaningful expertise in MT related skills.
  • The willingness to explore and understand how translation technology will continue to evolve and possibly impact their professional work.
  • Several conversations with translators who had long term experience with MT and thus had direct knowledge of MT systems that improved over time and had also seen both good and bad MT engines over the years, so were much more coherent in their criticism.
  • The shared experience of many different kinds of MT encounters from a variety of translators, ranging from DIY horror, experts systems that slowly evolved in quality gradually over years, and some proprietary efforts that produce astonishing quality.  
  • The presence of several very competent presentation sessions on developing MT related skills including:
    • Corpus Preparation for MT training
    • Working with the varying quality of PEMT output that translators get from LSPs
    • Using REGEX (Regular Expression) to develop more powerful text based editing skills when deal with corpora
    • PEMT best practices and tools and shared experiences
I also found this conference special, because I personally had no corporate allegiance at the event and was truly just an independent spokesperson with some knowledge of MT technology and it’s potential and place within the context of many of the attendees professional lives. As I am no longer employed or affiliated with Asia Online I felt very comfortable sharing my opinions, with no concern about persuading anybody to go one way or another. My opinions were all truly independent and the truest expression of what I try and do in this blog, i.e. provide useful and relevant information to inquiring minds. So while I am indeed looking for professional work, I am really enjoying this independence and focus on what really matters.

It was interesting to find that when one has this kind of openness and lack of bias as a presenter, there is an opening of the perception, and I was able to see much of what I was saying with a new and fresh eye. It was like playing improvised music to a keen and attentive audience, the shared attention of the musician and the audience creates a new, more evolved, version of an existing musical idea. I will share some of those insights in upcoming posts.

I also understood much more clearly that most often, translators have very little control of the content they are given to translate, because of the current structure of the professional translation business which is usually: Enterprise > MLV (Big Agency) > SLV (Small Agency) > Translator. Thus translators are often left to deal with poor quality source which cannot by contract be corrected or changed, work with crappy MT output produced by DIY practitioners who do not know how to actually do it themselves, or have no say in how the MT engines evolve since they are so far down the production line. Thus we have the current situation of unnecessarily mind numbing PEMT work, rather than evolving and rapidly evolving MT technology from more efficient production processes. And very often the extremely valuable linguistic feedback that translators provide is lost or ignored. An MT paradigm that organizes and collects valuable translator feedback will surely be more competitive and produce higher quality and benefit to all concerned. Not to mention that it will be personally rewarding for the many translators who will need to be involved, as the nature of the problems they solve will evolve in value and impact from the typical LSP project.

Plenary session on MT
I had an interesting experience during a plenary session panel on MT where all the other speakers were speaking in Portuguese, so I had to have a headset to understand what they were saying. When I started speaking, the interpreters of course started speaking in Portuguese, and I found it very strange and unsettling to hear a voice saying everything I was saying in English in Portuguese in real time. Somebody once said that MT is magic which I felt deserved some scorn, but to me this act of listening and translating into another language in the instant, not knowing what I was going to say, was surely closer to magic.

If this conference is an indicator of what is happening in the professional translation world, it is very promising for several kinds of translation technology initiatives. I have always felt, much to the chagrin of my former employers, that the real promise of MT will be seen when translators seek it out and learn to steer, drive and enhance the ongoing evolution of this technology. If this conference is really only representative of the Brazilian reality with translation technology, then I predict that the most exciting advances in MT will come from those working with Portuguese. This community is primed for the most interesting new Adaptive MT initiatives like Lilt which can empower motivated and technically savvy translators.

You can find some Twitter coverage of the event by searching on the hashtag #abrates16  or if you look up the following accounts:  


Ipanema Street Market

Thursday, April 21, 2016

The Concept of MT Maturity

I would like to introduce a series of posts that looks at and discusses the concept of MT Maturity. I hope to illustrate that getting real business advantage from MT requires alignment with a variety of other related business processes. I also plan to look at how the MT technology continues to evolve, especially as related to use in professional translation and corporate use to make large amounts of content multilingual. 

To facilitate the discussion I have created my own very rough evaluation and analysis model, which is an adaptation of the CSA Localization Maturity Model which is an adaptation of the software industry’s capability maturity model (CMM). Basically, it is a way of assessing if the technology user understands the technology, and is also using it in an efficient and effective manner by properly linking it to other organizational processes. 

Thus, companies that are at a higher stage of the CSA LMM model referenced above, will tend to have much more responsive and adaptive localization practices, that enable them to be much more nimble and efficient and effective in international market initiatives. These maturity models define very specific “stages” to characterize the efficiency and effectiveness of the business process as shown below. Thus an LSP at a higher stage is probably much better to work with, and an Enterprise at a higher level is also much more nimble and international market savvy and superior localization practices.

While this is interesting, it is somewhat theoretical, and I am going to attempt to make it less so in my own analysis of MT technology. This will make my comments less academically robust, but since all I do here is just speak my opinions on things it does not really matter. I do NOT represent any corporate opinion here, and these comments are purely personal observations though hopefully still valid professional assessments. My intention is to highlight best practices and point out what I think are interesting trends.

MT (machine translation) or Automated Language Translation (ALT) or Machine Pseudo Translation (MpT) use can also be described to some extent using these maturity model stages, however, I am going to try and simplify this further, as I only use the maturity model perspective to better structure my comments and analysis, of how this might apply to the business use of MT technology to further international initiatives.

So while there are still many naysayers who will never see any value to professional business use of MT, there is a growing community of users that are working through the vagaries and complexity of MT with varying degrees of success. The Gartner Hype Cycle curve below provides a useful graphic to describe the stages from a very common expectations cycle, that so many, if not most users of this technology seem to go through. I will describe the various maturity stages in more detail in upcoming posts though my analysis may not be quite as tightly defined as the CSA analysis.

It may be useful at the outset, to list some of the most common pitfalls, which are the opposite of best practices, as the continual recurrence of these factors seem to plague many business use cases of MT technology. They are:
  • Looking for cheap, fast and “easy” approaches like Moses and Instant Do-it-yourself solutions.
  • Expecting to do better than the really decent generic engines provided by Microsoft and Google without investments in time and core skill building.
  • Looking to use MT as a wholesale replacement for human translation.
  • Using MT for one-off projects and/or for small volume requirements (LT 500,000 words). MT makes most sense for very large projects that involve millions of words, especially those that would not make sense to do using only human translators for time and cost reasons.
It is worth re-stating that MT technology in the right hands and right use cases can generate long-term sustainable competitive efficiencies, but this generally needs a strategic focus, and a long term commitment to the technology deployment to achieve this.

Emerging MT Trends

While the “translation industry” only focuses on the older SMT approaches most often based around Moses, we are seeing an increasing and building momentum around newer approaches to building MT engines. These involve techniques like deep learning, neural nets and artificial intelligence to improve on the results of the current approaches. These new approaches are apparently yielding much better results than ideas like morpho-syntactic approaches to SMT which have also had small imporvements.

The presentation here presents some of the new perspectives in AI and Neural MT versus the traditional SMT NLP views in a relatively understandable way.

I have also noted that Microsoft appears to have seriously stepped up their MT technology of late. They are attempting to solve the most difficult automated translation challenges in the world today, specifically:
  • Facebook comments (so I now understand what Clio Schils, Renato and my Russian Facebook friends are talking about)
  • Skype based multilingual voice conferences
  • Customization with limited sets of training data using AI and allowing four levels of customization.
Also of interest, a new kind of interactive MT solution that I think holds great promise, especially for individual translators, are adaptive MT solutions like Lilt that take real time corrective feedback and improve dynamically while also leveraging TM in multiple ways. They are also built on post-Moses technology, so provide much better foundation engine quality which means less post-editing. While only available for a limited set of languages, I think it is one of the few MT technology initiatives that actually generate enthusiasm from translators. For an individual translator, I think that using something like Lilt is a much better approach than trying to build your own Moses engine since you have real expertise building your foundations. This article provides a good overview of what adaptive MT looks like and tries to do. 

I found out that Prince died today and while I was never a real fan, I think his guitar solo here is quite exceptional, and should leave no doubt to his amazing musical ability. Theatrics aside, it is the notes and varied textures that he plays that make it so special, and wins the respect and admiration of the other band members.  It is worth a listen. May he rest in peace.