Pages

Pages

Wednesday, March 22, 2017

LSP Perspective: A View on Translation Technology

This is an unsolicited guest post that provides a view of translation technology that is typical of what is believed by many in the translation industry.  

These initial preamble comments in italics are mine.  

It provides an interesting contrast to the previous post (Ending the Globalization Smoke Screen) on the need for LSPs to ask more fundamental questions and climb up higher in the value chain and contribute higher value advice on globalization initiatives. This is a view that sees the primary business of LSPs, and thus the role of technology, as being the management and performance of human translation work as efficiently as possible. 

I think we have already begun to see that the most sophisticated LSPs now solve more complex and comprehensive translation problems for their largest customers, which often extends much beyond human translation work. In December 2016, the new SDL management reported that they translate 100 million words a month using traditional TEP human translation strategies,  but they also translate 20 billion words a month with MT.  The VW use case also shows that for large enterprises, MT will be the primary means to translate the bulk of the customer-facing content, in addition to being the dominant way to handle internal communications related translations. Clearly, much of the translation budget is still spent on human translation but it is also much clearer that MT needs to part of the overall solution. MT competence is valuable and considered strategic when choosing an agency, and by this, I don't mean running sub-standard Moses engines.  Rather, it is about working with agencies who understand multiple MT options, understand corpus data preparation and analysis and can steer multiple types of MT systems competently.

Aaron raised what I think are many very interesting questions for the "localization" industry. How do we as an industry add more value in the process of globalization, and he suggests quite effectively I think, that it has more do with things other than using basic automation tools to do low-value things more efficiently. The globalization budget is likely to be much higher than the translation budget and involve answering many questions before you get to translation.

It is also my sense that there is a bright future for translation companies that solve comprehensive translation problems (i.e. MT, HT, and combinations), help address globalization strategies, or perform very specialized, high-value, finesse-driven human translation work (sometimes called transcreation,  an unfortunate word that nobody in the real-world understands). The future for those that do not do any of these things I think will be less bright, as the freely available and pervasive automation technology that is available for business translation tasks will get easier to use and more efficient. The days when building a TMS to get competitive advantage made sense are long gone. Many excellent tools are already available for a minimal cost and it is foolish to think that your processes and procedures are so unique as to warrant your own custom tools. The value is not in the tools you use but how, when, and how skillfully you use them. Commoditization happens when the industry players are unable to clearly demonstrate their value add to a customer. This is when price becomes the prime determinant of who gets the business, as you are easily replaceable. This also means that you are likely to find that the wind is no longer in your sails and it is much harder to keep forward momentum. In the post below, the emphasis is not mine.




======================================= 
↓↓↓↓↓


What’s out there, and what’s to come?


Technology is improving all the time. Technological advances like Artificial Intelligence (AI), Virtual Reality (VR) and the advance of smartphones are rousing the public’s interest.

It’s the same for translation technology. The way we translate and interpret content is changing all the time. Reliable translation technology is making it easier, faster and more productive to do our jobs.
Take for instance machine translation (MT). We see this type of technology as more of an additional language service to enable more content to be translated – rather than as a substitute language service to replace human translation.

MT is often considered in circumstances where the volume of content requiring translation cannot realistically be approached as a human translation task, for reasons of cost or speed. In this setting automatic translation can be deployed as part of a wider workflow.

Technology like this, for example, may enable a company to translate millions of words of user-generated content which would otherwise be completely out of reach. MT would not, however, be advisable for public-facing content, such as press releases.

Machines that translate


The benefits of machine translation largely come down to two factors: it’s quicker and less expensive. The downside to this is the standard of translation can be anywhere from inaccurate, to perplexing – machines can’t translate context you see.

The disadvantages as noted above can be avoided if the machine translation is checked by a professional. The last thing you want is a call from a lawyer telling you you’ve mistranslated one of their clients’ quotes.




Machine translation consists of rules-based systems that generate a translation by combining a vocabulary of words with syntactical rules. Whereas with statistical MT, the engine is fed with large volumes of translations that are analyzed using pattern-matching and word-substitution to predict the translation which is statistically the most likely to be correct.

It can be argued that machine translations are more suited to internal use, if your documents are only being used within your company, complete accuracy may not be vital. Another example would be for very basic documents – the more simplistic your original documents are, the easier they will be for a machine to interpret.

You need to be certain there is ample precision in your machine translations to hurry up the process. Otherwise, it will only slow the progression down and you’ll attain very little by using it. Machine translation is a time-saving tool – if it doesn’t do that, then it’s not worth using, or at least no solely relying on. That’s not to say that machine translation isn’t vital in some cases. It certainly is, more on that later.

Human translation basically shifts the table in terms of pros and cons. A higher standard of accuracy comes at the price of longer turnaround times and higher costs. What you have to decide is whether that initial investment outweighs the potential cost of errors.

More creative or intricate content such as poems, slogans or taglines ask far too much of machine translation tools so it’ll always make sense to opt for a human translator. When accuracy is paramount, take, for example, legal translation, safety instructions, and healthcare, machines leave far too much room for error.

Another point to make when deciding if it’s appropriate to use human over machine translation is when there’s not sufficient accuracy for machines to work with. If the content is too chaotic then it may be easier for a human translator to work and edit the original text – machine-free.
The translation of content is attainable using machine translation don’t get me wrong, for example, when translating high-volume content that changes every hour of every day – humans just can’t keep us and it would cost far too much. But if you require full control of your communication then a human translator is the better option – granted that the task isn’t too large or would not be obtainable without using MT

In the machine versus human translation debate, the latter has the edge – for now anyway – because the translator can provide a more accurate translation of your message.

This aside, companies like TripAdvisor and Amazon rely on machine translation because their online content and the daily visitors to their websites are so vast. Machine translation gives them the chance to stay up-to-date and offers users multilingual content rapidly. Companies like these would find that solely relying on humans to translate their message a demanding if not impossible task.

Machine translations have their place in the world – it’s an important place for sure – and can deliver the basic meaning of a text when your company is in a bind. However, cannot live up to the quality of a human-powered translation, which is the service you should choose when you want an official communication of your company to be fully understood by its readers.

I want translation now


Moving swiftly on, advanced translation software which allows its users to centralize all their translation requirements, making it simple to tailor translation workflows is translation management systems (TMS). Though nothing new anymore, software like this is not only saving people time but the automated processes mean it saves money too.


Larger companies are opting for one easy-to-use TMS platform in order to have complete control over their translation workflows. The software gives users a 360-degree overview of every current and completed translation job submitted. A TMS platform also gives users real-time project status information.

Moreover, creative translation tools allow teams of graphic designers and creative agencies to use web browsers rather than costly Adobe packages to review INDD or IDML content. Translation technology like this means no extra license fees to pay when localising and reviewing content.
Reviewers who do not have InDesign installed on their systems can see a live preview, edit text so it is exactly how they want it, and save their changes so that the InDesign file is updated. Tools like this give users peace of mind in the knowledge that the InDesign document cannot be “broken”, and that no time is spent copying and pasting text, or trying to decipher reviewers’ comments, or having to repeatedly transfer files backward and forwards.

As technology improves so does the expectations of its consumers. People want their information fast. When it comes to translation, in an online sense specifically the content needs to remain up-to-date and be easy to find. TMS platforms can be integrated with websites, CMS, DMS and database applications – this makes them an essential part of translation services.

Big technology brands like Apple and Google offer translation services of their own. iTunes and Google Play allow you to download transcription apps, giving you your own personal translator in the palm of your hand.

They’re handy devices but useless if you need accurate voice recognition and more than a few sentences transcribed. Your main concern with any transcription app will always be accuracy. You want the device to understand every word you say and accurately type it out in text form. Well, unfortunately, this is where the technology continues to fall way short of the demand.

Technology has evolved. But we’ve been evolving too and we still have a few tricks up our sleeves. A mother-tongue translator is still the only sure-fire way to ensure the most natural reading target text. It’s arguably the best way to get the most relevant translation possible. Language services companies still receive far more human translation inquiries. More than 90% of job requests are for human translation.

What’s next?


In the realm of AI for instance, a lot. Take for instance AI web-design software and an increasing list of automated marketing tools hitting the scene. All hoping to make translation services a more streamlined process – one that’s faster, cheaper and demands less manpower to make things happen.
Personal assistant apps such as Apple’s Siri and Amazon’s Alexa are driving online business in a way never seen before. These AI-powered apps are also changing web localisation in a huge way. More than ever businesses need to be aware of third party sites like Google Maps, Wikipedia, and Yelp because apps such as Microsoft’s Cortana are pulling snippets of content from all around the web.

According to Google Translate’s FAQ section, “Even today’s most efficient software cannot master a language as well as a native speaker and have by no means the skill of a professional translator.”

If the technology available helps. If it saves us time, money and precious resources then surely it’s vital and something we should be taking advantage of. But at the same time, most of the translation technology available to us should be used wisely. Often than not it should be used as a tool to aid us, not necessarily something to be relied upon.

Don’t get me wrong, I’m not for a second lessening the importance of machines when it comes to translation services. All I will say is that human translators are more familiar with expressions, slang, and grammar of a modern language. Often human translators are native speakers of the target language which gives greater depth and more of an understanding of the tone of the required translation.

Human translators also boast translation degrees, some specialise in a specific area of expertise and their understanding in the field of the project expedites the translation. Although it’s safe to say, one type of language translation that still baffles the most educated of linguists is emojis.
The multifaceted landscape of interpreting symbols makes translating these icons tough for both machines and humans – fact.

These ideograms are actually making it to court cases where text messages are regularly being surrendered as evidence. So it’s paramount that context and the interpretation of each emoji is understood.

The meanings of each smartphone smiley are often unclear and sometimes puzzling. This leaves far too much for misunderstanding, in fact in 2016 professional translators from around the world attempting to decipher emojis and the results were miserable.

Technological advances in the translation industry are going to change the way businesses operate for the better. Even though some of it threatens to compete against us, we fully expect this to continue and start happening in a much wider range of industries.

Machine translation services will play a vital part in producing multilingual content on a large scale. Big brands that need content fast, in large quantities will opt for MT. In fact, more and more professional translators will need to adapt to working more closely with this technology as it advances over time.

For things like AI, VR and apps the future is prosperous. We’re entering a world in which we take technology for granted. What’s capable nowadays and what will be in the future will aid us, maybe even guide us one day. But for the time being – though essential – the translation technology at our disposal still fails to deliver what humans are fully capable of.




                                                          = = = = = = = = 



 





Tom Robinson, Digital Marketing and Communications Executive at translate plus, a Global Top 50 language services provider by revenue, offering a full range of services, including translation, website localisation, multilingual SEO, interpreting, desktop publishing, transcription and voiceover, in over 200 languages. All this is complemented by our cutting-edge language technology, such as i plus®, our secure cloud-based TMS (translation management system).

Monday, March 6, 2017

Ending the Globalization Smoke Screen: A New Direction for the Localization Industry

We often hear translation industry players, both on the vendor and buyer side, complain about inadequate budgets, increasing work volume, and commoditization in general. Technology, that damned MT, and the content explosion are often blamed, and many in the industry resign themselves to this inevitability of automation and generic product delivery. Commoditization happens when you deliver very low value, especially with a service offering. However, could it be that most of us in the industry approach the core industry mission of raising organization globalization readiness, with a tunnel vision that can only lead to commoditization? 

This guest post by Aaron Schliem raises fundamental questions about our larger mission and while it does not offer complete answers, I hope that it will trigger new kinds of dialogue, focusing on how we as an industry can increase our value-add and up our game. To go beyond being high volume translation word-mongers and be part of enabling true globalization services to be offered.

Interestingly, I just saw what I felt was a related post in terms of its core theme, that I thought was worth linking to. I include the following excerpt:

When we find a new way of solving a problem, we make a conceptual shift to clarity. In To Sell is Human Dan Pink says clarity is an important quality to help move others. He defines it as:
“The capacity to help others see their situations in fresh and more revealing ways and to identify problems they didn't know they had.”
Solving problems is still an important ability, with the added twist that the value is in identifying the true problem, asking better questions. Studies conducted by social scientists Jacob Getzels and Mihaly Csiksentmihalyi in the 1960s found that people who achieve breakthroughs in any field tend to be good at finding problems:
“It is in fact the discovery or creation of problems rather than any superior knowledge, technical skill, or craftsmanship that often sets the creative person apart from others in his field.”
Excerpt From:  The Value of Finding the Right Problem to Solve


Please join the discussion and contribute via the comments, or if you are so moved I would welcome additional posts that provide differing or extended discussion on the provocative points that Aaron makes. The emphasis below is all mine.


 ==========================

I admit it – I was wrong. More than 15 years in the localization industry only to find that I was fooling myself. Like many insiders, I was convinced that if we applied the right technology to the problem of translation, we would advance the cause of globalization, garner respect from senior execs in client organizations and win vital budget allocations along the way. After all, in business you succeed by giving the customer what they want, right? The customer has seen exponentially increasing volumes of content that need to be delivered in all the hot business languages of the worlds. So, logically, it was our duty to offer solutions that did just that – get words out to market faster and cheaper and in so doing demonstrate how we fastidiously stand by our customers.

And thus the investment in automation was born. We began with translation tools, eventually building out translation management systems to house suites of tools. Unicode, multi-script, RTL, and diacritic capabilities were built into virtually every application businesses use, from desktop publishing software to media engineering suites. We localizers are a clever bunch who had learned by observing other industries. To meet market demand while honoring expense constraints, we competed to see who could build the smartest assembly line. With integrated TMS-CMS conveyor belts fully functional we went even further, feeding the system with auto-generated linguistic output via MT and its various flavors. To staff our assembly lines we largely rid ourselves of the high-priced experienced translators, opting instead for “sufficiently qualified” on-demand labor pools, even resorting to crowd-sourced volunteers.

All of this was accomplished to great fanfare, I might add. How many presentations have you seen where, with flashing lights and hyperbole, a vendor impresses upon the bulk translation buyer how a particular suite of technology is the one that will solve the buyer’s globalization woes? How often do we pat ourselves on the back for the automation revolution at industry events, toasting the next round of venture capital funding won by the latest technology company masquerading as a globalization agency?

Now, don’t get me wrong. The development of keen business and technology models targeting the needs of the big-budget corporate buyers is to be lauded. There is a need that is being serviced effectively and a great deal of innovation has sprung from these investments. Indeed, I am as guilty as anyone. I too believed in this vision of the future. I built proprietary TMS technology, I integrated systems, I trained MT systems. Lacking the investment capital of the giants, my firm, like virtually every other small to mid-sized company in the industry, had to do its best to keep up by implementing an inadequate off-the-shelf technology. But we pressed ahead nonetheless and finally, armed with a decent conveyor belt, we too parroted the industry promise – that technology would solve globalization.

But as I said before, I was wrong. I was willing to believe that by making translation faster and cheaper I was meeting my clients’ needs. I, as most of us, was willing to ignore the bigger picture, to convince myself that this approach was what my clients needed to “go global.”

However, I believe the time of reckoning has come. The truth has caught up with us. We have come to the point where I believe we are misrepresenting the idea of “globalization.” Despite the bells and whistles, at the end of the day, the industry largely sells a commodity called “technologically-enabled translation”. We actively try to rebrand our work as “localization.” However, we mislead our clients by pretending that we are applying finesse to their global ventures, when really we are simply translating words for digital interfaces and then calling it “localization” because content flows from system to system, the characters are rendered on the screen, the address format is right, and we substituted “Mary” with “María” in the sample dataset.

We have reduced the idea of a “locale” down to a four-letter ISO code, unwilling to face the complexities of local culture and market conditions. To truly attempt to build globalization strategy is messy. It most certainly is not conducive to an investment focused too heavily on a language assembly line with replaceable parts. To add insult to injury, the replaceable parts are the cross-cultural communicators from whose ranks nearly all of us in the industry have spawned. While disheartening, it is understandable that we have chosen this route. Look at the world around us – the proliferation of content, the A-B-ization of human choices, the way we have turned people into algorithms.

Let’s be honest. Long gone are the days when rendering non-Latin scripts on a screen was indeed a massive barrier to conducting international business. The global integrated market is already here and we are still conducting business as if we were unaware of the broader complexities involved in supporting our clients.

All of this said I would propose we take a closer look at ourselves and attempt to redefine our role in the world in a way that honors our industry and the lives of the people who continue to build it. When you talk to people in localization what you find are individuals who are overwhelmingly open – open to learning, to listening, to experiencing, to connecting. It is we, and not our technologies, who have built the real bridges that connect people around the world. For many of us, this begins with our personal journeys. We travel and live abroad. We fall in love with people from different cultures. We bear children who embody our global citizenship. We nourish our curiosity and need to connect by delving into the idiosyncrasies, histories, and ways of living that constitute cultures. It is in these ways that the world becomes more connected and more understanding. And it will be through connection and understanding that the global marketplace will thrive, not through the bombardment of people with multilingual content. I find it ironic that we who are capable of seeing the beautiful complexity of culture are precisely the ones who are seeking to iron out the unique contours of the world to make it more amenable to commoditization. I’m sure that many will say, “Business is business! What do you want from us? It’s not our job to make the world a better place.” I for one cannot look at the current global landscape and accept this sort of minimalism.

We collectively undervalue our contribution to the world by simply competing to see who can develop the best assembly line. And we do our clients a disservice by focusing on a single globalization tactic rather than enabling a holistic global to business planning and execution. Furthermore, we dishonor the people of the world by treating them like a language with a particular amount of web traffic or an attractive level of disposable income. The conversation around global business success too rarely rises beyond the mechanics of automation. We pay lip service to broader globalization ideals but at the end of the day, most companies simply try to feed their machine to make the biggest margins they can.

I am calling for a revolution, a shift in thinking toward a focus on real people and on the real business concerns of our clients. To truly support globalization with integrity, we cannot simply focus our energy on driving customers toward translation spending. And let me be very clear in stating that this is not simply an industry vendor problem. Some corporate buyers are complicit in this fantasy, seeking often to simply make their supply chains less expensive while retaining sufficient translation quality. Corporate buyers often shy away from advocating for a robust and multi-faceted approach to globalization. People are afraid to ruffle feathers, to lose budget, to misuse political capital. All of this comes at the expense of their own company’s interests. It is incumbent on those of us who know the truth to speak up and ensure that everyone, from senior executives to translation project managers, begins shifting their understanding of globalization away from tactics and toward global readiness and global engagement. This will mean that money will move away from translation and toward other more valuable endeavors, but providers with integrity will not shy away from this evolution and growth. And smart buyers will welcome truth and real global strategies.

But how do we do this? Where do we go from here? First off, we take translation off the table during an initial conversation. We ought to be working with our clients to identify and engage with the key stakeholders across the various disciplines that drive corporate development. This includes, critically, the thought leadership and executive strategists at the highest levels. In speaking with these stakeholders we should perform a holistic analysis of the company’s global readiness, including but not limited to the following:

Brand: It is foolhardy to imagine that one can build a global brand simply by translating copy and working out distribution relationships. And, please, let’s not make the naïve assumption that a local reseller or distributor is somehow going to deliver brand street cred. That’s not their job. It is the responsibility of headquarters to create a thoughtful plan that will ensure the integrity of the core brand while allowing local partners sufficient flexibility to adapt it to their needs. Successful cross-market branding takes a nuanced understanding of the many dimensions of culture that affect perception. By setting aside our native cultural bias, we open ourselves to seeing the brand with different eyes, reimagining it in a new context. This ability to shift perspective positions us to ensure that we are both maximizing opportunity while also mitigating the risk of alienating or offending people in a new market. Below is a tool that I’ve developed to help guide global brand development conversations.

 
Figure 1: Multi-dimensional cultural analysis is key to developing a powerful global brand


Identity: To truly find long-term success in the global market, an organization itself must be ready to walk the global walk. This means identifying, celebrating, and leveraging the skills of those in your organization who understand language and culture. It means encouraging all divisions and teams to explore the ways they have a stake in globalization. Every global organization should have a clear directive and buy-in from senior executives regarding the need for global readiness to imbue all work performed in the organization. Missions and organizational pillar should reflect a global focus and all employees should be trained on an ongoing basis on these ideals.

Human Resources: A great global organization needs to understand that the way employees interact with their employer is not the same in each culture. Expectations relating to management, accountability, work environment and communication are not universal. How people learn varies based on the educational philosophies that happen to be predominant in a place and time. Yes, people may speak some English, but we have begun to make blanket assumptions about language skills based on level of seniority or industry sector (e.g. everyone who works in tech, anywhere in the world, speaks English; all senior executives speak English – we require it at OUR company).

  
 Figure 2: ESL skills are not ubiquitous, even among senior executives.
EF English Proficiency Index. Source: EF


Customer Support: It is not enough to localize your software or create multilingual packaging. Real people expect real support that is accessible based on their local language, cultural norms, and technology infrastructure. Often organizations will exert great effort and apply meticulous controls to the localization of a product, but then assign customer support to a green (inexperienced) community manager who has not been trained on the organization's global corporate culture and has not been provided with basic tools to ensure a consistent voice in each market. Something as simple as a bilingual glossary is not regularly shared across global organizations.

Quality: In the world of language and cultural adaptation, everyone is a critic. Industry professionals and corporate stakeholders need to do more to build an awareness that language and communication is not an objective science. Too often any dialogue or disagreement around the right word or the right tone devolves into defensive posturing and questioning the integrity of those producing translations. Such foolhardy and combative stances must stop. If we cannot engage in a productive conversation about language and culture among those of us to apply such knowledge to the business of globalization, how likely will the end result of our work resonate with the recipients around the world?

Marketing: More times than not businesses focus on globalizing their product but ignore the need for a nuanced approach to marketing and advertising. There is a prevailing notion that “if you translate it, they will come.” This is especially true when the budget is scarce and when just getting the budget to localize the product itself is an uphill battle. But this is often where the rubber meets the road. Localization vendors are too often willing to simply remain silent, taking the client’s money to localize the product, knowing all along that ultimately the client may not achieve its goals because it has ignored culturally-infused marketing.

It is not my objective here to provide a prescriptive set of guidelines for holistic globalization analysis. Others will know more than I. However, I do seek to open a more honest dialogue regarding the role our industry should and could be playing, not only in enhancing corporate global success but also in promoting a more deliberate sort of integrity in the way we value our skills and the people who have built and continue to drive this industry.

I have great respect and admiration for our industry, including the numerous brilliant minds who have built shiny, smart conveyor belts and for the sharp business buyers who manage complex content and localization workflows. But I would posit that there is more at stake here, namely the shared responsibility to call things what they are and to recognize the greater complexity that is required for successful globalization. We might be tempted to point at client success stories to justify our current silence, but who is to say that despite global results being acceptable, they could not be extraordinary if a broader and more culturally oriented approach were employed.


-----------------------------------



Aaron Schliem: CEO, Glyph Language Services

As founder and CEO, Aaron has guided the strategic development of Glyph Language Services for nearly 15 years, positioning the firm as a visionary leader in global communications, cross-cultural learning strategy, and adaptation of creative media. A regular speaker at international conferences and workshops Aaron has delivered innovative training in fields that range from executive compensation to mobile apps and games. Aaron has also published articles and interviews for Multilingual Magazine, The Content Wrangler and The Savvy Client's Guide to Translation Agencies. Senior executives in a wide range of industries rely on Aaron’s creative consulting to develop holistic approaches to globalization, focusing on global HR management, product design & adaptation, brand globalization, content pipelines and culturally-infused marketing. Aaron also has more than a decade of experience helping Fortune 50 companies build smart geopolitical and cultural sensitivity policies and programs. Outside of Glyph, Aaron is active in his local Madison community, serving on the board of a local queer theater company and on the school district’s special education parent advisory committee.

Thursday, March 2, 2017

Lilt Labs Response to my Critique of their MT Evaluation Study

I had a chat with Spence Green earlier this week to discuss the critique I wrote of their comparative MT evaluation, where I might have been a tad harsh, but anyway I think we were both able to see each other's viewpoints a little bit better, and I summarize the conversation below.  This is followed by an MT Evaluation Addendum that Lilt has added to the published study to provide further detail on the specific procedures they followed in their comparative evaluation tests. These details should be helpful to those who want to replicate or modify and replicate, the test for themselves.

While I do largely stand by what I said, I think it is fair to allow Lilt to respond to the criticism to the degree that they wish to.  I think some of my characterization may have been overly harsh (like the sheep-wolf image for example). My preference would have been that Lilt wrote this response directly rather than having me summarize the conversation, but I hope that I have captured the gist of our chat (which was mostly amicable) accurately and fairly.

The driving force behind the study were ongoing Lilt customer requests wanting to know how the various MT options compared. Spence ( Lilt) said that he attempted to model their evaluation along the lines of the NIST "unrestricted track" evaluations and he stated repeatedly that they tried to be as transparent and open as possible so that others could replicate the tests for themselves. I did point out that one big difference here is that unlike NIST, we have one company here comparing themselves to their competitors also happens to be managing the evaluation. Clearly a conflict of interest, but mild compared to what is going on in Washington DC now. Thus, however well-intentioned and transparent the effort may be, the chances of protest are always going to be high with such an initiative. 

Spence did express his frustration with how little understanding there is of (big) data in the localization industry which does make these kinds of assessments and any discussion on core data issues problematic.

Some of the specific clarifications he provided are listed below:
  • SwissAdmin was chosen as it was the "least bad" data that we could have used to enable us to conduct a test with some level of adaptation that everybody could replicate. Private datasets were not viable because the owners did not want to share the data with the larger community to enable test replication. We did argue over whether this data was really representative of localization content, but given the volumes of data needed and need to have it easily available to all, there was not a better data resource available. To repeat, SwissAdmin was the least bad data available.  Spence pointed out that:
  1. Observe that the LREC14 paper has zero citations according to Google Scholar
  2. Compare adaptation gains to three different genres in their EMNLP15 paper.
  • It is clear that Google NMT is a "really good" system and sets a new bar for all the MT vendors to measure against, but Spence felt that it was not accurate to say that Google is news-focused as it has a much broader data foundation, from the extensive web crawling data acquisition that supports the engine. He also challenged my conclusion that since GNMT was so good it was not worth the effort with other systems. It is clear that an adaptation/ customization effort with only 18,000 segments is unlikely to outperform Google and we both agreed that most production MT systems in use will have much more data to support adaptation. (I will also mention that the odds of do-it-yourself Moses systems being able to compete on quality now are even less likely and that Moses practitioners should assess if DIY is worth the time and resources at all if they have not already realized this. Useful MT systems will almost by definition need an expert foundation and expert steering.)
  • He also pointed out that there are well-documented machine learning algorithms that can assess if the MT systems have certain data in their training sets and that these were used to determine that the SwissAdmin data was suitable for the test.
  • While they were aware of the potential for bias in the evaluation study they made every effort to be as open as possible about the evaluation protocol and process.Others can replicate the test if they choose to.
  • Lilt provides an explanation of the difficulties associated with running the SDL Interactive system in a footnote in the Evaluation Addendum attached below.
  • We also agreed that 18,000 segments (used for adaptation/ customization here) may not be quite enough to properly customize an MT engine and that in most successful MT engines a much larger volume of data is used to produce clear superiority over GNMT and other continuously evolving public MT engines. This again points to the difficulty of doing such a test with "public" data, the odds of finding the right data in sufficient volume, that everyone can use, are generally not very good.
  • Microsoft NMT was not initially included in the test because it was not clear how access was gained and he pointed me to the developer documentation to show how the Google docs were much clearer and easier to determine how to access the NMT systems. This lack of documentation may have been addressed since the test was run.
  • One of my independent observations on who really "invented" Adaptive MT also seemed troublesome to Spence. I chose to focus on Microsoft and SDL patents as proof that others were thinking about this and had very developed ideas on how to implement this long before Lilt came into existence. However, he pointed quite correctly and much more accurately that there were others who were discussing this approach from as early as the 1950's and that Martin Kay and Alan Melby, in particular, were discussing this in the 1970's. He pointed out a paper that details this and provides historical context on the foundational thinking behind Adaptive and Interactive MT. This to me does suggest that any patent in this area is largely built on the shoulders of these pioneers. Another paper by Martin Kay from 1980 provides the basic Adaptive MT concept on page 18. Also, he made me aware of Transtype: (the first statistical interactive/ adaptive system, a project that began in 1997 in Canada). For those interested you can get details on the project here:

Finally, all other vendors are welcome to reproduce, submit, and post results. We even welcome an independent third-party taking over this evaluation. 
 Spence Green


It may be that some type of comparative evaluation will become more important for the business translation industry as users weigh different MT technology options, and possibly could provide some insight on relative strengths and weaknesses. However, the NIST evaluation model is very difficult to implement in the business translation (localization) use case, and I am not sure if it even makes sense here. There may be an opportunity for a more independent body that has some MT expertise to provide a view on comparative options, but we should understand that MT systems can also be tweaked and adjusted to meet specific production goals and that the entire MT system development process is dynamic and evolving in best practice situations. Source data can and should be modified and analyzed to get better results, systems should be boosted in weak areas after initial tests and continuously improved with active post-editor involvement to build long-term production advantage, rather than just doing this type of instant snapshot comparison. What might matter much more in a localization setting is how quickly and easily a basic MT system can be updated and enhanced to be useful in business translation use-case production scenarios. This kind of a quick snapshot view has a very low value in that kind of a user scenario where it is understood that any MT system needs more work than just throwing some TM at it BEFORE putting it into production.



-------------------------



Experimental Design
We evaluate all machine translation systems for English-French and English-German. We report case-insensitive BLEU-4 [2], which is computed by the mteval scoring script from the Stanford University open source toolkit Phrasal (https://github.com/stanfordnlp/phrasal). NIST tokenization was applied to both the system outputs and the reference translations.

We simulate the scenario where the translator translates the evaluation data sequentially from the beginning to the end. We assume that she makes full use of the resources the corresponding solutions have to offer by leveraging the translation memory as adaptation data and by incremental adaptation, where the translation system learns from every confirmed segment.

System outputs and scripts to automatically download and split the test data are available at: https://github.com/lilt/labs.

System Training
Production API keys and systems are used in all experiments. Since commercial systems are improved from time to time, we record the date on which the system outputs were generated.

Lilt
The Lilt baseline system available through the REST API with a production API key. The system can be reproduced with the following series of API calls:
  • POST /mem/create   (create new empty Memory)
  • For each source segment in the test set:
    • GET /tr  (translate test segment)
Date: 2016-12-28

Lilt adapted
The Lilt adaptive system available through the REST API with a production API key. The system simulates a scenario in which an extant corpus of source/target data is added for training prior to translating the test set. The system can be reproduced with the following series of API calls:
  • POST /mem/create   (create new empty Memory)
  • For each source/target pair in the TM data:
    • POST /mem  (update Memory with source/target pair)
  • For each source segment in the test set:
    • GET /tr  (translate test segment)
Date: 2017-01-06

Lilt Interactive
The Lilt interactive, adaptive system available through the REST API with a production API key. The system simulates a scenario in which an extant corpus of source/target data is added for training prior to translating the test set. To simulate feedback from a human translator, each reference translation for each source sentence in the test set is added to the Memory after decoding. The system can be reproduced with the following series of API calls:
  • POST /mem/create   (create new empty Memory)
  • For each source/target pair in the TM data:
    • POST /mem  (update Memory with source/target pair)
  • For each source segment in the test set:
    • GET /tr  (translate test segment)
    • POST /mem (update Memory with source/target pair)
Date: 2017-01-04

Google
Google’s statistical phrase-based machine translation system. The system can be reproduced by querying the Translate API:
  • For each source segment in the test set:
    • GET https://translation.googleapis.com/language/translate/v2?model=base
Date: 2016-12-28

Google neural
Google’s neural machine translation system (GNMT). The system can be reproduced by querying the Premium API:
  • For each source segment in the test set:
    • GET https://translation.googleapis.com/language/translate/v2?model=nmt
Date: 2016-12-28

Microsoft
Microsoft’s baseline statistical machine translation system. The system can be reproduced by querying the Text Translation API:
  • For each source segment in the test set:
    • GET /Translate
Date: 2016-12-28

Microsoft adapted
Microsoft’s statistical machine translation system. The system simulates a scenario in which an extant corpus of source/target data is added for training prior to translating the test set.  We first create a new general category project on Microsoft Translator Hub, then a new system within that project and upload the translation memory as training data. We do not provide any tuning or test data so that they are selected automatically. We let the training process complete and then deploy the system (e.g., with category id CATEGORY_ID). We then decode the test set by querying the Text Translation API, passing the specifier of the deployed system as category id:
  • For each source segment in the test set:
    • GET /Translate?category=CATEGORY_ID
Date: 2016-12-30 (after the migration of Microsoft Translator to the Azure portal)

Microsoft neural
  • For each source segment in the test set:
    • GET /Translate?category=generalnn
Date: 2017-02-20

Systran neural
Systran’s “Pure Neural” neural machine translation system. The system can be reproduced through the demo website. We manually copy-and-pasted the source into the website in batches of no more than 2000 characters. We verified that line breaks were respected and that batching had no impact on the translation result. This comprised considerable manual effort and was performed over the course of several days.
Date(s): en-de: 2016-12-29 - 2016-12-30; en-fr: 2016-12-30 - 2017-01-02

SDL
SDL’s Language Cloud machine translation system. The system can be reproduced through a pre-translation batch task in Trados Studio 2017.
Date: 2017-01-03

SDL adapted
SDL’s “AdaptiveMT” machine translation system, which is accessed through Trados Studio 2017. The system can be reproduced by first creating a new AdaptiveMT engine specific to a new project and pre-translate the test set. The new project is initialized with the TM data. We assume that the local TM data is propagated to the AdaptiveMT engine for online retraining. The pre-translation batch task is used to generate translations for all non-exact matches. Adaptation is performed on the TM content. In the adaptation-based experiments, we did not confirm each segment with a reference translation due to the amount of manual work that would have been needed in Trados Studio 2017. (1)
(1) We were unable to produce an SDL interactive system comparable to Lilt interactive. We first tried confirming reference translations in Trados Studio. However, we found that that model updates often requires a minute or more of processing. Suppose that pasting the reference into the UI requires 15 seconds, and the model update requires 60 seconds. For en-de, 1299 * 75 / 3600 = 27.1 hours would have been required to translate the test set. We then attempted to write interface macros to automate the translation and confirmation of segments in the UI, but the variability of the model updates, and other UI factors such as scrolling prevented successful automation of the process. The absence of a translation API prevented crowd completion of the task with Amazon Mechanical Turk.

The Lilt adapted, Microsoft adapted and SDL adapted systems are most comparable as they were adapted in batch mode,  namely by uploading all TM data, allowing training to complete, and then decoding the test set. Of course, other essential yet non user-modifiable factors such as the baseline corpora, optimization procedures, and optimization criteria can and probably do differ.

Test Corpora
We defined four requirements for the test corpus:
  1. It is representative of typical paid translation work
  2. It is not used in the training data for any of the competing translation systems
  3. The reference translations were not produced by post-editing from one of the competing machine translation solutions
  4. It is large enough to permit model adaptation

Since all systems in the evaluation are commercial production systems, we could neither enforce a common data condition nor ensure the exclusion of test data from the baseline corpora as in requirement (2). Nevertheless, in practice it is relatively easy to detect the inclusion of test data in a system’s training corpus via the following procedure:
  1. Select a candidate test dataset
  2. Decode test set with all unadapted systems and score with BLEU
  3. Identify systems that deviate significantly from the mean (in our case, by two standard deviations)
  4. If a system exists in (3):
    1. Sample a subset of sentences and compare the MT output to the references.
    2. If reference translations are present,
      1. Eliminate the candidate test dataset and go to (1)
  5. Accept the candidate test dataset

Starting in November 2016, we applied this procedure to the eight public datasets described in Appendix B. The ninth corpus that we evaluated was SwissAdmin, which both satisfied our requirements and passed our data selection procedure.  

SwissAdmin [http://www.latl.unige.ch/swissadmin/] is a multilingual collection of press releases from the Swiss government from 1997-2013. We used the most recent press releases. We split the data chronologically, reserving the last 1300 segments of the 2013 articles as English-German test data, and the last 1320 segments as English-French test set. Chronological splits are standard in MT research to account for changes in language use over time. The test sets were additionally filtered to remove a single segment that contained more than 200 tokens. The remainder of articles from 2011 to 2013 were reserved as in-domain data for system adaptation.

SwissAdminen-de
en-fr

TM
test
TM
test
#segments
18,621
1,299
18,163
1,319
#words
548,435 / 482,692
39,196 / 34,797
543,815 / 600,585
40,139 / 44,874


Results

(Updated with Microsoft neural MT)

SwissAdmin
English->German
English->French
Lilt
23.2
30.4
Lilt adapted
27.7
33.0
Lilt interactive
28.2
33.1
Google
23.7
31.9
Google neural
28.6
33.2
Microsoft
24.8
29.0
Microsoft adapted
27.6
29.8
Microsoft neural
23.8
30.7
Systran neural
24.2
31.0
SDL
22.6
30.4
SDL adapted
23.8
30.4

-->-->-->
Appendix B: Candidate Datasets
The following datasets were evaluated and rejected according to the procedure specified in the Test Corpora section:
  • JRC-Acquis
  • PANACEA English-French
  • IULA Spanish-English Technical Corpus
  • MuchMore Springer Bilingual Corpus
  • WMT Biomedical task
  • Autodesk Post-editing Corpus
  • PatTR
  • Travel domain data (from booking.com and elsewhere) crawled by Lilt