Thursday, September 22, 2016

Comparing Neural MT, SMT and RBMT – The SYSTRAN Perspective

This is the second part of an interview with Jean Senellart (JAS) , Global CTO and SYSTRAN SAS Director General. The first part can be found here: A Deep Dive into SYSTRAN’s Neural Machine Translation (NMT) Technology .

Those translators who accuse MT vendors of stealing or undermining their jobs should take note that SYSTRAN is the largest independent MT vendor. A position it has held for most of its existence, but has never generated more than €20 million in annual revenue. Which to me suggests that MT is mostly being used for different kinds of  translation tasks and is hardly taking any jobs away. The problem has been much more related to unscrupulous or incompetent LSPs who used MT improperly in rate negotiations. MT is hugely successful for those companies who find other ways to monetize the technology, as I pointed out in  The Larger Context Translation Market.This does suggest that MT has huge value in enabling global communication and commerce, and even in its less than perfect state is considered valuable by many who might otherwise need to acquire human translation services. If anything, MT vendors are the ones that are trying hardest to develop technology that is actually useful to professional translators as the Lilt offering shows and as this new SYSTRAN offering also promises to be. The new MT technology is reaching a point where it is becoming rapidly responsive to corrective feedback and thus much more useful to professional translation use case scenarios. The forces affecting translator jobs and work quality are much more complex and harder to pin down as I have also mentioned in the past.

In my conversations with Jean, I realized that he is one of the few people around in the "MT industry", who has deep knowledge and production use experience with all three MT paradigms. Thus, I tried to get him to share both his practical experience-based and philosophical perspectives about the three approaches. I found his comments fascinating and thought that it would be worth highlighting them separately in this post. Jean (through SYSTRAN) is unique in being one of the rare practitioners around, who has produced commercial release versions, of say French <> English MT systems, using all three MT paradigms. More if you count the SPE and NPE “hybrid” variants where more than one approach is used in a single production process.

Again in this post, I have tried as much as possible to keep Jean Senellart’s direct quotes in here to avoid any misinterpretation. My comments are in italics when they do occur within his quotes.


Comparing The Three Approaches; The Practical View

Some interesting comparative comments made by Jean about his actual implementation experience with the three methodologies (RBMT, SMT, NMT):

“The NMT approach is extremely smart to learn the language structure but is not as good at memorizing long lists of terminology as RBMT or SMT was. With RBMT, the terminology coverage was right, but the structure was clumsy – with SMT or SPE, we had an “in-between” situation where we got the illusion of fluency, but sometimes at the price of a complete mistranslation, and a strange ability to memorize huge lists but without any consideration of their linguistic nature. With NMT, the language structure is excellent, as if the neural network really deeply understands the grammar of the language – and introducing [greater] support of terminology was the missing link to the previous technologies.”

(This inability of NMT to handle large vocabulary lists is considered one of the main weakness of the technology currently. Here is another reference discussing this issue. However, it appears that SYSTRAN has developed some kind of a solution to address this issue.)

What is interesting with NMT is it seems far more tolerant than PBSMT (Phrase-Based SMT) to noisy data. However, it does need far less data than PBSMT to learn – so we can afford to provide only data for which we know we have a good alignment quality. Regarding the domain or the quality [of MT output of] the languages, we are for the moment trying to be as broad as possible [
rather than focusing on specialized domains].”

In terms of training data volume, JAS said: “This is still very empirical, but we can outperform Google or Microsoft MT on their best language pairs using only 5M translation units – and we have a very good quality (BLEU score about 45** ) for languages like EN>FR with only 1M TU. I would say we need 1/5 of the data necessary to train SMT. Also, generic translation engines like Google or Bing Translate are using billions of words for their language models, here we need probably less than 1/100th.”

(**I think it bears saying that I fully expect that the BLEU here is measured with great care and competence, unlike what we see so often with Moses practitioners and LSPs in general who assume scores of 75+ are needed for the technology to be usable.

The ability of the new MT technology to improve rapidly with small amounts of good quality training data and small amounts of corrective feedback suggests that we may be approaching new thresholds in the use of MT for professional use.)

Comparing RbMT, SMT, and NMT: The Philosophical View

When I probed more deeply into the differences between these MT approaches, (since really SYSTRAN is the only company who has real experience in all 3), JAS said: “I would more compare them in terms of what they are trying to do, and on their ability to learn.” His explanation reflects his long-term experience and expertise and is worth careful reading. I have left the response in his own words as much as possible.

RBMT, fundamentally (unlike the other two), has an ulterior motive: it attempts to describe the actual translation process. And by doing that, it has been trying to solve a far more complicated challenge than just machine translation; it tries to decompose [and deconstruct and analyze] and explain how a translation is produced. I still believe that this goal is the ultimate goal of any [automated translation] system. For many applications, in particular, language learning, but also for post-editing, the [automated ] system would be far more valuable if it could produce not only the translation but also explain the translation.”

“In addition, RBMT systems are facing three main limitations in language [modeling] which are:

1) the intrinsic ambiguity of language for a machine, which is not the same for a human who has access to meaning [and a sense for the underlying semantics],

2) the exception-based grammar system, and

3) the huge, contextual and always expanding volume of terminology units.”

“Technically, RBMT systems might have different levels of complexity depending on the linguistic formalism being used, making it hard to compare with the others (SMT, NMT), so I would rather say that one of the main reasons for the limitations of a pure RBMT system lies in its [higher reaching] goal. The fact is that fully describing a language is a very complicated matter, and there is no language today for which we can claim a full linguistic description.”

“SMT came with an extraordinary ability to memorize from its exposure to existing translations – and with this ability, it brought a partial solution to the first challenge mentioned above, and a very good solution to the third one – the handling of terminology – however, it mostly ignores the modeling of the grammar. Technically, I think SMT is the most difficult of the 3 approaches, it combines many algorithms to optimize the MT process, and it is hard work to deal with the huge database of [relevant training ] corpus.”

“NMT has access to the meaning and is dealing well with modeling of human language grammar. In terms of difficulty, NMT engines are probably the simplest to implement, a full training of an NMT engine involve only several thousands of lines of code. The simplicity of implementation is, however, hiding the fact that we/nobody knows why it is so effective.”

“I would use an analogy of a human learning to drive a car to explain this more fully:

- The Rule-based approach will attempt to provide a full modeling of the car dynamic, on how the engine is connected to the wheel, on the effect of acceleration in the trajectory, etc. This is very complicated. (And possibly impossible to model in totality.)

- The Statistical approach, will use data from past experience and will try to compare a new situation with a past situation and will decide on the action based on this large database [of known experience]. This is a huge task and very difficult to implement. (And can only be as good as the database it learns from.)

- The Neural approach, with a limited access to the phenomenon involved, or with limited ability to remember, will build its own “thinking” system to optimize the driving experience, it will actually learn to drive the car, build reflexes – but will not be able to explain why and how such decisions are being made, and will not be able to leverage local knowledge – for instance that at a specific bend on the road in very specific weather condition, it had to anticipate [braking] because it is particularly dangerous, etc... This approach is surprisingly very simple and thanks to computation power evolution have become more accessible.”

“Today, this last approach (NMT) is clearly the most promising but will need to integrate the second (SMT) to be more robust, and eventually to deal with the first one (RBMT) to be able to not only make choices but also explain them.”


Comparing NMT to Adaptive MT

When probed about this, JAS said: “Adaptive MT is an innovative concept based on the current SMT paradigm – it is, however, a concept that is quite naturally embedded in the NMT paradigm; of course, there will be work needed to be done to make it work as nicely as the Lilt product does it. But, my point is that NMT (and not just from Systran) will bring a far more intuitive solution to this issue of continuous adaptive learning, because it is built for that: on a trained model, we can tune the model without any tricks with feedback of one single sentence – and produce a translation which immediately adapts to user input.”

The latest generation MT technology, especially NMT and Adaptive MT look like a major step forward to enabling the expanding use of MT in professional translation settings. With continuing exploration and discovery in the fields of NLP, artificial intelligence and machine intelligence, I think we may be in for some exciting times ahead as these discoveries benefit MT research. Hopefully, the focus will shift to making new content multilingual and solve new kinds of translation challenges, especially in speech and video. I believe that we will see more of the kinds of linguistic steering activities we are seeing in motion at eBay and that there will always be a role for competent linguists and translators.

Jean Senellart, CEO, SYSTRAN SA

The first part of the SYSTRAN interview can be found at A Deep Dive into SYSTRAN’s Neural Machine Translation (NMT) Technology

Wednesday, September 21, 2016

A Deep Dive into SYSTRAN’s Neural Machine Translation (NMT) Technology

One of the wonderful things about my current independent status is the ability to engage deeply with other MT experts who were previously off limits, because competing MT vendors don't usually chat with open hearts and open cloaks. MT is tough to do well and I think the heavy lifting should be left to people who are committed for the long run, and who are willing to play, invest and experiment in spite of regular failure. This is how humans who endure and persist, learn and solve complex problems.

This is Part 1 of a two part post on the SYSTRAN NMT product announcement. The second part will focus on comparing NMT with RBMT and SMT and also with the latest Adaptive MT initiatives. It can be found here: Comparing Neural MT, SMT and RBMT – The SYSTRAN Perspective

Press releases are so filled with marketing-speak as to be completely useless to most of us. They have a lot of words but after you read them you realize you really don't know much more than you got from the headline. So, I recently had a conversation with Jean Senellart , Global CTO and SYSTRAN SAS Director General, to find out more about their new NMT technology. He was very forthcoming, and responded to all my questions with useful details, anecdotes and enthusiasm. The conversation only reinforced in my mind that "real MT system development" is something best left to experts, and not something that even large LSPs should dabble with. The reality and complexity of NMT development pushes the limits of MT even further away from the DIY mirage.

In the text below, I have put quotes around everything that I have gotten directly from SYSTRAN material or from Jean Senellart (JAS) to make it clear that I am not interpreting. I have done some minor editing to facilitate readability and "English flow" and added comments in italics within his quotes where this is done.


The New Product Line

JAS clarified several points about the overall evolution of the SYSTRAN product line.
SYSTRAN intends to keep all the existing MT system configurations they have in addition to the new NMT options. So they will have all of the following options:
  • RBMT :- the rule-based legacy technology
  • SMT :- Moses-based generation of engines that they have released for some language pairs over the last few years
  • SPE :- Statistical Post-Editing translation engines - that were introduced in 2007 as the first implementation combining Rule-Based plus Phrase-Based Statistical systems.
  • NMT :- is the purely neural machine translation engines that they just announced.
  • NPE :- stands for « Neural Post-Editing » and it is the replication of what they did in SPE using Phrase-Based machine translation, but now using Neural Machine Translation instead of SMT for the second step in the process. They are now using a neural network to correct and improve output of a rule-based engine.

They will preserve exactly the same set of APIs and features (like support of a user dictionary) around these new NMT modules so that these historical linguistic investments are fully interchangeable across the product line.

JAS said: "From my intuition, there will still be situations where we will prefer to continue to offer the older solutions: for instance, when we will need high-throughput on a standard CPU server, or for low-resource languages for which we already have some RBMT solution, or for customers currently using heavily customized engines." However, they expect that NMT will proliferate even in the small memory footprint environment, and even though they expect that NMT will eventually prevail, they will keep the other options available for their existing customer base.

The NMT initiative focused on languages that were most important to their customers, or were known to be difficult historically, or were currently presenting special challenges not easily solved with the legacy solutions. So as expected the initial focus was on EN<>FR, EN<>AR, EN<>ZH, EN<>KO, FR<>KO. All of these already show promise, especially the KO <> EN, FR combinations which showed the most dramatic improvements, and can be expected to improve further as the technology matures.

However, DE<>EN is one of the most challenging language pairs, as Jean said: "we have found the way to deal with the morphology, but the compounding is still problematic. Results are not bad though, but we don't have the same jump of quality yet for this language pair."


The Best Results

So where have they seen the most promising results? As Jean said: "The most impressive results I have seen are in complicated language pairs like English-Korean, however, even for Arabic-English, or French-English the difference of quality between our legacy engines, online engines and this new generation is impressive.

What I found the most spectacular is that the translation is naturally fluent at the full sentence level - while we have been (historically used) to some feeling of local fluency but not sounding fully right at the sentence level. Also, there are some cases, where the translation is going quite away from the source structure - and we can see some real "rewriting" going on."

Here are some examples comparing KO>EN sentences with NMT, SYSTRAN V8 (the current generation) and Google:

And here are some examples of where the NMT seems to make linguistically informed decisions and changes the sentence structure away from the source to produce a better translation.


The Initial Release

When the NMT technology is released in October, SYSTRAN expects to release about 40 language pairs (mostly European and Major Asian languages related to English and French) with an additional 10 still in development to be released shortly after.

As JAS stated: "We will be delivering high quality generic NMT engines that will be instantly ready for "specialization" (I am making a difference with customization (which implies training),
because the nature of the adaptation to the customer domain is very different with NMT)."

Also very important for the existing customer base is that all the old dictionaries developed over many years for RBMT / SMT systems will be useful for NMT systems. As Jean confirmed: "Yes - all of our existing resources are being used in the training of the NMT engines. It is worth noting that, dictionaries are not the only components from our the legacy modules we are re-using, the morphological analysis or named entity recognition are also key parts of our models."

With regard to the User Interface for the new NMT products, JAS confirmed: "the first generation will fully integrate in the current translation infrastructure we have - we had to replace of course the back-end engines, but also some intermediate middle components. However the GUI is preserved. We have starting thinking about the next generation of UI which will fully leverage the new features of this technology, and will be targeting a release next year."

The official SYSTRAN marketing blurb states the following:
"SYSTRAN exploits the capacity NMT engines have to learn from qualitative data by allowing translation models to be enriched each time the user submits a correction. SYSTRAN has always sought to provide solutions adjusted to the terminology and business of its customers by training its engines on customer data. Today SYSTRAN offers a self-specialized engine, which is continuously learning on the data provided."


Driving MT Engine Improvements

Jean also informed me that NMT has a simple architecture but the number of options available to tune the engines are huge and they have not found one single approach that is suitable for all languages. Options that can make a significant difference include, "type of tokenization, introduction of additional features for instance for guiding the alignment, etc...

So far we have not found one single paradigm that works for all languages, and each language pair seems to have its own preference. What we can observe is that unlike SMT where the nature of the parameters were numerical and not really intuitive, here it seems that we can get major improvements by really considering the nature of the language pair we are dealing with."

So do these corrective changes require re-training or is there an instant dictionary-like capability that works right away? "Yes - this is a cool new feature.We can introduce feedback to the engine, sentence by sentence. It does not need retraining, we are just feeding the extra sentence and the model instantly adapts. Of course the user dictionary is also a quick and easy option. The ability of an NMT engine to "specialize" very easily and even to adapt from one single example is very impressive."


Detailed MT Quality Metrics

"What is interesting is that we get major score improvement for systems that have not been tuned for the metrics they are evaluated against - for instance, here are some results on English-Korean using the RIBES metric."


"In general, we have results in the BLEU range of generally above 5 points improvement over current baselines."

"The most satisfying result however is that the human evaluation is always confirming the results - for instance for the same language pair shown below - when doing pair-wise human ranking we obtained the following results. (RE is human reference translation, NM is NMT, BI is Bing, GO is Google, NA is Naver, and V8 our current generation). It reads "when a system A was in a ranking comparison with a system B - or reference), how many times was it preferred by the human?"
"What is interesting in the cross comparison is that when we rank engines by pair - When we blindly show a Google and V8 translation we see which one the user prefers. The most interesting row however is the second one:

         RE             BI       GO       NA      V8
NM  46.4           74.5    73.9    72     63.1

When comparing NMT output with the human reference translation, 46% of the time NMT is preferred (which is not bad, that means about one sentence out of two, the human does not prefer the Reference HT over NMT!), when comparing NMT and Google - 74% of the time, the preference goes to NMT, etc..."


The Challenges

The computing requirements have been described by many as a particular challenge. Even with GPUs, training an NMT engine is a long task. As Jean says: "and when we have to wait 3 weeks for a full training, we do need to be careful with the training workflow and explore as many options as possible in parallel."

"Artificial neural networks have a terrific potential but they also have limitations, particularly to understand rare words. SYSTRAN mitigates this weakness by combining artificial neural network and its current terminology technology that will feed the machine and improve its ability to translate."

"It is important to point out that graphic processing units (GPUs) are required to operate the new engine. Also, to quickly make this technology available, SYSTRAN will provide the market with a ready-to-use solution using an appliance (that is to say hardware and software integrated into a single offering). In addition, the overall trend is that desktops will integrate GPUs in the near future as some smartphones already do (the latest iPhone can manage neural models). As [server] size is becoming less and less of an issue, NMT engines will easily be able to run locally on an enterprise server."

As mentioned earlier there are still some languages where the optimal NMT formula is still being unraveled e.g. DE <> EN but these are still early days and I think we can expect that the research community will zero in on these tough problems, and at some point at least small solutions will be available even if complete solutions are not.


Production User Case Studies

When asked about real life production use of any of the NMT systems Jean provided two key examples.

"We have several beta-users - but two of them are most significant. For the first one, our goal is to translate a huge tourism related database from French to English, Chinese, Korean, and Spanish. We intend to use and publish the translation without post-editing. The challenge was to introduce support of named entity recognition in the model - since geographical entities were quite frequent [in the content] and a bit challenging for NMT. The best model was a generic model, meaning that we did not even have to adapt to a tourism model - and this seems to be a general rule, while in previous generation MT, the customization was doing 80% of the job, for NMT, the customization is only interesting and useful for slight final adaptation.

The second [use case]- is about technical documentation in English>Korean for an LSP. The challenge was that the available "in-domain" data was only 170K segments, which is not enough to train a full engine, but seems to be good enough to specialize a generic engine."

From everything I understand from my conversations, SYSTRAN is far along the NMT path, and miles ahead in terms of actually having something to show and sell, relative to any other MT vendor . They are not just writing puff pieces about how cool NMT is, to suggest some awareness of the technology. They have tested scores of systems and have identified many things that work and many that don't. Like many innovative things in MT, it takes at least a thousand or more attempts before you start developing real competence.They have been carefully measuring the relative quality improvements with competitive alternatives, which is always a sign that things are getting real. The product is not out yet, but based on my discussions so far, I can tell they have been playing for awhile. They have reason to be excited, but all of us in MT have been down this path before and as many of us know, the history of MT is filled with empty promises. As the Wolf character warns us (NSFW link, do NOT click on it if you are easily offended) in the movie Pulp Fiction after fixing a somewhat impossible problem, lets not get carried away just yet. Lets wait to hear from actual users and lets wait to see how it works in more production use scenarios before we celebrate.

The goal of the MT developer community has always been to get really useful automated translation, in a professional setting, since perfection it seems is a myth. SYSTRAN has seriously upped their ability in being able to do this. They are getting continuously better translation output from the machine.  If I were working with an enterprise with a significant interest in CJK <> E content, I would definitely take a closer look, as I have also gotten validation from Chris Wendt at Microsoft on their own success with NMT on J <>E content. I look forward to hearing more feedback about the NMT initiative at SYSTRAN, and if they keep me in the loop I will share it on this blog in future. I encourage you to come forward with your questions as it is a great way to learn and get to the truth, and Jean Senellart seems willing and able to share his valuable insights and experience.


Jean Senellart, CEO, SYSTRAN SA

Thursday, September 15, 2016

LSP Perspective - MT and Translators

This is a guest post by Deepan Patel at Milengo Ltd. This post is part of a series of upcoming posts that will provide varying LSP perspectives on MT technology. I would invite other LSPs who may have an interest in sharing a view to come forward and contact me to discuss different potential subjects. I may not always share the opinions of the guest writers, but I am a proponent of sharing differing views, and letting readers decide for themselves what makes most sense to them. I do not edit these opinions except for minor typos and basic reading flow related edits. I may sometimes highlight some statements in bold to highlight something I think is central to the view. I have added a presentation made by Deepan at TAUS, which is in the public domain, to provide more detail on his user case context. 




One of the key factors in Milengo’s success in establishing a robust and sustainable framework for machine translation (MT) has been to make translators an integral part of all our MT-related activities. Machine translation continues to be a divisive topic for translators for various reasons, and it is paramount for organizations that offer MT-based localization solutions to engage in respectful dialogue with translators. It is a prerequisite to ensuring successful implementation of any MT strategy. From my point of view, respectful dialogue entails addressing the very valid concerns that translators may have, especially on topics such as post-editing, in a balanced manner. 


There are many translators who for whatever reason do not want to post-edit machine-translated content at all and it is important to respect their reasons for this. I have spoken to many translators who simply do not like the discipline of post-editing because they feel that it introduces a sort of negative mindset into their work. Translation becomes less of a creative activity for them because the overt focus of post-editing is on error analysis and correction. 

A corollary to this is that translators can feel that post-editing requires them to lower their own expectations in terms of producing a highly polished and stylish translation. Whilst post-editing they find themselves fighting the urge to completely rewrite sentences so that the style of language corresponds to their own preferences, even in cases where only minor adjustments would be needed to produce a perfectly adequate sentence. To me these are perfectly reasonable viewpoints and we never seek to coerce translators into performing post-editing work, if they do not want to. 

A lot of translators have also expressed frustrations with the attitude of some of the language service providers (LSPs) with whom they work; having post-editing work thrust at them without any discussion on whether the original content selected for post-editing is actually suitable for MT, not being provided with clear enough directives on the key aspects to focus on during post-editing, and most importantly, no spirit of negotiation when it comes to establishing fair remuneration for the task at hand. 

I am surprised if LSPs do choose to engage with translators on post-editing assignments in the manner just described. In my opinion, such an approach can only serve to thoroughly alienate translators and is ultimately detrimental to the objective of successful MT implementation. 

When we started our own processes several years ago of introducing MT into our service spectrum, we were (and still are) heavily dependent on the guidance of our translators in establishing the parameters under which MT could help to increase translator throughput. And perhaps even more importantly, our translators help us to recognize the scenarios where MT does not really add much, if any, value to the translation process. As a result, our own approach is much more focused on those scenarios where we are confident that MT makes sense, and we are consequently handling ever-increasing volumes of localization work from our clients with post-editing workflows. For all of this we have our translators to thank – not only for helping to us shape a consultative approach regarding MT with our clients, but of course without them we would never get the post-editing work completed! 

The key has been to involve translators at every stage of a given testing scenario. Much of the work that we undertake during testing and evaluation phases relates to the careful selection of bilingual content to be used during MT engine creation. Although we use and appreciate automated mechanisms for extraction, consolidation and ‘cleansing’ of engine training data, it should never be forgotten that we are still dealing with language after all, and that highly proficient linguists should also play a very valuable role in the data selection process. 

For example, we ask our translators to help us design bilingual test, tuning, and terminology sets to be used for engine creation. These are constructed based on analysis of the actual source content that is eventually intended for machine translation, and are really vital for us in being able to effectively benchmark the performance of any engine that we train. Once an initial working engine is in place, our translators help us to verify the automated evaluation metric scores generated during the training process, and to identify patterns of errors in the output which we seek to remedy as much as possible in subsequent engine re-trainings. Eventually, the trial post-editing runs with our translators help us to agree on reasonable throughput expectations and consequently a consensus on fair compensation for post-editing. 

Ultimately we are strong advocates of highly collaborative working models with translators when it comes to testing and eventually implementing MT for a given scenario. Having translators participate at every stage of a lengthy testing process means that they are in full possession of all relevant facts to make informed decisions about whether a given MT engine adds value to their translation work or not. Similarly, we (Milengo) are able to shape our own approach towards evaluating whether MT could work effectively or not for a given localization scenario based on the expert guidance of our translators. I really cannot overstate the value of translators for us in all our MT-related activities. 

About the writer 

Deepan Patel is Milengo Ltd’s MT Specialist and has been working in the localization industry for seven years. He is a Modern Languages graduate from the University of Oxford and a certified Memsource trainer.