eMpTy Pages: Systran

Showing posts with label Systran. Show all posts

Saturday, December 30, 2017

Artificial Intelligence: And You, How Will You Raise Your AI?

This is the final post for the 2017 year, a guest post by Jean Senellart who has been a serious MT practitioner for around 40 years, with deep expertise in all the technology paradigms that have been used to do machine translation. SYSTRAN has recently been running tests building MT systems with different datasets and parameters to evaluate how data and parameter variation affect MT output quality. As Jean said:

" We are continuously feeding data to a collection of models with different parameters – and at each iteration, we change the parameters. We have systems that are being evaluated in this setup for about 2 months and we see that they continue to learn."

This is more of a vision statement about the future evolution of this (MT) technology, where they continue to learn and improve, rather than a direct reporting of experimental results, and I think is a fitting way to end the year in this blog.

It is very clear to most of us that deep learning based approaches are the way forward for continued MT technology evolution. However, skill with this technology will come with experimentation and understanding of data quality and control parameters. Babies learn by exploration and experimentation, and maybe we need to approach our continued learning, in the same way, learning from purposeful play. Is this not the way that intelligence evolves? Many experts say that AI is going to be driving learning and evolution in business practices in almost every sphere of business.

===================

Artificial Intelligence is the subject of all conversations nowadays. But do we really know what we are talking about? And if instead of looking at AI as a kind of software that is ready to use and potentially threatening for our jobs, what if it was thought to be an evolutionary digital entity, with an exceptional faculty of learning? Therefore, breaking with the current industrial scheme of the traditional software that requires code to be frozen until the next update of the system. AI could then disrupt not only technology applications but also economic models.

Artificial Intelligence does not have any difficulty to quickly handle exponential growing volumes of data, with exceptional precision and quality. It thus frees valuable time for employees to communicate internally with customers, and to invest in innovative projects. By allowing an analysis of all the information available for rapid decision-making, AI is truly the corollary of the Internet era, which is also the result of all threats, whether virtual or physical.

Deep Learning and Artificial Neural Networks: an AI that is constantly evolving

Deep Learning and artificial neural networks offer infinite potential and a unique ability to continually evolve in learning. By breaking with previous approaches such as the statistical data analysis approach which demonstrates a formidable, but trivial, memorization and calculation capacity, like databases and computers, the neuronal approach gives a new dimension to artificial intelligence. For example, in the field of automatic translation, artificial neural networks allow the "machine" to learn languages as we do when we are in an immersion program abroad. Thus, these neural networks have never finished learning, after their initial learning, they can then continue to evolve independently.

A Quasi "Genetic Selection"

Training a neural model is, therefore, more akin to a mechanism of genetic selection, such as those practiced in the agro-food industry, than to a deterministic programming process: in all the sectors used by neural networks, the AI is selected to keep only those learnings that progress best and the fastest or most adaptable to a given task. These AI techniques that are used for automatic translation are even customized according to customer needs - business, industry and specific vocabulary. Over time, some AI techniques will grow in use, and others will disappear because they will not demonstrate enough learning and will not be sufficiently efficient. DeepMind illustrates this ability to the extreme. It was at the origin of AlphaGo, the first algorithm to beat the human. AlphaGo had learned thousands of games played by human experts. The company then announced the birth of an even more powerful generation of AI. It managed to learn the game of Go without playing games played by humans, but "simply" by discovering, all alone and in practice, the strategies and subtleties of this game in a fraction of the time of the original process. Surprising, isn’t it?

Machines and Self-Learning Software

The next generation of neural translation engines will exploit this intrinsic ability of neural models to learn continuously. It will also build on the ability of two different networks to have unique pathways of progress. Specifically, from the same data, like two students, these models can improve by working together or competing against each other. This second generation of AI is very different because not only does it have models taught from existing repositories (existing translations), but just like newborn ones, they also learn to… learn over time, placing them in a long perspective. This is the lifelong learning: once installed in production, for example in the customer information system, AI continue to learn and improve.

To each his own AI tomorrow?

Potentially, tomorrow's computer systems may be built, like a seed planted on your computer, or in everyday objects. They will evolve to better meet your needs. Even if current technologies, especially software, are customized, these technologies remain 90% similar from one user to another because they are built into bundled and standard products. They cost very dearly because their cycles of development are long. The new AI, which tends to tailored solutions, is the opposite of this. In the end, it is the technology that will adapt to man and not the opposite, as it is at present. Each company will have the specific technical means, "hand-sew" and you may have an AI at home that you can raise yourself!

Towards a new industrial revolution?

What will be the impact of this evolution on software vendors or on IT services companies? And beyond, over the entire industry? Will businesses need to reinvent themselves to bring value to one of the stages of the AI process, whether it be adaptation or quality of service? Will we see the emergence of a new profession, the "AI breeders"?

In any case, until the AI is seen as a total paradigm shift, it will continue to be seen as software 2.0 or 3.0. A vision that hinders innovation and could make us miss all of its promises, especially to free ourselves from repetitive and repetitive tasks to restore meaning and pleasure to work.

Jean Senellart, CTO, SYSTRAN SA

Thursday, December 22, 2016

SYSTRAN’s Continuing Neural MT Evolution

Recently, I had the opportunity and kind invitation to attend the SYSTRAN community day event where many members of their product development, marketing, and management team gathered with major customers and partners.

The objective was to share information about the continuing evolution of their new Pure Neural MT (PNMT) technology, share detailed PNMT output quality evaluation results, and provide initial customer user experience data with the new technology. Also, naturally such an event creates a more active and intense dialogue between company employees and customers and partners. This, I think has substantial value for a company that seeks to align product offerings with its customer's actual needs.

Ongoing Enhancements of the PNMT Product Offering

The event made it clear that SYSTRAN is well down the NMT path, possibly years ahead of other MT vendors, and provided a review of the current status of their rapidly evolving PNMT technology.

Some highlights from my perspective:

Alex Rush from the Harvard NLP Department made an enthusiastic and valiant attempt to explain how NMT works, but I think lost most of the audience by the time he got to how LM/Softmax works and how dense features enable discrete predictions. Ironically the crowd was pretty glassy-eyed and stupified by the time he got to attention mechanisms, LSTMs and the thousands of hidden dimensions underlying each machine translation. However, for those who seek these kinds of details, Harvard and SYSTRAN are making much of this information available as open source via the OpenNMT project at Harvard University.

Jean Senellart, SYSTRAN CEO and CTO, described that they had already moved to the second generation production versions of the PNMT engines that had been released in October, and that they continued to see meaningful improvements in output quality, even though they are not using Internet-scale training data volumes.

He also announced 15 new language pairs that have been completed to make for a total of more than 45 language pairs.These new languages include French <> Chinese, English <> Russian, English <> Hungarian, English <> Hebrew, among others.

He pointed out how they had overcome one of the major hurdles of deploying NMT technology: the real-time translation throughput issue. SYSTRAN has found a way to deliver a production PNMT engine that runs only slightly less than 20% slower than their current hybrid (PB-SMT) systems.

Expert Human Evaluation Of PNMT Output

There was a very interesting presentation by Heidi Depraetere of CrossLang who is running a comprehensive human and automatic evaluation of ALL the PNMT systems output results. She is one of the few in my opinion, who can do a meaningful and accurate evaluation that will stand serious scrutiny and provide true insight into the MT output quality from a professional translator perspective. Her personal enthusiasm about the PNMT technology was quite revealing for me, as she has been around, and has an “MT Reality Meter” that I trust. Among other things, Heidi reported that:

There is a global improvement on all standard evaluation metrics like BLEU, TER, RIBES for a variety of languages with some examples shown below.

For English to French, evaluators clearly preferred SYSTRAN NMT when compared with the output from online web-scale SMT engines (Google, Bing).
She presented evidence on the strong preference by evaluators for PNMT output from an IT domain-adapted EN > KO engine. When comparing PNMT and professional translation:
- For 38% of the sentences, NMT is preferred
- For 54% of the sentences, Human Translation is preferred
- For 18% of the sentences, PNMT translation is judged equivalent to Human. This means that 1 in 2 sentences produced by the PNMT engine was either preferred or seen to be as good as professional human translation.

She also provided some very interesting data on the types of errors by language, which is too detailed to go into here, but was quite revealing of data bias and other data related problems. Given the scale of the evaluation, more tests are still underway, and SYSTRAN will share this information as it becomes available.

Customer User Experience Findings

Several customers also presented their use experience, with several LSPs vouching for the PNMT improvements over previous systems e.g. see Lionbridge comments below, or here is a gushing report from Lexcelera. The most interesting use case study (for me) was presented by the travel guide publisher, Petit Futé, who can now custom publish-on-demand, heavily personalized and unique one-of-kind travel guides that may draw data from several source languages, into a selected target language, at a customer accepted quality level, driven by PNMT. This is something they call Augmented Tourist 2.0, which allows a customer to create a unique travel plan, and then print a custom travel guide book to provide specific information just for that unique trip, aggregating both user review and generic tourist information. This is an approach that could also be used by other kinds of popular specialized domain publishers like Romance Novellas, Hatha Yoga Manuals, Multi User Online Gaming Guides etc..

Near Term Improvements Coming in 1H2017

Jean later showed a brief demo of how PNMT also has some of the capabilities of Interactive MT/AutoSuggest that competitive products have, that he called Predictive Translation, and described additional capabilities to handle unknown words which is a major concern for many in their initial explorations of Neural MT.

In the first quarter of 2017, the SYSTRAN PNMT engines will incorporate the broad infrastructural complementary functionality that is available to all SYSTRAN MT engines. Customers will benefit from the full power of this new engine in their current solution platform with all the same functionality, such as processing many file formats, customization with user dictionaries and translation memories, real time translation, named-entity recognition and integration of the engine into the Microsoft Office tools.

SYSTRAN will also work to transfer the compute intensive cloud PNMT system onto a somewhat scaled down translation server, to enable on-premise server installation and even mobile phone implementations, hopefully without compromising translation quality too much.

Domain Adaptation and Specialization

One of the major criticisms of NMT is that it is not ready for professional business use because it cannot be customized, or domain-adapted, for each enterprise customer like the most successful PB-SMT systems are today. Critics say that NMT is only a technology for generic system use. The training process is so expensive and slow that it is not feasible to use NMT for enterprise systems today, say the critics. However, SYSTRAN plans to do exactly that over the next few months, and has already begun beta testing as the EN>KO IT system tests described above suggest. NMT offers new approaches to customization that can be quick and not require the slow and expensive initial training period. This is not PB-SMT and new things are possible.

With its new PNMT engine, SYSTRAN can optimize neural networks in a new post-training process called “specialization”. Think of this as fast engine customization for unique customer or project needs, almost like Adaptive MT. This method significantly improves the quality of translation in record time.

Jean Senellart, SYSTRAN CEO, explains: “Adaptation of translation to a specific domain such as legal, marketing, technical, pharmaceutical, is an absolute necessity for global companies and organizations. Offering professionals specialized translation solutions in their trade terminology has been part of SYSTRAN's DNA for many years. This new generation of neural engines offers new capabilities in domain adaptation. PNMT is able to adapt a generic model to new data and even to each translator. Generic NMT is undoubtedly a great improvement in translation technology, but “Specialized NMT” is the technology that will truly help companies meet their global challenges.”

Impact on MT Market Dynamics

The growing feedback on the significantly better NMT Google engines also suggests that a change is coming. While SYSTRAN is easily 12 to 18 months ahead of most other MT vendors, with any possibility of delivering a market-ready Neural MT product, it is now becoming increasingly clear that the whole Expert MT Vendor market will start to move towards Neural MT. But the shift to NMT is expensive and complex and most other MT vendors do not have the funding, manpower and expertise to jump into NMT implementation easily. Having an alliance with an academic partner and funding an occasional experiment is not enough.

Open source will ease the hurdles and interestingly SYSTRAN is aiding this through the Harvard OpenNMT project, but the funding needs to properly undertake NMT development will keep this a game only for the big boys and the really smart small ones. Hopefully we do not see the same DIY foolishness we saw with SMT and Moses, as this is not a proper realm for LSPs to play in, even for very large ones. In my opinion the only one who could do this competently is SDL. Ask any one of the real NMT experts to explain how NMT works, if you have any doubt about my skepticism.

While I still do believe that Adaptive MT will produce the highest quality MT output in the near-term (2017), I think they too are heading towards Neural MT. Unless SYSTRAN can surprise us by providing the market with a rapid, robust, high-quality specialization capability, I would still expect that a properly engaged Adaptive MT system will produce the best professional MT engines in terms of output quality and translation productivity in the near term. 2017 will probably be the last year that phrase-based SMT systems will dominate in the professional and enterprise use arena. The Google, Microsoft, FB, Naver, Baidu evidence is clear, NMT is the way forward for generic systems, with improvements that are good enough to justify huge increases in deployment hardware costs. However, if SYSTRAN solves the domain adaptation and specialization challenge, and the large vocabulary issues, at enterprise scale, sooner rather than later, I think we will see a more rapid transition even for the smaller but usually higher quality professional translation MT world.

These are exciting times again for the MT world, as now, once again we are seeing people take big strides forward. SYSTRAN also briefly presented some new product concepts that look interesting and I hope to cover that in more detail in future. BTW, I am also talking to SDL about their Adaptive MT technology and will report on that shortly.

I wish you all a happy, healthy and joyous holiday season. And a prosperous and happy new year.

Wednesday, October 19, 2016

SYSTRAN Releases Their Pure Neural MT Technology

SYSTRAN announced earlier this week that they are doing a “first release” of their Pure Neural™ MT technology for 30 language pairs. Given how good the Korean samples that I saw were, I am curious why Korean is not one of the languages that they chose to release.

"Let’s be clear, this innovative technology will not replace human translators. Nor does it produce translation which is almost indistinguishable from human translation" ... SYSTRAN BLOG

The languages pairs being initially released are 18 in and out of English, specifically EN<>AR, PT-BR, NL, DE, FR, IT, RU, ZH, ES and 12 in and out of French FR<>AR, PT-BR, DE, IT, ES, NL. They claim these systems are the culmination of over 50,000 hours of GPU trainings but are very careful to say that they are still experimenting and tuning these systems and that they will adjust them as they find ways to make them better.

They have also enrolled ten major customers in a beta program to validate the technology at the customer level, and I think this is where the rubber will meet the road and we will find how it really works in practice.

The boys at Google (who should still be repeatedly watching that Pulp Fiction clip), should take note of their very pointed statement about this advance in the technology:

Let’s be clear, this innovative technology will not replace human translators. Nor does it produce translation which is almost indistinguishable from human translation – but we are convinced that the results we have seen so far mark the start of a new era in translation technologies, and that it will definitely contribute to facilitating communication between people.

Seriously Mike (Schuster) that’s all that people expect; a statement that is somewhat close to the reality of what is actually true.

They have made a good effort at explaining how NMT works, and why they are excited, which they say repeatedly through their marketing materials. (I have noticed that many who work with Neural net based algorithms are still somewhat mystified by how it works.) They plan to try and explain NMT concepts in a series of forthcoming articles which some of us will find quite useful, and they also provide some output examples which are interesting to understand how the different MT methodologies approach language translation.

CSA Briefing Overview

In a recent briefing with Common Sense Advisory they shared some interesting information about the company in general:

The Korean CSLi Co. (www.csli.co.kr) acquisition has invigorated the technology development initiatives.
They have several large account wins including Continental, HP Europe, PwC and Xerox Litigation Services. These kinds of accounts are quite capable of translating millions of words a day as a normal part of their international operational needs.
Revenues are up over 20% over 2015, and they have established a significant presence in eDiscovery area which now accounts for 25% of overall revenue.
NMT technology improvements will be assessed by an independent third party (CrossLang) with long term experience in MT evaluation, and who are not likely to say misleading things like "55% to 85% improvements in quality" like the boys at Google.
SYSTRAN is contributing to an open-source project on NMT with Harvard University and will share detailed information about their technology there.

Detailed Technical Overview

They have also supplied a more detailed technical paper which I have yet to review carefully, but what struck me immediately on initial perusal was that the data volumes they are building their systems with are minuscule compared to what Google and Microsoft have available. However, the ZH > EN results did not seem substantially different from the amazing-NOT GNMT system. Some initially interesting observations are highlighted below, but you should go to the paper to see the details:

Domain adaptation is a key feature for our customers — it generally encompasses terminology, domain and style adaptation, but can also be seen as an extension of translation memory for human post-editing workflows. SYSTRAN engines integrate multiple techniques for domain adaptation, training full new in-domain engines, automatically post-editing an existing translation model using translation memories, extracting and re-using terminology. With Neural Machine Translation, a new notion of “specialization” comes close to the concept of incremental translation as developed for statistical machine translation like (Ortiz-Martınez et al., 2010 )

What is encouraging is that adaptation or “specialization” is possible with very small volumes of data, and this can be run in a few seconds which suggests this has possibilities to be an Adaptive MT model equivalent.

Our preliminary results show that incremental adaptation is effective for even limited amounts of in-domain data (nearly 50k additional words). Constrained to use the original “generic” vocabulary, adaptation of the models can be run in a few seconds, showing clear quality improvements on in-domain test sets .

Of course the huge processing requirements of NMT remain a significant challenge and perhaps they are going to have to follow Google and Microsoft who both have new hardware approaches to address this issue with the TPU (Tensor Processing Units) and programmable FPGAs that Microsoft recently announced to deal with this new class of AI based machine learning applications.

For those who are interested, I ran a paragraph from my favorite Chinese News site and compared the Google “nearly indistinguishable from human translation” GNMT output with the SYSTRAN PNMT output and I really see no big differences in quality from my rigorous test, and clearly we can safely conclude that humanity is quite far from human range MT quality at this point in time.

The Google GNMT Sample

The SYSTRAN Pure NMT Sample

Where do we go from here?

I think the actual customer experience is what will determine the rate of adoption and uptake. Microsoft and a few others are well along the way with NMT too. I think SYSTRAN will provide valuable insights in December from the first beta users who actually try to use it in a commercial application. There is enough evidence now to suggest that if you want to be a long-term player in MT you had better have actual real experience with NMT and not just post how cool NMT is and use SEO words like machine learning and AI on your website.

The competent third party evaluation SYSTRAN has planned is a critical proof statement that hopefully provides valuable insight on what works and what needs to be improved at the MT output level. It will also give us more meaningful comparative data than the garbage that Google has been feeding us. We should note that while BLEU score jumps are not huge the human evaluations show that NMT output is often preferred by many who look at the output.

The ability of serious users to adapt and specialize the NMT engines for their specific in-domain needs I think is really a big deal – if this works as well as I am being told, I think it will quickly push PBSMT-based Adaptive MT (my current favorite) to the sidelines, but it is still too early to really to say this with anything but Google MT Boys certainty.

But after a five-year lull in the MT development world and seemingly little to no progress, we finally have some excitement in the world of machine translation and NMT is still quite nascent. It will only get better and smarter.

Thursday, September 22, 2016

Comparing Neural MT, SMT and RBMT – The SYSTRAN Perspective

This is the second part of an interview with Jean Senellart (JAS) , Global CTO and SYSTRAN SAS Director General. The first part can be found here: A Deep Dive into SYSTRAN’s Neural Machine Translation (NMT) Technology .

Those translators who accuse MT vendors of stealing or undermining their jobs should take note that SYSTRAN is the largest independent MT vendor. A position it has held for most of its existence, but has never generated more than €20 million in annual revenue. Which to me suggests that MT is mostly being used for different kinds of translation tasks and is hardly taking any jobs away. The problem has been much more related to unscrupulous or incompetent LSPs who used MT improperly in rate negotiations. MT is hugely successful for those companies who find other ways to monetize the technology, as I pointed out in The Larger Context Translation Market.This does suggest that MT has huge value in enabling global communication and commerce, and even in its less than perfect state is considered valuable by many who might otherwise need to acquire human translation services. If anything, MT vendors are the ones that are trying hardest to develop technology that is actually useful to professional translators as the Lilt offering shows and as this new SYSTRAN offering also promises to be. The new MT technology is reaching a point where it is becoming rapidly responsive to corrective feedback and thus much more useful to professional translation use case scenarios. The forces affecting translator jobs and work quality are much more complex and harder to pin down as I have also mentioned in the past.

In my conversations with Jean, I realized that he is one of the few people around in the "MT industry", who has deep knowledge and production use experience with all three MT paradigms. Thus, I tried to get him to share both his practical experience-based and philosophical perspectives about the three approaches. I found his comments fascinating and thought that it would be worth highlighting them separately in this post. Jean (through SYSTRAN) is unique in being one of the rare practitioners around, who has produced commercial release versions, of say French <> English MT systems, using all three MT paradigms. More if you count the SPE and NPE “hybrid” variants where more than one approach is used in a single production process.

Again in this post, I have tried as much as possible to keep Jean Senellart’s direct quotes in here to avoid any misinterpretation. My comments are in italics when they do occur within his quotes.

Comparing The Three Approaches; The Practical View

Some interesting comparative comments made by Jean about his actual implementation experience with the three methodologies (RBMT, SMT, NMT):

“The NMT approach is extremely smart to learn the language structure but is not as good at memorizing long lists of terminology as RBMT or SMT was. With RBMT, the terminology coverage was right, but the structure was clumsy – with SMT or SPE, we had an “in-between” situation where we got the illusion of fluency, but sometimes at the price of a complete mistranslation, and a strange ability to memorize huge lists but without any consideration of their linguistic nature. With NMT, the language structure is excellent, as if the neural network really deeply understands the grammar of the language – and introducing [greater] support of terminology was the missing link to the previous technologies.”

(This inability of NMT to handle large vocabulary lists is considered one of the main weakness of the technology currently. Here is another reference discussing this issue. However, it appears that SYSTRAN has developed some kind of a solution to address this issue.)

“What is interesting with NMT is it seems far more tolerant than PBSMT (Phrase-Based SMT) to noisy data. However, it does need far less data than PBSMT to learn – so we can afford to provide only data for which we know we have a good alignment quality. Regarding the domain or the quality [of MT output of] the languages, we are for the moment trying to be as broad as possible [
rather than focusing on specialized domains].”

In terms of training data volume, JAS said: “This is still very empirical, but we can outperform Google or Microsoft MT on their best language pairs using only 5M translation units – and we have a very good quality (BLEU score about 45** ) for languages like EN>FR with only 1M TU. I would say we need 1/5 of the data necessary to train SMT. Also, generic translation engines like Google or Bing Translate are using billions of words for their language models, here we need probably less than 1/100th.”

(**I think it bears saying that I fully expect that the BLEU here is measured with great care and competence, unlike what we see so often with Moses practitioners and LSPs in general who assume scores of 75+ are needed for the technology to be usable.

The ability of the new MT technology to improve rapidly with small amounts of good quality training data and small amounts of corrective feedback suggests that we may be approaching new thresholds in the use of MT for professional use.)

Comparing RbMT, SMT, and NMT: The Philosophical View

When I probed more deeply into the differences between these MT approaches, (since really SYSTRAN is the only company who has real experience in all 3), JAS said: “I would more compare them in terms of what they are trying to do, and on their ability to learn.” His explanation reflects his long-term experience and expertise and is worth careful reading. I have left the response in his own words as much as possible.

“RBMT, fundamentally (unlike the other two), has an ulterior motive: it attempts to describe the actual translation process. And by doing that, it has been trying to solve a far more complicated challenge than just machine translation; it tries to decompose [and deconstruct and analyze] and explain how a translation is produced. I still believe that this goal is the ultimate goal of any [automated translation] system. For many applications, in particular, language learning, but also for post-editing, the [automated ] system would be far more valuable if it could produce not only the translation but also explain the translation.”

“In addition, RBMT systems are facing three main limitations in language [modeling] which are:

1) the intrinsic ambiguity of language for a machine, which is not the same for a human who has access to meaning [and a sense for the underlying semantics],

2) the exception-based grammar system, and

3) the huge, contextual and always expanding volume of terminology units.”

“Technically, RBMT systems might have different levels of complexity depending on the linguistic formalism being used, making it hard to compare with the others (SMT, NMT), so I would rather say that one of the main reasons for the limitations of a pure RBMT system lies in its [higher reaching] goal. The fact is that fully describing a language is a very complicated matter, and there is no language today for which we can claim a full linguistic description.”

“SMT came with an extraordinary ability to memorize from its exposure to existing translations – and with this ability, it brought a partial solution to the first challenge mentioned above, and a very good solution to the third one – the handling of terminology – however, it mostly ignores the modeling of the grammar. Technically, I think SMT is the most difficult of the 3 approaches, it combines many algorithms to optimize the MT process, and it is hard work to deal with the huge database of [relevant training ] corpus.”

“NMT has access to the meaning and is dealing well with modeling of human language grammar. In terms of difficulty, NMT engines are probably the simplest to implement, a full training of an NMT engine involve only several thousands of lines of code. The simplicity of implementation is, however, hiding the fact that we/nobody knows why it is so effective.”

“I would use an analogy of a human learning to drive a car to explain this more fully:

- The Rule-based approach will attempt to provide a full modeling of the car dynamic, on how the engine is connected to the wheel, on the effect of acceleration in the trajectory, etc. This is very complicated. (And possibly impossible to model in totality.)

- The Statistical approach, will use data from past experience and will try to compare a new situation with a past situation and will decide on the action based on this large database [of known experience]. This is a huge task and very difficult to implement. (And can only be as good as the database it learns from.)

- The Neural approach, with a limited access to the phenomenon involved, or with limited ability to remember, will build its own “thinking” system to optimize the driving experience, it will actually learn to drive the car, build reflexes – but will not be able to explain why and how such decisions are being made, and will not be able to leverage local knowledge – for instance that at a specific bend on the road in very specific weather condition, it had to anticipate [braking] because it is particularly dangerous, etc... This approach is surprisingly very simple and thanks to computation power evolution have become more accessible.”

“Today, this last approach (NMT) is clearly the most promising but will need to integrate the second (SMT) to be more robust, and eventually to deal with the first one (RBMT) to be able to not only make choices but also explain them.”

Comparing NMT to Adaptive MT

When probed about this, JAS said: “Adaptive MT is an innovative concept based on the current SMT paradigm – it is, however, a concept that is quite naturally embedded in the NMT paradigm; of course, there will be work needed to be done to make it work as nicely as the Lilt product does it. But, my point is that NMT (and not just from Systran) will bring a far more intuitive solution to this issue of continuous adaptive learning, because it is built for that: on a trained model, we can tune the model without any tricks with feedback of one single sentence – and produce a translation which immediately adapts to user input.”

The latest generation MT technology, especially NMT and Adaptive MT look like a major step forward to enabling the expanding use of MT in professional translation settings. With continuing exploration and discovery in the fields of NLP, artificial intelligence and machine intelligence, I think we may be in for some exciting times ahead as these discoveries benefit MT research. Hopefully, the focus will shift to making new content multilingual and solve new kinds of translation challenges, especially in speech and video. I believe that we will see more of the kinds of linguistic steering activities we are seeing in motion at eBay and that there will always be a role for competent linguists and translators.

Jean Senellart, CEO, SYSTRAN SA

The first part of the SYSTRAN interview can be found at A Deep Dive into SYSTRAN’s Neural Machine Translation (NMT) Technology

Post Script: There is a detailed article that describes the differences between these approaches on the SYSTRAN website released a few weeks after this post was published.

Wednesday, September 21, 2016

A Deep Dive into SYSTRAN’s Neural Machine Translation (NMT) Technology

One of the wonderful things about my current independent status is the ability to engage deeply with other MT experts who were previously off limits because competing MT vendors don't usually chat with open hearts and open cloaks. MT is tough to do well and I think the heavy lifting should be left to people who are committed for the long run, and who are willing to play, invest and experiment in spite of regular failure. This is how humans who endure and persist, learn and solve complex problems.

This is Part 1 of a two-part post on the SYSTRAN NMT product announcement. The second part will focus on comparing NMT with RBMT and SMT and also with the latest Adaptive MT initiatives. It can be found here: Comparing Neural MT, SMT and RBMT – The SYSTRAN Perspective

Press releases are so filled with marketing-speak as to be completely useless to most of us. They have a lot of words but after you read them you realize you really don't know much more than you got from the headline. So, I recently had a conversation with Jean Senellart , Global CTO and SYSTRAN SAS Director General, to find out more about their new NMT technology. He was very forthcoming and responded to all my questions with useful details, anecdotes, and enthusiasm. The conversation only reinforced in my mind that "real MT system development" is something best left to experts, and not something that even large LSPs should dabble with. The reality and complexity of NMT development push the limits of MT even further away from the DIY mirage.

In the text below, I have put quotes around everything that I have gotten directly from SYSTRAN material or from Jean Senellart (JAS) to make it clear that I am not interpreting. I have done some minor editing to facilitate readability and "English flow" and added comments in italics within his quotes where this is done.

The New Product Line

JAS clarified several points about the overall evolution of the SYSTRAN product line.
SYSTRAN intends to keep all the existing MT system configurations they have in addition to the new NMT options. So they will have all of the following options:

RBMT :- the rule-based legacy technology
SMT :- Moses-based generation of engines that they have released for some language pairs over the last few years
SPE :- Statistical Post-Editing translation engines - that were introduced in 2007 as the first implementation combining Rule-Based plus Phrase-Based Statistical systems.
NMT:- is the purely neural machine translation engines that they just announced.
NPE :- stands for « Neural Post-Editing » and it is the replication of what they did in SPE using Phrase-Based machine translation, but now using Neural Machine Translation instead of SMT for the second step in the process. They are now using a neural network to correct and improve the output of a rule-based engine.

They will preserve exactly the same set of APIs and features (like the support of a user dictionary) around these new NMT modules so that these historical linguistic investments are fully interchangeable across the product line.

JAS said: "From my intuition, there will still be situations where we will prefer to continue to offer the older solutions: for instance, when we will need high-throughput on a standard CPU server, or for low-resource languages for which we already have some RBMT solution, or for customers currently using heavily customized engines." However, they expect that NMT will proliferate even in the small memory footprint environment, and even though they expect that NMT will eventually prevail, they will keep the other options available for their existing customer base.

The NMT initiative focused on languages that were most important to their customers, or was known to be difficult historically, or was currently presenting special challenges not easily solved with the legacy solutions. So as expected the initial focus was on EN<>FR, EN<>AR, EN<>ZH, EN<>KO, FR<>KO. All of these already show promise, especially the KO <> EN, FR combinations which showed the most dramatic improvements and can be expected to improve further as the technology matures.

However, DE<>EN is one of the most challenging language pairs, as Jean said: "we have found the way to deal with the morphology, but the compounding is still problematic. Results are not bad, though, but we don't have the same jump in quality yet for this language pair."

The Best Results

So where have they seen the most promising results? As Jean said: "The most impressive results I have seen are in complicated language pairs like English-Korean, however, even for Arabic-English, or French-English the difference of quality between our legacy engines, online engines, and this new generation is impressive.

What I found the most spectacular is that the translation is naturally fluent at the full sentence level - while we have been (historically used) to some feeling of local fluency but not sounding fully right at the sentence level. Also, there are some cases, where the translation is going quite away from the source structure - and we can see some real "rewriting" going on."

Here are some examples comparing KO>EN sentences with NMT, SYSTRAN V8 (the current generation) and Google:

And here are some examples of where the NMT seems to make linguistically informed decisions and changes the sentence structure away from the source to produce a better translation.

The Initial Release

When the NMT technology is released in October, SYSTRAN expects to release about 40 language pairs (mostly European and Major Asian languages related to English and French) with an additional 10 still in development to be released shortly after.

As JAS stated: "We will be delivering high-quality generic NMT engines that will be instantly ready for "specialization" (I am making a difference with customization (which implies training) because the nature of the adaptation to the customer domain is very different with NMT)."

Also very important for the existing customer base is that all the old dictionaries developed over many years for RBMT / SMT systems will be useful for NMT systems. As Jean confirmed: "Yes - all of our existing resources are being used in the training of the NMT engines. It is worth noting that, dictionaries are not the only components from our the legacy modules we are re-using, the morphological analysis or named entity recognition are also key parts of our models."

With regard to the User Interface for the new NMT products, JAS confirmed: "the first generation will fully integrate into the current translation infrastructure we have - we had to replace of course the back-end engines, but also some intermediate middle components. However, the GUI is preserved. We have started thinking about the next generation of UI which will fully leverage the new features of this technology, and will be targeting a release next year."

The official SYSTRAN marketing blurb states the following:

"SYSTRAN exploits the capacity NMT engines have to learn from qualitative data by allowing translation models to be enriched each time the user submits a correction. SYSTRAN has always sought to provide solutions adjusted to the terminology and business of its customers by training its engines on customer data. Today SYSTRAN offers a self-specialized engine, which is continuously learning on the data provided."

Driving MT Engine Improvements

Jean also informed me that NMT has a simple architecture but the number of options available to tune the engines are huge and he has not found one single approach that is suitable for all languages. Options that can make a significant difference include, "type of tokenization, the introduction of additional features for instance for guiding the alignment, etc...

So far we have not found one single paradigm that works for all languages, and each language pair seems to have its own preference. What we can observe is that unlike SMT where the nature of the parameters was numerical and not really intuitive, here it seems that we can get major improvements by really considering the nature of the language pair we are dealing with."

So do these corrective changes require re-training or is there an instant dictionary-like capability that works right away? "Yes - this is a cool new feature.We can introduce feedback to the engine, sentence by sentence. It does not need retraining, we are just feeding the extra sentence and the model instantly adapts. Of course, the user dictionary is also a quick and easy option. The ability of an NMT engine to "specialize" very easily and even to adapt from one single example is very impressive."

Detailed MT Quality Metrics

"What is interesting is that we get major score improvement for systems that have not been tuned for the metrics they are evaluated against - for instance, here are some results on English-Korean using the RIBES metric."

"In general, we have results in the BLEU range of generally above 5 points improvement over current baselines."

"The most satisfying result, however, is that the human evaluation is always confirming the results - for instance for the same language pair shown below - when doing pair-wise human ranking we obtained the following results. (RE is human reference translation, NM is NMT, BI is Bing, GO is Google, NA is Naver, and V8 our current generation). It reads "when a system A was in a ranking comparison with a system B - or reference), how many times was it preferred by the human?"

"What is interesting in the cross comparison is that when we rank engines by the pair - When we blindly show a Google and V8 translation we see which one the user prefers. The most interesting row, however, is the second one:

RE BI GO NA V8

NM 46.4 74.5 73.9 72 63.1

When comparing NMT output with the human reference translation, 46% of the time NMT is preferred (which is not bad, that means about one sentence out of two, the human does not prefer the Reference HT over NMT!), when comparing NMT and Google - 74% of the time, the preference goes to NMT, etc..."

The Challenges

The computing requirements have been described by many as a particular challenge. Even with GPUs, training an NMT engine is a long task. As Jean says: "and when we have to wait 3 weeks for a full training, we do need to be careful with the training workflow and explore as many options as possible in parallel."

"Artificial neural networks have a terrific potential but they also have limitations, particularly to understand rare words. SYSTRAN mitigates this weakness by combining artificial neural network and its current terminology technology that will feed the machine and improve its ability to translate."

"It is important to point out that graphic processing units (GPUs) are required to operate the new engine. Also, to quickly make this technology available, SYSTRAN will provide the market with a ready-to-use solution using an appliance (that is to say hardware and software integrated into a single offering). In addition, the overall trend is that desktops will integrate GPUs in the near future as some smartphones already do (the latest iPhone can manage neural models). As [server] size is becoming less and less of an issue, NMT engines will easily be able to run locally on an enterprise server."

As mentioned earlier there are still some languages where the optimal NMT formula is still being unraveled e.g. DE <> EN but these are still early days and I think we can expect that the research community will zero in on these tough problems, and at some point at least small solutions will be available even if complete solutions are not.

Production User Case Studies

When asked about real life production use of any of the NMT systems Jean provided two key examples.

"We have several beta-users - but two of them are most significant. For the first one, our goal is to translate a huge tourism related database from French to English, Chinese, Korean, and Spanish. We intend to use and publish the translation without post-editing. The challenge was to introduce support for named entity recognition in the model - since geographical entities were quite frequent [in the content] and a bit challenging for NMT. The best model was a generic model, meaning that we did not even have to adapt to a tourism model - and this seems to be a general rule, while in previous generation MT, the customization was doing 80% of the job, for NMT, the customization is only interesting and useful for slight final adaptation.

The second [use case]- is about technical documentation in English>Korean for an LSP. The challenge was that the available "in-domain" data was only 170K segments, which is not enough to train a full engine, but seems to be good enough to specialize a generic engine."

From everything I understand from my conversations, SYSTRAN is far along the NMT path, and miles ahead in terms of actually having something to show and sell, relative to any other MT vendor . They are not just writing puff pieces about how cool NMT is, to suggest some awareness of the technology. They have tested scores of systems and have identified many things that work and many that don't. Like many innovative things in MT, it takes at least a thousand or more attempts before you start developing real competence.They have been carefully measuring the relative quality improvements with competitive alternatives, which is always a sign that things are getting real. The product is not out yet, but based on my discussions so far, I can tell they have been playing for awhile. They have reason to be excited, but all of us in MT have been down this path before and as many of us know, the history of MT is filled with empty promises. As the Wolf character warns us (NSFW link, do NOT click on it if you are easily offended) in the movie Pulp Fiction after fixing a somewhat impossible problem, let's not get carried away just yet. Let's wait to hear from actual users and let us wait to see how it works in more production use scenarios before we celebrate.

The goal of the MT developer community has always been to get a really useful automated translation, in a professional setting, since perfection it seems is a myth. SYSTRAN has seriously upped their ability in being able to do this. They are getting continuously better translation output from the machine. If I were working with an enterprise with a significant interest in CJK <> E content, I would definitely take a closer look, as I have also gotten validation from Chris Wendt at Microsoft on their own success with NMT on J <>E content. I look forward to hearing more feedback about the NMT initiative at SYSTRAN, and if they keep me in the loop I will share it on this blog in future. I encourage you to come forward with your questions as it is a great way to learn and get to the truth, and Jean Senellart seems willing and able to share his valuable insights and experience.