Pages

Pages

Tuesday, January 3, 2017

A Closer Look at SDL's Adaptive MT Technology

I recently sat with Samad Echihabi, VP Research & Product Development at SDL's MT Research facility in LA to be briefed on their Adaptive MT technology. The team in LA originates from the core Language Weaver group, but the complete team also includes members in Cambridge, UK, and Cluj, Romania. The research team that Samad leads has always been on the forefront of MT research, in spite of multiple rounds of upheaval and management changes around the team. In this post I have gathered product specific information from Samad and his team, and strategy and vision related information from public presentations, given by the new SDL senior management, to piece together a more complete view of what this new technology initiative might mean.


Senior Management Commitment & Support


Earlier this year, I had watched the "100 days" presentation given by Adolfo Hernandez and the "new SDL" management team in Fall 2016, and this piqued my interest in SDL MT technology again. It is always refreshing to see somebody look at an old challenge with new eyes. It was clear that the new management saw leading-edge MT technology being a core business driver, and seem to be serious about putting SDL's MT back on track, and correcting the MT marginalization that previous management had created. It was interesting to hear the new team respond to investment analysts who questioned this new emphasis on MT when the actual direct MT revenue was so shrunken. They repeatedly explained that they had "a different view of MT than the old regime".  However, as the following slides from that "interim results" presentation show, the new team sees a bigger and foundational role for MT and see it as an enabling technology for forward business momentum.
 


Current SDL MT Technology


The SDL MT team has a whole suite of NLP tools ready for action, in addition to their MT technology. These tools can be used to prepare and transform linguistic data for varying language-related project needs that may or may not be related to MT. Samad provided an overview of their existing MT technology which has both on-premise (SDL ETS) and cloud-based (SDL BeGlobal) solutions that are robust and field-tested by a largely enterprise and government user base. He demonstrated how easily a remote console could setup a single server or a 100 servers within minutes. This was easily the most straightforward setup and MT server installation process I have seen. Some installations can take days to set up, and require vendor technical experts to even attempt it.

The cloud architecture is also very flexible and built to be scalable and elastic to adapt to widely varying workloads e.g. when large knowledge bases are re-translated with a major engine update. The architecture is also containerized. (Containers are a solution to the problem of how to get software to run reliably when moved from one computing environment to another. This could be from a developer's laptop to a test environment, from a staging environment into production and perhaps from a physical machine in a data center to a virtual machine in a private or public cloud. I am told this is important and this matters.)
 



Adaptive Machine Translation

Ever since I first saw a translator working with Lilt, early in 2016, I have been a fan of Adaptive MT and felt this was an approach that made so much sense, since so many MT projects stumble at the customization phase. In Adaptive MT engines, generally, the engines are learning as you translate, not via a batch training process managed by an MT vendor, like in most other MT systems. I think it is fair to say it is a next generation approach to post-editing as well, as corrective edits teach the system and the editing task is not mindlessly repetitive. It is also a much better solution for those agency projects that are data-starved, or just needed a quick turnaround without a long drawn out customization process that can take months to complete and still fail.

I am confident that Adaptive MT is a better approach for an agency or freelancer than any do-it-yourself Moses solution, and I think it could become the MT technology that is the most pervasive in the professional translation world because it learns dynamically, is linguistically steered, and improved by ongoing human corrective feedback. Also, unlike NMT it is not a resource hog. There is always the possibility that a highly responsive neural MT technology could emerge and prevail, but I am doubtful that this is possible in the near term (no sooner than 2018 at the earliest).  The Adaptive MT user experience is more like having a virtual assistant who learns to be a better assistant as you work together. The SDL Adaptive MT will work with generic (baseline), vertical (industry specific like IT or Life Sciences) and unique domain-adapted custom engines. This ability to shape and control the starting point Adaptive MT baseline, is a powerful feature, as it theoretically accelerates the system learning curve, and users can reach higher productivity thresholds faster. If properly implemented and used, adaptive MT engines will continue to improve continuously with ongoing use.
 


The SDL Eco-System Advantage?

Adaptive MT is available today from Lilt and SDL, and soon from ModernMT, an EU-funded open source initiative with heavy involvement from Translated.net. After years of being dormant and only showing small incremental progress and many failures from Moses madness, the MT market is humming again with Neural MT and Adaptive MT led momentum. Interestingly, SDL was awarded patents in 2015 for online domain adaptation, and for personalized MT via online adaptation, so it is possible that they were thought leaders on the basic idea of Adaptive MT who just did not get around to implementing a working solution to be the first player in the market, possibly because of differing priorities in the senior management team at the time.

SDL's Adaptive MT is tightly integrated with the new Trados Studio 2017 that provides a  richer interface into the MT engine than Lilt. This includes upLIFT Fragment Recall and upLIFT Fuzzy Repair, plus all the legacy Trados functionality to leverage and support the translator user who can also employ a learning MT engine to support him/her.  (upLIFT is a powerful concordance search that unlocks productivity gains from no-matches and fuzzy-match segments in new ways.) Massimo Ghislandi gave me a demo showing how power Trados features seamlessly integrate with the Adaptive MT, including something called Autosuggest, that leverages available data to assist with sub-segment pre-translation options. In some ways, this is similar and equivalent to the interactive MT component of Lilt. SDL will have both an Enterprise and an Independent Freelancer approach to their Adaptive MT technology which at this moment is limited to 5 languages (more coming soon as Adaptive MT requires different supportive data/software infrastructure than standard SMT). You get an Adaptive MT engine for free with a new Trados license. At the moment you do need Trados to interact with the Adaptive MT engine but they are looking at other options including integration with third party interfaces.  

Massimo suggested that the feature rich Trados would be a much-preferred interface for the kinds of power translators who would most likely engage with Adaptive MT. The Lilt interface is elegant and light but it does require that a translator adapts to a new kind of translation interface and approach. A new approach makes it really hard to get widespread adoption, (as Lilt knows), as old translator dogs don't learn new tricks easily. Translation Memory is really the only CAT tool that has widely taken root with translators to date. To date, nobody has succeeded in selling MT in volume to freelancers, and I am not sure the initial approach SDL is taking is going to be successful either, even though the technology does indeed make sense for a power translator, or any translator who learns to build and use the Adaptive MT productivity boost. The project management infrastructure that surrounds Trados also allows it to be used for large-scale enterprise projects, but I believe there is a need for a cloud-based post-editor interface that can allow easy consolidation of distributed editing work and supervision and monitoring of groups working on multi-million-word projects. Yesterday's tools don't always work automatically for today's problems.

The Adaptive MT Benefit

Samad described some evaluation tests he ran to assess how much benefit the Adaptive MT provided over and above any leverage provided by Standard MT. His careful test process and conservative estimates to isolate the Adaptive MT benefit suggested that this number is from 5% to 25% for the very specific tests he described. The 5% to 25% is improvement on top of Standard MT without taking into account any productivity gain from other Trados features. While most in the MT industry tend to overstate benefits of the technology, here is a case where the benefit is understated, and careful not to over-promise, in contrast to the recent Google hyperbole. I am also going to bet that the SDL human evaluations done by in-house translators, are much more meaningful and accurate than the Google mechanical turk based human evaluations. Based on how conservative and strict his test procedure was, I would expect that the real benefit to a customer is much higher. The latest Trados version produces a table (shown below) that quantifies the benefit of TM and Adaptive MT but the calculation is kind of a mystery, and SDL will hopefully clearly explain how they come up with this number if they want it to be trusted and taken seriously.


SDL has the potential to be a dominant player in this segment of the market i.e. the enterprise and professional translation segment with their MT product suite and especially the Adaptive MT. Some reasons why I say this include:
  • SDL is today translating 20B words a month for customers with MT and 100M words a month via traditional TEP. (This means that 95% of what SDL translates on a monthly basis is done by computers). This suggests that they know how to do both, and they already have established that in-house MT use improves margins (2% in fact as they reported on 12/6/2016 though I suspect it is higher)
  • They have a deeper reach into the enterprise than any other MT vendor and could possibly identify NEW high-value MT opportunities more easily.
  • They have a market leading TM product for freelancers, with a large installed base that could be engaged to actively use Adaptive MT with the right inducements e.g. free.
  • They have the most complete translation production eco-system that potentially could be integrated into a formidable whole, even though they have been incapable of doing this historically. ( I imagine Lionbridge and TransPerfect will disagree, but I am quite certain the SDL MT systems are of a higher caliber and LIOX and Transperfect have other concerns to attend to at this time). This eco-system includes content management software, translation management systems, MT, NLP and translation productivity tools in addition to competence with managing human translation projects at scale. 
  • While SDL cannot claim that all their software components are best-of-breed, one would expect that SDL does have a deep understanding of how key components need to tie together to build real market-leading translation process automation that customers value. The new management team also seems much more open to creating "connectors" to allow content to flow between content creation to/from translation processes and on to dissemination. Robust support of data ingestion from other (non-SDL e.g. Adobe) software environments is a critical requirement for many global enterprises and one that can often be a deal breaker when they select core business systems. Hopefully, this openness to connectors will also extend to MemoQ, Memsource and across, as this level of openness would benefit and encourage many potential customers and leverage the SDL MT offerings in particular. "Openness" that requires that you only use SDL components is not really open, and as Microsoft learned, does not work. The translation industry is sorely in need of real openness so that enterprises can easily link translatable content to a variety of translation-related processes.
However, historically we have also seen that SDL does not have a good track record of being able to tie it all together into an elegant, properly integrated market-dominating solution.  Historically their software integration has been messy and problematic from what I could gather. The management team has changed and so the possibility of real synergy exists again, but it is actually too early to tell. Software acquisitions are messy for any company, as each software product often comes with a confining worldview and architectural barriers that make it hard or even impossible to build bridges and create a new more complex, combined and powerful structure. The new team has been in place for barely 9 months so again too early to tell but the story they are telling is more believable (to me at least).

It is said that the companies that are able to build a brand image that is "liked" tend to rise and win market share simply because they are authentic, transparent, reliable, cool and likable e.g. FaceBook, Apple, Harley Davidson, Google, Sony, Amazon, BMW, Tom's. Here is a ranking that perhaps explains why this matters and compare it to the Forbes list which ranks by value and has companies like IBM (the creators of the  FUD approach) in their top 10. (When was the last time you heard somebody say, "Boy, I sure loooves me some IBM." ?)   Historically, I have noticed that SDL has the opposite problem, they are often actively disliked and distrusted by customers, partners, competitors and others in the industry. This is a problem that the new management has to fix as well, as it also is, in my opinion, an important determinant of future success. To me, this means to communicate in an authentic voice, not corporate marketing-speak which has some information value but does not generally build enthusiastic customers and partners who also sell effectively on your behalf.

SDL at one time had a strong MT technology operation that briefly had the largest enterprise MT revenue base in the market (i.e. > SYSTRAN), which was, unfortunately, close to decimated by mismanagement and neglect by previous senior management. Some might even say that the MT industry has had greater degree of mismanagement and poor commercialization strategy challenges than technology related ones.

SDL looks well positioned with their Adaptive MT technology, and I think that both agency and freelance clients could benefit from aggressive use (not casual, occasional use) of this technology. Both Adaptive MT and NMT have the potential to help build enduring production efficiencies that will be difficult to match with older technology, and I expect that those LSPs and freelancers who learn to properly use these technologies will emerge victorious in some way. SDL is the ONLY LSP with the expertise and a bench deep enough, in my opinion, to handle developing NMT solutions (but I am sure many other LSPs will also waste time and resources, and produce crappy sub-optimal systems).  However, SDL also has a history of losing or lagging in games where they seemed to be clear favorites. The "new SDL" management is much easier to listen to, and hopefully they execute as well as they talk. I will continue to share information about the evolution of this technology as I am able to gain from my vantage point, and hopefully, SDL will remain as open to share as they have been thus far. People buy technology they understand, and thus far the MT vendor industry has not done a great job of explaining the what, why, where, and have focused mostly on overly technical descriptions of  the "how" of this technology.

Anyway, 2017 looks like it will be an exciting and memorable year for MT. Wishing you all great success in your endeavors.

-----------------------------





Samad Echihabi is Vice President of Research and Product Development at SDL. He heads the SDL Machine Translation Research and Development group and manages the end-to-end development life cycle of SDLs MT products with a focus on customer-driven technology and science advancement. Samad has extensive knowledge in MT and over 15 years of experience in Software Development at the intersection of Machine Learning and Data Science. Samad is a Fulbright scholar and a graduate of University of Southern California.

2 comments:

  1. Nice and informative blog! Your blog provides information about many new and innovative technology. Thanks for sharing the blog and providing a information about something new.

    ReplyDelete
  2. Interesting, informative post about the nuts and bolts of this new technology. I just can't wait until it comes out in my language combination (French>English)! Any idea when that might be?

    ReplyDelete