2017 was an especially big year for the Neural MT momentum on multiple fronts. We saw DeepL and Amazon introduce new generic NMT product offerings, each with a unique twist of their own. They are both impressive introductions but are limited to a small set of language pairs, and both these companies have made a big deal about superior MT quality on using some limited tests to document this and attract attention. For those who define better quality by BLEU scores, these new offerings do indeed have slightly higher scores than other generic MT solutions. But for those who wish to deploy an industrial scale MT solution, there are other things that matter more e.g. the extent of customization possibilities and the range of steering possibilities available to update and tune the base engine, the overall build platform capabilities, and the ability to secure and maintain data privacy are particularly important. The Asian market also saw several NMT initiatives build momentum, especially with Baidu and Naver.
All of the "private" MT vendors have also expanded their NMT offerings, focusing much more on the customization aspects and two vendors (Lilt & SDL) have also introduced Adaptive NMT for a few languages. These Adaptive NMT offerings may finally get many more translators using MT on a regular basis as more people realize that this is an improvement over existing TM technology. We should expect that these offerings will grow in sophistication and capability over the coming year.
While industry surveys suggest that open source Moses is the most widely used MT solution in professional settings, I still maintain that many, if not most of these systems will be sub-optimal as the model building and data preparation processes are much more complicated than most LSP practitioners expect or understand. NMT currently has four open source toolkit alternatives, and some private ones thus the complexity for do-it-yourself practitioners escalates. However, in some ways, NMT is simpler to build if you have the computing power, but it is much harder to steer as was described in this post. The alternatives available thus far include:
2018 looks like a year when MT deployment will continue to climb and build real momentum. Strategic Business advantage from MT only comes if you build very high-quality systems, remember that we now have great generic systems from DeepL, Google, Microsoft, and others. Translators and users can easily compare LSP systems with these public engines.
While the technology continues to advance relentlessly, the greatest successes still come from those who have the "best" data and understand how to implement optimal data preparation and review processes best for their deployments.
The most popular blog posts for the year (2017) are as follows:
10. Creative Destruction Engulfs the Translation Industry: Move Upmarket Now or Risk Becoming Obsolete :-- This is a guest post by Kevin Hendzel whose previous post on The Translation Market
was second only to the Post-editing Compensation post, in terms of
long-term popularity and wide readership on this blog. This new post is
reprinted with permission and is also available on Kevin's blog
with more photos. I am always interested to hear different perspectives
on issues that I look at regularly as I believe that is how learning
happens.
All of the "private" MT vendors have also expanded their NMT offerings, focusing much more on the customization aspects and two vendors (Lilt & SDL) have also introduced Adaptive NMT for a few languages. These Adaptive NMT offerings may finally get many more translators using MT on a regular basis as more people realize that this is an improvement over existing TM technology. We should expect that these offerings will grow in sophistication and capability over the coming year.
SDL Quality Measurements on Various Types of Customization on SMT & NMT |
While industry surveys suggest that open source Moses is the most widely used MT solution in professional settings, I still maintain that many, if not most of these systems will be sub-optimal as the model building and data preparation processes are much more complicated than most LSP practitioners expect or understand. NMT currently has four open source toolkit alternatives, and some private ones thus the complexity for do-it-yourself practitioners escalates. However, in some ways, NMT is simpler to build if you have the computing power, but it is much harder to steer as was described in this post. The alternatives available thus far include:
- OpenNMT - Systran & Harvard
- Tensorflow - Google
- Nematus - University of Edinburgh
- Fairseq - FaceBook
- and now Sockeye from Amazon which allows you to evaluate multiple options (for a price I am sure).
2018 looks like a year when MT deployment will continue to climb and build real momentum. Strategic Business advantage from MT only comes if you build very high-quality systems, remember that we now have great generic systems from DeepL, Google, Microsoft, and others. Translators and users can easily compare LSP systems with these public engines.
While the technology continues to advance relentlessly, the greatest successes still come from those who have the "best" data and understand how to implement optimal data preparation and review processes best for their deployments.
The most popular blog posts for the year (2017) are as follows:
1. A Closer Look at SDL's recent MT announcements : -- a detailed look at the SDL ETS system and their new NMT product announcements. I initially set out with questions that were very focused on NMT, but
the more I learned about SDL ETS, the more I felt that it was worth more
attention. The SDL team were very forthcoming, and shared interesting
material with me, thus allowing me to provide a clearer picture in this
post of the substance behind this release, (which I think Enterprise Buyers, in particular, should take note of), and which I have summarized in this post.
2. Post Editing - What does it REALLY mean? :-- This is a guest post by Mats Dannewitz Linder that carefully examines three
very specific PEMT scenarios that a translator might face and view quite
differently. There is an active discussion in the comments as well.
3. The Machine Translation Year in Review & Outlook for 2017 :-- A review of the previous year in MT (2016) and a prediction of the major trends in the coming year which turned out to be mostly true, except that Adaptive MT never really gained the momentum I had imagined it would with translators.
4. Data Security Risks with Generic and Free Machine Translation:-- An examination of the specific and tacit legal agreements made for data re-se when using "free" public MT and possibilities on how this can be avoided or determined to be a non-issue.
5. Private Equity, The Translation Industry & Lionbridge :-- A closer examination of the Private Equity rationale which seems to be grossly misunderstood by many in the industry. PE firms typically buy controlling shares of private or
public firms, often funded by debt, with the hope of later taking them
public or selling them to another company in order to turn a profit.
Private equity is generally considered to be buyout or growth equity
investments in mature companies.
6. The Problem with BLEU and Neural Machine Translation :-- The reasons for the sometimes excessive exuberance around NMT are
largely based on BLEU (not BLUE) score improvements on test systems
which are sometimes validated by human quality assessments. However it
has been understood by some that BLEU, which is still the most widely
used measure of quality improvement, can be misleading in its
indications when it is used to compare some kinds of MT systems. This post describes how BLEU can sometimes under-represent performance of NMT systems.
7. From Reasoning to Storytelling - The Future of the Translation Industry :-- This is a guest post by Luigi
Muzii, on the future of translation, written in what some may say is an
irreverent tone. For those participants who add very little value in any business
production chain, the future is always foreboding and threatening,
because there is often a sense that a reckoning is at hand. It is
easier to automate low-value work than it is to automate high-value
work. Technology is often seen as a demon, but for those who learn to
use it and leverage it, it is also a means to increase their own value
addition possibilities in a known process, and increase one's standing
in a professional setting. While translating literature may be an art,
most business translation work I think is business production chain
work. These are words translated to help you sell stuff, or help
customers better use stuff that has been sold to them, or now
increasingly it is a way to understand what customers who bought your
stuff think, about the user and customer experience.
8. Never Stop Bullshitting: Or Being Popular in the Translation Industry :-- This is a guest post by Luigi Muzii (and his unedited post title).
Luigi likes to knock down false idols and speak plainly, sometimes with
obscure (to most Americans anyway) Italian literary references. I would
characterize this post as an opinion on the lack of honest
self-assessment and self-review that pervades the industry (at all
levels) and thus slows evolution and progress. Thus, we see that
industry groups struggle to promote "the industry", but in fact, the
industry is still one that "gets no respect" or is "misunderstood" as we
hear at many industry forums. While change can be uncomfortable, real
evolution also results in a higher and better position for some in the
internationalization and globalization arena. Efficiency is always
valuable, even in the arts.
9. The Ongoing Neural Machine Translation Momentum :-- This is largely a guest post by Manuel Herranz of Pangeanic, slightly
abbreviated and edited from the original, to make it more informational
and less promotional. Last year we saw FaceBook announce that they were
going to shift all their MT infrastructure to a Neural MT foundation as
rapidly as possible, this was later followed by NMT announcements from
SYSTRAN, Google, and Microsoft.
Wishing You all a Happy, Healthy and
Prosperous New Year
No comments:
Post a Comment