This
is yet another post triggered by conversations in Rio at the ABRATES conference
in early June. As I mentioned in my initial conference post, the level of
interest in MT was unusually high and there seemed to be real and serious interest
in finding out ways to engage with MT beyond just the brute force and numbing corrective
work that is typical of most PEMT projects with LSPs.
MT has
been around in some form for decades, and I am aware that there have always
been a few translators who found some use for the technology. But since I was
asked by so many translators about the options available today I thought it
would be useful to write about it.
The
situation today for many translators is to work with low quality MT output
produced by LSP/enterprise MT practitioners with very limited MT engine
development experience, and typically they have no say in how the MT engines
evolve, since they are so far down the production line. Sometimes they may work
with expert developed MT systems where there is some limited feedback and
steering possible, but generally the PEMT experience involves:
1. Static MT engines that are developed
offline somewhere,
and that may have
periodic if any updates at all to improve the engine very marginally if at all.
2. Post-editors
work on batches of MT output and provide periodic feedback to MT developers.
This
is beginning to change in the very recent past with innovative new MT
technology that is described as Adaptive Interactive Dynamic Learning MT (quite
a mouthful). The most visible and elegant implementation of this approach is
from a startup called Lilt. This allows the translator-editor to tune and
adjust the engine dynamically in real time, and thus make all subsequent MT
predictions more intelligent, informed and accurate. This kind of an MT
implementation is something that has to be cloud based to allow the core engine
to be updated in real time. Additionally, when used in workgroups, this
technology can also leverage the individual efforts of translators by spreading
the benefit of a more intelligent MT engine with the whole team of translators.
Each user benefits from the previous edits and corrective actions of every
other translator-editor and user as this case study shows. This allows a team to build a kind of communal
edit synergy in real time and theoretically allows 1+1+1 to be 5 or 7 or even
higher. The user interface is also much more translator friendly and is
INTERACTIVE so it changes moment to moment as the editor makes changes. Thus
you have a real-time virtuous cycle which is in essence an intelligent learning
TM Engine that learns with every single corrective interaction.
CSA tells
us that the SDL Language Cloud also has similar abilities but
my initial perusal suggests it is definitely less real-time, and less dynamic
than Lilt i.e. it is not updating phrase tables in real time. There are several
little videos that explain it and superficially it looks like an equivalent but
I am going to bet it is not yet at this point in time anyway.
So for
a translator who wants to get hands on experience with MT and understand the
technology better, what are the options? The following chart provides a very
rough overview of the options available ranked by my estimation of the best
options to learn valuable new skills. The simplest MT option for an individual
translator has always been a desktop RbMT system like Systran or ProMT, and it
still is a viable option for many, especially with Romance languages. But there
has never been much that could be done to tune these older systems beyond
building dictionaries, a skill that in my opinion will have low value in the
future.
Good
expert developed MT systems will involve frequent interaction with translators
in the early engine development phases to ensure that engines get pattern based
corrective feedback to rapidly improve output quality. The more organized and
structured this feedback and improvement process, the better the engine and the
more interesting the work for the linguist.
Of the
“free” generic MT engines Microsoft offers much more customization capabilities and thus are a good
platform to learn how different sets of data can affect an MT engine and MT
output. This of course means the user needs to have organized data available
and an understanding of the technology learning process. MT can be trained if
you have some understanding of how it learns. This is why most Moses experiments fail I think, too much effort focused on
the low value mechanics, and too little on what and why you do what you do. I
remain skeptical about ignorant Moses experimentation because getting good SMT
engines requires good data + understanding of how your training corpus is
similar to or different from the new source that you want to translate, and a
variety of tools to help keep things synced and aligned when you see
differences. I am also skeptical that these DIY efforts are likely to get as
engines as good as the free generic engines, and I wonder why one would bother
with the whole Moses effort if you could get it at higher quality for free from
Microsoft or Google. There are some translators who claim some benefit from
working with Moses and its desktop implementations like Slate.
All these
options will provide some experience and insight into MT technology, but I
think it is useful to have some sense for how they might impact you from two
key perspectives shown below:
1.
What options are likely to give
you the fastest way to get to improved personal productivity?
·
I would expect that an Adaptive
MT solution is most likely to do this the fastest, but you need have good clean
training data – TM, Glossaries (the more the better). Also you should see your edit
experience improve rapidly as your corrective feedback modifies the engine in
real time.
·
Followed by the Microsoft
Translator Hub (if you do not have concern about privacy issues), SDL Language Cloud and some
Expert systems which are more proactive in engaging translators but this will
also involve an LSP middleman typically.
·
Generic Google & Microsoft
and Desktop RbMT (Romance languages and going to English tend to have better
results in general).
·
DIY Moses is the hardest way to
get to productivity IMO but there is some evidence of success with Slate.
2.
What options are most likely to
help develop new skills that could have long-term value?
·
My bet is that the SMT options
are all going to help skills related to corpus analysis, working with n-grams,
large scale corpus editing and data normalization. Even Neural MT will train on
existing data so all those skills remain valuable.
·
Source analysis before you do
anything is always wise and yields better results as your strategy can be
micro-tuned for the specific scenario.
·
Both SMT and the future Neural
MT models are based on something called machine learning. It is useful to have at
least a basic understanding of this as this is how the computer “learns”. It is
growing in importance and worth long-term attention.
There are tools like Matecat that show promise, but given that
edits don’t directly change the underlying MT engine, I would opt for a true
Adaptive MT option like Lilt instead.
The
traditional understanding of PEMT can be summarized in the graphic below and
this is the most common kind of interaction that most translators have with MT
if they have any at all.
However
the problem definition and the skills needed to solve them are quite different
when you consider an MT engine from a larger overall process perspective. It
generally makes sense to address corpus level issues before going to the
segment level so that many error patterns can be eliminated and resolved at a high
frequency pattern level. It may also be useful to use web crawlers to gather
patterns to guide the language model in SMT and get more fluent target language
translations.
The most interesting MT problems which almost
always happen outside the “language services industry”, require a view of
the whole user experience with translated content from beginning to end. This paper describes
this holistic user experience view and the resultant corpus analysis
perspective for an Ebay use case. Solving problems often involves data
acquisition of the right kind of new data, normalization of disparate data, and
focus on handling high frequency word patterns in a corpus to drive MT engine
output quality improvements. This type of deeper analysis may happen at an MT savvy LSP
like SDL, but otherwise is almost never done by LSP MT practitioners in the
core CSA defined translation industry. This kind of deep analysis is also often limited at MT vendors
because customers are in a hurry, and not willing to invest the time and money to do it right. Only the most committed will venture into this kind of
detail, but this kind of work is necessary to get an MT system to work at
optimal levels. Enterprises like Facebook, Microsoft and EBay understand the
importance of doing all the pre and post data and system analysis and thus
develop systems that are much more closely tuned to their very very specific
needs.
MT
use makes sense for a translator only if there is a productivity benefit.
Sometimes this is possible right out of the gate with generic systems, but most
often it takes some effort and skill to get an MT system to this point. It is
important that translators have a basic understanding of three key elements before MT
makes sense:
1. Their
individual translation work throughput without MT in words per hour.
2. The quality of
the MT system output and the ability of the translator to improve this with
corrective feedback.
3. The individual
work throughput with MT after some effort has been made to tune it for specific
use.
Obviously 3 has to be greater than 1 for MT use to make sense. I
have heard that many translators use MT as way to speed up the typing or to
look up individual words. I think we are at a point where the number of MT
options will increase and more translators will find value. I would love to
hear real feedback from anybody who reads this blog as actual shared experience
is still the best way to understand the possibilities.
Kirti, thank you for mentioning Slate Desktop. Like you said, there is some evidence of success. In that success (and the success of other use cases below), there's growing evidence that your pyramid's tiers and descriptive categories are showing some age, especially relative to the individual translator. Let's review some use cases.
ReplyDeleteMemsource recently published this table showing 5-20 percent zero-edit distance zero (ED0) for "generic" engines (e.g. GT, Bing). http://blog.memsource.com/machine-vs-human-translation/. Customers have turned to Slate Desktop after using Memsource for gist-only translations what they accepted without edit but were nowhere close to final publication quality. This practice drives the ED0 percent artificially higher, but I can't speculate to what extent.
Circa 2014, Asia Online's Mr Dion Wiggins (acknowledging your now-former affiliation) stated in a Linkedin Machine Translation group discussion, "But don't take my word for it. I'm happy to provide real world reference customers where as much as 86% of raw MT output requires no editing at all..." Again, this is a reference to ED0 percent. I'm sorry I lost the link.
With only these two use cases as data points, it would be easy to conclude that (a) generic engines christen the low end at 5-20% because no experts customized the engine, and (b) expertly customized engines set the high end at as much as 86%. However, we can't stop there or we'd be oversimplifying what has become a complex and interesting world.
Let's look at a use case that falls outside the pyramid. Silvio Picinini is one of several MT specialists at eBay who maintain and improve eBay's highly customized Moses system. In a Linkedin Pulse comment, he shared an anecdotal ED0=5%. https://www.linkedin.com/pulse/ebay-mt-language-specialists-series-edit-distance-silvio-picinini/. This highly-qualified expert team doesn't ensure a high percent of exact matches. They work to ensure the system serves its purpose.
For a decade, MT systems have been built with specific design goals as one shared resource to serve huge user groups (final publication, gist, user-generated-content, etc). Individual engines serving individual translators (i.e. the subject of this blog post) mark the introduction of a new use case that's only months old.
Three new entrants in this use case have been specifically designed and built from the ground up to serve individual translators: Lilt, Slate Desktop and SDL's Language Cloud Custom MT Engine. There are pro's and con's for each, but all are alike in that none have SMT experts actively customizing each individual translator's engine. In this regard, none qualify for your "Expert systems."
ReplyDeleteIn another diversion from the pyramid's tiers, all three are simple for individual translators to setup and use. It takes less than 1/2 hour for Slate customers download, install the app, import their TMs and start generating their first engine. Language Cloud users bypass the download/install step, but they still upload their TMs and wait for the engine to finish. Lilt takes a very different approach. With regards to complexity and end-user understanding, none qualify for your DIY option.
Quality results are also neck-n-neck. One translator's blog, https://signsandsymptomsoftranslation.com/2016/05/31/slate_languagecloud/, reported comparable experiences between the two. She perceived slightly better for quality SDL's Language Cloud and Slate Desktop was ahead for confidentiality. Emma didn't share her percentages.
So, I'll share that customers who have reported (unlike Memsource reporting is voluntary) experience ED0% in the 30% to 50% range for their first engine. Some are investing time to hone their expertise and are improving their results. Like Mr Wiggin's offer in 2014, I'll make personal introductions between anyone who wishes to verify these numbers. To demonstrate our confidence in these numbers, we have confidence to initiate (soon) a "more than your money back" promotion if the customer doesn't experience a minimum threshold ED0 percent on their first engine.
I can't propose good alternatives for your options, but I think there's an argument for looking for new ones.
Tom
DeleteCongratulations on the success of some of your customers. It is encouraging, and shows that the technology has fundamental merit and possibility.
I am not responsible for Dion's hyperbole and I will leave it at that. For some reason the history of MT is populated with people who make claims that many consider outrageous and exaggerated and may not always be warranted. As someone said once: "The history of MT is filled with empty promises." Most real experts will admit that MT is THE most difficult problem in NLP or AI. I once met someone at DARPA who said to me, in reference to the Star Trek technology, that all the other technology shown on that series was easier to make happen in reality than the Universal Translator.
I think there is a significant amount of conflation in your comments so I will attempt to clarify.
Experts are those people who have a deep understanding of the tools they are using and generally have considerable experience with failure as well as success with these tools beyond just having educational credentials. Expertise can take hundreds if not thousands of hours to acquire. Just to be clear let me state this more explicitly.
Google is a generic system developed by experts.
Bing Translate is a generic system developed by experts that allows users to customize to some extent.
Lilt is an Adaptive MT system built by experts to allow translators to easily tune an SMT engine in real time.
SDL provides base systems built by experts (from Language Weaver)designed to allow some further customization.
In every case I would expect that the baseline engines produced by these experts will outperform a Moses attempt (by LSP or Translator) in probably 90% or more of the cases. TAUS has documented this repeatedly. Several enterprise TAUS members tried to use Moses to build their own engines and realized that they could do much better by simply using the MSFT Hub customization capabilities and get better quality at a fraction of the cost and effort. Even the domain customization experts often find that their systems are barely better than expert generic systems.
Silvio is not an "MT specialist". He is actually an MTLS = MT Language Specialist. He addresses linguistic problems that undermine MT engines and he is working with a team of expert MT developers who actually handle the SMT engine development part. Also he is working with the most challenging type of content possible (UGC) so 5% is actually pretty good there. Their effort is about as far as you can get from your typical Moses experience.
"It takes less than 1/2 hour for Slate customers download, install the app, import their TMs and start generating their first engine." is in my experience unlikely to outperform a system built by experts. However, I maybe wrong, and I offer you the chance to provide contrary evidence in a guest post on this blog.
If your users can indeed produce better systems than the experts listed above in 1/2 hour, I think you may have a truly valuable asset on your hands.
Thank you for your comments.
I didn't mean to infer you were responsible for any of Dion's comments. I mere used he comment as as a possible high end of a range.
ReplyDeleteSorry about the "significant amount of conflation" in my comments. I think technologies mature faster than business models. SMT as a viable commercial technology is over 10 years old, but it's easy to get stuck thinking about it in its original context.
Use cases are expanding. It takes thought to sort through them. So, thanks for supporting most points and your less conflated summary.
Very interesting. Awaiting more from Mr. Kirti Vashee & Mr. Thomas Hoar.
ReplyDelete