eMpTy Pages: Dispelling MT Misconceptions

Tuesday, February 26, 2013

Dispelling MT Misconceptions

MT in 2013 is still a complex affair requiring many skills, expertise and understanding that are not commonplace, to enable successful deployment as a productivity enhancing technology for business translation needs. While it has become much easier to build basic custom engines using a variety of Instant Moses solutions or by creating a dictionary for a RbMT, there are still very few who know how to coax MT system output to consistent productivity enhancing levels. Getting some kind of a basic engine up and running is NOT the same thing as having a production-ready post-editor friendly system. There are even fewer who know what to do if the first MT attempt does not work, or is lackluster. Most of these basic/instant MT systems are inferior to basically free online MT from Microsoft and Google. Building long-term productivity and strategic production advantage require much more skill, expertise and experimentation than most LSPs or users have access to, or care to invest in. While it is sometimes possible for a user to get usable MT output after throwing some data into an instant MT/Moses engine, it is not common, even for “easy” languages like Spanish as several TAUS case studies reveal.

It is my sense that MT is still complex enough that meaningful expertise can only be built around one methodology i.e. RbMT or SMT and that anybody who tells you that they can do both should be viewed with some skepticism. It is almost certain that they cannot do both well, and also quite likely they cannot do either well if they claim expertise in both, since very different kinds of skills are required. Specialization and long-term experience is necessary to build real competence with either approach.

We have reached a point today, where many more MT systems are successful, but we also have many mediocre systems that do not provide any long-term production/productivity leverage and can easily be duplicated by any competitor with minimal investment. Today it is quite easy to find many (usually bad) examples of free/instant MT but the best custom systems are still not widely known or commonplace. Good MT system development takes work and ongoing investment and require overall process modifications, communication and expectation management, not only technology investments.

Recently we have seen some articles in the blogosphere and even the mainstream professional translation press that continues to provide what I believe is a lop-sided and even a somewhat disingenuous view of the verifiable use and known best practices of various MT technologies. (This link gets you to full article). In this particular case it is somewhat clear that the author has a preference and a bias favoring an RbMT approach where value-add is generally limited to building dictionaries.

The misinformation is typically around the following concepts:

Rules-Based vs. Statistical MT Comparisons
The scope and extent of possibilities with instant MT customization
The degree of expertise and experience required to develop skills in any of these approaches

Firstly let me state my own biases:

I think the Rules-based MT vs. Statistical MT arguments are largely irrelevant, even though I think it is increasingly evident that SMT is becoming the preferred approach, especially as more linguistics are added to the data-driven approach. To a great extent most systems out there except for raw Moses systems are all hybrids of some sort. Recently MT technology has evolved to a point where SMT and RBMT concepts are being merged into a single ”hybrid” approach. While there is some overlap in these approaches, there are two primary hybrid models in use today.

a) RbMT with SMT smoothing tacked on after the RBMT translation is completed, such as with Systran to help improve the fluency and quality of the often clumsy raw RbMT output and,

b) Linguistically informed rules that modify source text before SMT processes it and that guides the SMT processes and additional rules after SMT processing takes place to perform normalization and adjustments to translation output where required. Or the newer syntax and morpho-syntactic SMT approaches which have shown limited success and are still emerging.

Finally, what really matters is how much productivity does an MT system offer, and the RbMT vs. SMT issue is largely irrelevant. The objective is to get translation work done faster and more cost effectively.

In the right hands, both approaches (RbMT or SMT) can work for projects where MT is suitable. However, there are many more user controls and much simpler options available to tune MT systems in the SMT world.
In general I would say that it makes sense to specialize in one MT (SMT or RbMT) approach and go deep to understand what you can control and how it works rather than do shallow and instant approaches. It takes work and extensive experimentation to develop real expertise in either approach and there is nobody I know in the industry who can do both well. So choose RbMT or SMT and figure out what it takes to make it REALLY work rather than do the kind of shallow tests that Lexcelera does and draws definitive conclusions on these results as described in the article. Many of the conclusions drawn in the article are more a reflection of the quality of their effort than the actual possibilities of the technology in more skillful hands.

Some of the specific claims made and disinformation in the Multilingual article referenced above that I would challenge and dispute are as follows:

“In our experience, languages such as Japanese and German perform best with an RBMT approach” – This was actually true in the early SMT days (~2005-2007) but is simply not an accurate truism anymore. I have seen custom SMT (if done right) outperform customized RbMT systems in both these languages even when large amounts of data are not available.

“if you do not have enough data — we're talking millions of segments of in-domain bilingual and monolingual segments — you may not have enough corpora to train an SMT engine” This seems to me to be a statement often made by people who have little or very shallow experience with SMT. In the large majority of SMT systems I have been involved with this amount of training data volume was simply not available. However, it is possible to get productivity enhancing SMT engines with even just 50,000 segments if you know what you are doing. This is possible even for languages like Japanese and Russian as Scott Bass of Advanced Language Translation points out in this webinar, where this was done with a fraction of the data mentioned in this misleading statement. A large majority of Moses MT engines, especially those of the instant kind, produce MT systems that are inferior to the free MT provided by Google and Microsoft. This is more likely to be related to a lack of understanding about the technology rather than any fundamental deficiency in the basic technology or the data as the Multilingual article suggests. If data privacy or copyright is not an issue, most LSPs would probably be better of using the Microsoft Hub option over using some generic instant MT option or some LSP managed Moses effort.

“If the terminology is fixed in a narrow domain such as automotive or software documentation, RBMT or a hybrid is generally the best choice. This is because the rules component protects terminology better” While this may be true for systems developed by naïve Moses users, many SMT experts like Asia Online have figured out that terminology really matters and know how to use it. Most of the corporate SMT systems out there focus exactly on automotive and IT product user documentation of various kinds, in addition to unstructured content. It is in fact possible to build a single Automotive engine (at Asia Online) and then tune it for different clients (Toyota, Honda etc..) and have the preferred terminology dominate IF you know what you are doing. See the diagram below for example.

“Wild West content where the terminology runs all over the map and would be impossible to train for, such as patents, works better with SMT. ” This again suggests the authors lack of experience with patent domain and basic unfamiliarity with SMT technology. The largest terminology effort I have seen was with a patent engine where tens of thousands of scientific and technical terms were carefully translated to ensure accurate and useful translation of patent material. SMT benefits greatly from good, consistent terminology work and we have several customers (e.g. Sajan) who have gone on record to say that terminology consistency was one of the major benefits of an Asia Online engine. In fact the strategy deployed by Asia Online in data scarce situations usually begins with a tightly focused terminological foundation.

“However, if there are metadata tags, you should be aware that SMT doesn't preserve tags well, so RBMT or hybrid technology will save you some headaches.” While this may be true for many Moses efforts made by technically naïve and unskilled users, any SMT developer worth his/her salt knows how to easily resolve this problem. Asia Online handles all the formatting tags in XLIFF and TMX automatically and also provides a variety of tools that allow power users to do sophisticated handling of different kinds of formatting.

“Today's SMT systems are still hampered by a lack of predictability, which means that translators waste a lot of time verifying terminology that already ought to be automatically verified.” Asia Online ran an experiment a few years ago using TDA data from multiple sources. It was discovered that combining data or using noisy data of any kind produces much lower quality MT systems.Understanding how to get the data clean and building a quality foundation makes on-going maintenance and update of the engine much easier and largely eliminates this unpredictability. We also discovered that consistent terminology in the TM ensures much higher quality results and thus at Asia Online we now have tools to ensure this. Again, if you know what you are doing this is a manageable issue and after you have built a few thousand engines you realize that unpredictability can be managed by data cleaning and ensuring terminological consistency. Kevin Nelson, Managing Director of Omnilingua, stated in a webinar that the terminology and writing style produced by his Asia Online MT system was even more consistent than a human only approach. This was specifically noticed by his end-client who contacted Omnilingua directly without prompting to discuss how they had accomplished recent improvements in quality

“When post-editing SMT, that next training cycle may be six months or a year away because you usually want a fair bit of new data accumulated before you begin the process of retraining. In this case, the post-editors are not empowered to make lasting changes and it typically takes until the next training cycle to see any progress at all”. This may actually be true for many Moses systems and for most naïve users of instant MT solutions. But for the higher value-add systems like the ones produced by Asia Online this is not true. There are two ways that SMT based systems can incorporate corrective feedback:

Real-time corrections that are used on each job and can easily be done by translators every single time they run a translation. Since there is no additional cost for retranslating the same content at Asia Online, users are encouraged to resubmit the translation until it is in better shape to hand over to a post-editor. Many dumb and high-frequency error patterns can be corrected instantly by some simple analysis and corrections based on small test translation runs.
Periodic retraining which is done when sufficient corrective feedback is available. Incremental Trainings with Asia Online can be performed in just a few days and can be performed with just a few thousand segments to show meaningful improvements especially with terminology and high-frequency phrases.

Perhaps the biggest misconception of all is that More Data is Always Better. We now have much more evidence that this is frequently not true. Even Google, the high priest of big data, admitted this some time ago: "We are now at this limit where there isn't that much more data in the world that we can use"

So be careful to not believe everything you read (including on this blog) and if you take more than a glancing look at MT technology today you will probably understand that while it is becoming much simpler to play and experiment with MT, it is still a long way from being easy to produce production-quality systems that provide long-term business leverage. Do not underestimate the expertise requirements to be successful with MT, and realize that even after jumping in with Asia Online or others it will take ongoing changes in process and human factor management to really achieve long-term cost advantages and build sustainable business leverage. The reward for those who figure this out will be clear differentiation and long-term production cost advantages that others with instant MT or home-brewed Moses systems will never be able to match.

MT is messy and not quite as predictable as most want it to be yet. You have to have a stomach for uncertainty and are probably better off with "real experts" than people who say they can do it all and are "technology agnostic". And the next time you see an article that says they have all the answers for you and that for a nominal service charge you could reach nirvana tonight just tell them: "Don't you jive me with that cosmic debris!".

Watch this video and feel your face melt at 4:50 when the guitar solo happens.

22 comments:

Tom HoarMarch 1, 2013 at 10:39 AM
I think the opening sentence of Kirti's blog says it all and encapsulates a very accurate picture: "MT in 2013 is still a complex affair requiring many skills, expertise and understanding that are not commonplace, to enable successful deployment as a productivity enhancing technology for business translation needs."

Two important questions come to mind:

Is it desirable to make MT a less complex affair?

Is it desirable to make the associated skills, expertise and understanding more commonplace?

If the answers to these two are yes, then what are we willing to do to forward the action?

By Tom Hoar
ReplyDelete
Replies
Kirti VasheeMarch 1, 2013 at 10:49 AM
I think there is a lot happening in terms making the mechanics of building an SMT engine easier but addressing the other is much more difficult.

At this point in the technology's evolution I do not think one can avoid the learning process which is to build many engines across many language combinations with many different kinds of data sets to start to get an understanding of HOW it works and what can be done on a systematic basis to ensure optimal results.
ReplyDelete
Replies
Tom HoarMarch 4, 2013 at 10:43 AM
Of course, it requires more than an overview and humor to achieve results that contribute to the user's bottom line. I agree that one can't "avoid the learning process which is to build many engines" and "with many different kinds of data sets." The number of language combinations, however, depends solely on what languages they choose to support.

From our experience, our customers focus on 3 or 4 languages in various combinations where they have internal expertise. They evolve their own definitions of "results that contribute to the user's bottom line" to meet their needs. They acknowledge that their results do not typically reach the maximum optimization. Their costs stay within their budgets. Their increased productivities yield a savings that still drives profits up.

Since they are still new in the field, they reinvest their savings into new user education and training. With each iteration of building models, they acquire the internal expertise by building many SMT models in the few language combinations that are important to them.

This investment in human capital shows up as an ROI multiplier in their out years. Localization engineers at our earliest customers (2-3 years) now build new SMT models with only a few minutes of their attention, before spawning the training/tuning process that typically runs overnight or a couple of days. Why? Like you said, they have worked with the system, they've grown to understand the impacts of different data and they have organized their to suit their needs.

By Tom Hoar
ReplyDelete
Replies
Kirti VasheeMarch 4, 2013 at 10:45 AM
Tom

I maintain that it is useful and even necessary to understand HOW it works and what might cause problems before you build your first engines for any long-term and sustainable value to accrue even for 1 language.

As many of the TAUS case studies show - it is difficult to produce MT engines that even reach the output quality levels of Google and MSFT -- remember that any translator can get this level of output by themselves, and will have little incentive to use a low quality LSP engine if the PE work is in fact harder.

I think that many underestimate the skills needed to steer these MT engines into something that delivers measurable productivity. The deep MT knowledge skills are for the most part not usually available at most LSPs -- mostly because it would be hard to cost-justify
ReplyDelete
Replies
LoriMarch 4, 2013 at 6:07 PM
Oh my goodness, I was just reading your post Kirti and I see you are quoting me. Is that from Multilingual Computing? Am I allowed to comment?

We're not a technology vendor, so we have no axe to grind. We came from a rules-based environment, absolutely true. But for years we have been working in SMT as well, and in fact are part of a group working on SMT daily under a large EU grant.

I guess the most important thing to say is that before any large project we test extensively, comparing performance among the three approaches, RBMT, SMT and Hybrid. (What RBMT or SMT engine we choose to train and to test with is based on various factors: for SMT it's not always Moses - if RTT, for example, we may test with Microsoft Translator Hub.)

I cannot speak for every SMT engine under the sun, but in our experience Japanese does not react well at all to the statistical approach. When testing we even find we have to throw out what the Hybrid does with that language as well. Our tests always point to higher quality in Japanese with a straight rules approach. And our post-editors confirm this.

ReplyDelete
Replies
Tom HoarMarch 5, 2013 at 10:45 AM
I agree (S)MT is complex. It takes time to learn that (a) it does work and (b) how it works.

Where else would you have a beginner start, other than with a first engine with one language pair? Edison didn't invent the lightbulb on his first try. Problem solving is part of the learning process. Without problems, there's no learn. The SMT experts refer to building SMT models as "experiments" for a reason. So, I go back to my original questions that remain unanswered.

If it's not good enough to have 2+ years of learning across hundreds of users culminate in the users' declarations that they are satisfed with and using their own results, then what will it take to move the action forward? If the users' satisfactions aren't good enough, is it desireable to make (S)MT less complex in the first place?
By Tom Hoar
ReplyDelete
Replies
Tom HoarMarch 5, 2013 at 10:47 AM
Re: "it is difficult to produce MT engines that even reach the output quality levels of Google and MSFT." Let's not forget that SMT has many variables. Google and MSFT often achieve excellent qualities in some language pairs for some subject domains. So, it's not suprising that those quality levels are difficult to reach.

* I know as of mid-2012, Agoda.com in Bangok was paying for translations via Google's API because they found Google's results in their travel and tourism domains were quite adequate for their post-editing teams, the price was right, and confidentiality was't a requirement.

* Likewise, a few years ago a travel/tourism project with one LSP found their customized engines struggled to reach, much less exceed, Google's quality results which were the defined benchmarks for the project. For this project, the LSP had professional help.

* As its first project, one of our customer converted parallel PDF travel descriptions to text, aligned them and trained an engine. The training data from the PDFs consisted of artistic, flowery, creative writing styles. This was on their own and without asking our advise. The resulting SMT translations were horrible. They learned from the experience, bounced back and today their results are astounding.

So, not exceeding Google and MSFT is not necessarilly a bad thing. On the other side, getting it right the first time could be "beginner's luck" and it's not necessarilly a good thing. If you're in this for the long haul, it pays off to learn, bounce back, and create lasting assets that compound returns.
By Tom Hoar
ReplyDelete
Replies
Kirti VasheeMarch 7, 2013 at 10:25 AM
Tom,

My original blog post focused on “Dispelling MT Misconceptions” and the basic thrust of the post challenged and questioned several assertions made in this article in Multilingual: http://lexworks.com/cms/wp-content/uploads/2013/03/PostEdShrtgMT-Thicke.pdf

I felt that they were especially misleading and incorrect with regard to what is possible with SMT, and thus I provided contrary evidence to some specific points made in the Multilingual article. I suspect that many of the conclusions were reached because the author had limited and only shallow experience with SMT that many DIYers typically have. The issues she highlights have been solved by others who have deeper expertise and a broader skill base with SMT related technology. The author is an LSP who claims to have a few “years” of experience with SMT (Moses and MSFT Translator Hub) and her conclusions about SMT were based on this experience. As she says in a comment to the blog post: “But for years we have been working in SMT as well, and in fact are part of a group working on SMT daily under a large EU grant.” I think the comments made in the article suggest that many LSPs will get stuck at basic hurdles that can be easily managed by specialists who can afford to invest more into the process and invest more into building critical skills to solve the highlighted problems.

My point is that few LSPs have the skills/knowledge upfront or the urge to invest in developing the expertise necessary to resolve many of the problems and technical challenges involved in taking Moses or other technologies up to viable professional-use quality levels. Several TAUS Moses case studies show this as well, as many have pointed out that they spent much more money and time than anticipated in developing engines on their own, and had yet to reach quality levels that matched free MT.

I think it does matter that LSP MT engines produce better output and be better than free MT. Every freelancer/post-editor ALWAYS has the option to either use free MT or use poorer quality MT output from an LSP. If a post-editor determines that free MT is going to be easier to work with and edit, then I expect many will use it and prefer it over low quality output from a DIY Moses effort. So why bother building engines that editors know produce inferior output to what they can do for free elsewhere?

I agree with you, that for many, using free MT will be a better choice than undertaking a low-quality DIY effort where one may or may not reach competitive quality levels in 2+ years.

Developing good MT systems in 2013 is to some extent like building houses, one can try and do it oneself, and learn along the way and hopefully one may learn in 2+ years (or not as the Multilingual article shows) or you could hire contractors who have built hundreds of houses and focus on the finishing and developing skills that are closer to core competence. While some LSPs may indeed learn the various skills needed to build their own systems, I think most will find that it is often more expensive to do this and results in low quality systems when all the learning and time opportunity costs are factored in.

I am biased as you know, as I work for a company that provides expert services (contractors) and is required to prove that systems we develop are better than anything the customer could do with free MT or often with DIY solutions too.

Finally I think RbMT or SMT based approaches can both work but both require long-term investment to really get to any kind of distinctive expertise and I remain skeptical of those who think that they can do it all-- especially at a level of competence that adds business value.
ReplyDelete
Replies
Tom HoarMarch 14, 2013 at 6:38 PM
Tom Hoar • I think we've reached a poing of circular logic. We agreed that one can't "avoid the learning process which is to build many engines... with many different kinds of data sets." Isn't that the inherent answer to the question, "So why bother building engines that editors know produce inferior output to what they can do for free elsewhere?"

Learning happens when users build, fail, analyze, adjust and try again. "Expertise" is the skill of an expert gained through experience. We "bother building engines" because we can not know what produce inferior or superior output until after the system is built. The experience in this corrective feedback cycle develops expertise.

Outsourcing is a valid option for those who do not wish to develop their own expertise. SYSTRAN and AO hire Dr. Koehn for his expertise. In so doing, both have extended their own areas of expertise. Lexworks has worked for years develping their own expertise. The list of experts grows longer each day since the TAUS reports you cited. When were those studies done? 4 or 5 years ago?
ReplyDelete
Replies
Kirti VasheeMarch 14, 2013 at 6:43 PM
Tom,

Not sure that we have reached a circular logic. Perhaps we continue to maintain a predictable and steadfast perspective that is a product of our respective business missions.

Though, I think that you do seem to be missing the implications of the conclusions drawn by Lori (and thus Lexcelera) in her referenced article. IMO She is drawing many erroneous conclusions about what SMT can do based on her “years” of experience which sort of invalidates your point that people will somehow learn by playing with Moses in “2+” years. The Multilingual article is a very clear example of how you might NOT learn – also to clarify, the reports I cite from TAUS are all Moses case studies presented in 2012.

Additionally:
- Pretty much every SMT deficiency pointed out by Lori in her article, based on her Moses experience, are easily addressed by Asia Online and others who have deeper knowledge and expertise,
- Most LSPs don’t want or need to learn the intricate details of SMT. Just like a house, you could build it yourself, but most hire an architect, builder, plumber, electrician etc. because even handymen realize some things are better left to people who do it all the time.
- Most LSPs don’t want to wait 2+ years (as you described) to get returns, they need them now if MT is going to be worth the investment at all. Hiring an expert means that the engine is available sooner and the risk is lowered substantially.
- The corpora that has been developed over many years by Asia Online or other experts cannot be matched by an LSP or even research institutions. So the starting point is way lower.
- The cloud being dangerous and a security risk is just not true in 2013

However, I do admit that there will be a few who might learn through experience e.g. Autodesk
ReplyDelete
Replies
Lori ThickeMarch 16, 2013 at 6:32 PM
Kirti,

Controversial statements are not a bad thing because it gives us all a chance to share our experience. In that spirit I welcome the chance to respond to some of your statements with which I do not agree.

Your comment:

“It is my sense that MT is still complex enough that meaningful expertise can only be built around one methodology i.e. RbMT or SMT and that anybody who tells you that they can do both should be viewed with some skepticism. It is almost certain that they cannot do both well, and also quite likely they cannot do either well if they claim expertise in both, since very different kinds of skills are required.”

Our answer (thanks to Laurence Roguet from our Paris office)

The only thing true here is that the skills are very different – but that does not prevent an LSP from integrating both profiles and skills internally. In fact, this is necessary to achieve even better results via a best of breed approach. LSPs are not attached to one tool (or at least they shouldn't be): they are attached to their customers, and giving them the best results.

Your comment:

“I think the Rules-based MT vs. Statistical MT arguments are largely irrelevant, even though I think it is increasingly evident that SMT is becoming the preferred approach, especially as more linguistics are added to the data-driven approach. To a great extent most systems out there except for raw Moses systems are all hybrids of some sort. Recently MT technology has evolved to a point where SMT and RBMT concepts are being merged into a single ”hybrid” approach.”

Our answer:

It’s false to say that both technologies are now achieving the same results through hybridization. You need only to assess the results - including a sentiment analysis of final users and post-editors - to see that from one language pair to the other and from one content type to another, the results are very different, depending on the engine.

Furthermore, the hybrid approach is not always the best one. For example, when we work in Japanese we get the best results from a pure rules-based approach. But we’re not relying on hearsay: before starting any major new project we train three engines for testing – RMBT, SMT and Hybrid – and use only the engine that delivers the best results. (Quality MT output is something we feel we owe our post-editors.)

Your comment:

“Have you considered the possibility that others with much deeper expertise than you (Lexcelera)have, could get very different i.e muc better results with Japanese or really any project where you may run something through a basic SMT setup and conclude that it will not work for you?”

Our answer:

(Laurence clearly felt passionate her response because she wrote it in French.)

Ceux qui croient en la MT qu’elle que soit la technologie – investissent en elle. Il ne s’agit pas de s’amuser à faire quelque one shots qui ne peuvent qu’aboutir à la perte d’un client – ce qu’aucune LSP ne peut se permettre. Lexcelera a investi depuis 2007 pour mieux répondre à ses clients. Ce sont principalement des investissements humains, l’acquisition de compétences et de connaissances ne pouvant se passer des compétences spécifiques des différents acteurs.

Au fur et à mesure de sa montée en expertise, Lexcelera a intégré les équipes nécessaires à la bonne et intelligente intégration de nouveaux outils et processus qui leur sont liés. Les heures de R&D ne sont effectivement pas négligeables pour déterminer « quel processus pour quel outil pour quelle langue répond au mieux au défi que me confie mon client ». C’est à partir de ces travaux, et des multiples déploiements en production déjà effectués, que certaines conclusions ont pu être émises. Que ces assertions ne servent et ne plaisent pas à tout le monde, n’est au fond que le problème de ceux à qui cela a déplu !
PS : et la confrontation des différents profils et compétences intégrés en notre sein, nous rend même encore plus performant et certainement plus « brainstormed » and «challenged » than any company attached to one single technology…
ReplyDelete
Replies
Lori ThickeMarch 16, 2013 at 6:38 PM
PS Kirti, I haven't read all the subsequent comments but I will when I get a chance. I just see that you have put Lexcelera's years of experience as "years" of experience. I don't understand why you would do that.

You have personally met with a longtime MT customer, Bentley Systems, who we started working with in 2007.

I guess this is as good a time as any to announce that that customer has just joined us as our new CEO. John Papaioannou was Director of Release Services at Bentley Systems, for whom we have done many millions of words in MT in French, English, Spanish, German, Dutch, Italian and Japanese. I guess he liked the results because he is now Lexcelera's new CEO.

Could we stick to the facts, folks?
ReplyDelete
Replies
Kirti VasheeMarch 25, 2013 at 3:45 PM
Lori

The primary focus of my comments was directed to your conclusions about what is possible with SMT. I dispute your conclusions about SMT, not your experience with Bentley. The “years” is in quotes because you state in a comment that you have been working with SMT for that long. There is no disrespect intended. I am suggesting (respectfully , I think) that your conclusions about what is possible with SMT is based on a relatively shallow knowledge base and that those with deeper knowledge about the inner workings of SMT can handle pretty much all the SMT shortcomings that you point out in your article. Asia Online is not the only company/person who disputed your claims. Other with deep SMT expertise did too.

You also have a very Systran-centric viewpoint when you use the word hybrid, and perhaps I should point out that “hybrid” has a broader meaning than the one used by Systran. Many more advanced (not Moses) SMT initiatives use a combination of linguistics and data – not just the data only approach that most basic Moses approaches have. Hybrid with SMT can be all or combinations of the following: linguistic rules development, POS parsing and adjustments, syntax and morpho-syntactic approaches which Moses users are unlikely to try. Today most people who have been working with SMT for a few years are ALL doing hybrids in the broader sense of the word.

I am also suggesting that every SMT attempt is not the same, unless you have 2 users throw in exactly the same data into the exact same training system. As the NIST tests used to prove in the past – depending on the expertise and skill of the practitioners, SMT can yield very different results especially in a language like Japanese where some linguistic work would have to be done to get better results.

I agree that finally the only thing that really counts is the output quality and the quality of the post-editing experience and it does not matter how you get there – the productivity can only come from good systems that editors feel are responsive to feedback and where compensation is related to the difficulty of the effort.
ReplyDelete
Replies
Dion WigginsJune 12, 2013 at 3:38 AM
@Lori - I am sitting in your session at Localization World now listening to you say the same things that you said in your Multilingual magazine article. The same things that Kirti addressed as incorrect in his blog post where he addressed the misconceptions that were disseminated in your Multilingual Magazine article.

I am responding in a direct manner this time as it seems the points that Kirti made in his previous post have fallen on deaf ears and it is confusing potential users of MT. It is time to update your knowledge on the advances in SMT in recent years. Many things you say in your presentations and published articles were correct several years ago, but are no longer correct as they have been addressed and resolved. It is a little perplexing and puzzling that you continue to disseminate incorrect information that can be verified as being incorrect with may proof points. This confuses the market unnecessarily in a market that already has enough misconceptions over the last 50+ years.

Simple things like "SMT does not handle formatting tags", while RBMT does is complete nonsense. If you use raw SMT from Moses perhaps, but if you use a commercial SMT product such as Language Studio then that is not the case.

To say universally that RBMT is better for post editing is again nonsense. Asia Online has systems with published case studies such as that from Omnolingua where 52% of the output required zero edits and the client came back to Omnilingua stating that the final quality with MT + PE was higher than HT only.

Productivity is key and this is reflected in the post editing experience. As per the Sajan case study which achieved 328% productivity gains, with 62% of the raw MT requiring zero edits on PE review.

Likewise the need for large volumes of in domain data is no longer a requirement. We have case studies where no client data was provided as data manufacturing technologies were deployed in the process. See Kirti's recent blog post on Advanced Language Translation as an example.

You mention the lack of control and unpredictably of terminology in SMT, yet with a clean data approach such as that used with Language Studio, there is complete control - all the way down runtime normalization of the source as it passed through, normalization after translation in the target language and even control down to the writing style and target audience.

While I have given our own examples using Language Studio and our customers above, there are many MT providers with both RBMT and SMT technologies. Each is different with different features. To summarize them as you have gives the perception that all MT or all SMT is the same. Systran is different to PROMT, Asia Online Language Studio is different to SDL BeGlobal and Moses. Wide sweeping statements that reduce each to the lowest common denominator where features are ignored is not helpful to anyone (i.e. SMT does not support tags)
ReplyDelete
Replies
Dion WigginsJune 17, 2013 at 2:55 PM
In response to the slide deck on SlideShare (http://www.slideshare.net/TAUS/12-june-2013-taus-mt-showcase-moses-in-the-mix-a-technology-agnostic-approach-to-a-winning-mt-strategy-lori-thicke-lex-works)

This information is reposted from SlideShare as the formatting was lost in the SlideShare comments display.

-----

The reason for this detailed post is to correct misinformation so that potential MT users can make informed decisions about all MT based products that are based on fact. This is not an attack on the author; rather it is a series of proof points where we and others have disagreed with the author’s perspective with links and references to third party information in support of the counter positions presented.

At the end of this presentation 4 different organizations raised issues with the content presented. SMT and RBMT are approaches to machine translation and not products. The presentation author is referring to SMT and RMBT as if they are unique products and not a technology approach. There are many products based on either or both approaches that have different features.

The author seems to have only considered the features available in Moses, Microsoft Translator and Systran, while ignoring the features of many other commercial MT products that have already resolved many of the issues raised. The author appears to have not performed data management and optimization of the training data when creating statistical models. As a result, in the authors experience, with a limited number of products and a subset of features that commercial SMT based products can offer, sweeping statements are made that cover a wide scope have been bundled as if there were just 1 product. Each vendor’s products, whether SMT or RBMT based, have a range of different features that are not being recognized by the author.

Many of the assumptions that are presented may have been true several years ago. However many of the issues raised with SMT in particular have been recognized and addressed some time ago by commercial MT vendors and as such are no longer true. If Microsoft Translator and Moses do not support a feature, it does not mean that all SMT based products do not support a feature.

The author makes many sweeping statements that are factually incorrect in the presentation and can readily be verified as such. Multiple individuals have pointed out these discrepancies, but the author has chosen to ignore these proof points and continue to disseminate misleading information. As noted above, several of these issues were raised directly at the end of the presentation where these slides were delivered.

Examples include:

1. SMT cannot handle software tags properly. This is incorrect. Moses cannot handle software tags, but many commercial MT platforms based on SMT such as Asia Online’s Language Studio handle tags very well.

2. SMT does not retain correction to terminology. This is incorrect. If the data is managed properly then management of terminology becomes very easy. Moses and Microsoft Translator do not provide terminology management tools and processes, but products such as Language Studio provide tools to manage and normalize terminology, both when preparing data for training and at translation runtime.

3. SMT does not have a rapid developed customization cycle. This is incorrect. In Andrew Rufener’s presentation (link below), he notes clearly that Asia Online system improved dramatically over 3 days. And that as they added further data, they had control and the system improved quickly.

4. SMT output is not predictable. This is incorrect. If the data is managed properly and supported with data manufacturing such as within Language Studio, then the output can be very predictable.

ReplyDelete
Replies
Dion WigginsJune 17, 2013 at 2:57 PM
5. RBMT is better suited to documentation and software. This is incorrect. There are many published case studies to the contrary. As an example, the case study of Omnilingua on the Asia Online website shows 52% of raw MT required zero edits for their technical automotive documentation. There are many other examples from Asia Online and other vendors.

6. RBMT is better suited to post editing. This is incorrect. As with the above mentioned case study from Omnilingua, engines based on SMT can deliver near perfect quality. The quality of an engine greatly comes down to pre and post processing technologies and the amount of suitable data / corpus that is available for the SMT customization process. With less data or low quality data, translation quality will be poor and the editing will be difficult. With more, high quality data that is in domain, the editing will be less and because it has learned from the clients own translation memories, editing will be significantly less.

7. SMT is not effective with a limited training corpus. This is incorrect. Advances in data manufacturing technologies such as those available in Language Studio mean that even when no data at all is available an engine can still be customized to a high level of quality. The case study on Kirti Vashee’s blog (http://kv-emptypages.blogspot.co.uk/2013/04/pemt-case-study-advanced-language.html) shows how Advanced Language Translation was able to customize engines for their clients with no data at all and only using data manufacturing technologies from within Language Studio.

8. SMT is not as good at languages like Russian, Japanese and German. This is incorrect. The quality of a SMT engine greatly depends on the quality of the data that is used for training. If the author is getting poor results this may be due to insufficient data / corpus, low quality data or insufficient skills to prepare and process the data in a manner that delivers high quality output (see skills comment below). There are many high quality engines based on SMT that excel over RBMT. Andrew Rufener presented “Implementing large scale Machine Translation in Patent Information” (http://dotsub.com/view/159ce97c-dbd4-4d6a-90c2-427a3a3e755f) where he shows metrics from many RBMT and SMT systems. He took a technology agnostic approach and performed detailed metrics before selecting Language Studio.

It appears in the article that the author has not managed data well when creating SMT systems and has not used any data manufacturing and optimization technologies as they are never mentioned. This is evidenced by the authors incorrect assumptions that systems based on SMT cannot have managed terminology and are unpredictable.

The author only considers the hybrid approach of RBMT + SMT based smoothing that is available in Systran and does not consider other hybrid approaches of other vendors such as the hybrid approach of SMT guided by rules and syntax that is offered in Language Studio.

We strongly recommend that the author expand beyond the 3 MT products listed and undertake to learn about data management and data manufacturing for SMT approaches. In Language Studio, Asia Online undertakes these complex tasks for our customers so that the customer can focus on providing the right data for SMT to learn from without the need for skills and the understanding of the complexity of data optimization. Once the initial optimization and data manufacturing is complete, control is handed to the end customer to add and further refine terminology and other linguistic features.

ReplyDelete
Replies
Dion WigginsJune 17, 2013 at 2:58 PM
What the author has omitted in their presentation is information about the corpus that was used to train the SMT engines and the actual product used to support each specific assumption. The author also omits information on how metrics were performed, how many segments were compared, how productivity was measured in post editing and over what time period productivity was measured. Too few segments and too short a time period can dramatically impact and incorrectly skew results. Additionally while languages were referred to, domains were excluded. The complexity of a domain is an important factor that impacts metrics and quality. Comparatively, LexisNexis case studies listed 9 MT systems and the metrics performed, the data volumes used and the results.

We agree that a technology agnostic approach to MT is very viable, but as Andrew Rufener points out in his presentation the integration costs and skill levels required to run multiple MT platforms were significant and can often outweigh the benefits of selecting multiple MT technology solutions. Adobe, PayPal and others have successfully deployed multiple MT technologies and some such as Autodesk have been very open with their metrics. However they have made significant investment in skills, time, data acquisition and data optimization, as well as software development. They also are focused on their own narrow domains, not a broad range of domains in multiple languages like an LSP. Thus they have their own existing language assets and do not need to perform as much management of data as an LSP would when receiving TMs from multiple sources such as TAUS and other LSP partners. Trying to make such data that is not in the domain fit a new purpose is very difficult and unlikely to deliver the optimal quality. This mixture of data approach is commonly referred to as “dirty data SMT”, which is very different from the focused domain “clean data SMT” approach that Asia Online takes in Language Studio.

We encourage measurement and publication of metrics with all the relevant information about how the metrics were performed and with what data. However only including a small number of products in the group that are evaluated, without considering many of the leading commercial products from multiple vendors and not performing data optimization and data manufacturing means that the results irrespective of product will be biased, skewed and limited to only what the raw corpus can provide. Modern commercial SMT systems go well beyond the capabilities of open source Moses and Microsoft Translator.

What many LSPs seem to underestimate is the complexity of delivering high quality MT. Being able to install Moses and train your own engine does not mean that you will get high quality any more than owning a sewing machine and cloth make one an expert tailor. As per the LexisNexis presentation, a significant investment in skills is needed. Andrew Rufener notes in his presentation the significant effort that they put into learning SMT approaches and optimizing data. It is our position that there is no one individual or organization that has the necessary skills to be an expert in each of the technology and approach. Much like healthcare, it is too complex for one individual to be an expert in all fields. For this reason, specialist medical professionals are needed for cancer, brain and other treatments. While a general medical practitioner can deal with common low level issues, more complex issues are referred to a specialist. Machine translation is complex. There are few true specialists globally and even fewer that have solid experience in multiple technologies and approaches. Finding an expert in optimizing any of these technologies is difficult. Finding an expert that can deliver the optimal approach and quality from all or even multiple SMT and RBMT vendors’ products is not realistic.
ReplyDelete
Replies
Dion WigginsJune 17, 2013 at 2:59 PM
For this reason, when Asia Online works with LSPs and other customers, we hide the complexity and engage our linguists to execute data manufacturing and optimization processes on the clients behalf. This means that the required level of expertise for our customers and the effort is greatly reduced. The Asia Online team are experts in Language Studio, our SMT based hybrid platform, and know how to optimize it to deliver the best results. The Asia Online team claims no expertise in other SMT based products or RBMT based products as each have their own merits, approaches and optimal configurations.

We strongly recommend that anyone looking at any MT technology or approach whether RBMT or SMT look for skills that are vendor and product specific. These are not easy to come by, but are the only way to deliver high quality and reduce risk of deployment. Any vendor stating that they are experts in SMT and RBMT without explicitly listing the specific vendors and products that they work with is not going to give the optimal results. Generic skills in SMT and RBMT approaches most certainly cannot deliver the optimal result. Specialists in individual products, not just an understanding of an approach, are required.

Additional counter positions to more misconceptions raised by the author in this presentation and other previous publications can be found in Kirti Vashee’s blog (http://kv-emptypages.blogspot.co.uk/2013/02/dispelling-mt-misconceptions.html)

ReplyDelete
Replies
German NAATI translatorJune 20, 2013 at 1:36 AM
Good to know that MT in now a complex affair requiring many skills, expertise and understanding that are not commonplace, to enable successful deployment as a productivity enhancing technology for business translation needs.
ReplyDelete
Replies
Dion WigginsJune 23, 2013 at 1:44 AM
@"German NAATI translator" - like high quality human translation requires skill and training in a specific language pair and domain, so does high quality machine translation. The skills leveraged, tasks performed and technology use differ between language pairs and domains. For example to create a high quality engine in a Slavic language, there is a large amount of data manufacturing required to handle all the inflected forms well. To translate in Japanese, syntax tools are used in a hybrid approach with SMT to deliver higher quality. Even a more simple language like Spanish has specific tasks performed. But the skills also extend to domains - travel reviews have a simple grammar structure, but when you consider the volume and scope of named entities possible in the travel domain, these too have to be processed via data generation and syntax supported rules in order to deliver high quality. Many LSPs make the mistake of simply loading their translation memories and running a SMT solution such as Moses without any additional effort. It is these kind of efforts that I have not heard any comment on in the slides or articles that Lori Thicke has published in recent months. These skills are complex and language/domain specific. For this reason, we have taken exception to broad and sweeping statements such as "RBMT is better than SMT for Japanese" or similar.
ReplyDelete
Replies

Add comment

eMpTy Pages

Pages

Tuesday, February 26, 2013

Dispelling MT Misconceptions

22 comments:

Get new posts by email:

Search This Blog

Pages

Featured Post

Comparing MT System Performance