As machine translation continues to gain momentum, we are seeing many more instances of LSPs and some enterprise users exploring the potential use of the technology in core production work. MT today is still unfortunately quite complex and there are few universally accurate truisms or rules of thumb that replace the need for at least some minimal amount of expertise and understanding. Expertise and knowledge are key requirements for those who wish to use MT successfully in a translation production context. However, there are still many misconceptions about the effective use of the technology.
Some of the most common misconceptions include:
All MT systems are about the same. Not really, some MT systems that have undergone expert-managed customization and domain-focused training can produce dramatically better results than generic systems. This also means that you are not likely to get a very good understanding of the capabilities of an MT technology without doing a real pilot project that involves customization. Yet I often see people trying to make judgments about which MT system to use based on running a few paragraphs through a generic engine.
All MT applications are the same. Some MT applications that are focused on localization (documentation, core website content) need much higher quality to be useful, than other applications like making customer support forum content multilingual where good gisting quality is adequate. Translator productivity applications are the most difficult to do successfully and one where naïve users (e.g. your average LSP with Moses) are likely to fail.
Post-editors should be paid the same lower rate for all MT post-editing work. CSA states that this magic rate is 61% of the full rate in 2010. However, setting a fixed rate without understanding the reality of the MT output quality can often be unfair to editors and cause resentment that can undermine any attempt to build production leverage. Compensation needs to be linked to productivity and effort expended to “fix MT” and the most successful users are respectful and careful to do this well to ensure a stable and motivated workforce.
MT is responsible for falling translation rates.
This is a digression, but I wanted to highlight some interesting analysis and opinion by Luigi Muzii on why this is NOT true and he provides very interesting analysis and opinion on this matter in this article and also in a post called “Changes Ahead” that was characterized as follows by Rob Vandenberg.
I will address the first three issues in this post and provide some more context to clarify these misconceptions.
MT systems can vary and produce very different type and quality of output depending on all of the following factors:
- Methodology used (RbMT, SMT, Hybrid which can also mean many different things)
- The skill and knowledge of the practitioners working with the technology and building the systems. MT is still quite complex and needs skills that take time to develop and refine, to get output quality that surpasses the quality produced by public MT engines from Google and Bing.
- Increasingly the quality and the volume of the “training data” are important determinants of the quality of the system as SMT approaches increasingly lead the way.
- The language pair: It is much easier to get “good” systems with FIGS than with CJK relative to English. Languages like Hungarian, Finnish, and Turkish are just tough in general (relative to English).
- The ability of the system to respond to small amounts of strategic corrective feedback. This is critical to building real business leverage. While some systems may improve slightly when many millions of new words are added to train them, very few can respond favorably to small volumes of additional data. MT system development is evolutionary and one should enter into development with this mindset.
MT can be useful in many different scenarios but it should be understood that the expected usable quality for different uses is very different. We live in a world today, where MT translates billions of words each day for internet users who are trying to understand the content of interest on the web or communicate with others across the world. There are also many corporate and business applications where the sheer volume and volatility of the information could not justify anything but MT, e.g. technical knowledge base content, customer forum discussions, hotel reviews where “good enough” is good enough. Much of this information has little or no value over time e.g. configuration guidance on DOS 5.0/Windows XP or a 3-year-old hotel review but could have great value and enhance global customer satisfaction for a brief window in time even in an imperfect linguistic-quality form. MT use for traditional LSP applications is the most demanding of all MT applications and requires the deepest knowledge and expertise and skill. MT in this context can only add value if the output produced is of sufficient quality, that it actually enhances the productivity of translators and makes the business translation process more cost-efficient. It is not a replacement for human translation and thus needs to be at a quality level that humans acknowledge its utility and actually want to use it.
Much of the early dissatisfaction with MT in the professional translation world is a result of asking translators to edit poor quality output for much lower rates in a relatively arbitrary fashion, that did not accurately reflect the level of effort that was involved. The task of post-editing MT to publication-quality levels needs an understanding of the average level of effort needed and very few in the professional translation world have figured this out.
Omnilingua is an example of how to do it right, with a very clear and trusted quality measurement profile of the MT output which then also helps to define productivity and fair compensation for editors. This task of accurate measurement of MT output quality and then a determination of the correct compensation structure is key to successful MT deployment and is quite possible in high-trust scenarios but much harder to implement when trust is less prevalent.
In the following largely hypothetical example (which is based on a generalization of actual experiences) I have summarized the possibilities to show how MT system output quality and productivity are related. I have also taken the additional step of showing how lower word rates can often make sense with “good” MT systems, and hopefully demonstrate that it is in the interests of both LSPs and translator/post-editors to figure out the key quality/productivity metrics accurately. Once the productivity is clearly established lower rates make sense because the throughput is trusted. Both parties need to be willing to make adjustments when the numbers don’t properly balance out.
In this hypothetical comparison, we will assume that there are 3 MT systems all focused on the same production task. These systems are of differing quality and their related productivity impact is characterized below. The objective in every case is to produce final output that cannot be discerned from a pure human TEP production effort:
In the following largely hypothetical example (which is based on a generalization of actual experiences) I have summarized the possibilities to show how MT system output quality and productivity are related. I have also taken the additional step of showing how lower word rates can often make sense with “good” MT systems, and hopefully demonstrate that it is in the interests of both LSPs and translator/post-editors to figure out the key quality/productivity metrics accurately. Once the productivity is clearly established lower rates make sense because the throughput is trusted. Both parties need to be willing to make adjustments when the numbers don’t properly balance out.
In this hypothetical comparison, we will assume that there are 3 MT systems all focused on the same production task. These systems are of differing quality and their related productivity impact is characterized below. The objective in every case is to produce final output that cannot be discerned from a pure human TEP production effort:
- Good Instant MT/Moses System – A large majority of these systems do not produce output better than the free generic engines on the internet. I am assuming that perhaps 5% to 10% of these systems can reach a state where they can outperform Google. TAUS has highlighted several case studies where this is documented and where it is clear this is difficult. Typically productivity for a very successful effort will range from 3,000 words per day and slightly higher.
- Average Expert System – A product of a reasonable amount of data and expertise and experience that enables productivity over 5,000 words/day to as much as 7,000 words per day for editors who work on correcting the MT output.
- Excellent Expert System – This is possible with data-rich systems developed by experts that have gone through several iterations of improvement and corrective feedback. I have seen systems that enable 9,000 words/day to as much as 12,000 words/day throughput. Some exceptional systems are even higher!
I'll have to take your word on the points in Luigi's article Kirti; despite several readings, neither I nor another colleague were able to make much sense of many parts of it. Perhaps he and I make the same or similar points, but with that approximation of English, God only knows.
ReplyDeleteInteresting stats you quote on MT / post-edit productivity. No better or significantly worse than I have observed routinely for a subject matter expert or translating interpreter working on texts with limited formatting and a well-tuned dictation/transcription system. At that's with some very complex subject matter.
Older translators will probably recall the output of those using dictation systems in the 1980s and before as being some 10,000 to 15,000 words per day of reasonable quality text. I have heard this figure from a number of sources who witnessed that routinely then. The bottleneck then was transcription and subsequent proofreading/editing. This has improved considerably with the evolution of voice recognition systems.
So it would seem that the real returns can be equal or greater if the investment is made in methods that allow good translators to relax and focus on the subject matter at hand and do efficient quality checks of the work afterward. In my tests, for example, I have found that alignment of dictated texts with sources for final QA is not a great challenge, Tagged formats do, of course, add their own special challenges, but this would be no less the case with other modes of working, and dictating translation in a modern working environment for translators offers many ways of dealing with tags efficiently.
As you rightly point out, MT is a very complex challenge, one which is typically tackled with more naive enthusiasm than knowledge and too often results only in wasted effort, burned cash and frustration. Far better to focus on proven methods with competent professionals best able to produce viable results.
Great post, Kirti. You take on a number of common perceptions/misperceptions with effectiveness. I particularly like your comments on the "one price fits all" approach. In my experience, this is the single largest barrier to translator acceptance ... they get all the risk with very little up-side. Your suggestion that all sides need to be flexible with price is a good one, too. I think that is why MT adoption is best seen as part of a business partnership of some kind where all parties (technology vendor, end buyer, LSP, freelance linguists) share a commitment to making the project a success and sharing financially in that success to whatever degree true savings are generated. In fact, I think a general discussion about how to make this happen would be useful.
ReplyDeleteAlso, one comment on the "Good Instant MT/Moses" solutions. They do not really have to be better than Google/Bing to generate value ... many MT users are concerned about data privacy such that a private system that is only "almost as good as Google" would still be worth some investment. And as you point out in your three-tiered model, such an "instant" solution may still be the basis for the creation of an "average" (or maybe even above-average) expert system with additional investments of time and expertise.
@Kevin
ReplyDeleteI think Luigi's post is interesting because he points out that some CSA research and sample-based conclusions might not actually be representative. He also makes several "common-sense" observations that are I think are useful and worth attention because they raise questions about fundamental assumptions in the industry i.e. tacit assumptions that may not actually be fact. I find the questions he raises very interesting. He also has a view of efficiency and automation that I think is larger and more encompassing than most. Luigi does prefer to write in Italian and perhaps some may find the flow challenging.
Another way to look at the different MT options presented here is to consider doing the same project with 1) TM that on average gives you 85%-90% fuzzy match 2) avg of 70%-80% match 3) 60%-70% for EVERY new segment. So this could even be useful for highly formatted content and productivity will be linked to the average match presented to the translator.
Translators are often forced to deal with data/file transformations (as you have often pointed out) and other "non-linguistic" work. MT often creates a new workflow where there is much less non-linguistic work involved. The example you cite is perhaps an example of this kind of focus where linguistically oriented work will result in relatively high productivity.
Thanks for the comments.
@Bob
ReplyDeleteThe problem with having MT systems that do not produce better results than Google/Bing is that some translators will prefer to use the public engines since it may actually make their work faster and easier. If they are being asked to work with MT, most editors will opt for the solution that provides the best output and thus the best throughput for them. There is no way that enterprises/LSPs can police/prevent a post-editor from running source material through Google if that is what they choose to do so and a contract is not likely to be much of an impediment in today's global online workplace.
However, if a translator/editor realizes that the MT output they are editing is significantly or even slightly superior to GOOG they are not likely to bother with these free privacy-compromising engines and privacy will be easier to maintain.
I have spoken to a few translators who have described exactly this scenario to me and apparently it is common practice and common sense in much of the world.
Thank you for the nice post,I agree that the translation quality of systems such as Moses is not perfect, but we can work on it by providing specialized corpora and/ or combining statistical MT with rule based MT in other words we can use hybrid MT to enhance the quality of machine translation systems.
ReplyDeleteyou should understand the very moment you use MT you are start decreasing mind's capacity to translate. before jumping tp MT you should understand quality of translation is depend upon mood of translator.. because this is not mechanical but cerebral work where mind is fully occupied. i don't see great future for translators.because most of the people don't understand process of translation. it is if not more than equally important like writing skill.when did you last come acorss good piece of writing?
ReplyDelete