Thursday, April 23, 2015

How Translators Can Assess Post-Editing MT Opportunities

With the continued growth in the use of MT, it has become increasingly important for translators to understand better when it is worth getting involved, and when it is wise to stay away from post-editing opportunities that come their way. 

This is still a very fuzzy issue for most translators and I think it might be useful to share some information with them to highlight some of the key variables they could use to determine the most rational action given the facts at hand. For some, post-editing will never be palatable work, but for those who look more closely and see that PEMT is now just another variant of professional translation work that is much like other translation work, which can be economically advantageous when one is working with the right partners and the right technology in this case.  


We have seen that in the early days of MT use that there has been much cause for dissatisfaction all around, especially for translators who have been asked to post-edit sub-standard MT output for very low rates. Translators do need to be wary since many LSPs deploy MT technology without really understanding it, with the sole purpose of reducing costs, and with no understanding on how to produce systems that actually enable this lower cost scenario or interest in engaging translators in the process. Thus it is worth translators learning some basic discrimination skills to determine and establish some general guidelines to understand the relative standing of any PEMT opportunity that they are presented with.


The following checklist is a useful start (IMO) that every translator should consider when deciding what kinds of PEMT opportunities are worth working on.
  • Understand the very specific MT output that you will be working with as every MT engine is unique and assessments need to be made in reference to the actual output you will be working with.
  • Determine if the LSP understands what they are doing with the MT technology and can respond to feedback on error patterns. There are many “upload and pray” efforts nowadays that create very low quality systems that are very hard to control and challenging for translators to work with.
  • Understand the MT technology that is being used as not all MT is equal. There are many variants and you should know what the key differences are. Systems that allow feedback and have more controls to correct errors after the MT engine has been built and accept ongoing corrective feedback will generally be better to work with.
  • Have a basic understanding of the MT methodology which means at least an overview of the rules-based and statistical approaches. This can give you a sense for what kind of feedback you can provide and also help you understand error patterns.
  • Understand that MT engine development is an evolutionary process rather than an instant solution that Google has led some of us to believe. Professional MT deployment is a molding process that evolves in quality through expert iteration, and is typically done to tune an engine for a specific business purpose to help an ongoing high volume translation production need. MT makes much less sense for random one-time use.
  • Understand the basic quality assessment metrics used with MT. BLEU scores are often bandied about with MT systems and often interpreted incorrectly. If you understand them you will always have a better sense for the reality of a situation as incompetent practitioners use and interpret these scores incorrectly all the time. The BLEU scores are only as good as the Test Sets used and so try and understand what makes a good Test Set as described in the link.
It is wise to use technology when and only if there is a clear benefit, and this is especially true with MT. An LSP should have a clear sense that the productivity of the translation project will be improved by using the technology otherwise it is detrimental in many ways.This means that there needs to be clear idea of what typical translation project throughput is before and after the use of MT. And a trusted way to measure how MT might impact this productivity. 

  • Thus MT only makes sense when it boosts productivity or when it makes it possible to provide some kind of translation for material that would just not get translated otherwise.
  • Translators should also understand that lower rates are not necessarily bad if their throughput is appropriately higher.
  • Finally, MT error patterns tend to be consistent so it makes sense to approach corrections at a chunk level rather than an individual segment level. 


Much of the dissatisfaction with PEMT work is related to compensation. My post on PEMT compensation remains the most read post on this blog even though it is now 3 years old. But I think if you understand the specific MT output you are dealing with and it’s impact on your throughput you can make an informed decision.


It is wise to remember that a lower rate does not necessarily mean less overall compensation as the following totally hypothetical chart explains. (The productivity benefits are more likely to be shared less generously). The best LSPs will have an open and transparent process in setting this rate and translators will be involved to ensure that the rate is fair and reasonable and based on actual MT output quality rather than some arbitrarily lower rate “since we are suing MT”. Also expect Romance language rates to be lower than tough-for-MT languages like Japanese and Korean if editing effort is used a criterion for setting the rate.


Much of what I have covered here was presented in a Proz presentation that is still available as video (slides with voice) for those who want to see and hear more details of the summary presented in this post.


As a complete aside this is for those who think that Genetically Modified foods are harmless. Here is a quote from a biotech company leader that you might want to consider the next time you eat corn from a US supermarket:
“We have a greenhouse full of corn plants that produce anti-sperm antibodies.” ~ Mitch Hein, president of Epicyte, a California-based biotechnology company.

And to end on a cheery note, I was very impressed by the musicality of  this song and thought others might want to hear it too.

Monday, December 1, 2014

Machine Translation Humor Update

It has been sometime since I first wrote a blog post about MT humor primarily because I really have not been able to find anything worth the mention, until now, and except for some really lame examples about how MT mistranslates (sic) I have not seen much to laugh heartily at. It seems a group of people on the web have discovered the humorous possibilities of MT in translating song lyrics which might be difficult even for good human translators. (It really seems strange to be saying “human translator”.) 

I should point out that in all these recent cases one does have to work at degrading the translation quality by running the same text through a whole sequence of preferably not closely related languages.

It has often surprised me that there are some in the MT industry who use “back translation” as a way to check MT quality, as from my vantage point it is an exercise that can only result in proving the obvious. MT back translation by definition should result in deterioration since to a very great extent MT will almost always be something less than a perfect translation. This point seems to evade many who advocate this method of evaluation, so let me clarify with some mathematics as math is one of the few conceptual frameworks available to man where proof is absolute or pretty damned certain at least.

If one has a perfect MT system then the Source and Target segments should be very close if not exactly the same. So mathematically we could state this as:

Source (1) x Target (1) = 1

since in this case we know our MT system is perfect ;-)

But in real life where humans play on the internet and you have DIY MT systems being used to determine what MT can produce, the results are less likely to equal 1 which is perfect as shown in the example above.

So lets say you and I do a somewhat serious evaluation of the output of various MT systems (each language direction should be considered a separate system) and find that the following table is true for our samples by running 5,000 sentences through various MT conversions and scoring each MT translation (conversion) as a percentage “correct” in terms of linguistic accuracy and precision.

Language Combination Percentage Correct
English to Spanish 0.8 or 80%
Spanish to English 0.85 or 85%
English to German 0.7 or 70%
German to English 0.75 or 75%

So if we took 1,000 new sentences and translate them with MT we should expect that the percentage shown above would be “correct” (whatever that means). But if we now chain the results by making the output of one, the input of the other, we will find that results are different and and get continually smaller e.g.

EN > ES > EN = .8 x .85 = 0.68 or 68% correct

EN > DE > EN = .7 x .75 =  0.525 or 52.5% correct

So with MT we should expect that every back test will result in a lower or degraded results as we are multiplying the effect of two different systems. Since computers don’t really speak the language one cannot assume that they have equal knowledge going each way and if you provide a bad source from system A to system B you should expect a bad target as computers like some people, are very literal.

So now if we take our example and run it through multiple iterations we should see a very definite degradation of the output as we can see below.

EN > ES > EN(from MT) > DE > EN = .8 x .85 x .7 x .75 = 0.357 or 35.7%

So if you are trying to make MT look silly you have to run it through multiple iterations to get silly results. It would help further if you chose language combinations like EN to Japanese to Hindi to Arabic as this would cause more rapid degradation to the original English source. Try it and share your results in the comments. 

So here we have a very nicely done example and you should realize it takes great skill for the lead vocalist to mouth the MT words as if they were real lyrics and still maintain melodic and rhythmic integrity so be generous in your appreciation of their efforts.

This video shows very effectively how using multiple languages very quickly can degrade the original source as you can see when they go to 64 languages. Somehow words get lost and really strange.

And here is one from a vlogger who really enjoys the effect of multiple rounds of MT on a songs lyrics. She is a good singer and is able to maintain the basic melody without breaking into a smile so I found it quite enjoyable  and I would not be surprised that some might believe that these were indeed the lyrics of the song. She has a whole collection of recordings and has what I consider are high production values for this kind of stuff.

And she produces wonderful results on this Disney classic "When you paint the colors of your air can" which used to be a favorite of my daughter. I actually think the song from the Little Mermaid is much funnier and was done by just running it only through four iterations in Google Translate, but since I could not embed it here directly I have given the link.

 Here is another person who has decided that 14 iterations is enough to get to generally funny with this or any pop song. I'm not sure how funny this really is since I don't know the original song.

 So it appears that we are going to see a whole class of songs that are re-interpreted by Google Translate and it is possible to get millions of views as MKR has, and probably even make a living doing this.  So here you see one more job created by MT.

So anyway if somebody suggests doing a back test with MT you should know the cards are clearly stacked against the MT monster and the results are pretty close to meaningless. A human assessment of a targeted sample set of sentences is a much better way to understand your MT engine.

Hope you all had a good Thanksgiving vacation and are not feeling compelled to shop too fervently now. 

In this time of strife and distrust in Ferguson it is good to see spontaneous goodwill and instant musical camaraderie between these amateur musicians. 


My previous posts on MT humor for those who care are:
Machine and Human Translation Based Humor

Translation Humor & Mocking Machine Translation

Thursday, September 11, 2014

The Translation Market – Is it Really Understood?

I saw some interesting comments to a blog post by Kevin Lossner that I thought would be good to share with the community that reads this blog, as it raised some cogent points I thought. The comments basically talk about a larger more complex translation market than many of us might believe exists based on market research available. I do not claim to have real insight or knowledge of this larger translation market, but I am definitely aware that the largest translation initiatives in the world are generally overlooked by traditional market research e.g the many branches of the US government (DoD, NSA, CIA, FBI, DIA, State and even Commerce), the EU and I expect many of the clandestine “intelligence” operations around the world, especially amongst the G20 governments.

I would also bet that the really big, almost nation-like, Fortune 100 corporates also have captive and hidden translation operations that are buried and invisible within PR, Marketing and Investor Relations somewhere to translate the stuff that really matters or is really secret. (I would not be surprised if the people in these departments did not even know if a localization team exists elsewhere in the corporation.)  If it really matters, why would you ask Lionbridge or SDL (or any other large LSP) to translate it? definitely is something to ponder upon.   Surely it would be more likely to go to internal subject matter experts, or to trusted and elite boutique services that actually understand the subject matter of the material, and can protect the information with the same zeal and protective assurances as those who create it.  Imagine you are an oil company called ABCP and want to make sure that you look less culpable for a major accident caused by management insistence on moving ahead with a risky drilling project. I think the odds are high that the translators chosen to translate critical memos and communications and "put the right spin on it" before it is shown to regulators are going to be different from the ones that work for Lionbridge since it might save a few billion in damages that will have to be paid.

I also generally expect that specialists, i.e. translators with demonstrated subject domain expertise, will have a much brighter future than those who will translate anything that is within arms reach. Specialization means building subject matter expertise, which I think will matter more and more, and I for one would stay away from LSPs who do not specialize or have long-term demonstrated competence in a few select domains.

I find this discussion interesting also because I think that repetitive, low-value, short shelf-life, bulk (high volume) content is eventually going the way of PEMT or even raw MT, but there is a huge world of high value content that is unlikely ever to head that way until we reach the Star Trek Universal Translator levels of quality, which are not expected to be available till the 24th century. I actually think that IPO and many SEC filing documents (10K, Registration documents) and user manuals of any kind including nuclear machinery and medical equipment are fair game for competent and very specialized PEMT initiatives, but I would not use MT for anything that requires linguistic finesse or reading between the lines e.g. wedding vows, great literature, letters to the board/stockholders or poetry. Even in those areas where you have high volume and lots of repetitive and highly similar content, MT can work well only when real expertise is applied, and there is a real and active collaboration with translators and linguists who all want to produce an engine that will reduce future efforts.
These are some of the excerpted and unedited (by me) comments made by Kevin Hendzel at the blog post referenced above written in a more visceral style than the more careful elaboration in much greater detail on his own blog. I don’t agree with everything Kevin says about MT, but I think his views are generally based on deeper observations than “MT is crap” and I can appreciate that we have different views on this issue. (Excerpts printed here with his and Kevin Lossner’s permission.)
From my own viewpoint, it does seem that the localization industry/bulk translation market has long suffered from a “we’re the only game in town” problem. There’s an amusing story about SeaWorld (an aquatic theme park in the US) that goes a long way toward illustrating this exact echo-chamber problem that the localization industry and pure bulk-market providers seem to be perpetually trapped in. Occasionally you’ll see protesters outside SeaWorld holding up signs that declare: “It’s not SeaWorld, it’s PoolWorld.” The corporate entity SeaWorld telling tourists that these tiny, familiar pools constitute “the sea” does not make them the sea. The sea is immensely, incalculably larger and more complex.
The same is true of the translation market. Referring to the tiny pool you are familiar with (low-end bulk localization and translation) as “the sea” (the whole rest of the market) tends to distort one’s sense of the enormity of the sea, the complexity of sea life, not to mention how damaging it can be to trap sea life in unfamiliar and hostile surroundings. There may also be value in dispensing with a couple of misconceptions.
Myth #1: There are two market segments (premium and bulk) that are easily delineated and the premium market is dramatically smaller than the bulk market.
Reality: There’s a very long continuum that encompasses all market segments, with raw bulk free MT at one end and $25,000 tag line translations of 3 words at the other.
It’s far more accurate to characterize the continuum in terms of gradual and consistent gradations of shade rather than in terms of clear differentiating boundary lines. The “premium vs. bulk” dichotomy is a form of shorthand only. That also applies to price and quality, since the correlation between the two is not always linear. The premium sector includes commercial segments that are fiercely guarded and (often) shrouded in secrecy to prevent additional competition. Many of these are boutique translator-owned companies that deliberately fly under the radar of “research” companies like Nonsense Advisory (itself shamelessly in bed with the large companies it purports to “cover,” and stubbornly resistant to acknowledging its own 50-kilometer-wide blind spots) to avoid alerting other companies to their profitable businesses. There is an astonishing amount of money in these premium sectors. Pure translation alone in the high-end expert pharmaceutical, medical device and IP litigation as well as the premium legal, financial and marketing sectors across all languages and in all countries dwarfs the entire global IT localization industry by about two to three orders of magnitude. There are some years where one single IP pharmaceutical litigation case in Japanese-English alone will run into the $10 - $20 million range – about 10 times the “savings” that TAUS preaches are available to localization companies and their end clients that embrace their “translation as a utility” model in localization. That’s one single translation project in one single language pair. And the net profit margins are considerably higher.
Myth 2: Price is the key differentiator between the premium and bulk market.
Reality: While it’s true that the premium market tends to operate at higher prices, the market really operates on a completely different value proposition than does the bulk market. That proposition is that the cost of failure is dramatically higher than the cost of performance.
So in the premium market, the cost of translation errors – liability, regulatory failure, loss of life, damaging publicity or significant loss of prestige – far outweighs the cost of “getting it right.” Paying whatever cost premium for translation that is necessary to PREVENT the cost of failure is viewed as a wise investment.
In the bulk market, those two are reversed. The cost of failure is low, so there is no corresponding push to invest in getting it right. This can be tested by comparison to the dynamics of other industries, too. The cost of failure for a Walmart product is very low – the consumer almost expects the damn thing to break. It’s the same with cheap online localization and “just good enough to understand it” bulk translation. But a fractured fuel pump on a Boeing aircraft in flight has an enormous cost of failure, so several layers of review, ongoing maintenance and testing as well as regulatory enforcement are built around it in an effort to ensure that does not happen, a process which drives up fuel pump manufacturing costs dramatically.  When the failure of an IPO or the collapse of a deal due to a translation-related regulatory failure or when nuclear weapons are improperly dismantled or lost to unknown people – yeah, that’s a very, very high cost of failure. Wallets open up to pay a premium for translation in these cases. Of course, translators who want to play in this market must be Boeing quality, though, not Walmart. (If any serious person considers this view “elitist,” I will contemplate the validity of that charge when that person agrees to fly on Walmart-manufactured jet aircraft that fly without regulatory approval or oversight.) :)
Myth 3: The largest translation company in the world is Lionbridge, crowned once again by Nonsense Advisory.
Reality: It isn’t. It may be the largest localization company that openly shares public financial data in an easy-to-read format and hence is trivially “researched,” but it omits huge operations that just don’t advertise their existence in quite the same way. For example, there are Global Linguist Solutions and L-3 Inc. just in the US alone. Never heard of either, right? GLS won the original US Army contract to support Iraq ops worth about $4.64 billion over five years after L-3 had the original one pre-Iraq. Perhaps more to the point in terms of current size, the U.S. Army recently awarded a huge US Army contract referred to as DLITE valued at $9.7 billion to 5 companies including those two. Those are JUST the U.S. Army contracts. The open, unclassified ones. This does not include all the other U.S. federal open spending on language services for all the other agencies that these same companies along with DynCorp and McNeil and Booz Allen and a dozen others that have never been to an ATA or any other translation conference compete for and win. It also omits all U.S. classified and confidential contracts. It omits all other governments’ outsourced classified and unclassified language spending. It’s like omitting the Indian Ocean and half the Pacific from your "research."
It’s a vast, complex, cloudy and immensely varied translation sea out there.
I know that those who have dealings with the US government around translation technology at least have an inkling that this is true. It is sort of like the discussions on the Deep Web which contains much of the highest value information available in the world that is not indexed or accessible by the search engines that we all use. This is the part that is private, gated and contains the really important high value content that can only be seen by people who are properly authenticated and authorized. I can’t say for certain that the proportions in the graphic below are true for the translation market but based on what I directly know about the data volumes processed in the clandestine communities it certainly would not be impossible.
Deep Web icebergdeepweb
Anyway I thought this subject was interesting and worth more exposure. Also, it was easy to do as Kevin Hendzel wrote the bulk of this post.Smile   

P.S.  I thought it was worth adding this post-script here since Luigi Muzii has also made extended comments on his blog on this subject and so I add his Twitter comment to the main body of this post.

From @ilbarbaro
My comments to @kvashee latest debated post can be found in, and