Pages

Monday, January 11, 2021

Most Popular Blog Posts of 2020

 This is a summary ranking of the most popular blog posts of the 2020 year based on readership traffic and presence. These rankings are based on the statistics given to me by the hosting platform, which sometimes fluctuate much more than one would expect. 

I am pleased to see that there is an increasing awareness on the importance data analysis in multiple phases of the MT development and deployment process. Data analysis matters for training data selection, improved linguistic pattern handling, effective testing and quality estimation amongst other things. The tools to do this well are still lacking robustness or need major improvements to make them more useable. As the world shifts from handling translation projects for localization (relatively low volume), to other digital presence and large scale content assimiliation and dissemination use cases, I think there will be a need for better tools. I am surprised by the continuing stream new TMS products that continue to emerge, most of these new products have a huge amount of overlap with existing products, and none of the new tools really change the playing field in a meaningful way. 

The single most popular post of the year was this one which was an interview with Adam Bittlingmayer on the risk prediction capabilities of  Modelfront:

1. Understanding Machine Translation Quality & Risk Prediction

Getting a better understanding of data and identifying the most critical data issues is key to success with MT. Better data analysis would mean that human efforts can be focused on a much smaller set of data and thus yield better overall quality in less time. The risk prediction and quality estimation data provided by ModelFront makes MT use much more efficient. It allows rapid error detection and can help isolate translation projects by high-touch and low-touch elements. I suspect much of the readership of this post came from outside the translation industry as I continue to see little focus on this in the localization world. This post is worth a closer look for those LSPs who are investigating a more active use of MT. This link will lead to a case study to show how this can help in localization projects. 

Despite the hype around the magic of NMT and deep learning in general, we should understand that deep learning NMT toolkits are going to be viewed as commodities. Data analysis is where value creation will happen. 

The data is your teacher and is where the real value contribution possibilities are. I predict that this will continue to be clearer in the coming year.




If we consider all three posts related to the "Premium Translation", it would easily be the top blog theme for the year. These posts together attracted the most active readership, and also the most articulate and comprehensive comments. MT technologists tend to lump all translators together when making comments about "human" translators, but we should understand that there is a broad spectrum of capabilities when we talk about "human" translators.  And those who collaborate, consult, and advise their customers around critical content communication are unlikely to be replaced by ever-improving MT. Real domain expertise, insight, and the ability to focus on the larger global communication mission of the content is something I do not see MT approach successfully in my lifetime. 

I am extremely skeptical about the notion of "singularity" as some technologists have described it. It is a modern myth that will not happen as described IMO -- most AI today and machine learning, in particular, is no more than sophisticated pattern matching in big data, and while it can be quite compelling at times, it is NOT intelligence. Skillful use and integration of multiple deep learning tasks can create the illusion of intelligence but I feel that we have yet, much to learn about human capabilities, before we make "human equivalent" technology.

Here is a smart discussion on AI that provides informed context and a reality check on the never-ending AI hype that we continue to hear. Translators, in particular, will enjoy this discussion as it reveals how central language understanding is to the evolution of AI possibilities.
There is not the slightest reason to believe in a coming singularity. Sheer processing power [and big data] is not a pixie dust that magically solves all your problems."
                                                                            Steven Pinker  




The pandemic has forced many more B2C and B2B interactions to be digital. In an increasingly global world where it is commonplace to have multilingual communication, collaboration and information sharing MT becomes an enabler of global corporate presence. I estimate that the huge bulk of the need for translation is beyond the scope of the primary focus of localization efforts. It is really much more about digital interactions across all kinds of content, multiple platforms that require instant multilingual capabilities. 

However, data security and privacy really matter in these many interactions, and MT technology that does not make data security a primary focus in the deployment should be treated with suspicion and care.

Microsft offers quite possibly the most robust and secure cloud MT platform for companies who wish to integrate instant translation into all kinds of enterprise content flows. The voice of a Microsoft customer states the needs quite simply and eloquently.

“Ultimately, we expect the Azure environment to provide the same data security as our internal translation portal has offered thus far,” 

Tibor Farkas, Head of IT Cloud at Volkswagen 





This was a guest post by Raymond Doctor that illustrates the significant added value that linguists can add to the MT development process. Linguistically informed data can make some MT systems considerable better than just adding more data. Many believe that the need is simply more data but this post clarifies that a smaller amount of the right kind of data can have a much more favorable impact than sheer random data volume. 

The success of these MT experiments is yet more proof that the best MT systems come from those who have a deep understanding of both the underlying linguistics, as well as the MT system development methodology.

Here is a great primer on the need for data cleaning in general. This post takes the next step and provides specific examples of how this can be extended to MT.

"True inaccuracy and errors in data are at least relatively straightforward to address because they are generally all logical in nature. Bias, on the other hand, involves changing how humans look at data, and we all know how hard it is to change human behavior."

Michiko Wolcott




I have been surprised at the continuing popularity of this post which was actually written and published in March 2012, almost 9 years ago. Interestingly Sharon O'Brien stated this at the recent AMTA2020 conference. She tried to get a discussion going on why have the issues being discussed around post-editing not changed in ten years. 

The popularity of this post points to how badly PEMT compensation is being handled even in 2020. Or perhaps it suggests that people are doing research to try and do it better. 

Jay Marciano had a presentation at ATA recently,  where he argued that since there is no discernible and reliable differentiator between fuzzy translation memory matches and machine translation suggestions (assuming that you are using a domain trained machine translation engine), we should stop differentiating them in their pricing. Instead, he suggested that they should all be paid by edit distance. ("Edit distance" is the now widely used approach to evaluating the number of changes the editor or translator had to make to an MT suggestion before delivering it.) 

Doing this, according to Jay, protects the translator from poor-quality machine translation (because the edit distance -- or rewrite from scratch --will, in that case, be large enough for 100% payment) as well as from bad translation memories (same reason). Also, he suggests payment for MT suggestions with no edit distance, i.e., suggestions where no edits were deemed necessary (20% of the word price) at a rate twice as high as a 100% TM match (10%) to compensate for the effort to evaluate their accuracy. He also suggests a 110% rate for an edit distance of 91-100%, taking into account the larger effort needed to "correct" something that was rather useless in the first place. 

This is an attempt to be fair but, practically, it is a hard-to-predict compensation scheme and most customers like to know costs BEFORE they buy. There are many others who think we should still be looking at an hourly-based compensation scheme. We do not encounter discussions on how a mechanic, electrician, accountant, or lawyer takes too long to do a job as a reason not to hire them, and perhaps translation work could evolve to this kind of a model.  It is not clear how this could work when very large volumes (millions of words/day) of words are involved as the edit-distance approach really only viable in post-editing of MT use scenarios.

Nonetheless, much of the current thinking on the proper PEMT compensation model is to use Edit Distance-based methodologies. While this makes sense for localization MT use cases, this approach is almost useless for the other higher volume MT use cases. The quality and error assessment schemes proposed in localization are much too slow and onerous to use in scenarios where millions or hundreds of millions of words are being translated every day.

 It is my sense that 95% of MT use is going to be outside of localization use cases (PEMT) and I think the more forward-looking LSPs will learn to find approaches that work better when the typical translation job handles millions of words a week. Thus, I am much more bullish on quality estimation and risk prediction approaches that are going to be a better way to do rapid error detection and rapid error correction for these higher business value, higher volume MT use cases.

The issue of equitable compensation for the post-editors is an important one, and it is important to understand the issues related to post-editing, that many translators find to be a source of great pain and inequity.  MT can often fail or backfire if the human factors underlying work are not properly considered and addressed. 

From my vantage point, it is clear that those who understand these various issues and take steps to address them are most likely to find the greatest success with MT deployments. These practitioners will perhaps pave the way for others in the industry and “show you how to do it right” as Frank Zappa says. Many of the problems with PEMT are related to ignorance about critical elements, “lazy” strategies and lack of clarity on what really matters, or just simply using MT where it does not make sense. These factors result in the many examples of poor PEMT implementations that antagonize translators. 

 




This is a guest post by Luigi Muzii with his observations on several hot issues in the localization business. He comments often on the misplaced emphasis and attention on the wrong problem. For some reason,  misplaced emphasis on the wrong issues has been a long-term problem in the business translation industry. 

Almost more interesting than disintermediation – removing the middleman – is intermediation that adds the middleman back into the mix. Intermediation occurs when digital platforms inject themselves between the customers and a company.  In this case, the global enterprise and the translators who do the translation work. These platforms are so large that businesses can’t afford not to reach customers through these platforms. Intermediation creates a dependency and disintermediation removes the dependency. There is no such intermediary for translation though some might argue that the big public MT portals have already done this and the localization industry only services the niche needs.

He focuses also on the emergence of low-value proposition, generic MT portals with attached cheap human review capabilities as examples of likely to fail attempts at disintermediation. It is worth a read. An excerpt:
"It is my observation, that these allegedly “new offerings” are usually just a response to the same offering from competitors. They should not be equated to disintermediation and they often backfire, both in terms of business impact and brand image deterioration. They all seem to look like dubious, unsound initiatives instigated by Dilbert’s pointy-haired boss. And the Peter principle rules again here and should be considered together with Cipolla’s laws of stupidity, which state that a stupid person is more dangerous than a pillager and often does more damage to the general welfare of others. "

 

By Vincedevries - Own work, CC BY-SA 4.0
 

The danger of the impact of the stupid person is proven by what we have seen from the damage caused by the orange buffoon to the US. This man manages to comfortably straddle both the stupid and bandit quadrants with equal ease even though he started as a bandit. Fortunately for the US, the stupid element was much stronger than the bandit element in this particular case. Unfortunately for the US, stupid bandits can inflict long-term damage on the prospects of a nation and it may take a decade or so to recover from the damage done. 


“The reason why it is so difficult for existing firms to capitalize on disruptive innovations is that their processes and their business model that make them good at the existing business actually make them bad at competing for the disruption.”

'Disruption' is, at its core, a really powerful idea. Everyone hijacks the idea to do whatever they want now. It's the same way people hijacked the word 'paradigm' to justify lame things they're trying to sell to mankind."
'Disruption' is, at its core, a really powerful idea. Everyone hijacks the idea to do whatever they want now. It's the same way people hijacked the word 'paradigm' to justify lame things they're trying to sell to mankind.
Read more at https://www.brainyquote.com/topics/disruption-quotes
'Disruption' is, at its core, a really powerful idea. Everyone hijacks the idea to do whatever they want now. It's the same way people hijacked the word 'paradigm' to justify lame things they're trying to sell to mankind.
Read more at https://www.brainyquote.com/topics/disruption-quotes
Clay Christensen


“Life’s too short to build something nobody wants.”

                                                                                    Ash Maurya 



Luigi also wrote a critique of my post on the Premium Market and challenged many of the assumptions and conclusions I had drawn. I thought it would only be fair to include it in this list so that readers could get both sides of the subject on the premium market discussion.




I also noted that the following two posts got an unusual amount of attention in 2020. The BLEU score post has been very popular in two other forums where it has been published. There are now many other quality measurements for adequacy and fluency being used but I still see a large number of new research findings reporting with BLEU, mostly because it is widely understood in all its imperfection.

The latest WMT results use Direct Assessment (DA) extensively in their results summaries. 

Direct assessment (DA) (Graham et al., 2013,2014, 2016) is a relatively new human evaluation approach that overcomes previous challenges with respect to lack of reliability of human judges. DA collects assessments of translations separately in the form of both fluency and adequacy on a 0–100 rating scale, and, by combination of repeat judgments for translations, produces scores that have been shown to be highly reliable in self-replication experiments. The main component of DA used to provide a primary ranking of systems is adequacy, where the MT output is assessed via a monolingual similarity of meaning assessment. In Direct Assessment humans assess the quality of a given MT output translation by comparison with a reference translation (as opposed to the source and reference). DA is the new standard used in WMT News Translation Task evaluation, requiring only monolingual evaluators. For system-level evaluation, they use the Pearson correlation r of automatic metrics with DA scores.I have not seen enough comparison data of this to have an opinion on efficacy yet.


Most Popular Blog Posts of 2019 had an unusually high traffic flow and would rank in the Top 5




I wish you all a Happy, Prosperous and Healthy New Year

No comments:

Post a Comment