I did not write as much as I had hoped to in 2019 but I hopefully can correct this in the coming year. I notice that the top two posts of the past year were written by guest writers, and I invite others who may be so moved, to also come forward and add to the content being produced on this blog.
These rankings are based on the statistics given to me by the hosting platform, and in general, they look reasonable and likely. In these days of fake news and fake images, one does need to be wary. I have produced other reports that have produced drastically different rankings which seemed somewhat suspect to me so I am going with the listing presented in this post.
The most popular post of the 2019 year was from a frequent guest writer on eMpTy Pages: Luigi Muzii, who has also written extensively about post-editing best practices elsewhere.
Data is valuable when it is properly collected, understood, organized and categorized. Having rich metadata and taxonomy is especially valuable with linguistic data. Luigi has already written about metadata previously, and you can find the older articles here and here. I think that we should also understand that often translation memory does not have the quality and attributes that make it useful for training NMT systems. This is especially true when large volumes of disparate TM are aggregated together and this is contrary to what many in the industry believe. It is often more beneficial to create new, more relevant TM, based on real and current business needs that better fit the source that needs to be translated.
A series of posts that focused on BLEU scores and MT output quality assessment were the next most popular. Hopefully, my efforts to steer the serious user/buyer to look at business impact beyond these kinds of scores has succeeded, and informed buyers now understand that it is possible to have significant score differences that may have a minimal business impact, and thus these scores should not be overemphasized when selecting a suitable or optimal MT solution.
2. Understanding MT Quality: BLEU Scores
As there are many MT technology options available today, BLEU and its derivatives are sometimes used to select what MT vendor and system to use. The use of BLEU in this context is much more problematic and prone to drawing erroneous conclusions as often comparisons are being made between apples and oranges. The most common error in interpreting BLEU is the lack of awareness and understanding that there is a positive bias towards one MT system because it has already seen and trained on the test data or has been used to develop the test data set.
These rankings are based on the statistics given to me by the hosting platform, and in general, they look reasonable and likely. In these days of fake news and fake images, one does need to be wary. I have produced other reports that have produced drastically different rankings which seemed somewhat suspect to me so I am going with the listing presented in this post.
The most popular post of the 2019 year was from a frequent guest writer on eMpTy Pages: Luigi Muzii, who has also written extensively about post-editing best practices elsewhere.
1. Understanding the Realities of Language Data
Despite the hype, we should understand that deep learning algorithms are increasingly going to be viewed as commodities.
The data is your teacher. It's the data where the real value is. I predict that this will become increasingly clear over the coming year.
Data is valuable when it is properly collected, understood, organized and categorized. Having rich metadata and taxonomy is especially valuable with linguistic data. Luigi has already written about metadata previously, and you can find the older articles here and here. I think that we should also understand that often translation memory does not have the quality and attributes that make it useful for training NMT systems. This is especially true when large volumes of disparate TM are aggregated together and this is contrary to what many in the industry believe. It is often more beneficial to create new, more relevant TM, based on real and current business needs that better fit the source that needs to be translated.
A series of posts that focused on BLEU scores and MT output quality assessment were the next most popular. Hopefully, my efforts to steer the serious user/buyer to look at business impact beyond these kinds of scores has succeeded, and informed buyers now understand that it is possible to have significant score differences that may have a minimal business impact, and thus these scores should not be overemphasized when selecting a suitable or optimal MT solution.
2. Understanding MT Quality: BLEU Scores
As there are many MT technology options available today, BLEU and its derivatives are sometimes used to select what MT vendor and system to use. The use of BLEU in this context is much more problematic and prone to drawing erroneous conclusions as often comparisons are being made between apples and oranges. The most common error in interpreting BLEU is the lack of awareness and understanding that there is a positive bias towards one MT system because it has already seen and trained on the test data or has been used to develop the test data set.
What is BLEU useful for?
Modern MT systems are built by “training” a computer with examples of human translations. As more human translation data is added, systems
should generally get better in quality. Often, new data can be added
with beneficial results, but sometimes new data can cause a negative
effect especially if it is noisy or otherwise “dirty”. Thus, to measure
if progress is being made in the development process, the system
developers need to be able to measure the quality impact rapidly and
frequently to make sure they are improving the system and are in fact
making progress.
BLEU allows developers a means “to monitor the effect of daily changes to
their systems in order to weed out bad ideas from good ideas.” When used
to evaluate the relative merit of different system building strategies,
BLEU can be quite effective as it provides very quick feedback and this
enables MT developers to quickly refine and improve translation systems
they are building and continue to improve quality on a long term basis.
The enterprise value-equation is much more complex and goes far beyond linguistic quality and Natural Language Processing (NLP) scores. To truly reflect the business value and impact, evaluation of MT technology must factor in non-linguistic attributes including:
The SDL blog also had a strong preference for MT-related themes and if you are curious you can check this out: REVEALED: The Most Popular SDL Blogs of 2019
- Adaptability to business use cases
- Manageability
- Integration into enterprise infrastructure
- Deployment flexibility
To effectively link MT output to business value implications, we need to understand that although linguistic precision is an important factor, it often has a lower priority in high-value business use cases. This view will hopefully take hold as the purpose and use of MT is better understood in the context of a larger business impact scenario, beyond localization.
Ultimately, the most meaningful measures of MT success are directly linked to business outcomes and use cases. The definition of success varies by the use case, but most often, linguistic accuracy as an expression of translation quality is secondary to other measures of success.
The integrity of the overall solution likely has much more impact than the MT output quality in the traditional sense: not surprisingly, MT output quality could vary by as much as 10-20% on either side of the current BLEU score without impacting the true business outcome. Linguistic quality matters but is not the ultimate driver of successful business outcomes. In fact, there are reports of improvements in output quality in an eCommerce use case that actually reduced the conversion rates on the post-edited sections, as this post-edited content was viewed as being potentially advertising-driven and thus less authentic and trustworthy.
There is also a post by Dr. Pete Smith that is worth a look: In a Funk about BLEU
Your personal data security really does matter Don't give it away |
The fourth most popular post of 2019 was by guest writer Robert Etches with his vision for Blockchain.
4. A Vision for Blockchain in the Translation Industry
Cryptocurrency has had a very bad year, but the underlying technology is still regarded as a critical building block for many new initiatives. It is important to be realistic without denying the promise as we have seen the infamous CEOs do. Change can take time and sometimes it needs much more infrastructure than we initially imagine. McKinsey (smart people who also have an Enron and mortgage securitization promoter legacy) have also just published an opinion on this undelivered potential, which can be summarized as:
Finally, much to my amazement, a post that I wrote in March 2012 was the fifth most-read post of 2019 even though seven years have passed. This proves Luigi's point, (I paraphrase here) that the more things change in the world at large, the more they stay the same in the translation industry.
4. A Vision for Blockchain in the Translation Industry
Cryptocurrency has had a very bad year, but the underlying technology is still regarded as a critical building block for many new initiatives. It is important to be realistic without denying the promise as we have seen the infamous CEOs do. Change can take time and sometimes it needs much more infrastructure than we initially imagine. McKinsey (smart people who also have an Enron and mortgage securitization promoter legacy) have also just published an opinion on this undelivered potential, which can be summarized as:
"Conceptually, blockchain has the potential to revolutionize business processes in industries from banking and insurance to shipping and healthcare. Still, the technology has not yet seen a significant application at scale, and it faces structural challenges, including resolving the innovator’s dilemma. Some industries are already downgrading their expectations (vendors have a role to play there), and we expect further “doses of realism” as experimentation continues."While I do indeed have serious doubts about the deployment of blockchain in the translation industry anytime soon, I do feel that if it happens it will be driven by dreamers, rather than by process crippled NIH pragmatists like Lou Gerstner and Rory. These men missed the obvious because they were so sure they knew all there was to know and because they were stuck in the old way of doing things. While there is much about blockchain that is messy and convoluted, these are early days yet and the best is yet to come.
Finally, much to my amazement, a post that I wrote in March 2012 was the fifth most-read post of 2019 even though seven years have passed. This proves Luigi's point, (I paraphrase here) that the more things change in the world at large, the more they stay the same in the translation industry.
The issue of equitable compensation for the post-editors is an important one, and it is important to understand the issues related to post-editing, that many translators find to be a source of great pain and inequity. MT can often fail or backfire if the human factors underlying work are not properly considered and addressed.
From my vantage point, it is clear that those who understand these various issues and take steps to address them are most likely to find the greatest success with MT deployments. These practitioners will perhaps pave the way for others in the industry and “show you how to do it right” as Frank Zappa says. Many of the problems with PEMT are related to ignorance about critical elements, “lazy” strategies and lack of clarity on what really matters, or just simply using MT where it does not make sense. These factors result in the many examples of poor PEMT implementations that antagonize translators.
From my vantage point, it is clear that those who understand these various issues and take steps to address them are most likely to find the greatest success with MT deployments. These practitioners will perhaps pave the way for others in the industry and “show you how to do it right” as Frank Zappa says. Many of the problems with PEMT are related to ignorance about critical elements, “lazy” strategies and lack of clarity on what really matters, or just simply using MT where it does not make sense. These factors result in the many examples of poor PEMT implementations that antagonize translators.
My role at SDL was also somewhat inevitable since as long as 7 years ago I was saying:
I suspect that the most compelling evidence of the value and possibilities of PEMT will come from LSPs who have teams of in-house editors/translators who are on fixed salaries and are thus less concerned about the word vs. hourly compensation issues. For these companies, it will only be necessary to prove that first of all MT is producing high enough quality to raise productivity and then ensuring that everybody is working as efficiently as possible. (i.e not "over-correcting"). I would bet that these initiatives will outperform any in-house corporate MT initiative in quality and efficiency.It is also clear that as more big-data becomes translation worthy, the need for the technologically informed linguistic steering will become more imperative and valuable.SDL is uniquely positioned to do this better than almost anybody else that I can think of. I look forward to helping make this a reality at SDL in 2020.
The SDL blog also had a strong preference for MT-related themes and if you are curious you can check this out: REVEALED: The Most Popular SDL Blogs of 2019
Wishing you all a Happy, Prosperous,
and Healthy, New Year and Decade