This is a guest post by Luigi Muzii, a frequent contributor to this blog. I wanted to make sure I had a chance to re-publish his thoughts on the MT human parity issue before he withdraws from blogging, and hopefully, this is not his last contribution. He has been a steady and unrelenting critic of many translation industry practices, mostly, I think with the sincere hope of driving evolution and improvement in business practices. To my mind, his criticism always had the underlying hope that business processes and strategies in the translation industry would evolve to look more like other industries where service work is more respected and acknowledged or more closely align to the business mission needs of clients. His acerbic tone and dense writing style have been criticized, but I have always appreciated his keen observation and unabashed willingness to expose bullshit, overused cliches, and platitudes in the industry. There is just too much Barney-love in the translation industry. Even though I don't always agree with him, it is refreshing to hear a counter opinion that challenges the frequent self-congratulation that we also see in this industry.
When I first came to the translation industry from the mainstream IT industry I noticed that people in the industry were more world-wise, cultured, and even gentler than most I had encountered in the IT industry. However, the feel-good vibe engendered by the multicultural sensitivity also sustains a cottage industry characteristic to processes, technology, and communication style in this industry. People are much more tolerant of inefficiency and sub-optimal technology use. I noticed this especially from the technology viewpoint as I entered the industry as a spokesperson for Language Weaver who was an MT pioneer with data-driven MT technology, the first wave of "machine learning". I was amazed by the proliferation of shoddy in-house TMS systems and the insistence to keep these mostly second-rate systems running. When a group of more professionally developed TMS systems emerged, these TMS vendors struggled to convince key players to adopt the improved technology. It is amazing that even companies that reach hundreds of millions of dollars in annual revenue still have processes and technology use profiles of late-stage cottage industry players. Even Jochen Hummel the inventor of Trados (TM) has expressed surprise that a technology he developed in the 1980s is still around, and has stated openly that it should properly be replaced by some form of NMT!
The resistance to MT is a perfect example of a missed opportunity. Instead of learning to use it better, in a more integrated, knowledgeable, and value-adding way for clients, it has become another badly used tool whose adoption struggles along, and MT use is most frequently associated with inflicting pain and low compensation on the translators forced to work with these sub-optimal systems.
In an era where trillions of words are being translated by MT daily in public MT portals, the chart above should properly be titled "Clueless with MT". I would also change it to N=170 LSPs that don't know how to use MT. Most LSPs who claim to "do MT", even the really large ones, in fact, do it really badly. The Translated - ModernMT deployment in my opinion is one of the very few exceptions of how to do MT right for the challenging localization use case. It is also the ONLY LSP user scenario I know where MT is used in 90% or more of all translations work done by the LSP. Why? Because it CONSISTENTLY makes work easier, more efficient, and most importantly translators consistently ask for access to the rapidly learning ModernMT systems. Rather than BLEU scores, a production scenario where translators regularly and fervently ask for MT access is the measure of success. It can only happen with superior engineering that understands and enhances the process. It also means that this LSP can process thousand words projects with the same ease as they can process billions of words a month and scale easily to trillions of words if needed. In my view, this is a big deal and that is what happens when you use technology properly. It is no surprise that most of the largest MT deployments in the world outside of the major Public MT Portals (eCommerce, OSI, eDiscovery) have little to no LSP involvement. Why would any sophisticated global enterprise be motivated to bring in an LSP that offers nothing but undifferentiated project management, dead-end discussions on quality measurement, and a decade-long track record of incompetent technology use?
This, I felt was a fitting introduction to Luigi's post. I hope he shows up once in a while in the coming future, as I don't know many others who are as willing to point out "areas of improvement" for the community as willingly as he does.
The Productivity Paradox
Economists have argued for decades that massively investing in office technologies would enormously boost up productivity. However, already in 1994 authoritative studies had cast doubts on the reliability of certain projections. Recent studies reported that a 12 percent annual increase in the data processing budgets for U.S. corporations have yielded annual productivity gains of less than 2 percent.
The reasons for those gains to be much less than expected might be in long-established business practices that have possibly been holding them back by restraining knowledge workers from taking full advantage of better and better tools, thus boosting productivity, proving the significance of the law of the instrument.
Therefore, to achieve the expected increases in productivity most business practices should change.
Word Rates v. Hour Rates
Translation pays have been based on per-word rates for over thirty years. The reasons are basically twofold. On one hand, computer-aided translation tools have finally enabled buyers to understand (more or less) precisely what they have been paying for. On the other hand, computer-aided translation tools have been allowing to measure throughput (almost) objectively and productivity, thus helping statistics and projections.
Add to that the ability for buyers to request discounts based on the percentage of matches between a text and a translation memory and it instantly becomes obvious that it is not the translator’s time, expertise, or skills that they are buying and paying for.
Nevertheless, a translation assignment/project inevitably ends up involving a series of collateral tasks whose fee cannot be computed on a per-word basis.
The price LSPs charge buyers, then, includes the price for services for which they then pay vendors on a different basis. Similarly, in setting their own fees, these vendors include the compensation for non-productive or non-remunerative tasks. The word-rate fee, then, is also based on the time required to complete a certain task. In short, this means that even the conundrum of measured fees (word rate and hourly rate) v. fixed fees is pointless. The moment the parties agree on how to compute the fee, only measuring is left open. And when it comes to statistics and projections, this is of more interest to the supplier—specifically the middleman—than the buyer.
Not only would reducing non-productive tasks allow for regaining margins and cutting the selling price, but also for regaining productivity and resources to allocate for increasing efficiency through automation, thus ultimately productivity itself.
If anything, now more than ever, it is necessary to foster standardization and reach an agreement on reference models, metrics, and methods of measurement. The resulting standardization of exchange formats, data models, and metrics would help productivity and interoperability.
In fact, some tasks, like file preparation or, more precisely, the assembly of localization packages and kits, cannot be fully automated or outstripped from the translation/localization workflow, although they are indeed separate jobs. In this respect, standardization might also help automate such tasks. Nevertheless, when extensive and time-tolling, these tasks should be the buyer’s responsibility. Incidentally, given the traditionally poor consideration of buyers for the translation industry and their insufficient understanding of translation and localization and the related workflow, most of the problems associated with project setup and file-preparation is attributable to sloppiness and immaturity. This includes job instructions requiring project teams to spend time reading through them.
On the other hand, some of these tasks, like quotation, are commonly part of project tasks while they should not. So, for example, when formulating quotations at selling, any subsequent task relating to it can be (at least partially) automated. The same goes for instructions that might become mandatory workflow steps (when platforms allow for custom workflows) and checklists to run.
Skill, Labor Shortages, and Education
Here are a few questions for those who have designed or design, have held, or hold translation and localization courses:
- Have your lectures ever dealt with style guides and job instructions for students to learn how to follow them?
- Have you ever included in your assessments the degree of compliance with style guides and instructions during exams?
Customers and LSPs, to the same extent, have always been complaining about the lack of qualified language professionals.
At the TAUS Industry Summit 2017, Bodo Vahldieck, Sr. Localization Manager at VMware, expressed his frustration at not being able to find young talent willing and able to go and work with the “fantastic localization technology suites” at his company.
Sometime earlier, CommonSense Advisory had also launched the alarm on the talent shortage in the language service industry.
Even earlier, Inger Larsen, Founder & MD at Larsen Globalization Recruitment, a recruitment company for the translation industry wrote an article titled Why we still need more good translators reporting about the outcome of a little informal poll showing a failure rate for translators passing professional test translations was about 70 percent, although they all were qualified translators, many of them with quite a lot of experience.
The talent shortage is no news, then, and lately many companies in other industries have been reporting hiring troubles. Apparently, Gresham’s Law [an economic principle commonly stated as Bad money drives out good is ruling everywhere, not just in the translation space.
Actually, the labor shortage is a myth. The complaints of Domino’s Pizza CEO, Uber, and other companies are insubstantial because the simplest way to find enough labor is by offering higher wages. In doing so, new workers will enter the market and any labor shortages will quickly end. A rare case for true labor shortage in a free economy is when wages are so high that businesses cannot afford to pay them without going broke. But this would be like the dot-com bubble that led an entire economy to collapse.
Therefore, such complaints are most possibly the sign that corporate executives have grown so accustomed to a low-wage economy to believe anything else is abnormal.
But when bad resources have driven out good ones altogether, offering higher wages might not be enough and presents the risk of overpaying; even more so if the jobs available are very low-profile and can hardly be automated.
- Complete flexibility in hours and location
This means that, in response to skill shortages and to position themselves to win in the future, companies will have to leverage flexible work models and meet employees where they are. And yet, many still seem to be on a different path.
- Different productivity metrics
Traditional productivity metrics will have to address the value delivered, not the volume i.e., companies will have to prioritize outcomes over output. Surprisingly, many companies claim this is already how they operate.
A diverse workforce will become even more important as roles, skills, and company requirements change over time, although this will challenge current productivity metrics even further.
Machines Do Not Raise Wage Issues
If the linear decrease of pay in the face of the exponential growth of translation demand is puzzling, it is because we are accustomed to the fundamental market law: When demand increases, prices rise. But the technology lag that educational institutions and industry players generally, show compared with other industries and, most importantly, clients which mean that even the best resources do not keep up with productivity expectations, regardless of whether these are more or less reasonable. Also, the common failure of LSPs to differentiate, maximize efficiency and reduce costs leads them to compete on price alone, which only exacerbates the situation, making translation and localization a commodity. Finally, the all too often unreasonable demands of LSPs, even more, unreasonable than those of their customers, have been driving the best resources off the industry. It is a vicious circle that makes productivity a myth and an illusion.
Productivity is a widely discussed subject that has got even more attention during the pandemic. As David J. Lynch recently put it in The Washington Post, “Greater productivity is the rare silver lining to emerge from the crucible of covid-19”. This eventually has kick-started a turn to automation, which is gradually spreading through structural shifts that will further spur it.
Lynch also pointed out that, assuming and not conceding that labor shortages actually exist and are a problem, after helping businesses survive, automation will help them attract labor to meet surging demand.
There is a general understanding that, during the pandemic, firms became more productive and learned to do more with less, even though, in this respect, the effect of technology has been fairly marginal, and less than that from purely organizational measures.
Anyway, according to a McKinsey study, investments in new technologies are going to accelerate through 2024 with an expectation of significant productivity growth. That is because automation is generally understood as different from office technologies or, more likely, because the organizational measures above are more challenging, cost more and are less tax-efficient. Or maybe because more and more businesses complaining of labor shortages are convinced that automation will allow them to fill orders they otherwise would have to turn down.
After all, this is exactly the approach of LSPs towards machine translation and even more so post-editing. But automation as understood is limited and distorted and leads to an exacerbation of the effects of the Gresham’s Law. On the other hand, many translators are still quite unconvinced of machine translation and see it as slightly useful. This is due mostly to the negative policies of most LSPs and their widespread attitude towards automation, machine translation and technology at large that have repeatedly exposed LSPs and their vendors to the deadly effects of incompetently implemented and deployed machine translation systems, whose only objective is to try and reduce translator compensation and safeguard margins.
Playing with Grown-ups
Experienced customers know that machine translation is no panacea [for translation challenges] and does not come cheap. True, online machine translation engines are free, but they are not suitable for business or professional use, requiring experienced linguists to exploit them for professional use. A corporate machine translation platform requires a substantial initial investment, plus specific know-how and resources, including a proper (substantial) amount of quality data to train models. Most importantly, it requires time and patience, which are traditionally a rare commodity in today’s business world.
The most coveted achievement of any LSP is to play in the same league as grown-ups, but grown-ups do not want to play with LSPs when they get to know them, and learn LSPs cannot help them find the best suited machine translation system, implement, train, and tune it because they do not have the necessary know-how, ability, and resources. For the same reasons, they know they cannot outsource their machine translation projects to the LSPs themselves, no matter how hard these offer their services in this field too.
Disenchantment when not skepticism or outright distrust is the consequence of LSPs not being attuned to the needs of clients, especially the bigwigs (the grown-ups), and the resulting lack of integration with their processes. Then again, clients have always been asking for understanding and integration and what have they got in response? A pointless post-editing standard.
LSPs are losing the continuous localization battle too. Rather than adjusting processes to the customer’s modus operandi, LSPs—and their reference consultants—blame customers for demanding localization teams to keep up with code and content as these are developed, before deployment. On the other hand, rather than streamlining their processes, LSPs try and stick hopelessly to the traditional clumsy ones. No wonder customers have issues in trusting LSPs.
Apparently, in fact, many LSPs are concerned about the effects of continuous localization on linguistic quality, when the kind of quality LSPs are accustomed to is exactly what they should forget. Not for nothing, a basic rule in the Agile model, consists of using every new iteration to correct the errors made in the previous one.
If anything, it is odd that machine translation has not become predominant already and that clients and, more importantly, LSPs insist on maintaining working and payment models that are, to say the least, obsolete.
What if, for example, the idea around quality rapidly changes, and customer experience becomes the new paradigm?
This would reinforce the base for wide-ranging service level agreements to cover a stable buyer-vendor relationship first on the client-LSP side and then on the LSP-vendor side, with international payments going through a platform enabling the buyer to pay vendors in their local preferred currency. A clause in the agreement may require the payees sign up with the platform and input their banking details and preferred currency.
Payment platforms already exist that allow clients to qualify for custom (flat) rates by submitting a pricing assessment form, and that connect with other systems through a web API translator via no-code applets based on an IFTTT (If This Then That) mechanism.
Payments are not easy, but it is worth getting right because it is the sore point paving the road for Gresham’s Law.
If the debate around rates and payments has never gone past the stage of rants and complaints, the one around quality has been intoxicating the translation space for years without leading to any significant outcome. Yet they still produce tons of academic publications around the same insubstantial fluff and generate thousands of lines of code just to keep repeating the same mistakes.
As long as machine translation was a subject confined to specialists, relatively objective metrics and models ruled the quality assessment process with the goal of improving the technology and the assessment metrics and models themselves.
After entering the mainstream, a few years ago, machine translation became marketing prey. Marketing people at machine translation companies started targeting translation industry players with improvements in automated evaluation metrics, typically BLEU, and the public with claims of "human parity". [And also the increasing use of bogus MT quality rankings done by third parties.]
Both are smoke and mirrors, though. On one side, automated metrics are no more than just the scores they deliver, and their implications are hard to grasp; also, they have been showing all their limitations with Neural MT models. On the other hand, no one has bothered to offer a consistent, unambiguous, and undisputable definition of ‘human parity’ other than the ones from the companies bragging they have achieved it.
Saying that machine translation output is “nearly indistinguishable from” or “equivalent to” a human translation is misleading and means almost nothing. Saying that a machine has achieved human parity if “There is no statistically significant difference between human quality scores for a test set of candidate translations from a machine translation system and the scores for the corresponding human translations” may sound more exhaustive and accurate, but comparisons depend anyway on the characteristics of input and output and on the conditions for comparison and evaluation.
In other words, the questions to answer are, “Is every human capable of translating in any language pair? Can any human produce a translation of equivalent quality in any language pair? Can any human translate better than machines in any language pair?” And vice versa.
All too often, people, even in the professional translation space, tend to forget that machine translation is a narrow-AI application i.e., it focuses on one narrow task, with each language pair being a separate task. In other words, the singularity that would justify making the claim of "human parity" is still afar, and not just in time, so much for Ray Kurzweil’s predictions or Elon Musk’s confidence in Neuralink’s development of a universal language and brain chip.
Using automatic MT quality scores as a marketing lever is therefore misleading because there are too many variables at play. Talking about "human parity" is misleading too because one should consider the conditions under which the assessment leading to certain statements has been conducted.
Now, it is quite reasonable for a client to ask a partner (as LSPs like to think of themselves) to help them correctly and fully interpret machine translation scores and certain catchphrases that may sound puzzling for vagueness or ambiguity.
Most clients—the largest ones anyway—are in a different league in terms of organizational maturity than their language service providers, and cannot understand the reason for the sloppiness and inefficiency they see in these would-be partners. And yet it is quite simple: The traditional, still common translation process model they follow are not sustainable even for mission-critical content. Incidentally, this brings us back to productivity, payments, Gresham’s law, and skill and labor shortages, all interrelated.
Not only are leaner, faster, and more efficient processes necessary more than ever, a mutual understanding is crucial. To help customers understand translation products and services, and value them accordingly, the people in this industry should waive the often obfuscating jargon that no client is interested in and is willing to learn and decipher. Is this jargon part of the notorious information asymmetry?
A greater and more honest self-assessment is necessary, which the industry is, instead, dramatically lacking at all levels. And this possibly explains the greater interest in the machine translation market and industry rather than in the translation industry.
Luigi Muzii has been in the "translation business" since 1982 and has been a business consultant since 2002, in the translation and localization industry through his firm. He focuses on helping customers choose and implement best-suited technologies and redesign their business processes for the greatest effectiveness of translation and localization-related work.
This link provides access to his other blog posts.