Pages

Friday, February 17, 2023

The Problem With LangOps

This is a letter I wrote to the editor of Multilingual after reading several articles focused on LangOps in the December 2022 issue.  This discussion started on LinkedIn and Cameron invited the active contributors to formalize our comments and write a letter to the editor with alternate viewpoints.

TLDR: LangOps is a term that refers to the vague use of "A.I."  in/around localization or is nothing more than a way to describe the centralization of enterprise translation production processes.



I carefully read all of the following before writing my letter to ensure that I had not somehow missed the boat.  The basic question I am still left with after looking carefully through the LangOps material is "Where's the real substance of this concept/idea/word?"


Jump to 2' 50" to get to the relevant part

Here is a slightly ornamented version of the text of my letter to the editor which was published in the Multilingual February 2023 issue. I include a version with emphasis (mine) so that others may also comment on this, and perhaps correct my misperception. 

Special Thanks to Marjolein Groot Nibbelink for taking the trouble to convert the letter to a really well-read audio track that can be played back faster.


Dear Multilingual Editor (Cameron),

After reading the various articles on LangOps in the Multilingual December 2022 issue, I had hoped that I would get a better sense of what LangOps is, and why it matters. But I cannot say that this happened for me, and I am not sure if I (or any other reader) have any more clarity on what LangOps is, beyond it being a vendor buzzword, that remains fuzzy and amorphous because there is not enough supporting evidence to document it properly. While there was much discussion about why a new definition that went further than localization is needed, there was not much that defined LangOps in more concrete terms. I suspect the fuzziness and lack of clarity that I felt are true for many other readers as well.

One is left asking. “Where’s the beef?” on this thing they call LangOps.

I reviewed the articles in the magazine on the LangOps subject again before writing this letter, to better identify the defining elements, and to make sure I was fair and had not missed some obvious facts. My intention with my comments here is to hopefully provide a coherent critique of the subject matter, which started in discussion with comments made by several readers about LangOps on LinkedIn. 

From my reading, the articles in Multilingual were clearer on Why new definitions are needed, but less clear on the What [it is] or explaining the How.

It appears to me that the LangOps concept is another attempt by some stakeholders in the industry to raise the profile of the translation business, to make it more visible at the executive level, or to increase the perceived value of the translation production process by imbuing it with more complexity and mysterious undefined AI elements. However, in the absence of specifics, it becomes just another empty buzzword that creates more confusion than clarity for most of us, especially so for new buyers. 

It is difficult to see how any sponsor could take the descriptions provided in this issue of Multilingual to a senior executive to ask for funding, or even to explain what it is.

It is clear that as the translation of some product and marketing content became recognized as a valuable international business-driving activity, the need to scale, organize and systematize it became more urgent and led to what most call localization today. 

Thus, localization I think refers to the many processes, activities, and tools, used in making language translation processes more automated, structured, and systematic. Most often this work is related to relatively static content that is mandatory in international markets, but recently it has expanded to include more customer service and support content. It also sometimes includes cultural adaptations that are made in addition to the basic translation. 

TMS systems have been central to the localization worldview over the past decade, as these TMS systems facilitate the development and management of different workflows, monitor translation work, and ease project management of distributed translation-related tasks (TEP). It is also true that MT has been minimally used in hard-core localization settings as MT systems were not deemed to be accurate, flexible, and simple enough to configure to be used in this work.

By carefully reviewing the published Multilingual articles again, I gathered that the following elements that are being used to define what LangOps is:

  • There are AI-driven capabilities applied to certain localization processes which are not defined,
  • Centralization of all translation production activities across the enterprise,
  • Introduction of “more” technology into existing localization workflows, but what this is specifically, is unclear,
  • LangOps is said to be made up of cross-functional and inter-disciplinary teams, but who and why is not clear,
  • Possibly adding other value-added language tasks (sentiment analysis, summarization, chatbots) in addition to the translation. [This at least is clear].

To my view, the only element here that is clear in the many descriptions [of LangOps] is that of the centralization of translation production. 

The other elements used to describe what it is are kind of fuzzy and hard to pin down. They can mean anything or could mean nothing since vagueness is not easily pinned down. LangOps is another term, that is possibly even worse than localization (which confuses many regular people and many new customers) because it creates a communication problem. 

How do you answer the question, “What do you do?” in an elevator, a cab, at a party, on an airplane, with family and friends? As you can see both Localization and LangOps present opaque, obfuscating images to the regular human mind.

Would it not be so much easier to just say “Language Translation to Drive International Business”? And then maybe add, “We use technology, tools, and people to do it at a large scale efficiently.”

I would like to suggest a different way to view the continuing evolution of business translation. It is my feeling that the LangOps movement is linking the growing number of MT use cases, which have more dynamic IT connectivity, and cross-organization collaboration implications, with a need for a new definition.

We have now reached that perfect storm moment where most B2C and B2B businesses recognize that they need a substantial digital presence, that it is important to provide large volumes of relevant content to serve and please their customers, and that they need to listen to customers in social media, understand trends faster, and communicate across the globe much faster. 

This means that successful businesses have to share, communicate, listen, and produce translations at a much larger scale than they have had to in the past. The core competency from traditional localization work is less likely to be useful with these new challenges. These new market requirements need a shift away from TM and TMS-managed work to a more MT-centric view of the world. The volume of translation increases from thousands of translated words, a month, to millions or even billions of words a month to drive successful international business outcomes in the modern era. 

As Generative AI improves and begins to be deployed in production customer settings, we will only see the translation volumes grow another 10X or 100X. Thus, deep MT competence increasingly becomes a core requirement to be in the enterprise translation business.

MT has been improving dramatically over the last five years in particular, and it is not ridiculous to say that it is getting close to human output in some special cases when systems are properly designed and deployed by competent experts. 

Competence means that experts can quickly adapt and modify MT systems to produce useful output in the 20-30 different use cases where an enterprise faces an avalanche of text and/or audiovisual content. The new use cases go beyond the traditional focus of localization in terms of content and process. We now need to translate much more dynamic content related to customer services and support, translate more active communications (chat, email, forums), share more structured and unstructured content, pay more attention to social media feedback, and are just more real-time and dynamic in general.

The successful modern global enterprise listens, understands, communicates, and actively shares content across the globe to improve customer experience. Thus, I think it is fair to say that we (the translation business) are moving to a more MT-centric world from a previously TMS-centric world, and a critical skill needed today is deep competence with MT. 

Useful MT output means it helps grow and drive international business, even though it may not be linguistically “perfect”. The requirement for MT competence requires moving far beyond choosing an MT system with the best BLEU or COMET score. 

MT Competence means you can find egregious errors (MT & AI make these errors all the time) and instantly correct these problems to minimize damage. 

MT Competence means the skill and agility to respond to changing business needs and new content types and the ability to rapidly modify MT systems as needed. 

Competence in managing rapid, responsive, deep adaptation of MT systems will be a key requirement to actively participate as an enterprise partner (not vendor) on a global stage very shortly. 

When language translation is mission-critical and pervasive, the service provider will likely evolve from being a vendor to being a partner. It can also often mean that the scope of localization teams is greatly expanded and become more mission-critical.

While I can see a business reality where there is Machine-First & Human Optimized translation approach to content across the global enterprise, which requires responsive, continuously improving MT, it also means moving beyond traditional MTPE where clean-up crews come to reluctantly fix badly formed MT output produced by inexperienced and incompetent MT practitioners. 

However, the lights start to dim for me when I think of "LangOps" being part of this reality in any form whatsoever.

This continuing evolution of business translation also probably means that there is a much more limited role for the TMS or using it only for some localization (software and key documentation) workflows. The more common case as translation volumes grows is to connect all (Customer Experience) CX-related text directly into highly tuned, carefully adapted NMT systems in high-performance low-latency IT infrastructure that is directly customer-facing, or customer accessible. 

Recent data I have seen on MT use across a broad swathe of enterprise users shows that as much as 95% of MT use completely bypasses the TMS. Properly tuned expert-built MT engines do not need the unnecessary overhead of a TMS system. The enterprise objective is to enable translation at scale for everything that might require instant, and mostly but not necessarily a perfectly accurate translation, as long as it furthers and enhances any and every global business initiative and communication. 

Speed and scale are more important and have a more positive impact on international business success in many CX-related use cases than perfect linguistic quality does. The enterprise executives understand this even though we as an industry might not.

I am not aware of a single LangOps configuration or group on this earth or know any enterprise that claims to have such an initiative, but I can point to several massive-scale MT-driven translation engines around the world e.g. Airbnb, Amazon, Alibaba, and eBay where billions of words are translated regularly to drive international business and customer delight and serve a growing international customer base. I am confident we will see this pool of enterprise users grow beyond the eCommerce markets.

Thus, I see little value in promoting the concept of LangOps as what actually seems to be happening is that more expert-tuned enterprise MT is being used and we see the share of MT used to total translation volumes continue to grow. 

As this kind of responsive, highly adaptive MT capability becomes more pervasive across an enterprise, it also becomes a critical requirement for international business success. The activities related to organizing and managing significantly more dynamic content and translation volumes should not be mistaken to be something as vague as LangOps, as no organization I am aware of has the building blocks or template to create such a vaguely defined function. I think that it is more likely that Localization teams will evolve and the scope of their activities will increase, perhaps as dramatically as we have seen at Airbnb.

Airbnb just booked its first annual profit in its near-15-year history, a whopping $1.9bn in 2022. It now appears to be in rarefied air, with its place as the de facto online marketplace for homestays and experiences, giving it a network effect that’s hard to compete with.

Airbnb just booked its first annual profit in its near-15-year history, a whopping $1.9bn in 2022. It now appears to be in rarefied air, with its place as the de facto online marketplace for homestays and experiences, giving it a network effect that’s hard to compete with.

 

I did find all the articles on LangOps useful in furthering my understanding, especially the ones by Riteba McCallum, and Miguel Cerna, and my comments should not be mistaken as a wholesale dismissal of the viewpoints presented. On the contrary, I think we have much more agreement on many of the core issues discussed. Though I do admit that I find the general concept of LangOps as it has been painted, to be a likely hindrance to our mutual future rather than a beneficial concept to drive our success with globalization and international business initiatives with our common customers.


Respectfully Yours,

Kirti Vashee



Here is the LinkedIn article where the discussion began:


P.S.  Maybe all I am saying is that LangOps just needs more cowbell 😄😄😄 to get the sound and the concept right?



Wednesday, February 1, 2023

The March Towards AI Singularity and Why It Matters

 

Why progress in MT is a good proxy of progress with the technological singularity


For as long as machine technology has been around (now over 70 years) there have been regular claims made by developers of the technology reaching “human equivalence”. However, until today we have not had a claim that has satisfied practitioners in the professional translation industry, who are arguably the most knowledgeable critics around.  For these users, the actual experience with MT has not been matched by the many extravagant claims made by MT developers over the years.

This changes with the long-term study and translation production data presented by Translated SRL at the AMTA conference which provides the missing elements: a huge industrial-scale evidentiary sample validated by a large group of professional translators across multiple languages based on professional translation work done in real-world production scenarios.

The historical difficulty in providing acceptable proof does not mean that progress is not being made, but it is helpful to place these claims in proper context and perspective to better understand what the implications are for the professional and enterprise use of MT technology. 

The history of MT (machine translation) is unfortunately filled with empty promises

MT (human language translation) is considered among the most difficult theoretical problems in AI, and thus we should not be surprised that it is a challenge that has not yielded completely to the continuing research efforts of MT technology experts over the decades. Also, many experts have said that MT is a difficult enough challenge (AI-complete: because it requires a deep contextual understanding of the data, and the ability to make accurate predictions based on that data) that it is a good proxy for AGI (Artificial general intelligence is the ability of a machine process/agent to understand or learn any intellectual task that a human being can) and thus progress with MT can also mean that we are that much closer to reaching AGI.

The Historical Lack of Compelling Evidence

MT researchers are forced to draw conclusions on research progress being made based on relatively small samples of non-representative data (from the professional translation industry perspective) that are evaluated by low-cost human "translators".  The Google Translate claims in 2016 are an example of a major technology developer making "human-equivalence" claims based on limited data that was possible within the scope of the technology development process typical at the time.

Namely, here are 200 sentences that amateur translators say are as good as human translation, thus we claim we have reached human equivalence with our MT.

Thus, while Google did indeed make substantial progress with its MT technology, the evidence it provided to make the extravagant claim lacked professional validation, was limited only to a small set of news domain sentence samples, and was not representative of the diverse and broad scope of typical professional translation work which tends to be much more demanding and varied.

The problem from the perspective of the professional industry with these historical as-good-as-humans claims can be summarized as follows:

  1. Very small samples of non-representative data: Human equivalence is claimed on the basis of evaluations of a few news domain segments where non-professional translators were unable to discern meaningful differences between MT and human translations. The samples used to draw these conclusions were typically based on no more than a few hundred sentences.
  2. Automated quality metrics like BLEU were used to make performance claims: The small samples of human evaluation were generally supported by larger (a thousand or so) sentences where the quality was assessed by an automatic reference-based score. There are many problems with these automated quality scores as described here, and we now know that they miss much of the nuance and variation that is typical in human language, resulting in erroneous conclusions, and at best they are very rough approximations of competent human assessments. COMET and other metrics are slightly better quality approximation scores but still fall short of competent human assessments which are still the "gold standard" in assessing translation output quality. The assessments of barely bilingual translators found in Mechanical Turk settings and often used by MT researchers are likely to be quite different from expert professional translators whose reputations are defined by their work product. Competent human assessments ("gold standard") are often at odds and different from the segments suggested as the best-scoring ones based on metrics like COMET or hLepor.
  3. Overreaching extrapolations: The limited evidence from these experiments was marketed as “human-equivalence” by Google and others, and invariably resulted in disappointing professional translators and enterprise users who quickly witnessed the poor performance of these systems when they strayed away from news domain content.  Though these claims were not deliberately deceptive, they were made to document progress from a perspective that was much narrower than the scope and coverage typical of professional translation work.  There has never been a claim of improved MT quality performance based on the huge scale (across 2 billion segments) presented by Translated SRL.

Translated SRL Finally Provides Compelling Evidence


The measurement used to describe ongoing progress with MT is Time To Edit (TTE). This is a measurement made during routine production translation work and represents the time required by the world’s highest-performing professional translators to check and correct MT-suggested translations.

Translated makes extensive use of MT in their production translation work and has found that TTE is a much better proxy for MT quality than measures like Edit Distance, COMET, or BLEU. They have found that rather than using these automated score-based metrics, it is more accurate and reliable to use a measurement of the actual cognitive effort extended by professional translators during the performance of production work.

Consistent scoring and quality measurement are challenging in the production setting because this is greatly influenced by varying content types, translator competence, and changing turnaround time expectations. A decade of careful monitoring of the production use of MT has yielded the data shown above. Translators were not coerced to use MT and it was only used when it was useful.

The data are compelling because of the following reasons:

  • The sheer scale of the measurements across actual production work is described in the link above. The chart focuses on measurements across 2 billion edits where long-term performance data was available.  
  • The chart represents what has been observed over seven years, across multiple languages, measuring the experience of professional translators making about 2 billion segment edits under real-life production deadlines and delivery expectations.
  • Over 130,000 carefully selected professional translators contributed to the summary measurements shown on the chart.
  • The segments used in the measurements are all no TM match segments as this represents the primary challenge in the professional use of MT.
  • The broader ModernMT experience also shows that highly optimized MT systems for large enterprise clients are already outperforming the sample shown above which represents the most difficult use case of no TM match.
  • A very definite linear trend shows that if the rate of progress continues as shown, it MAY be possible to produce MT segments that are as good as those produced by professional translators within this decade. This is the point of singularity at which the time top professionals spend checking a translation produced by the MT is not different from the time spent checking a translation produced by their professional colleagues which may or may not require editing.

It is important to understand that the productivity progress shown here is highly dependent on the superior architecture of the underlying ModernMT technology which learns dynamically, and continuously, and improves on a daily basis based on ongoing corrective feedback from expert translators. ModernMT output has thus continued to steadily improve over time. It is also highly dependent on the operational efficiency of the overall translation production infrastructure at Translated SRL.

The virtuous data improvement cycle that is created by engaged expert translators providing regular corrective feedback provides the right kind of data to drive ongoing improvements in MT output quality. This improvement rate is not easily replicated by public MT engines and periodic bulk customization processes that are typical in the industry.

The corrective input is professional peer revision during the translation process - and this expert human input "has control," and guides the ongoing improvement of the MT, not vice versa. While overall data, computing, and algorithms are critical technological foundations to ongoing success, expert feedback has a substantial impact on the performance improvements seen in MT output quality.

The final quality of translations delivered to customers is measured by a metric called EPT (Errors per thousand words) which in most cases is 5 or even as low as 2 when two rounds of human review are used. The EPT rating provides a customer-validated objective measure of quality that is respected in the industry, even for purely human translation product when no MT is used.    

There is a strong, symbiotic, and mutually beneficial relationship between the MT and the engaged expert translators who work with the technology. The process is quite different from typical clean-up-the-mess PEMT projects with customized static models where the feedback loop is virtually non-existent, and where the MT systems barely improve even with large volumes of post-edited data.

Responsive, Continuously Improving MT Drives Engagement from Expert Translators 
Who See Immediate Benefit During the Work Process

The Problem with Industry Standard Automated Metrics for MT Quality Assessment

It has become fashionable in the last few years to use automated MT quality measurement scores like BLEU, Edit Distance, hLepor, and COMET as a basis to select the “best” MT systems for production work. And some companies use different MT systems for different languages in an attempt to maximize MT contributions to production translation needs. These scores are all useful for MT system developers to tune and improve MT systems, however, globalization managers who use this approach may overlook some rather obvious shortcomings of this approach for MT selection purposes.

Here is a summary listing of the shortcomings of this best-MT-based-on-scores approach:

  1. These scores are typically based on measurements of static systems. The score is ONLY meaningful on a certain day with a certain test set and actual MT performance may be quite different from what the static score might suggest. The score is a measurement of a historical point and is generally not a reliable predictor of future performance.
  2. Most enterprises need to adapt the system to their specific content/domain and thus the ability of a system to rapidly, easily, and efficiently adapt to enterprise content is usually much more important than any score on a given day.
  3. These scores do not and can not factor in the daily performance improvements that would be typical of an adaptive, dynamically, and continuously improving system like ModernMT, which would most likely score higher every day it was actively used and provided with corrective feedback. Thus, they are of very limited value with such a system.
  4. These scores can vary significantly with the test set that is used to generate the score and scores can vary significantly as test sets are changed. The cost of generating robust and relevant test sets often compromises the testing process as the test process can be gamed.
  5. Most of these scores are only based on small test sets with only 500 or so sentences and the actual experience in production use on customer data could vary dramatically from what a score based on a tiny sample might suggest.
  6. Averaged over many millions of segments, TTE gives an accurate quality estimate with low variance and is a more reliable indicator of quality issues in production MT use. Machine translation researchers have had to rely on automated score-based quality estimates such as the edit distance, or reference-based quality scores like COMET and BLEU to get quick and dirty MT quality estimates because they have not yet had the opportunity to work with such large (millions of sentences) quantities of data collected and monitored in production settings.
  7. As enterprise use of MT evolves the needs and the expected capabilities of the system will also change and thus static scores become less and less relevant to the demands of changing needs.
  8. Also, such a score does not incorporate the importance of overall business requirements in an enterprise use scenario where other workflow-related, integration, and process-related factors may actually be much more important than small differences in scores.
  9. Leading-edge research presented at EMNLP 2022 and similar conferences provide evidence that COMET-optimized system rankings frequently do not match what “gold-standard” human assessments would suggest as optimal. Properly done human assessments are always more reliable in almost every area of NLP. The TTE measurements described above inherently allow us to capture human cognition impact and quality assessment at a massive scale in a way that no score or QE metric can today.
  10. Different MT systems respond to adaptation and customization efforts in different ways. The benefit or lack thereof from these efforts can vary greatly from system to system especially when a system is designed to primarily be a generic system. Adaptive MT systems like ModernMT are designed from the outset to be tuned easily and quickly with small amounts of data to fit a wide range of unique enterprise use cases. ModernMT is almost never used without some adaptation effort, unlike generic public MT systems like Google MT which are primarily used in a default generic mode. 


A “single point quality score” based on publicly sourced sentences is simply not representative of the dynamically changing, customized, and modified potential of an active and evolving enterprise adaptive MT system that is designed to be continuously adapted to unique customer use case requirements.


When it is necessary to compare two MT systems in a buyer selection & evaluation process, double-blind A/B human evaluations on actual client content would probably produce the most accurate and useful results that are also better understood by the executive and purchasing management.

Additionally, MT systems are not static: the models are constantly being improved and evolving, and what was true yesterday in quality comparisons may not be true tomorrow. For these reasons, understanding how the data, algorithms, and human processes around the technology interact is usually more important than any static score-based comparison snapshot.  A more detailed discussion of the overall MT system comparison issues is provided here.

Conducting accurate and consistent comparative testing of MT systems is difficult with either automated metrics or human assessments. We are aware that the industry struggles in its communications about translation quality with buyers.  Both are easy to do badly and difficult to do well. However, in most cases, properly done human A/B tests will yield much more accurate results than automated metrics.

 Questions to ask when looking at automated metrics: 

  • What specific data was used to calculate the score? 
  • How similar or different is it from my data? 
  • Can I see the data that was used? 
  • How easy or difficult is it to adapt this MT system to my specific linguistic style and preferences? 
  •  How much effort is needed to teach this MT system to use my preferred style and language? 
  • Will I need ML experts to do this or can my translators drive this? 
  • Do small score differences really mean anything? 
  •  What happens to these scores if I make changes to the test set? 
  • How quickly will this MT system improve as my translators provide daily corrections? 
  • Do my translators accept these score-based rankings if I show them the output from 3 different systems? 
  • Do my translators like working with this MT system? 
  • Will I be forced to use less qualified translators if I use this MT system as the best translators will prefer to decline?


The Implications of Continuously Improving MT

Modern commerce is increasingly done with the support of online marketplaces and the importance of providing increasingly larger volumes of relevant content digitally to customers has become an important requirement for success.

As the volumes of content grow, the need for more translation also grows substantially. Gone are the days when it was enough for a global enterprise to provide limited, relatively static localization content.

Delivering superior customer experience (CX) requires much more content to be made available to global customers who have the same informational requirements as customers in the HQ country do. A deep and comprehensive digital presence that provides a broad range of relevant content to a buyer and global customer may be, even more, important to be successful in international markets.

The modern era requires huge volumes of content to support the increasingly digital buyer and customer journey. Thus, the need for high-quality, easily adapted machine translation grows in importance for any enterprise with global ambitions.

The success and relentless progress of the ModernMT technology described here make it an ideal foundation for building a rapidly growing base of multilingual content without compromising too much on the quality of translations delivered to delight global customers. This is critical technology needed to allow an enterprise to go multilingual at scale.  This means that it is possible to translate billions of words a month at relatively high quality.



The availability of adaptive, highly responsive MT also enables new kinds of knowledge sharing to take place.

A case in point:  Unicamullus Medical University in Rome experimented with using ModernMT to translate their medical journal into several new languages and test acceptance and usability. They were surprised to find that the MT quality was much better than expected. The success of the initial tests was promising enough to encourage it to expand the experiment and make valuable medical journal content available in 28 languages.

The project also allows human corrective feedback to be added to the publishing cycle when needed or requested. This machine-first and human-optimized approach is likely to become an increasingly important approach to large-scale translation needs when intelligent adaptive MT is the foundation.  

It is quite likely that we will see possibly 1000X or more growth, in the content volume that is translated in the years to come, but also that we see a growing use of adaptive and responsive MT systems like ModernMT which are deeply integrated with active system-improving human feedback loops that can enable and drive this massive multilingual expansion.

There is increasing evidence that the best-performing AI systems across many areas in NLP have a well-engineered and tightly integrated human-in-the-loop to ensure optimal results in production use scenarios. The Translated SRL experience with ModernMT is proof of what can happen when this is done well.

We should expect to see many more global companies translating hundreds of millions of words a month in the near future to serve their global customers.  A future that will increasingly be machine-first and human-optimized.

The following interview with Translated CEO, Marco Trombetti, provides additional insight into the progress that we have witnessed with MT over a decade of careful observation and measurement. The interview highlights the many steps taken to ensure that all the measurements are useful KPIs in a professional translation services setting which has been and will continue to be the most demanding arena of performance for MT technology. Marco also points out that ModernMT and new Generative AI like ChatGPT are made of the same DNA, and that MT research has provided the critical technological building blocks used to make these LLMs (Large Language Models like ChatGPT) possible.

 




Tuesday, September 20, 2022

The Localization Tech Stack Evolution

 As the world moves increasingly to a “digital-first” approach across the business and government spectrum, it has become increasingly clear to any enterprise interested in reaching a larger digital population, that providing more multilingual content matters, and that the demand for more translated content will only grow, which also means that enterprise translation capabilities will need to be pervasive and scalable.

The challenge for globalization managers is further complicated by the increasing focus on customer experience (CX) which means that the content can vary greatly, in volume, velocity, and value to customers and internal stakeholders. All content does not need to go through traditional localization production and quality validation processes.

Content that is focused on understanding, communication, and listening does not require the same linguistic quality assurance, and in the digital space, user-generated and other external content is now often the most impactful content to consider.

Modern-era globalization managers need to understand what matters most to customers and balance their focus on the “mandatory” legally required content that localization has typically focused on, against the non-corporate content customers find most useful.

Though the value and business benefit of large-scale translation and localization are now well understood, globalization and localization managers tasked with making the global customer outreach happen, struggle with this objective.

They are faced with a fragmented, inconsistent, and fractured technology landscape and many sub-optimal tools currently exist in the language technology marketplace shown in the graphic below.

These tools are needed both to perform the many specific tasks involved in any globalization effort and to help establish structured processes that enable ongoing and emerging global customer-focused needs to be efficiently serviced.

Given the wide variety of tools and the diversity of the people needed to effectively execute the multiple globalization processes involved, straightforward and efficient data flow from sub-system to sub-system is desirable.

Successful globalization outcomes are often directly linked to enabling fast-flowing, unhindered data flows through a variety of translation-related processes. This is a necessary condition for success in a digital-first world.

However, what many Loc Buyers find is that some of the systems and tools they use impede and obstruct this smooth data flow, and thus the digital globalization initiative is often undermined and overly focused on repairing broken and problematic data flows.

If we look at the TMS part of the tech stack more closely, we can understand the challenge that globalization managers have when making long-term decisions on what their tech stack should look like. There are many choices, and identifying the specific characteristics of a superior system is not so clear.

We have learned that the best AI outcomes are driven by high-quality data above all else, and thus selecting technology that facilitates ongoing data access as technology changes, should be a prime concern and strategy for any forward-thinking globalization manager.

The critical technology components for most localization managers include the following categories:

1.      CAT Tools used by translators

2.      Translation Management Systems

3.      Language Quality Assurance (LQA) Tools

4.      Enterprise-capable MT

5.      Audiovisual Translation tools are growing in importance

In general, it can be said that the better the integration between these key components is, the more successful the globalization outcomes and the more efficient the enterprise will be in providing a high-quality global customer experience.

Unfortunately, the reality for many localization teams is focused on repairing broken or non-existent connections between sub-systems, and building better data sharing between the various components in their back-end localization tech stack to power the customer-pleasing expanded multilingual CX.


Avoiding Lock-In with Proprietary Systems

As the enterprise matures in localization and globalization sophistication, it will likely develop and build valuable linguistic assets over time. These linguistic assets need to be easily accessed and available to be shared with emerging new tools and platforms that provide business leverage, powered by new AI capabilities, as customer needs and CX imperatives dictate.

The tech stack complexity challenge for globalization managers is further exacerbated by the fact that much of the technology is still evolving.

Thus, any technology component that creates lock-in and prevents the straightforward transfer of linguistic assets to new superior language technology tools or platforms as they become available IS TO BE AVOIDED.

These siloed systems create what is called Tech Debt. This refers to the off-balance-sheet accumulation of all the technology work a company needs to do in the future. Tech debt results from software entropy and a lack of integration between different systems and data.

And it’s not just a minor inconvenience. A majority of businesses say that tech debt is slowing their pace of development, and resulting in real-world losses in sales and productivity.

Tech debt can produce several negative consequences for businesses:

The benefits of reducing tech debt are also significant and include:

Buyers should demand that enabling straightforward API access to client linguistic data without restriction or restraint should be a basic and critical requirement for any modern enterprise software solution or TMS.

Sophisticated new capabilities emerging from NLP research in Large Language Models, Responsive MT, and other emerging Language AI research will be almost useless to those companies that cannot quickly move relevant enterprise linguistic data to these new applications. They will be unable to properly explore possibilities of providing better CX with emerging linguistic AI capabilities.

This data lock-in is especially true for some of the current TMS systems that create multiple layers of technical, and even legal obstacles to straightforward data sharing. These obstacles invariably trap some Loc Buyers into sub-optimal workflows and solutions.

It is surprising that more Loc Buyers do not understand the importance of free and easy access to all linguistic data over the long-term, and suggests that Loc Buyers are extremely naïve in terms of making technology evaluations and selections that stand the test of time.

Sub-optimal initial choices will require regular overhauls in the technology stack to overcome obstacles created by proprietary lock-in technology.

Data is the lifeblood of any organization and the backbone that supports the creation of market-leading CX. The AI-driven world of tomorrow will be increasingly data-driven.

However, it is nearly impossible to make this data actionable in marketing activations and other business processes without data centrality and shareability. To avoid this, globalization managers should look for partners who can help them democratize their data sets so that they’re integrated and accessible by all.


The Growing Importance of Integration with Enterprise IT

Translation technology has reached an inflection point, as translation connects to the major trends affecting every industry: big data, cloud computing, and artificial intelligence (AI). Language platforms that can scale from millions to billions of words of translated content per month are being created as enterprise buyers and innovative language service providers seek to align their language systems with the technology stacks of globally focused enterprises.

Implementing many different systems and sources that don’t speak to each other will make it harder for the business to enact data-backed decisions and integrate with core IT functionality. Straightforward integration with core enterprise IT is a key requirement to enable successful global CX outcomes.

The ability to quickly import, clean, and use data from countless sources is critical to marketers and globalization managers, but rarely easy.

One of the barriers to realizing this data actionability stems from rigid data structures that can’t onboard, transport, and unify both structured and unstructured data from different sources. Flexible data architecture and a scalable hygiene framework can speed up the timeline for data activation and value creation.

The chart above shows the relationship between translation quality and content volume. It also shows that the highest returns on investments in translation technology will come from those areas focused on global CX and eCommerce.

The collaboration between localization teams and enterprise IT teams are growing in sophistication and now increasingly both internal corporate data and external data from social media and customer reviews are being mingled and merged to provide better CX.

This often requires handling large volumes of user-generated content (UGC) and monitoring social media brand impressions which are so voluminous that traditional localization workflows are not valid.

UGC is a dominant element of the eCommerce content landscape and even presents special challenges for MT technology. UGC content is often written by non-native speakers and, most likely, by non-professional content writers and thus needs specialized treatment and a different approach from typical localization content. But we see today that global market leaders learn to do this at scale, with new techniques that assume and drive evolutionary quality improvements.

Tech-savvy localization managers who understand this “start now and improve gradually approach“ on massive content volumes are now being seen as vital partners in global growth strategies. Best practices suggest that the most effective strategy is to have MT and Human translators working together to build a continuous improvement cycle.

The strategy to translate a billion new words across multiple use cases every month has to be different than a typical localization translate-edit-proof (TEP) process. This is made difficult or even impossible with TMS systems that do not allow easy access to ALL linguistic assets.

Airbnb is an example of emerging localization leadership where the localization team is seen as a vital partner in enabling global growth and works closely with IT, Legal, and Product teams to deliver better global customer experiences.

The Airbnb localization team oversees both typical localization content and user-generated content (UGC), across the organization, which means they oversee billions of words a month being translated across 60+ languages using a combined human plus continuously improving MT translation model. The localization team enables Airbnb to translate customer-related content across the organization at scale. High-value external content is often given the same attention as internally produced marketing content.

When dealing with CX-focused translation scenarios, the business requirements direct globalization managers to focus on optimizing the translation production mode to the volume, speed, quality requirements, and the value of the content to customers.

This is a clear shift away from the traditional LQA-focused localization workflows where TMS systems have traditionally been useful.

TMS systems have been most useful in relatively low-volume, complex workflows that involve multiple levels of human touch on the translated content. This is the top left-hand corner of the chart above. TMS systems add little to no value in scenarios with high volume fast flowing CX data where data flow straight from MT to dissemination.

Dated monolithic translation management systems (TMS) are giving way to micro-service and cloud-based architectures, with machine learning driving systems toward enterprise-scale automation where speed, scale, and the value of the content to the global customer matter more than achieving perfect linguistic quality.

Thus, increasingly we see that TMS systems are completely bypassed or irrelevant, and there is greater use of “raw MT” or carefully pre-tuned MT rather than fully post-edited MT.



The Emerging Requirements for a Language Platform

As more senior executives in the global enterprise ask questions like:

  • How do we integrate our international strategy with our overall corporate strategy?
  • What will this take in terms of people, process, and technology?

We should expect a shift to language as a feature at the platform level wherein language is designed, delivered, and optimized as a feature of a product and/or service from the beginning.

Language accessibility is integrated into content and procedural workflows that affect almost everyone within the organization at some point. Something that analysts call a "language platform."

Rebecca Ray of CSA describes the impact of producing relevant content for modern eCommerce marketplaces at scale and touches upon the key requirements of a Language Platform.

The success of globalization leaders like Airbnb demonstrates the value of developing comprehensive and collaborative capabilities with a more globally embedded and pervasive translation-focused ecosystem. The Airbnb deployment is a pioneering example in global CX best practice that shows how extensive and deep-reaching translation workflows can be integrated into corporate IT when the value is understood at executive levels.

A CX-focused and capable Language Platform that is ready for digital-first globalization and localization challenges would need all of the following key components working together in a highly integrated and seamless manner.

  • Essential TMS capabilities to monitor translation projects, and linguistic quality, and generate and manage critical translation workflows for different content types as needed.
  • An adaptive and continuously improving MT system that automates personalization and performance optimization for each enterprise customer and manages the collection of corrective feedback across dozens of enterprise use cases. This element is increasingly becoming the most important element of the back-end tech stack and the heart of the Translation Engine to provide superior global CX.
  • Computer-assisted translation (CAT) tools to enhance translator productivity, simplify project management, and share corporate linguistic assets like translation memories (TMs), glossaries, and terminology. In the modern era, CAT tools would also need to handle video, audio, and other social media-focused multimedia data.
  • Open and service-based architecture to allow continuing evolution of translation processes and addition of new core functions powered by machine learning with speed and agility. Linguistic assets are maintained in a continuously leverageable state so that these assets can be connected to emerging linguistic AI technology without hindrance or restraint.
  • Connectors: As the need for an Enterprise Translation Engine becomes more apparent the Language Platform will need to connect to CMS, Marketing Automation, Customer Data Platforms (CDP), CRM, ERP, Messaging, and Customer Support & CX platforms, in addition to leading social media to listen to and monitor customer conversations.

Look for a partner rather than a vendor, that can help you simplify and rationalize your tech stack and the increasing amounts of data you’re ingesting. Instead of logging in to different systems repeatedly according to content type and purpose, look for a vendor who can consolidate these into one simplified view. Additionally, ensure that data can be shared across vendors in this centralized hub so that you can leverage the power of these insights across the scope of your audience.


The Translated Tech Stack

TranslationOS is a hyper-scalable translation platform, that directly connects clients with translators that also provides management access to translation-related KPIs. It also provides the technical foundations to build a next-generation Language Platform.  It provides customizable dashboards that give globalization managers access to KPIs, project status, quality performance, and linguist profiles.

TranslationOS is a technology platform that allows straightforward access to client data whenever it is required for other downstream applications, or just for internal archival purposes. Client linguistic assets always remain within easy reach of the client's developers to support other valued added processes that can arise over time.

TranslationOS is also the overarching technology that tightly ties together enterprise translation memory, adaptive MT and corrective feedback management, CAT, and multimedia data management tools.

TranslationOS has a growing set of content connectors enabling external data ingestion and export to enterprise IT infrastructure.

TranslationOS includes an AI-driven translator matching tool (T-Rank) to ensure optimal selection from a qualified, and continuously verified pool of 400,000 translators for different projects using 30+ factors (e.g., availability, historical performance, subject matter experience, qualifications) to drive objective rankings to ensure the identification of the best-suited resources.

ModernMT is an adaptive MT system that is highly flexible, responsive, easy to manage and maintain, continuously learning, and able to incorporate ongoing human corrective feedback to ensure better MT output.

It consistently shows up as a top-performing MT system in independent third-party MT quality evaluations even before it has been adapted and tuned to specific enterprise content. It seamlessly integrates into TranslationOS and leading CAT tools like MateCat, Trados, and MemoQ.

In 2022 it has also been integrated with MateSub and MateDub to enable the automated translation of corporate multimedia content.

MateCat is a free, open-source, performance-oriented online CAT tool that allows translators to easily share TMs, and glossaries and interact dynamically with ModernMT to ensure continuously improving MT suggestions. It is integrated with MyMemory, a massive, yet clean TM gathered over 20 years, to augment and increase TM matching possibilities.

MateSub is a CAT tool optimized for subtitling tasks. It combines state-of-the-art AI (auto-spotting, auto-transcription, auto-translation) with a powerful and easy-to-use editor to let you create higher quality subtitles, dramatically faster.

MateDub is an AI-powered tool to assist in voice-over dubbing projects which can add digital voices synthesized from human voice-actor sampling. It allows users to dub videos by simply editing text.



As we move more deeply into the "digital-first" age, we will also move beyond the reach of traditional language technology like TM and TMS systems for more of our translation needs.

We are going to see much more focus and discussion on Language Platforms, Translation Layers, and Translation Operating Systems for fast-flowing CX-related data that are also built on much more open, new integrations-friendly, and transparent technology stacks.