Wednesday, December 29, 2021

The Human Space Beyond Language

Much of what I write about in this blog is about language technology and machine translation. The primary focus is on the technology and AI initiatives related to human language translation. This focus will remain so, but I recently came upon something that I felt was worth mentioning, especially in this holiday season, where many of us review, consider and express gratitude for the plenitude in our lives.

Language is a quintessentially human experience where we share, discover, learn, and express the many different facets of our lives through this medium we call language. This is probably why computers are unlikely to ever unravel it fully, there is too much amorphous but critical context about life, living, learning, and the world, around most words to easily capture with training data and give to a computer to learn.  

While many of us surmise that language is only about words and about how words can be strung together to share, express, understand the world around us, in most cases, there is much that is unspoken or not directly referenced that also needs to be considered to understand any set of words accurately and faithfully. Sometimes the feeling and emotion are enough and words are not needed.

In 2021 Large Language Models (LLMs) were a big deal and GPT-3, in particular, was all over the news as a symbol of breakthrough AI that to some suggests that a sentient machine is close at hand. Until you look more closely and see that much of what is produced by LLMs are crude pattern reflections that are completely devoid of understanding, comprehension, or cognition in any meaningful sense. The initial enthusiasm for GPT-3 has been followed by increasing concern as people have realized how these systems are prone to producing unpredictable obscenity, prejudiced remarks, misinformation, and so forth. The toxicity and bias inherent in these systems will not be easily overcome without strategies that involve more than more data and more compute.

It is very likely that we will see these increasingly larger LLMs go through the same cycles of over-promising and under-delivering that machine translation has gone through for over 70 years now. 

The problem is the same, the words used to train AI alone do not contain everything needed to establish understanding, comprehension, and cognition. And IMO simply training a deep learning algorithm with many more trillions of words will not somehow create understanding and cognition or even common sense.  

The inability for AI "to understand" was clearly shown by Amazon Alexa recently when it told a child to essentially electrocute herself. "No current AI is remotely close to understanding the everyday physical or psychological world, what we have now is an approximation to intelligence, not the real thing, and as such it will never really be trustworthy," said Gary Marcus in response to this incident. GPT-3 has also advised suicidal humans to kill themselves in experiments conducted elsewhere. 

The machine is not malicious, it simply has no real understanding of the world and life, and lacks common sense. 

The truth is that we are forced to learn to query and instruct Alexa, Siri, and Google Voice so that they can do simple but useful tasks for us. This is "AI" where the human in the loop keeps it basically functional and useful. Expecting any real understanding and comprehension from these systems without many explicit and repeated clarifications is simply not possible in 2021. 

But anyway, I digress, so, I wanted to talk about the areas where humans move beyond language (as in word-based) but yet communicate, share, and express quintessential humanness in the process. 

It is my feeling that entering this space happens most often with music, especially improvised music where there is some uncertainty or unpredictability about the outcome.  Where what happens, happens, often without a plan, but yet still with a clear artistic framework and structural outline. I happen to play the sitar focusing on the Indian Classical music of North India where the "Raga" is the basic blueprint that provides the needed foundations for highly disciplined improvisatory exploration. 

To a great extent what these musicians do is "shape the air" and create something equivalent to sonic sculptures. These sculptures can be pleasing or relaxing in many ways that only humans can understand, and sometimes can be very moving, which means they can trigger emotional release (tears) or establish a deeply emotional presence (left speechless). Often it is not necessary to understand the actual language used in musical performance since there is still a common layer of feeling, emotion, and yearning that all humans can connect and tap into. 

The key difference of this improvisation-heavy approach from a performance of score-based music is that neither the musician nor the audience really knows at the outset how things will turn out. With a score, there is a known and well-defined musical product that both the audience and the musician are aware of and expect. There is more of an elemental structure. However, here too it is possible for an attendee to listen to an unfamiliar language e.g. an operatic aria in Italian, and be deeply moved, even though the audience member speaks no Italian and may have no knowledge of the operatic drama. The connection is made at a feeling and emotional level, not at the word,  language, or idea cognition level.

I came upon this musical performance of a Sufi (a mystical Muslim tradition) song sung by two musical legends on a commercial platform called Coke Studio Pakistan. Musically, this might be considered "fusion" but it is heavily influenced by Indian classical music and it is sung in Urdu (Braj) which is so close to Hindi (Hindustani) that they are virtually the same language, except that Urdu uses much more Persian vocabulary. The original poem was written in Braj Basha an antecedent of both Urdu and modern-day Hindi. 

This particular performance was a rehearsal and was the first time all the musicians were in the same room, but the producers decided it was not possible to improve on this and published it, as is, since it was quite magical and probably impossible to reproduce. 

There are almost 20,000 comments to the video shown below and this comment by Matt Dinopoulos typifies much of the feedback: "It hit my soul on so many levels and just brought me to tears and I don’t even know what they’re saying.The figures of speech “pluck at one’s heartstrings” and “strikes a chord in me” have found a home in our language for just this reason.

The song/poem Chaap Tilak was written by Amir Khusrau the famous poet laureate of the Indian subcontinent considered one of the most versatile poets and prolific prose-writers of the 13th and 14th centuries. He is also considered by some to be a seminal force in the creation of the sitar and the development of the Khyal music form most prevalent in North Indian classical music today. 

This song essentially expresses Khusrau's gratitude, devotion, love, and longing for communion with his Pir (Guru/Spiritual teacher) whose name is Nizam (Nizamuddin Auliya). Sung from the perspective of a young girl awaiting or yearning for her beloved, it is replete with modest yet enchanting symbols, as it celebrates the splendor of losing oneself in love. Both the use of motifs as well as the language itself were deliberate creative choices by Amir Khusrau, to communicate with common people using familiar ideas and aesthetics.

A closer examination of Khusrau's works will reveal that the Beloved in his songs/poems is always the Divine or the Pir. Many poets in India use the perspective of the romantic yearnings of a young maiden for the beloved as an analogy, as the relationship with the Divine is seen as the most intense kind of love. The longing and union they speak of are always about direct contact with the Sacred and so this song should be considered a spiritual lament whose essential intention is to express spiritual love and gratitude. The translations shown in the video are sporadic but still useful.

Chaap Tilak Performance 

At the time of this publishing, the video above had already had 40 million views. Many thanks to the eminent Raymond Doctor for providing this link which provides a full translation, and useful background to better understand the thematic influences and artistic inspiration for this song.

“As long as a spiritual artist respects his craft, peace will prevail. It is wonderful when a singer has a noble cause and spreads the message of love, peace, and brotherhood as presented by our saints, without greed of money or the world. This is the real purpose of qawwali.”             
                                                                                             Rahat Fateh Ali Khan

“Music doesn’t have a language, it’s about the feeling. You have to put a lot of soul into whatever you are making. Music doesn’t work if you’re only doing it for money or professionally. It works only if it’s from the soul. There’s no price to it.”                                                                                                                                                                                                                                    Aima Baig     

"Information is not knowledge. Knowledge is not wisdom. Wisdom is not truth. Truth is not beauty. Beauty is not love. Love is not music. Music is THE BEST.” 

— Frank Zappa 

I played this song for several friends (mostly musicians) who had no familiarity with Indian music and found that several of them were deeply touched, so much so that some had tears streaming and were unable to speak when the song ended. In fact, they were mystified by how strong an emotional reaction they had to this unfamiliar and alien artistic expression. 

This unexpected, often surprising, emotion-heavy reaction is entirely and uniquely human. This kind of listener impact cannot come from musical virtuosity alone which is abundantly present here, the musicians here are also tapping into a deeper sub-strata of feeling and emotion that only exists in and is shared by humans. 

This is the human space beyond language where understanding happens in spite of initial unfamiliarity. There is something in the human psyche that understands and connects to this even if by accidental discovery, and this initial response often leads to a more substantial connection. We could call this learning perhaps, and this is probably how children also gather knowledge about the world. Intensity and connection probably have a more profound impact on students than pedagogy and quite possibly drive intense learning activity in any sphere.

It is interesting that there are many reaction videos on Youtube where music teachers and YT celebrities from around the world share their first reactions to this particular song and other culturally unfamiliar music. Based on the number of these reaction videos, I guess more and more people are exploring and want to share in the larger human musical experience. Some examples:

  • Latina Ceci Dover left speechless (around 4' 50")
  • British rapper reacts in shock and awe (around 6' 05")
  • A deep and informed analysis of the singing technique and mechanics. "If her voice was an animal it would be an eagle."
  • Seda Nur Turkish German was surprised by the emotional connection. (around 3' 15") It also led her to actually visit Pakistan last week, a trip which she is also sharing in her Vlogs.
  • John Cameron left speechless and in tears (around 11' 30')
  • Waleska & Efra discover a new musical paradigm (around 10' 40")
  • Asian dude is blown away (~2' 09"): Oh My Godness he says at 4' 0", and dances to the chorus like a bird (4' 25") Hilarious responses throughout the song.
It is not always necessary to have this level of virtuosity to find this sub-strata space beyond language. Artists often communicate without words, with just a look, or with presence and full attention. I too have participated in impromptu musical conversations, where friends simply gather to converse musically with a very simple outline depending primarily on improvisation and listening to the other. This is an example, that is difficult to reproduce exactly because it captured an instant in time that was unique.

All this to point out that the data we use to build artificial intelligence completely miss these deeper layers of humanness. It is not just about missing a larger subject, common sense, and physical world context, but especially the non-verbal emotional, and feeling layers that also make us human. 

Intelligence is barely understood by humans even after looking at the issue for eons, so how is it even possible to put this into a computer algorithm? What kind of data are we going to use? How do you model emotion and feeling about knowledge, data, information?

Machine learning and computers are likely to radically transform our lives in the coming decade and change our lives in so many ways, but there are some things like the wordless feeling-filled states of human space beyond language, the spiritual sub-strata that underlies consciousness, that I think is simply not within the province of Man to model or perhaps even to understand. It can only be experienced.  


The poem below was copied from Maria Popova's excellent blog about the awe and wonder of being human. Her backgrounder on Rebecca Elson is timely and worth reading, and she has also written about the connection between music and the neurophysiological mechanism of emotion that I recommend. As she points out: "Emotions possess the evanescence of a musical note."


by Rebecca Elson

Returning, like the Earth

To the same point in space,

We go softly to the comfort of destruction,

And consume in flames

A school of fish,

A pair of hens,

A mountain poplar with its moss.

A shiver of sparks sweeps round

The dark shoulder of the Earth,

Frisson of recognition,

Preparation for another voyage,

And our own gentle bubbles

Float curious and mute

Towards the black lake

Boiling with light,

Towards the sharp night

Whistling with sound.

I wish you all a Happy, Healthy, Peaceful, and Prosperous New Year

Thursday, December 16, 2021

The Evolution of Machine Translation Use in the Enterprise

 The modern enterprise with global ambitions is experiencing an evolving view of the value, scope, and need for language translation to enhance and build global business momentum.

The old view has been to translate what is mandatory to participate in a global market, but the new view is increasingly about listening, understanding, sharing, communicating, and engaging the global customer. The new view requires a modern global enterprise to translate millions of words a day.

Recent events, the pandemic, in particular, have forced many private and governmental institutions to become much more focused on expanding their digital presence and profile.

It is now widely understood that providing increasing volumes of content to assist, inform, and help a potential buyer understand the products and services of an enterprise, and enhance the customer’s journey after the purchase are important determinants of sustained market success.

The cost of failure to do so is high. In the last 20 years, over half (52%) of Fortune 500 companies have either gone bankrupt, been acquired, or ceased to exist as a result of the digital and business model disruption. Studies suggest many more companies across various industries will disappear because they fail to understand the strategic value of providing relevant content and establishing a robust digital presence to improve CX.

Innosight Research predicts that as much as 75% of today’s S&P 500 will be replaced by 2027.

Rebecca Ray eloquently describes the impact of producing relevant content for the modern enterprise. She describes how Airbnb and Expedia recognize that they are not just lodging companies, but rather high-tech (multilingual) content companies solving lodging problems.

The [most] significant implications revolve around the recognition given to the business value of multilingual content by a high-tech company such as Airbnb through its financial investment in a cross-functional collaboration initiative to greatly expand language accessibility. They must recognize that, in many cases, their products and services do not function independently of information about them and that the most valuable content and code are often generated by third parties.

As more of the world gets more accustomed to a digital-first buyer journey, companies must adapt to stay relevant. As brands face unmatched logistical and communication challenges in the new millennium, they have focused on more engagement with their customers via digital channels.

Customer experience is no longer just a concept. It’s a business imperative that requires cross-functional collaboration, data, and analytics focused on delivering customer success.

The enterprises with the most successful digital transformation initiatives are seeing growth from improved and innovative digital interactions. This usually means cross-organizational collaboration with a digital strategy-focused team leader invested in optimizing digital channels and improving both customer and employee digital experience. Success is often built on data and analytics and the increasing use of AI to generate predictive analytics to power better, more personalized interactions.

The benefits of providing superior CX are increasingly clear:

  • #1 – By 2021, customer experience will overtake price and product as the key brand differentiator – Walker
  • 86%– of those who received a great customer experience were likely to repurchase from the same company; compared to just 13% of those who received a poor CX – Temkin Group
  • 6x– Between 2010-2015, CX leaders grew 6x faster than CX laggards – Forrester
  • “Customer Experience leaders grow revenue faster than CX laggards, drive higher brand preference, and can charge more for their products.” – Forrester’s Rick Parish.
  • 3x greater return CX leaders outperform CX laggards in terms of stock performance. -- Watermark Consulting
  • Consumers will pay a 16% price premium for a great customer experience. - PwC

Leaders develop “digital agility”, that enables cross-function collaboration focused on mapping and optimizing customer journeys. The key difference according to Altimeter surveys between top-performing companies and average performers is the ability to use data in prescriptive ways. This means harnessing analytics to make or automate decisions that improve processes like delivering a great customer experience, creating a new product, or defining a new strategy.

There is increasing convergence between marketing, sales, and service goals and operations. This convergence is the natural outcome of the increasing digital sophistication of all customer-facing functions.

The Localization Implications

There are significant implications from these larger digital transformation imperatives for traditional localization departments. The needs and scope of language translation for the modern digitally-agile enterprise are significantly greater than ever seen historically.

This is resulting in a changing view of the localization function within the enterprise. Ranging from being seen as a vital partner in global growth strategies to sometimes being relegated to low-value contributors (for those less engaged localization groups) whose only focus is measuring translation quality (badly) and juggling LSP vendors. These latter groups will see customer-facing departments take over large-scale CX-focused translation initiatives and slip into further obscurity.

Airbnb is an example where the localization team is seen as a vital partner in enabling global growth. The Airbnb localization team oversees both typical localization content and user-generated content (UGC), across the organization, which means they oversee billions of words a month being translated across 60+ languages using a combined human plus continuously improving MT translation model. The localization team enables Airbnb to translate customer-related content across the organization at scale. High-value external content is often accorded the same attention as internally produced marketing content.

Digital agility requires that a modern era localization team be the hub of cross-functional collaboration to facilitate and enable all kinds of content to be translated to expedite understanding, communication, and experience of customer-related issues. The modern enterprise will likely need millions of words translated every month to understand, enhance, and improve the global customer experience (GCX). In the B2C scenario, the volumes could even be billions of words per month.

Continuously improving responsive MT is a critical foundation to building better GCX. Increasingly we will see more and more content come directly from carefully optimized MT to the customer without additional human intervention. The linguistic human oversight process and approach will likely change. Upfront human feedback investments will be needed in addition to selective post-editing to make these enterprise MT engines perform optimally on a range of unique and specialized enterprise content. It is possible to post-edit 100K words a month, but it is not possible to do this for a million or a billion source words a month.

We are seeing an emerging translation model (e.g. at Airbnb) that enables an enterprise to build virtuous cycles of continuously improving machine translation working closely together with regular ongoing human feedback.

All focused on cross-functional alignment with the larger enterprise goal of reducing customer friction and increasing customer delight.

Superior CX is built on active, ongoing conversations with the customer to ensure the best possible experience and outcomes. It involves active and continuous listening to the voice of the customer on social platforms to identify problems and success early. It involves multiple levels of communication. Large volumes of multilingual data flows have created a huge and growing need for rapid translation.

Thus, the emerging CX-focused and digitally agile modern enterprise is likely to be one that regularly does the following:

  • Helps to provide a uniform customer data profile across the world to achieve a common understanding of the customer.
  • Translates over 100 million or even billions of words a month to support GCX in various ways. This means that 95%+ has to be done by optimized adaptive MT engines.
  • Has informed human feedback driving continuously learning MT systems. This feedback may only amount to 1% or less of the raw MT that is freely flowing to support understanding, communication, and research of customers across the globe.
  • Helps provide a common view on the increasing convergence between marketing, sales, and service content and operations without any language barriers. This has great value for the internal understanding of international customer needs and again enables speedier, better-coordinated responses to differing international customer needs.
  • Improves global understanding and communication within and without the enterprise.

The State of Machine Translation in the Enterprise

Machine translation output quality has improved dramatically over the past decade, but all stakeholders should understand that MT is not a perfect replacement for competent human translation. Yet, if properly used and optimized, it can rapidly enhance the transparency of all multilingual data flows in the modern enterprise and expedite international business initiatives.

The use of MT in the enterprise is clearly on the rise. The overall content deluge, the need to monitor brand feedback on social media, and a much more rapid digitally-driven global presence are some of the factors that drive this use.

Global customer service and support and eCommerce have been the most active use cases historically. But increasingly we see that that the global enterprise recognizes the need for a personalized and optimized MT-based translation utility that is secure, private, and enterprise content-tuned to make ALL information multilingual and easier to share, access, and understand.

Recent Changes Driving Faster MT Adoption

  • Human evaluations show that neural MT is often indiscernible from human translation for much of the high-value customer service content that is needed for CX improvement.
  • Direct customer feedback on the usefulness of MT content suggests that many customers are willing to accept “imperfect” MT if it means that they get broader content access and faster response during many stages of the customer journey.
  • The pandemic has made the need for an expanded digital presence much more urgent. MT enables the acceleration of a global “digital-first” strategy.
  • The increasing awareness at executive levels of the need for global inclusion and recognition that the fastest-growing markets in the world today are in Africa and SE Asia.
  • The increasing awareness in the enterprise of the need and impact of greater availability.
  • The increasing importance of community content is increasingly recognized business-enhancing customer creator content.
  • An understanding that the next billion potential customers coming to the internet are unlikely to speak English, FIGS, or CJK.

One important point of understanding within a global enterprise is to recognize that different kinds of content need different translation processes. MT is very useful in making high volume, rapidly flowing, short shelf-life content multilingual, but is less suited to high impact executive communications or high liability impact types of content where nuance and semantic accuracy are central.

The mix of human-machine contributions tends to be closely linked to the volume and nuance contained in the source data. The following chart roughly shows the relationship between the content type and the translation approach used. The translation production mode has to be tuned to the needs of target consumers of information and could range from assimilation, dissemination to publishable quality requirements. For example, internal cross-lingual emails can stand a lower average quality translation than customer support content even though both can be voluminous.

What Does an Enterprise Need from an MT Solution?

In considering MT technology options there are several attributes that an enterprise needs to understand in an evaluation. While there is often an over-emphasis and focus on generic, mostly irrelevant MT score-based comparisons, in reality, other factors often matter much more in producing higher ROI and successful outcomes. For example:

Data Security & Privacy: As more CX and confidential product information starts flowing through MT, data security and privacy is our primary concern. CX data may often be considered even more valuable than product information. Many generic public portals present challenges in ensuring the assurance of data privacy. Flexible caching options for different content types are an increasingly important concern.

Rapid Adaptability to Enterprise Content and Use Cases: Enterprise use of MT makes the most sense when the MT performs well on unique enterprise content and terminology. The ability to do this quickly and accurately with minimal startup effort is perhaps the single most valuable aspect of MT technology to an enterprise. There are likely to be many use cases and MT systems like ModernMT, that can leverage existing linguistic assets across multiple use cases with minimal overhead, and system redundancy is much easier to manage and update than a lot of traditional MT systems. Continuous improvement capabilities allow MT systems to evolve and improve over time and thus responsive, highly adaptive MT systems like ModernMT that improve daily are much more likely to produce successful outcomes.

Superior MT Output Quality: While generic MT comparisons by third parties can be useful to understand the potential experience with an MT solution, it is much more important to understand how an MT solution performs on your specific enterprise content. Perhaps even more important is how rapidly and easily an MT system can improve and learn your specific enterprise terminology and linguistic style.

Ease of Integrating Human Corrective Feedback: While some MT systems can produce excellent output occasionally, none can do this all the time.

Human corrective feedback is key to improving MT system performance on an ongoing basis. The ease, speed, and impact with which human feedback is incorporated into driving ongoing MT system performance improvements is a valuable characteristic for an MT system.

ModernMT has been architected to learn rapidly from human corrective feedback and learns continuously and rapidly. The low overhead of adapted models also makes it possible to have hundreds of different models that are easily updated and maintained as improvements can flow from model to model if they have similar content.

Expert Consulting Services: As MT-based translation services become more commonplace in the global enterprise, the reach of rapid translation capabilities will expand and extend through an organization.

It may be necessary to connect different content-containing systems to the MT engine or special linguistic analysis may be needed to speed up the development of systems optimized for new use cases.

MT providers who have this expertise to build a Translation Operating System that can handle a variety of different kinds of data and service types with high efficiency will make the deployment easier, and also increase the probability of success.

ModernMT is a context-aware, incremental, and responsive general-purpose MT technology that is price competitive to the big MT portals (Google, Microsoft, Amazon) and is uniquely optimized for LSPs and global enterprises, and addresses all the criteria specified above.

ModernMT can be kept completely secure and private for those willing to make the hardware investments for an on-premise installation. It is also possible to develop a secure and private cloud instance for those who wish to avoid making hardware investments.

ModernMT overcomes technology barriers that hinder the wider adoption of currently available MT software by enterprise users and language service providers:

  • ModernMT is a ready-to-run application that does not require any initial training phase. It incorporates user-supplied resources immediately without needing laborious, technically overly complex, and tedious upfront model training.
  • ModernMT learns continuously and instantly from user feedback and corrections made to MT output as production work is being done. It produces output that improves by the day and even the hour in active-use scenarios.
  • ModernMT manages context automatically and does not require building multiple different domain-specific and use-case-specific systems.
  • ModernMT has a data collection infrastructure that accelerates the process of filling the data gap between baseline systems and enterprise-specific models with unique terminology and linguistic style.
  • ModernMT is responsive to small volumes of corrective feedback and thus allows straightforward deployment in multiple organizational scenarios.
  • ModernMT’s goal is to deliver the quality of multiple custom engines by adapting to the provided context on the fly. This fluidity makes it much easier to manage on an ongoing basis.
  • ModernMT provides extensive control over the MT cache allowing immediate no-trace deletion to a permanent (annual) cache for highly static public content.
  • ModernMT can scale from millions to billions of words a day with responsive and modifiable performance through the range.
  • ModernMT systems can easily outperform competitive systems once adaptation begins, and active corrective feedback immediately generates quality-improving momentum.

Thus as we head into the post-pandemic era, the need for a broadly capable, private, secure, corporate-domain tuned  "translation engine" will only grow in importance for the digitally agile, CX-savvy global enterprise.

Localization teams will need to lead multi-year investment initiatives that are recognized by executives as essential drivers for their organizations’ global growth and revenue to enable this.  

The shift to language as a feature at the platform level wherein language is designed, delivered, and optimized as a feature of a product and/or service from the beginning is now underway.

As Airbnb CEO, Brian Chesky stated in his recent launch video, “Technology made it possible to work from home, but Airbnb now allows you to work from any home.” Obviously, language independence is an integral part of his platform that now makes this possible. His success will demand that others follow.

This post was originally published here

Friday, November 26, 2021

The Carbon Footprint of Machine Learning

 AI and machine learning (ML) news are everywhere we turn today, impacting virtually every industry from healthcare, finance, retail, agriculture, defense, automobile, and even social media. The solutions are developed using a form of ML called neural networks which are called “deep” when multiple layers of abstraction are involved. Deep Learning (DL) could be the most important software breakthrough of our time. Until recently, humans programmed all software. Deep learning, is a form of artificial intelligence (AI), that uses data to write software and typically “learns” from large volumes of reference training data.

Andrew Ng, Baidu’s Chief Scientist and Co-founder of Coursera, has called AI the new electricity. Much like the internet, deep learning will have broad and deep ramifications. Like the internet, deep learning is relevant for every industry, not just for the computing industry.

The internet made it possible to search for information, communicate via social media, and shop online. Deep learning enables computers to understand photos, translate language, diagnose diseases, forecast crops, and drive cars. The internet has been disruptive to media, advertising, retail, and enterprise software. Deep learning could change the manufacturing, automotive, health care, and finance industries dramatically.

By “automating” the creation of software, deep learning could turbocharge every industry, and today we see it is transforming our lives in so many ways. Deep learning is creating the next generation of computing platforms, e.g.

  • Conversational Computers: Powered by AI, smart speakers answered 100 billion voice commands in 2020, 75% more than in 2019.
  • Self-Driving Cars: Waymo's autonomous vehicles have collected more than 20 million real-world driving miles across 25 cities, including San Francisco, Detroit, and Phoenix.
  • Consumer Apps: We are familiar with recommendation engines that learn from all our digitally recorded behavior, and drive our product, services, and entertainment choices. They control our personalized views of ads that we are exposed to and are the primary sources of revenue for Google, Facebook, and others. Often using data that we did not realize was being used without our consent. But it can build market advantage, for example, TikTok, which uses deep learning for video recommendations, has outgrown Snapchat and Pinterest combined.

According to ARK Investment research, deep learning will add $30 trillion to the global equity market capitalization over the next 15-20 years. They estimate that the ML/DL-driven revolution is as substantial a transformation of the world economy as the IT Computing to Internet Platforms change was in the late ’90s, and predict that Deep Learning will the dominant source of market capital creation over the coming decades.

Three factors drive the advance of AI: algorithmic innovation, data, and the amount of computing capacity available for training. Though we are seeing substantial improvements in computing and algorithmic efficiency, the data volumes are also increasing dramatically, and some recent Large Language Model (LLM) innovation from OpenAI (GPT-3), Google (BERT), and others show that there is significant resource usage impact from this approach.

"If we were able to give the best 20,000 AI researchers in the world the power to build the kind of models you see Google and OpenAI build in language; that would be an absolute crisis, there would be a need for many large power stations." 
Andrew Moore, GM Cloud Ops Google Cloud

This energy-intensive workload has seen immense growth in recent years. Machine learning (ML) may become a significant contributor to climate change if this exponential trend continues. Thus, while there are many reasons to be optimistic about the technological progress we are making, it is also wise to both consider what can be done to reduce the carbon footprint, and take meaningful action to address this risk.

The Problem: Exploding Energy Use & The Growing Carbon Footprint

Lasse Wolff Anthony, one of the creators of Carbontracker and co-author of a study of the subject of AI power usage, believes this drain on resources is something the community should start thinking about now, as the energy costs of AI have risen 300,000-fold between 2012 and 2018.

They estimated that training OpenAI’s giant GPT-3 text-generating model is akin to driving a car to the Moon and back, which is about 700,000 km or 435,000 miles. They estimate it required roughly 190,000 kWh, which using the average carbon intensity of America would have produced 85,000 kg of CO2 equivalents. Other estimates are even higher.

“As datasets grow larger by the day, the problems that algorithms need to solve become more and more complex," Benjamin Kanding, co-author of the study, added. “Within a few years, there will probably be several models that are many times larger than GPT-3.”

The financial cost for training GPT-3 reportedly cost $12 million for a single training run. However, this is only possible after reaching the right configuration for GPT-3. Training the final deep learning model is just one of several steps in the development of GPT-3. Before that, the AI researchers had to gradually increase layers and parameters, and fiddle with the many hyper-parameters of the language model until they reached the right configuration. That trial-and-error gets more and more expensive as the neural network grows. We can’t know the exact cost of the research without more information from OpenAI, but one expert estimated it to be somewhere between 1.5X and 5X the cost of training the final model.

This would put the cost of research and development between $11.5 million and $27.6 million, plus the overhead of parallel GPUs. This does not even include the cost of human expertise which is also substantial.

OpenAI has stated that while the training cost is high, the running costs would be much lower but access will only be possible through an API as few could invest in the hardware needed to run it regularly. The efforts to develop the potentially improved GPT-4 which is 500+ times larger than GPT-3 are estimated will cost more than $100 million just in training costs!

These costs mean that this kind of initiative can only be attempted by a handful of companies with huge market valuations. It also suggests that today’s AI research field is inherently non-collaborative. The research approach of “obtain the dataset, create model, beat present state-of-the-art, rinse, repeat” makes it so that there is a big barrier to entry to the market for new researchers and researchers with low computational resources.

Ironically, a company with the word “open” in the name has now chosen to not release the architecture and the pre-trained model. The company has opted to commercialize the deep learning model instead of making it freely available to the public.  

So how is this trend to large models likely to progress? While advances in hardware and software have been driving down AI training costs by 37% per year, the size of AI models is growing much faster, 10x per year. As a result, total AI training costs continue to climb. Researchers believe that state-of-the-art AI training model costs are likely to increase 100-fold, from roughly $1 million today to more than $100 million by 2025. The training cost outlook from Ark Investments is shown below in log scale, where you can also see how the original NMT efforts compare to GPT-3.

Training a powerful machine-learning algorithm often means running huge banks of computers for days, if not weeks. The fine-tuning required to perfect an algorithm, by for example searching through different neural network architectures to find the best one, can be especially computationally intensive. For all the hand-wringing, though, it remains difficult to measure how much energy AI consumes and even harder to predict how much of a problem it could become.

There have been several efforts in 2021 to build even bigger models than GPT-3. All probably with a huge carbon footprint. But there is good news, Chinese tech giant Alibaba announced M6, a massive model that has 10 trillion parameters (50x the size of GPT-3). However, they managed to train it at 1% of the energy consumption needed to train GPT-3!

Another graphic that illustrates the carbon impact that deep learning generates as model enhancement efforts are made is shown below. All deep learning models have ongoing activity directed at reducing the error rates of existing models as a matter of course. An example with Neural MT is when a generic model is adapted or customized with new data.

Subsequent efforts to improve accuracy and reduce error rates in models often need additional data, re-training, and processing. The chart below shows how much energy is needed to reduce the model error rates for image recognition on the ImageNet model. As we can see that the improvement process for large models has an environmental impact that is substantial and needs to be considered.

There is ongoing research and new startups focused on more efficient training and improvement techniques, lighter footprint models that are just slightly less accurate, and more efficient hardware. All of these are needed and will hopefully help reverse or reduce the 300,000x increase in deep learning-driven energy use of the last 5 years.

Here is another site that lets AI researchers roughly calculate the carbon footprint of their algorithms.

And as the damage caused by climate change becomes more apparent, AI experts are increasingly troubled by those energy demands. Many of the deep learning initiatives shown in the first chart above are being conducted in the Silicon Valley area in Northern California. This is an area that has witnessed several catastrophic climate events in the recent past:

  • Seven of the ten largest recorded forest wildfires in California have happened in the last three years!
  • In October 2021 the Bay area also witnessed a “bomb cyclone” rain event after a prolonged drought that produced the largest 24-hour rainfall in San Francisco since the Gold Rush!

San Francisco skyline turns orange during wildfires in September 2020

The growing awareness of impending problems is raising awareness in big tech companies about implementing carbon-neutral strategies. Many consumers are now demanding that their preferred brands take action to show awareness and move toward carbon neutrality.

Climate Neutral Certification gives businesses and consumers a way to a net-zero future and also builds brand loyalty and advocacy. Looking at a list of some committed public companies shows that this is now recognized as a brand-enhancing and market momentum move.

Uber and Hertz recently announced a dramatic expansion of their electric vehicle fleet and received much positive feedback from customers and the market.

Carbon Neutral Is The New Black

What Sustainability Means to Translated Srl

In the past few years, Translated’s energy consumption linked to AI tasks has increased exponentially, and it now accounts for two-thirds of the company’s total energy consumption. Training a translation model for a single language can produce as much CO2 as driving a car for thousands of kilometers.

A large model produces as much CO2 as hundreds of airline flights would. This is why Translated is pledging to become a completely carbon-neutral company.

"How? Water is among the cleanest energy sources out there, so we have decided to acquire one of the first hydroelectric power plants in Italy. This plant was designed by Albert Einstein’s father in 1895. We are adapting and renovating this historic landmark, and eventually, it will produce over 1 million kW of power a year, which will be sufficient to cover the needs of our data center, campus, and beyond."

Translated's electric plant located in Sannazzaro de’ Burgondi

Additionally, the overall architecture of ModernMT minimizes the need for large energy-intensive re-training and the need for maintenance of multiple client-specific models that is typical of most MT deployments today.

Global enterprises may have multiple subject domains and varied types of content so multiple optimizations are needed to ensure that MT performs well. Typical MT solutions require different MT engines for web content, technical product information, customer service & support content,  and user-generated and community content for each language.

ModernMT can handle all these adaptation variants with a single engine that can be differently optimized.  

  • ModernMT is a ready-to-run application that does not require any initial training phase. It incorporates user-supplied resources immediately without needing model retraining.
  • ModernMT learns continuously and instantly from user feedback and corrections made to MT output as production work is done. It produces output that improves by the day and even the hour in active-use scenarios.
  • The ModernMT system manages context automatically and does not require building domain-specific systems.

ModernMT is perhaps the most flexible and easy to manage enterprise-scale MT in the industry.

ModernMT’s goal is to deliver the quality of multiple custom engines by adapting to the provided context on the fly. This makes it much easier to manage on an ongoing basis as only a single engine is needed. This reduces the training need, hence carbon footprint, and makes it easier to manage and update over time.

As described before, a primary driver of improvement in ModernMT is the tightly integrated human-in-the-loop feedback process which provides continuous improvements in model performance, but yet greatly reduces the need for large-scale retraining.

ModernMT is a relatively low footprint approach to continuously learning NMT that we hope to make even more energy efficient in the future.

Thursday, November 11, 2021

The Challenge of Using MT in Localization

We live in an era where MT is translating more than 99% of all the translation being done on the planet on any given day.

However, the adoption of MT by the enterprise is still nascent and still building momentum. Business enterprises have been slower to adopt MT even though national security and global surveillance-focused government agencies have used MT heavily. This adoption delay has mostly been because MT has to be adapted and tuned to perform better with very specific language used in specialized enterprise content.

Early enterprise adoption of MT was focused on eCommerce and customer support use-cases (IT, Auto, Aerospace) where huge volumes of technical support content made it a necessity to use MT technology to allow any possibility of translating the voluminous content in a timely and cost-effective manner to improve the global customer experience.

Microsoft was a pioneer who translated its widely used technical knowledge base to support an increasingly global customer base. The positive customer feedback for doing this has led to many other large IT and consumer electronics firms doing the same.

The adaptation of the MT system to perform better on enterprise content is a critical requirement in producing successful outcomes. In most of these early use-cases we see that MT is used to manage translation challenges when the content volumes were huge, i.e., millions of words a day or week. These were “either use MT or provide nothing” knowledge-sharing scenarios.

These enterprise-optimized MT systems have to adapt to the special terminology and linguistic style of the content they translate, and this customization has been a key element of success with any enterprise use of MT.

eBay was an early MT adopter in eCommerce and has stated often that MT is key in promoting cross-border trade. It was understood that “Machine translation can connect global customers, enabling on-demand translation of messages and other communications between sellers and buyers, and helps them solve problems and have the best possible experiences on eBay.”

A study by an MIT economist showed that after eBay improved its automatic translation program in 2014, commerce shot up by 10.9 percent among pairs of countries where people could use the new system.

Today we see that MT is a critical element of the global strategy for Alibaba, Amazon, eBay, and many other eCommerce giants.

Even in the COVID-ravaged travel market segment, MT is critical as we see with Airbnb, which now translates billions of words a month to enhance the international customer experience on their platform. In November 2021 Airbnb announced a major update to the translation capabilities of their platform in response to rapidly growing cross-border bookings and increasingly varied WFH activity.

“The real challenge of global strategy isn’t how big you can get, but how small you can get.”
Dennis Goedegebuure, former head of Global SEO at Airbnb.

However, MT use for localization use cases has trailed far behind these leading-edge examples, and even in 2021, we find that the adoption and active use of MT by Language Service Providers (LSPs) is still low. Much of the reason lies in the fact that LSPs work on hundreds or thousands of small projects rather than a few very large ones.

Early MT adopters tend to focus on large-volume projects to justify the investments needed to build adapted systems capable of handling the high-volume translation challenge.

What options may be available to increase adoption in the localization and professional business translation sectors?

At the MT Summit conference in August 2021, CSA's Arle Lommel shared survey data on MT use in the localization sector in his keynote presentation. He noted that while there has been an ongoing increase in adoption by LSPs there is considerable room to grow.

Arle specifically pointed out that a large number of LSPs who currently have MT capacity only use it for less than 15% of their customer workload and, “our survey reveals that LSPs, in general, process less than one-quarter of their [total] volume with MT.”

The CSA survey polled a cross-section of 170 LSPs (from their "Ranked 191" set of largest global LSPs) on their MT use and MT-related challenges. The quality of the sample is high and thus these findings are compelling.

The graphic below highlights the survey findings.

CSA Survey of MT Use at LSPs

When they probed further into the reasons behind the relatively low use of MT in the LSP sector they discovered the following:

  • 72% of LSPs report difficulty in meeting quality expectations with MT
  • 62% of LSPs struggle with estimating effort and cost with MT

Both of these causes point to the difficulty that most LSPs face with the predictability of outcomes with an MT project.

Arle reported that in addition to LSPs, many enterprises also struggle with meeting quality expectations and are often under pressure to use MT in inappropriate situations or face unrealistic ROI expectations from management. Thus, CSA concluded that while current-generation MT does well relative to historical practice, it does not (yet) consistently meet stakeholder requirements.

This apparent market reality validated by this representative sample is in stark contrast to what happens at Translated Srl, where 95% of all projects and all client work use MT (ModernMT) since it is a proven way to expedite and accelerate translation productivity.

Adaptive, continuously learning ModernMT has been proven to work effectively over thousands of projects with tens of thousands of translators.

This ability to properly use MT in an effective and efficient assistive role in production translation work has resulted in Translated being one of the most efficient LSPs in the industry, with the highest revenue per employee and high margins.

Another example of the typical LSP experience: a recent study by Charles University done with only 30 translators using 13 engines (EN>CS) concludes: "the previously assumed link between MT quality and post-editing time is weak and not straightforward." They also found that these translators had “a clear preference for using even imprecise TM matches (85–94%) over MT output."

This is hardly surprising, as getting MT to work effectively in production scenarios requires more than choosing the system with the best BLEU score.

Understanding The Localization Use Case For MT

Why is MT so difficult for LSPs to deploy in a consistently effective and efficient manner?

There are at least four primary reasons:

  1. The localization use case requires the highest quality MT output to drive productivity which is only possible with specialized expertise and effort,
  2. Most LSPs work on hundreds/thousands of smallish projects (relative to MT scale) that can vary greatly in scope and focus,
  3. Effective MT adaptation is complex,
  4. MT system development skills are not typically found in an LSP team.

MT Output Expectations

As the CSA survey showed, getting MT to consistently produce output quality to enable use in production work is difficult. While using generic MT is quite straightforward, most LSPs have discovered that rapidly adapting and optimizing MT for production use is extremely difficult.

It is a matter of both MT system development competence and workflow/process efficiency. 

Many LSPs feel that success requires the development of multiple engines for multiple domains for each client, which is challenging since they don't have a clear sense of the effort and cost needed to achieve positive ROI.

If you don’t know how good your MT output will be, how do you plan for staffing PEMT work and calculate PEMT costs?

Thus, we see MT is only used when very large volumes of content are focused around a single subject domain or when a client demands it.

A corollary to this is that it requires deep expertise and understanding of NMT models to acquire the skills and data needed to raise MT output to useful high-quality levels consistently.

Project Variety & Focus

Most LSPs handle a large and varied range of projects that cover many subject domains, content types, and user groups on an ongoing basis. The translation industry has evolved around a Translate>Edit>Proof (TEP) model that has multiple tiers of human interaction and evaluation in a workflow.

Most LSPs struggle to adapt this historical people-intensive approach to an effective PEMT model which requires a deeper understanding of the interactions between data, process, and technology.

The biggest roadblock I have seen is that many LSPs get entangled in opaque linguistic quality assessment and estimation exercises, and completely miss the business value implications created by making more content multilingual. Localization is only one of several use-cases where translation can add value to the global enterprise's mission.

Typically, there is not enough revenue concentration around individual client subject domains, thus, it is difficult for LSPs to invest in building MT systems that would quickly add productivity to client projects. 

MT development is considered a long-term investment that can take years to yield consistently positive returns.

This perceived requirement for the development of multiple engines for many domains for each client requires an investment that cannot be justified with short-term revenue potential. MT projects, in general, need a higher level of comfort with outcome uncertainty, and, handling hundreds of MT projects concurrently to service the business is too demanding a requirement for most LSPs.

MT is Complex

Many LSPs have dabbled with open-source MT (Moses, OpenNMT) or AutoML and Microsoft Translator Hub only to find that everything from data preparation to model tuning, and quality measurement is complicated, and requires deep expertise that is uncommon in the language industry.

While it is not difficult to get a rudimentary MT model built, it is a very different matter to produce an MT engine that consistently works in production use. For most LSPs, open-source and DIY MT is the path to a failed project graveyard.

Neural MT technology evolution is happening at a significantly faster pace than Statistical MT. To stay abreast with the state-of-the-art (SOTA) requires a serious commitment, both in manpower and computing resources.

LSPs are familiar with translation memory technology that has barely changed in 25 years, but MT has changed dramatically over the same period. In recent years the neural network-based revolution has driven multiple open-source platforms to the forefront and keeping abreast with the change is difficult.

NMT requires expertise not only around "big data", NMT algorithms, and open-source platform alternatives but also around understanding parallel processing hardware.

Today AI and Machine Learning (ML) are synonymous, and engineers with ML expertise are in high demand.

MT requires long-term commitment and investment before consistent positive ROI is available and few LSPs have an appetite for such investments.

Some say that an MT development team might be ready for prime-time production work only after they have built a thousand engines and have this experience to draw from. This competence-building experience seems to be a requirement for sustainable success.

Talent Shortage

Even if LSP executives are willing to make these strategic long-term investments, finding the right people has gotten increasingly harder. According to a recent survey by Gartner, executives see the talent shortage not just as a major hurdle to progressing organizational goals and business objectives, but it is also preventing many companies from adopting emerging technologies.

The Gartner research, which is built on a peer-based view of the adoption plans of 111 emerging technologies from 437 IT global organizations over a 12- to 24-month time period, shows that talent shortage is the most significant adoption barrier to 64% of emerging technologies, compared with just 4% in 2020.

IT executives cited talent availability as the main adoption risk factor for the majority of IT automation technologies (75%) and nearly half of digital workplace technologies (41%).

But using technology early and effectively creates a competitive advantage. Bain estimates that “born-tech” companies have captured 54% of the total market growth since 2015. “Born-tech” companies are those with a tech-led strategy. Think Tesla in automobiles, Netflix in media, and Amazon in retail.

Technology has emerged as the primary disruptor and value creator across all sectors. The demand for data scientists and machine learning engineers is at an all-time high.

LSPs need to compete with the global 2000 enterprises who offer more money and resources to the same scarce talent. Thus, we even see technical talent migrating out of translation services to the “mainstream” industry.

There is a gold rush happening around well-funded ML-driven startups and enterprise AI initiatives. ML skills are being seen as critical to the next major evolution in value creation in the overall economy as the chart below shows.

This perception is driving huge demand for data scientists, ML engineers, and computational linguists who are all necessary to build momentum and produce successful AI project outcomes. The talent shortage will only get worse as more people realize that deep learning technology is fueling most of the value growth across the global economy.

Thus, it appears that MT is likely to remain an insurmountable challenge for most LSPs. The option for an LSP to start building robust state-of-the-art MT capabilities in 2021 is increasingly unlikely. 

Even the largest LSPs today have to use “best-of-breed” public systems rather than build internal MT competence. Strategies employed to do this typically depend on selecting MT systems based on BLEU, hLepor, TER, Edit Distance, or some other score-of-the-day, which again explains why there is <15% MT-in-production-use.

As CSA has discovered, LSP MT use has been largely unsuccessful because a good Edit Distance/hLepor/Comet score does not necessarily translate to responsiveness, ease of use, adaptability of the MT system to the production localization use-case needs.

For MT to be useable on 95%+ of the production translation work done by an LSP, it needs to be reliable, flexible, manageable, rapidly adaptive, and continuously learning. MT needs to produce predictably useful output and be truly assistive technology for it to work in localization production work.

The contrast of the MT experience at Translated Srl is striking. ModernMT was designed from the outset to be useful to translators and created to collect the right kind of data needed to rapidly improve and assist in localization project-focused systems.

ModernMT is a blend of the right data, deep expertise in both localization processes and machine learning, and a respectful and collaborative relationship between translators and MT technologists. It is more than just an adaptive MT engine.

Translated has been able to overcome all of the challenges listed above using ModernMT, which today is possibly the only viable MT technology solution that is optimized for the core localization-focused business of LSPs.

ModernMT is the creation of an MT system optimized for LSP use. It could be used quickly and successfully by any LSP as there is no startup setup and training needed, it is a simple "load TM and immediately use" model.

ModernMT Overview & Suitability for Localization

ModernMT is an MT system that is responsive, adaptable, and manageable in the typical localization production work scenario. It is an MT system architecture that is optimized for the most demanding MT use-case: localization. And it is thus able to handle many other use-cases which may have more volume but are less demanding on the output quality requirements.

ModernMT is a context-aware, incremental, and responsive general-purpose MT technology that is price competitive to the big MT portals (Google, Microsoft, Amazon) and is uniquely optimized for LSPs and any translation service provider, including individual translators.

It can be kept completely secure and private for those willing to make the hardware investments for an on-premise installation. It is also possible to develop a secure and private cloud instance for those who wish to avoid making hardware investments.

ModernMT overcomes technology barriers that hinder the wider adoption of currently available MT software by enterprise users and language service providers:

  • ModernMT is a ready-to-run application that does not require any initial training phase. It incorporates user-supplied resources immediately without needing upfront model training.
  • ModernMT learns continuously and instantly from user feedback and corrections made to MT output as production work is being done. It produces output that improves by the day and even the hour in active-use scenarios.
  • ModernMT is context-sensitive.
  • The ModernMT system manages context automatically and does not require building domain-specific systems.
  • ModernMT is easy to use and rapidly scales across varying domains, data, and user scenarios.
  • ModernMT has a data collection infrastructure that accelerates the process of filling the data gap between large web companies and the machine translation industry.
  • Driven easily by the source sentence to be translated and optionally small amounts of contextual text or translation memory.

ModernMT’s goal is to deliver the quality of multiple custom engines by adapting to the provided context on the fly. This fluidity makes it much easier to manage on an ongoing basis as only a single engine is needed.

The translation process in ModernMT is quite different from common, non-adapting MT technologies. The models created with this tool do not merge all the parallel data into a single indistinguishable heap; separate containers for each data source are created instead and this is how it maintains the ability to adapt to hundreds of different contextual use scenarios instantly.

ModernMT consistently outperforms the big portals in MT quality comparisons done by independent third-party researchers, even on the static baseline versions of their systems.

ModernMT systems can easily outperform competitive systems once adaptation begins, and active corrective feedback immediately generates quality-improving momentum.

The following charts show how ModernMT is a consistent superior performer even as the quality measurement metrics change over multiple independent third-party evaluations conducted over the last three years. 

None of these metrics capture the ongoing and continuous improvements in output quality that is the daily experience of translators who work with dynamically improving ModernMT at Translated Srl.

Independent evaluations confirm ModernMT quality improves faster with COVID data set on English > German in the chart below.

ModernMT was also the "top performer" on several other languages tested with COVID data.

In Q4 2021 the COMET metric is widely being considered a "better" score because it is more aligned with human assessments and also incorporates semantic similarity, and again ModernMT shines.

If the predictions about the transformative impact of the deep learning-driven revolution are true, DL will likely disrupt many industries including the translation industry. MT is a prime example of an opportunity lost by almost all the Top 20 LSPs.

While it is challenging to get MT working consistently in localization scenarios, ModernMT and Translated show that it is possible and that there are significant benefits when you do.

This success also shows that when you get MT properly working in professional translation work, you create competitive advantages that provide long-term business leverage.  The future of business translation increasingly demands collaborative working models with human services integrated with responsive adapted MT. The future for LSPs that do not learn to use MT effectively will not be rosy.

A detailed overview of ModernMT is provided here. It is easy to test it against other competitive MT alternatives, as the rapid adaptation capabilities can be easily seen by working with MateCat/Trados or with a supported TMS product (MemoQ) if that is preferred.

ModernMT is an example of an MT system that can work for both the LSP and the Translator. The ease of the "instant start experience" with Matecat + ModernMT is striking when compared to the typical plodding, laborious MT customization process we see elsewhere today. Try it and see.