Pages

Friday, February 1, 2019

Understanding the Realities of Language Data

This is a guest post by Luigi Muzii that focus mostly on the various questions that surround Language Data, which by most “big data” definitions and volumes is really not what most in the community would consider big data. As the world hurtles into the brave new world that is being created by a growling volume of machine learning and AI applications, the question of getting the data right is often brushed aside. Most think the data is a solved problem or presume that data is easily available. However, those of us who have been working at MT seriously over the last decade understand this is far from a solved problem. Machines learn from data and smart engineers can find ways to leverage the patterns in data in innumerable ways. Properly used it can make knowledge work easier, or more efficient e.g. machine translation, recommendation, and personalization.

The value and quality of this pattern learning can only be as good as the data used, and however exciting all this technology seems, we need to understand that our comprehension of how the brain (much less the mind) works, is still really is only in its infancy. “Machine learning” is a geeky way of saying “finding complicated patterns in data”. The comparative learning capacity and the neuroplasticity of any 2-year-old child will pretty much put most of these "amazing and astounding" new AI technologies to shame. Computers can process huge amounts of data in seconds, and sometimes they can do this in VERY useful ways, and most of what we call AI today is rarely if ever much more than this. If the data is not good, the patterns will be suspect and often wrong. And given what we know about how easy it is to screw data up this will continue to be a challenge. In fact, DARPA and others are now discussing strategies about detecting and removing “poisoned data” on an increasingly regular basis. Much of the fear about rogue AI is based on this kind of adversarial machine learning which can lead trusted production systems astray to make biased and even dangerous errors.


Despite the hype, we should understand that deep learning algorithms are increasingly going to be viewed as commodities.

The data is your teacher. It's the data where the real value is.


Data is valuable when it is properly collected, understood, organized and categorized. Having rich metadata and taxonomy is especially valuable with linguistic data. The SMT experience has shown that much of the language industry TM was not very useful in building SMT engines without significant efforts in data cleaning. This is even more true with Neural MT. As Luigi points out, most of the data used by the MT engines that process 99% of the language translation being done on the planet today, have used data that the language industry has had very little to do with. In fact, it is also my experience that many large scale MT projects for enterprise use cases involve a data acquisition and data creation phase that produces the right kind of data to drive successful outcomes. While data from localization projects can be very useful at times, it is most often better to create and develop training data that is optimized for the business purpose. Thus a man-machine collaboration

Luigi has already written about metadata and you can find the older articles here and here.

This is also true for the content that drives digital experience in the modern digital marketplace. We are now beginning to understand that content is often the best salesperson in a digital marketplace and good content drives and can enhance a digital buyer and customer journey. And here too data quality and organization matters, in fact, it is a key to success in the dreams of digital transformation. Content needs to be intelligent.

 Ann Rockley said years ago:

Intelligent content is content that’s structurally rich and semantically categorized and therefore automatically discoverable, reusable, reconfigurable, and adaptable.

Here is a simple and succinct description of what intelligent content is. For a more detailed look at what this is and what is needed to make it happen through the translation and globalization process take a look at the SDL ebook on the Global Content Operating Model.



===============================



Let’s face it: Translation is prevalently considered as a trivial job. Investigating the rationale of this view is pointless, so let’s take it as a matter of fact and focus on the ensuing behaviors, starting with the constantly increasing unwillingness to pay a fee—however honest and adequate—for a professional performance. Also, the attitude of many industry players towards their customers does not help correct this cheapish view either.

Unfortunately, in conjunction with the prevailing of the Internet, the idea has been progressively established that goods—especially virtual ones—and indeed services, should be ever cheaper and better. The (in)famous zero marginal cost theory has heavily contributed with to the vision of an upcoming era of nearly free goods and services, “precipitating the meteoric rise of a global Collaborative Commons and the eclipse of capitalism.” But who will pay for this? Are governments supposed to subsidize all infrastructural costs to let marginal cost pricing prevail? Well, no. But keep reading anyway.

So where is the Language Data?


Is there anything in all this having to do with data? Data has entered the discussion because of big data. The huge amounts of data manipulated by the large tech corporations have led to the assumption that translation buyers, and possibly industry players too, could do the same with language data. This has also led to an expectation with respect to the power of data, an expectation that may be overly exaggerated or beyond any principle of reality.


Indeed, from the pulverization of the industry, a series of problems comes that have not yet been resolved. The main problem consists in the fact there is no one player really big and vertical having such a large and up-to-date amount of data—especially language data—to be even remotely considered big data or to be used in any comparable way.

Also, more than 99,99 percent of translations today is performed through machine translation and the vast majority of the training data of major online engines comes from sources other than the traditional industry ones. Accordingly, the data that industry players can offer and make available and even use for their business purposes are comparably very little and poor. In fact, the verticality of this data and the width of the relevant scope are totally insufficient to enable any player, including or maybe especially the largest ones, to impact the industry. A certain ‘data effect’ indeed exists only because online machine translation engines are trained with a huge amount of textual data available on the Internet regardless of the translation industry.

For these reasons, a market place of language data might be completely useless if not even pointless. It might be viable but the data available could hardly be the data needed.

For example, TAUS Matching Data is an elegant exercise, but its practicality and usefulness are yet to be proved. It is based on DatAptor, a research project pursued by the Institute for Logic, Language and Computation at the University of Amsterdam under the supervision of Professor by Khalil Sima’an. DatAptor “aims at providing automatic methods for inducing dedicated translation aids from large translation data” by selecting datasets from existing repositories. Beyond the usual integrity, cleanliness, reliability, relevance, and prevalence issues, the traditional and unsolved issue of information asymmetry persists: A deep linguistic competence and subject-field expertise, as well as a fair proficiency in data management, are needed to be sure that the dataset is relevant, reliable and up-to-date. And while the first two might possibly be found in a user querying the database, they are harder to find in the organization collecting and supposedly validating the original data.

Also, several translation data repository platforms are available today generally by harvesting data through web crawling. The data used by the highest-resourced online machine translation engines comes from millions of websites or from the digitalization of book libraries.

Interestingly, open-source or publicly-funded projects like bicleaner, TMop, Zipporah, ParaCrawl or Okapi CheckMate are growing to harvest, classify, systematize, and clean language data.

The initiative of a self-organized group of ‘seasoned globalization professionals’ from some major translation buyers may be seen as part of this trend. This group has produced a list of best practices for translation memory management. Indeed, this effort proves that models and protocols are necessary for standardization, not applications.

TMs are not dead and are not going to die as long as CAT tools and TMSs remain the primary means in the hands of translation professionals and businesses to produce language data.

At this point, two questions arise: What about the chance of having different datasets from the same original repository available on the same marketplace? And what about synthetic data? So, the challenge of selecting and using the right data sources remains unsolved.

Finally, also the coopetition paradox applies to a hypothetical language data marketplace. Although many translation industry players may interact and even cooperate on a regular basis, most of them are unwilling to develop anything that would benefit the entire industry and keep struggling to achieve a competitive advantage.

Is Blockchain really the solution?



For all these reasons, blockchain is not the solution for a weak-willed, overambitious data marketplace.

As McKinsey’s partners Matt Higginson, Marie-Claude Nadeau, and Kausik Rajgopal wrote in a recent article, “Blockchain has yet to become the game-changer some expected. A key to finding the value is to apply the technology only when it is the simplest solution available.” In fact, despite the amount of money and time spent, little of substance has been achieved.

Leaving aside the far-from-trivial problem of the immaturity, instability, expensiveness, complexity—if not obscurity—of the technology and the ensuing uncertainty, maybe blockchain can be successfully used in the future to secure agreements and their execution (possibly through smart contracts), though hardly to anything else in the translation business. Competing technologies are also emerging as less clunky alternatives. Therefore, it does not seem advisable to put your money in a belated and misfocused project based on a bulky, underachieving technology as a platform for exchanging data that will still be exposed to ownership, integrity, and reliability issues.


The importance of Metadata


Metadata is totally different: It can be extremely interesting even for a translation data marketplace.
The fact that big data is essentially metadata has possibly not been discussed enough. The information of interest for data-manipulating companies does not come from the actual content posted, but from the associated data vaguely describing user behaviors, preferences, reactions, trends, etc. Only in a few cases text strings, voice data, and images are mined, analyzed, and re-processed. Even in this case, the outcome of this analysis is stored as descriptive data, i.e. metadata.  The same applies to IoT data. Also, data is as good as the use one is capable of making of it. In Barcelona, for example, within the scope of the Decode project, mayor Ada Colau is trying to use data on the movements of citizens generated by apps like Citymapper to inform and design a better system of public transport.

In translation, metadata might prove useful for quality assessment, process analysis, and re-engineering and market research, but it is much less considered than language data and even more neglected than elsewhere.

Language data is obsessively reclaimed but ill-curated. As usual, the reason is money: Language data is supposed to be immediately profitable, by leveraging it through discount policies or by training machine translation engines. In both cases, they are seen as a means at hand to sustain the pressure on prices and reduce compensations to linguists. Unfortunately, the quality of language data is generally very poor, because curating it is costly.

Italians use the expression “fare le nozze coi fichi secchi” (make a wedding with dry figs) for an attempt to accomplish something without spending what is necessary, while Spanish say “bueno y barato no caben en un zapato” (good and cheap don’t fit in a shoe). Both expressions recall the popular adage “There ain’t no such thing as a free lunch.”

This idea is common to virtually every culture, and yet translation industry players still have to learn it, and possibly not forget it.

We often read and hearsay that there’s a shortage of good talent in the market. On the other hand, many insist that there is plenty and that the only problem of this industry is its ‘bulk market’—whatever this means and regardless of how reliable those who claim this are or boast to be and are wrongly presumed to be. Of course, if you target Translators Café, ProZ, Facebook or even LinkedIn to find matching teams you most possibly have a problem in knowing what talent is and which talents are needed today.

Let’s face it: The main reason for high-profile professionals (including linguists) being unwilling to work in the translation industry is remuneration. And this is also the main reason for the translation industry and the translation profession to be respectively considered as a lesser industry and a trivial job. In an endless downward spiral.

Bad resources have been driving out the good ones for a long time now. And if this applies to linguists that should be ribs, nerves, and muscles of the industry, let alone what may happen with sci-tech specialists.

In 1933, in an interview for the June 18 issue of The Los Angeles Times, Henry Ford offered this advice to fellow business people, “Make the best quality of goods possible at the lowest cost possible, paying the highest wages possible.” Similarly, to summarize the difference between Apple’s long-prided quest for premium prices and Amazon’s low-price-low-margin strategy, on the assumption it would make money elsewhere, Jeff Bezos declared in 2012, “Your margin is my opportunity.”

Can you tell the difference between Ford, Amazon, and any ‘big’ translation industry player? Yes, you can.


Luigi Muzii's profile photo


Luigi Muzii has been in the "translation business" since 1982 and has been a business consultant since 2002, in the translation and localization industry through his firm . He focuses on helping customers choose and implement best-suited technologies and redesign their business processes for the greatest effectiveness of translation and localization related work.

This link provides access to his other blog posts.

Monday, January 14, 2019

A Vision for Blockchain in the Translation Industry

Happy New Year

This is yet another post on Blockchain, a guest post by Robert Etches who presents his vision of what Blockchain might be able to be in the translation industry. A vision, by definition, implies a series of possibilities, and in this case, quite revolutionary possibilities, but does not necessarily provide all the details of how and what. Of course, the way ahead is full of challenges and obstacles which are much more visible to us than the promised land, but I think it is wise to keep an open mind and watch the evolution even if we are not fully engaged or committed or in agreement. Sometimes it is simply better to wait and see than to come to any final conclusions.

It is much easier to be dismissive and skeptical of upstart claims of fundamental change than to allow for a slim but real possibility that some new phenomenon could indeed be revolutionary. I wrote previously about CEO shortsightedness and what I called Roryisms. Here is a classic one from IBM that shows how they completely missed the boat because of their hubris and old style thinking.
Gerstner is however credited with turning around a flailing mainframe business.The cost of missing the boat can be significant for some, and we need only look at relative stock price and market value improvements over time (as this is how CEO performance is generally measured) to understand how truly clueless our boy Lou and his lieutenants at IBM, in general, were when they said this. The culture created by such a mindset can last decades as we see by the evidence. Culture is one of a company’s most powerful assets right until it isn’t: the same underlying assumptions that permit an organization to scale massively constrain the ability of that same organization to change direction. More distressingly, culture prevents organizations from even knowing they need to do so.

IBM’s chairman minimized how Amazon might transform retail and internet sales all the way back in 1999.
“Amazon.com is a very interesting retail concept, but wait till you see what Wal-Mart is gearing up to do,” said [IBM Chairman, Louis V. Gerstner Jr in 1999.]. Mr. Gerstner noted that last year IBM’s Internet sales were five times greater than Amazon’s. Mr. Gerstner boasted that IBM “is already generating more revenue, and certainly more profit, than all of the top Internet companies combined.”

AMZN Stock Price Appreciation of 36,921% versus IBMs 211% over 20 years


 IBM is the flat red line in the chart above. IBM looks just as bad against Microsoft, Google, Apple, Oracle, and many others who had actual innovation.


January 11, 2019
Amazon Market Value$802 Billion7.3X Higher
IBM Market Value$110 Billion

I bring attention to this, because, I also saw this week that IBM has filed more patents than any other company in the US in 2018, Samsung was second. In fact, IBM has been the top patent filer in the US for every year from 1996 to 2018. BTW they are leaders in blockchain patents as well. However, when was the last time that ANYBODY has associated IBM with innovation or technology leadership? 1980? Maybe they just have some great patent filing lawyers who understand the PTO bureaucracy and know how to get their filings pushed through. In fact, there have been some in the AI community who felt that IBM Watson was a joke and that the effort did not warrant serious credibility and respect. Oren Etzioni said this: “IBM Watson is the Donald Trump of the AI industry—outlandish claims that aren’t backed by credible data.” Trump is now a synonym for undeserved self-congratulation, fraud, and buffoonery, a symbol for marketing with false facts. IBM is also credited with refining and using something called FUD (fear, uncertainty, and doubt) as a deliberate sales and marketing misinformation tactic to keep customers from using better, more innovative but lesser-known products. We should not expect IBM to produce any breakthrough innovation in the emerging AI-first, machine learning everywhere world we see today, and most expect the company will be further marginalized in spite of all the patent filings. 

Some of you may know that IBM filed the original patents for Statistical Machine Translation, but it took Language Weaver (SDL), Google and Microsoft to really make it come to life in a useful way. IBM researchers were also largely responsible for conceiving of the BLEU score to measure MT output quality that was quite useful for SMT. However, the world has changed and BLEU is not useful with NMT. I plan to write more this year on how BLEU and all its offshoots are inadequate and often misleading in providing an accurate sense of the quality of any Neural MT system.

It is important to be realistic without denying the promise as we have seen the infamous CEOs do. Change can take time and sometimes it needs much more infrastructure than we initially imagine. McKinsey (smart people who also have an Enron and mortgage securitization promoter legacy) have also just published an opinion on this undelivered potential, which can be summarized as:
 "Conceptually, blockchain has the potential to revolutionize business processes in industries from banking and insurance to shipping and healthcare. Still, the technology has not yet seen a significant application at scale, and it faces structural challenges, including resolving the innovator’s dilemma. Some industries are already downgrading their expectations (vendors have a role to play there), and we expect further “doses of realism” as experimentation continues." 
 While I do indeed have serious doubts about the deployment of blockchain in the translation industry anytime soon, I do feel that if it happens it will be driven by dreamers, rather than by process crippled NIH pragmatists like Lou Gerstner and Rory. These men missed the obvious because they were so sure they knew all there was to know and because they were stuck in the old way of doing things.  While there is much about blockchain that is messy and convoluted, these are early days yet and the best is yet to come.

Another dreamer, Chris Dixon has an even greater vision on Blockchain when he recently said:
The idea that an internet service could have an associated coin or token may be a novel concept, but the blockchain and cryptocurrencies can do for cloud-based services what open source did for software. It took twenty years for open source software to supplant proprietary software, and it could take just as long for open services to supplant proprietary services. But the benefits of such a shift will be immense. Instead of placing our trust in corporations, we can place our trust in community-owned and -operated software, transforming the internet’s governing principle from “don’t be evil” back to “can’t be evil.”

========

2018 was a kick-off year for language blockchain enthusiasts. At least five projects were launched[1], there was genuine interest expressed by the industry media, and two webinars and one conference provided a stage for discussion on the subject[2]. Then it all went very quiet. So, what’s happened since? And where are we today?

Subscribers to Slator’s latest megatrends[3] can read that it’s same same in the language game for 2019: NMT, M&A, CAT, TMS, unit rates … how we love those acronyms!

On the world stage, people could only shake their heads in disbelief regarding the meteoritic rise of the value of cryptocurrencies in 2017. However, in 2018 those same people relished a healthy dish of schadenfreude as exchange rates plummeted and the old order was restored with dollars (Trump), roubles (Putin), and pound sterling (Brexit) back in vogue.

In other words, for the language industry and indeed for the world at large, “better the devil(s) we know” appears to be the order of the day.

There is nothing surprising in this. Despite all the “out of your comfort zone” pep talks by those Cassandras of Change[4], the language industry continues to respect the status quo, grow and make money[5]. Why alter a winning formula? And certainly, why even consider introducing a business model that expects translators to work for tokens?! Hello, crazy people!!!

But maybe, just maybe, there was method in Hamlet’s madness[6] and Apple was right when they praised the crazy ones[7]?

Let’s take a closer look at the wonderful world of blockchain and token economics, and how they are going to change the language industry … also the language industry.

 

Pinning down the goal posts

Because they keep moving! Don’t take my word for it. Here’s what those respected people at Gartner wrote in their blockchain-based transformation report[8] in March 2018:

Summary


While blockchain holds long-term promise in transforming business and society, there is little evidence in short-term reality.

 

Opportunities and Challenges

  • Blockchain technologies offer new ways to exchange value, represent digital assets and implement trust mechanisms, but successful enterprise production examples remain rare.
  • Technology leaders are intrigued by the capabilities of blockchain, but they are unclear exactly where business value can be achieved in the enterprise context.
  • Most enterprise blockchain experiments are an attempt to improve today's business process, but in most of those cases, blockchain is no better than proven enterprise technologies. These centralized renovations distract enterprises from other innovative possibilities offered by blockchain.
And now here’s a second overview, also from Gartner, this time their blockchain spectrum report[9] from October 2018:

 

Opportunities and Challenges

  • Blockchain technologies offer capabilities that range from incremental improvements to operational models to radical alterations to business models.
  • The impact of blockchain’s trust mechanisms and interaction paradigms extends beyond today’s business and will affect the economy, society and governance.
  • Many interpretations of blockchain today suffer from an incomplete understanding of its capabilities or assume a narrow scope.
The seven-month leap from little evidence in short-term reality to will affect the economy, society and governance is akin to a rocket-propelled trip across the Grand Canyon! Little wonder that traditional businesses don’t know where to start even looking into this phenomenon, never mind taking on a new business model that basically requires emptying the building of 90% of hardware, software and, more important, people.

But!

Why does Deloitte have 250 people working in their distributed ledger laboratories? Because when immutable distributed ledgers become a reality they will put 300,000 people out of work at the big four accountancy companies[10].

Why are at least 26 central banks looking into blockchain? Because there’s a good chance that private banks[11] will be superfluous in 10-15 years’ time and we will all have accounts with central banks.

Or there will be no banks at all …

Let’s just take a second look at that Gartner statement:

The impact of blockchain’s trust mechanisms and interaction paradigms extends beyond today’s business and will affect the economy, society and governance.

Other than basically saying blockchain will change “everything”, the sentence mentions two factors that are core to blockchain: trust and interaction.

Trust. What inspires me about blockchain is its transparency. A central tenet of blockchain is its truth gene. In a world in which even the most reliable sources of information are labeled as fake, blockchain’s traceability – its judgment in stone as to who did what, when and for whom – makes it a beacon of light.

Just think if we could utilize this capability to solve the endless quality issue? What if the client always knew who has translated what – and could even set up selection criteria based upon irrefutable proof of quality from previous assignments? It is no surprise to learn that many blockchain projects are focusing on supply chain management.

Interaction is all about peer-to-peer transactions through Ethereum smart contracts. It’s not just the central banks that will be removing the middlemen. Unequivocal trust opens the door to interact with anyone, anywhere. To a global community. These people of course speak and write in one or more of approximately 6,900 languages, so there’s a market for providing the ability for these “anyones” to speak to each other in any language. What a business opportunity! And what a wonderful world it would be!

Cryptocurrencies and blockchain: peas and carrots


You’ve gotta love Forest Gump – especially now we know Jenny grew up to become Claire Underwood[12] 😊


Just as Jenny and Forest went together like peas and carrots, so do tokens and blockchain.


Unfortunately, this is where many jump off the train. One thing is accepting the relevance of some weird ledger technology that is tipped to become the new infrastructure for the Internet, another is trading in hard-won Venezuelan dollars for some sort of virtual mumbo jumbo!
  1. All fiat currencies are a matter of trust. None is backed by anything more than our trust in a national government behaving responsibly. In 2019 that is quite a scary thought – choose your own example of lemming-like politicians.
  2. All currencies (fiat or crypto) are worth what the market believes them to be worth. In ancient times a fancy shell from some far-off palmy beach was highly valued in the cold Viking north. Today not so. At its inception, bitcoin was worth nothing. Zero. Zip. Today[13] 1 BTC = €3,508.74. Because people say so.
Today, there is absolutely no reason why a currency cannot be minted by, well, anyone. There is indeed a school of thought that believes there will be thousands of cryptocurrencies in the non-too distant future. If we look at our own industry, we have long claimed that translation memories and termbases have a value. Why can that value not be measured in a currency unique to us and with an intrinsic value that we all respect and which is not subject to the whims of short-term political aspirations? Why can’t linguistic assets be priced in a Language Coin?

Much has already been written about the concept of a token economy, though little better than the following:

An effective token strategy is one where the exchange of a token within a particular economy impacts human economic behavior by aligning user incentives with those of the wider Community.[14]

Think about this in the context of the language industry. What if the creation and QA of linguistic assets were tied to their own token? What if you – a private company, a linguist, an LSP, an NGO, an intranational organization – were paid in this token for your data and that the value of this data grew and grew year on year as it was shared and leveraged as part of a larger whole – the Language Community[15]. What if linguists were judged by their peers and their reputations were set in stone? What if everyone was free to charge whatever hourly fees they choose, and that word rates and CAT discounts were a relic of the past?

This is why blockchain feeds the token economy and why the token needs blockchain. Peas and carrots!

To end with the words of another Cassandra – a trendy one at that: Max Tegmark:
If you hear a scenario about the world in 2050 and it sounds like science fiction, it is probably wrong; but if you hear a scenario about the world in 2050 and it does not sound like science fiction, it is certainly wrong.[16]

The pace of change will continue to accelerate exponentially, and I believe blockchain will be one of the main drivers.

Already in 10-15 years, there will be some household (corporate) names and technologies that do not exist or have only just started today. And by 2050 the entire finance, food, and transport sectors (to name the obvious) will be ‘blockchained’ beyond recognition.

At Exfluency, we see multilingual communication as an obvious area where a token economy and blockchain will also come out on top; I’m sure that other entrepreneurs in a myriad of other sectors are coming to similar conclusions. It’s going to be exciting!

Robert Etches
CEO, Exfluency
January 2019


[1] Exfluency, LIC, OHT, TAIA, TranslateMe
[2] TAUS Vancouver, and GALA and TAUS webinars
[3] https://slator.com/features/reader-polls-pay-by-hour-deepl-conferences-and-2019-megatrends/
[4] Britta Aagaard & Robert Etches, This changes everything, GALA Sevilla 2015; Katerina Pastra
Krzystof Zdanowski, Yannis Evangelou & Robert Etches, Innovation workshop, NTIF 2017; Peggy Peng & Robert Etches, The Blockchain Conversation, TAUS 2018, Jochen Hummel, Sunsetting CAT, NTIF 2018.
[5] I am painfully aware that not everyone in the food chain is making money …
[6] Hamlet, II.ii.202-203
[7] https://www.youtube.com/watch?v=8rwsuXHA7RA
[8] https://www.gartner.com/doc/3869696/blockchainbased-transformation-gartner-trend-insight

[9] https://www.gartner.com/doc/3891399/blockchain-technology-spectrum-gartner-theme
[10] P.221 The Truth Machine, by Paul Vigna and Michael J. Casey
[11] Ibid. pp163-167

[12] https://en.wikipedia.org/wiki/Robin_Wright
[13] 9 January 2019
[14] P.69 The Truth Machine
[15] See Aagaard & Etches This changes everything for the sociological and economic importance of communities, the circular society, and the sharing society.
[16] Life 3.0 by Max Tegmark


=================================================================



CEO, Exfluency

A dynamic actor in the language industry for 30 years, Robert Etches has worked with every aspect of it, achieving notable success as CIO at TextMinded from 2012 to 2017. Always active in the community, Robert was a co-founder of the Word Management Group, the TextMinded Group, and the Nordic CAT Group. He served four years on the board of GALA (two as chairman) and four on the board of LT-Innovate. In a fast-changing world, Robert believes there has never been a greater need to implement innovative initiatives. This has naturally led to his involvement in his most innovative challenge to date: As CEO of Exfluency, he is responsible for combining blockchain and language technology to create a linguistic ledger capable of generating new opportunities for freelancers, LSPs, corporations, NGOs and intergovernmental institutions alike.

Tuesday, December 25, 2018

The Global eCommerce Opportunity Enabled by MT

This is a slightly updated post that has been already published on the SDL site.


The holiday season around the world today is often characterized by special holiday shopping events like Black Friday and Cyber Monday. These special promotional events generate peak shopping activity and are now increasingly becoming global events. They are also increasingly becoming a digital and online commerce phenomenon. This is especially true in the B2C markets but is also now often true in the B2B markets.

The numbers are in, and the most recent data is quite telling. The biggest U.S. retail shopping holiday of the year – from Thanksgiving Day to Black Friday to Cyber Monday, plus Small Business Saturday and Super Sunday – generated $24.2 billion in online revenues. And that figure is far below Alibaba’s 11.11 Global Shopping Festival, which in 2018 reached $30.8 billion – in just 24 hours.


When we look at the penetration of eCommerce in the U.S. retail market, we see that, as disruptive as it has been, it is still only around 10% of the total American retail market. According to Andreessen Horowitz, this digital transformation has just begun, and it will continue to gather momentum and spread to other sectors over the coming years.

The Buyer’s Experience Affects eCommerce Success


Success in online business is increasingly driven by careful and continued attention to providing a good overall customer experience throughout the buyer journey. Customers want relevant information to guide their purchase decisions and allow them to be as independent as possible after they buy a product. This means sellers now need to provide much more content than they traditionally have.

Much of the customer journey today involves a buyer interacting independently with content related to the product of interest, and digital leaders understand that understanding the customer and providing content that really matters to them is a pre-requisite for digital transformation success.

B2B Buying Today is Omnichannel


In a recent study focused on B2B digital buying behavior that was presented at a recent Gartner conference, Brent Adamson pointed out that “Customers spend much more time doing research online – 27% of the overall purchase evaluation and research [time]. Independent online learning represents the single largest category of time-spend across the entire purchase journey.”

The proportion of time 750 surveyed customers making a large B2B purchase spent working directly with salespeople – both in person and online – was just 17% of their total purchase research and evaluation process time. This fractional time is further diluted when you spread this total sales person contact time across three or more vendors that are typically involved in a B2B purchase evaluation.

The research made evident that a huge portion of a sellers’ contact with customers happens through digital content, rather than in-person. This means that any B2B supplier without a coherent digital marketing strategy specifically designed to help buyers through the buyer journey will fall rapidly behind those who do.

The study also found that just because in-person contact begins, it doesn’t mean that online contact ends. Even after engaging suppliers’ sales reps in direct in-person conversations, customers simultaneously continue their digital buying journey, making use of both human and digital buying channels simultaneously.

Relevant Local Content Drives Online Engagement


Today it is very clear to digitally-savvy executives that providing content relevant to the buyer journey really matters and is a key factor in enabling digital online success. A separate research study by Forrester uncovered the following key findings:
  • Product information is more important to the customer experience than any other type of information, including sales and marketing content.
  • 82% of companies agree that content plays a critical role in achieving top-level business objectives.
  • Companies lack the global tools and processes critical to delivering a continuous customer journey but are increasingly beginning to realize the importance of this.
  • Many companies today struggle to handle the growing scale and pace of content demands.
A digital online platform does enable an enterprise to establish a global presence very quickly. However, research suggests that local-language content is critically needed to drive successful international business outcomes. The global customer requires all the same content that a US customer would in their own buying journey.

Machine Translation Facilitates Multilingual Content Creation


This requirement for providing so much multilingual content presents a significant translation challenge for any enterprise that seeks to build momentum in new international markets. To address this challenge, eCommerce giants like eBay, Amazon, and Alibaba are among the largest users of machine translation in the world today. There is simply too much content needs to be multilingual to do this with traditional localization methods.

However, even with MT, the translation challenge is significant and requires deep expertise and competence to address. The skills needed to do this in an efficient and cost-effective manner are not easily acquired, and many B2B sellers are beginning to realize that they do not have these skill in-house and could not effectively develop them in a timely manner.

Expansion Opportunities in Foreign Markets


The projected future growth of eCommerce activity across the world suggests that the opportunity in non-English speaking markets is substantial, and any enterprise with aspirations to lead – or even participate – in the global market will need to make huge volumes of relevant content available to support their customers in these markets.

When we look at eCommerce penetration across the globe, we see that the U.S. is in the middle of the pack in terms of broad implementation. The leaders are the APAC countries, with China and South Korea having particularly strong momentum as shown below. You can see more details about the global eCommerce landscape in the SDL MT in eCommerce eBook.



The chart below, also from Andreessen Horowitz shows the shift in global spending power and suggests the need for an increasing focus on APAC and other regions outside of the US and Europe. The recent evidence of the power of eCommerce in China shows that these trends are already real today and are gathering momentum.

The Shifting Global Market Opportunity




To participate successfully in this new global opportunity, digital leaders must expand their online digital footprint and offer substantial amounts of relevant content in the target market language in order to provide an optimal local B2C and B2B buyer journey. 

As Andreesen points out, the digital disruption caused by eCommerce has only just begun and the data suggests that the market opportunity is substantially greater for those who have a global perspective. SDL's MT in eCommerce eBook provides further details on how a digitally-savvy enterprise can handle the new global eCommerce content requirements in order to partake in the $40 trillion global eCommerce opportunity.



Happy Holidays to all.  

May your holiday season be blessed and peaceful.  


Click here to find the SDL eBook on MT and eCommerce.