Pages

Sunday, October 14, 2018

Looking at Blockchain in the Translation Industry

 I recently attended the TAUS Annual Conference, where " current language technology and collaboration" is the focus. And indeed it has historically been the best place to talk about technology in the "language industry".  It was very clear that in addition to MT, edit distance and error classification on MT systems output are also REALLY important to this community.   Blockchain alternatives were presented as the most revolutionary new technology at this event, since the buzz on NMT has subsided a bit, but I have always wondered if it is really possible to really get truly revolutionary if your primary focal point is "localization".  

I say all this not out of any disregard, but somewhat triggered by Chris Wendt (Microsoft) and I sharing thoughts on our motivations on staying with MT after all these years, and our shared angst about when we as a community would (if ever) start talking about "real" MT applications. I am quite sure that we were both glad to hear LinkedIn and Dell very clearly state that the value of the MT content to the customer, and better engagement of global populations (enabled by MT) were much more important than any kind of quality score, and as a rule, more content faster would always be better than better translations delivered way too late in these days of digital transformation.

As long as I have been in "the industry" there has always been a discussion about taking localization to the next level. To make it more respectable. To be considered with more regard by outsiders and be seen more often in the "mainstream press".  I am not sure that this is really possible if the industry focus does not change, and move beyond cost efficiency and translation quality concerns. The fundamental challenge the industry has is mostly because localization happens after-the-fact i.e. after the marketing, and product development people have decided everything that is really worth deciding to drive a market revolution, and/or make a digital transformation happen.

My sense is that it is increasingly all about content and content is more than words that we translate. It is about relevance and value and transformation. Content is where localization, marketing, and product development can meet. Content is where customers meet the enterprise. Content is the magic key that possibly lets localization people into the C-suite. And digital experience is where finance and customer service and support also join the party. From my vantage point the only company that fundamentally understands this in "the industry"  is the "new  SDL". I am quite possibly biased since they deposit money into my bank account on a regular basis, but I like to think that I can make this statement on purely objective facts as well. It is much more important to understand what you should translate and why you are doing it, than simply translate efficiently with fewer errors with MT systems that produce very low edit distances. Indeed it is probably most important to understand what is needed to get the original content right in the first place as that is the fundamental driver of digital transformation. Understand relevance and value. Revolutions tend to be more about what matters, and why it matters,  then about how should we do what we must do. Being content focused enables you get much closer to the source, to the what and the why. 

However, in this age of fake news even content is under fire. We are surrounded by "fake news" and fake videos and fake pictures. How do we tell what is true and what is not? What about blockchain? The idea of an immutable ledger stored in the cloud, tracing the origin of all content to its source, definitely sounds appealing. Users could compare versions of videos or images to check for modifications, and watermarks would serve as a badge of quality.  But here, too, the question is whether this can be applied to text-based content, where the intent to deceive leaves fewer technical traces.

There are now some who wish to bring specialized blockchain implementations to localization (translation) with verified translators, translation memory and payment mechanisms and raise  the level of trust and fairness.  I am hoping to publish a series of posts on this subject that show various perspectives on this issue and technology. I cannot say I really understand the blockchain potential here at this point, and this post and others that follow is part of my effort to learn and share. 

Gabor Ugray has written a post on blockchain as having far-reaching consequences for compensation, compliance, workflows, and tools in the industry, and has a much more optimistic viewpoint than presented by Luigi in the bulk of this post, that I also recommend to readers.

Luigi Muzii is a commentator who is often considered acerbic and "negative" by many in the industry. But I like to listen to his words generally, since he also tends to cut through the bullshit and get to core issues.  He is not enthusiastic about the impact of blockchain on the translation business. This guest post describes why. His summary conclusion:

Blockchain is no change, it may possibly be an improvement, but it will keep us doing things the way they have been done so far, in a [slightly] different shape. 

If we consider the history of MT in the localization industry, his current conclusions do indeed make sense and seem very reasonable. In this industry, MT is about error classification, edit distances, quality measurement, comparing MT system scores, and custom engines. It is almost never about understanding global customers, listening more closely around the globe, better global communication and collaboration on a daily basis, or rapidly  scaling and ramping up international business. Outsiders have for the most part led those kind of truly transformational MT-driven initiatives. We are defined often by the kinds of measures we use to define our success. Consider what you have accomplished by getting a low edit distance score across 30 MT systems vs. say increasing Russian traffic by 800% and Russian online transactions by 25% by translating 50 million product listings into Russian. Lets also say that this increases sales by $150 million. We can also safely bet that the edit distance on these billions of words is quite terrible and very high. (Yes. I understand that this is a very lop-sided contrast.)

So here is a toast to lower edit distance scores on all your MT systems, and to error classification systems with at least 56 dimensions. 😏

And thank you to Eric Vogt for educating the TAUS  community on what a taus actually looks like, as shown below. As somebody who seriously plays a closely related instrument, I appreciate people knowing this. Robert Etches also won some points in my eyes for stating the seminal and enduring influence that a book about the massacre at Wounded Knee had on his sense of injustice as a young man.





-----------------------------------------------------------



During a panel discussion at the first Hackers Conference in 1984, the influential publisher, editor, and writer Stewart Brand was recorded telling Steve Wozniak, “On the one hand information wants to be expensive, because it’s so valuable. The right information in the right place just changes your life. On the other hand, information wants to be free, because the cost of getting it out is getting lower and lower all the time.”

Over the years, Brand’s articulated nuanced modern conundrum has been amputated as “Information wants to be free,” thus distorting the original concept. Indeed, in this way, Brand’s originally value-neutral statement can be used ambiguously to argue the benefits of propertied information, of free and open information, or of both.

Truth is that people should be able to access information freely and that information should be independent, transparent, and honest. Unfortunately, the pages of the mainstream press, especially the economic ones, are the least “free” of all, filled as they are with the successes of companies possibly filling expensive advertising spaces in the same media, nearly without a critical or at least skeptical comment. Maybe, later, it turns out that those very same companies have been in crisis for some time. This same shameful habit can be found in trade media, especially the translation industry media, most often hosting docilely promotional articles. Nothing, however, indicates their peculiar nature next to them. This might not figure a problem of “freedom of information” but certainly one of “quality of information”.

Recently, Joseph E. Stiglitz, a recipient of the Nobel Memorial Prize in Economic Sciences in 2001, has warned against the risk of a short-sighted outlook. True, John Maynard Keynes said, “long run is a misleading guide to current affairs. In the long run we are all dead,” but any “sugar high” is going to vanish when the same unsolved problems fire back. And a major one is exactly the lack of transparency.

In God Bless You, Mr. Rosewater, Kurt Vonnegut recalls the free enterprise system as having “the sink-or-swim justice of Caesar Augustus built into it,” and hence the need for “a nation of swimmers, with the sinkers quietly disposing of themselves.”

The “translation industry” is aggressively competing on prices and volumes, so boasting a growth in revenues and volumes, but not in profits and compensations, and even greater expectations for the next year is not really a smart prospect. In fact, the larger translation companies may be collecting revenues in the short term in many ways, and a short-sighted outlook is the fastest lane to the grave of the whole industry. On the other hand, despite the paeans sung to the alleged smartness of the so-called champions of this industry, greed seems to be what has been driving them more than entrepreneurial and an innovational spirit. Simply put, can anyone name the Elon Musk of the translation industry? And, to be totally clear, leveraging public funding to build a platform and exploit cheap labor may be cunning, but it is no entrepreneurship.

Unfortunately, these “champions” have managed to make a business trend prevail to make the industry and its products and services irrelevant, so not even learning how to swim with sharks could be enough. The whole industry keeps chanting the same old litany, with the same people telling the same old stories in the same old places to the same people who are still surprisingly eager to listen.
On the other hand, there is also the same old vox clamantis in deserto, while the landscape keeps changing.

Blockchain again


Take blockchain for example. Blockchain is largely considered still an immature technology. The market is less than embryonic with no clear recipe for implementation and very few unstructured experimental solutions. Despite many publications, no clear and undisputed strategic evaluation of blockchain has emerged yet and many companies are reconsidering their investments. However, the hype has infected the translation industry through a breath of wind that has traveled the seas and industry events and media.

As many experts, analysts and observers noticed, blockchain is not as efficient as traditional databases: It’s much hungrier of energy use, processing power, and even storage. Also, a blockchain is only as good as the information that is put in it. In other words, the data in a blockchain is not checked in any way.

The translation industry and the translation profession positively need to be modernized and solve perennial problems, but blockchain can hardly be seen as the eventual technology solution and used as a banner in this respect. People who do so are just cynically using blockchain purely for its potential reputational value, to draw some attention, prove how innovative they are, and eventually attract investments. They may even start some kind of an implementation, though it is likely rather a proof of concept, that they possibly know they would not benefit from it in any way.

As a matter of fact, if and when some future practical applications will show possible, the only way those very people might most possibly benefit from their significant investments will be a highly volatile lead, a nonpaying competitive advantage.

At the moment, there seem to be six different categories of business applications addressing two major needs. It remains to be seen where industry players might find their applications: Storing reference data, with a view on ownership? Smart contracts? Maybe this might be a major application, but it is not among those that are being figured out at the moment. What about payment? It is the most interesting application, but it is not being figured out either.

Blockchain might be used to verify the identity of the person(s) with writing privilege, but as long as no control exists over the information being written, and the information itself remains unchecked, this feature will prove useless as it would always be possible to write fraudulent data to the blockchain. Therefore, a mechanism should be devised to stamp the digital signature of the legitimate owner of each digital item as a unique code that stays with it all the way through the supply chain.

Itnellectual property (IP) is another issue here. As long as no mechanism is available to unmistakably identify the owner of the content, investments in blockchain for content transactions would be highly capricious. And an open blockchain to store this information and help to integrate different organizations and systems still seems a long way off given the absence of standards and of any ongoing attempts to identify one.

Blockchain could support a validated register of qualified practitioners with proven experience in a specific field, whose credentials would be validated to allow customers to quickly find a qualified workforce. However, one of the many things that can never be provided is “trust” in transactions.

Will there ever be a sticker on an item that is valid and complete enough? What about the corporate and personal integrity of the people behind the processes in place? After all, Bitcoin success is mostly due to anonymity and the use of it by criminals.

Blockchain is not going to be disruptive to the translation industry. Once mature it will allow certain things to be done better and more efficiently, but it will not do to the translation industry what digital photography did to Kodak.

More with less

 

What does transparent, honest and independent information have to do with blockchain? And with the translation industry? As for the blockchain, the trade press, even more than the business or the mainstream press, should help debunk hype rather than fuel the frenzy on which they thrive by helping, more or less explicitly, the club of the usual suspects. Also, with their production, those industry sources that should be taken as authoritative end up looking generally unreliable and this creates a general climate of distrust. As a result, the mainstream media do not turn to these sources to gather the information to process for their audiences. Eventually, the translation industry is being devalued even more than it already is.

It comes as no surprise that the “do-more-with-less” mantra has been unquestioningly borrowed by the translation industry in the past few years, together with many other baloneys such as agile, lights-out, augmented, etc. when the many marketing geniuses crowding the industry could have invented something better than a flabby ‘transcreation.’ Of course, they should have done their homework first, and this would have spared them the poor figure of using wrong quotes and attributing them to the wrong people.

But no. Most industry players still believe the same old little fairy tale they have been told for years that the more companies expand globally, the more they need to pay attention to local language expectations in the new markets they are trying to enter, the more they need to pay close attention to the linguistic, cultural, and even socio-economic nuances of these markets, and that this makes translation a major part of a company’s global growth strategy.

 

Beyond futility

 

The greatest damage to the industry as a whole has been the explosion of true ‘religion wars’ pervading the industry with the abandonment of any objective, accurate and unemotional approach to problems.

This prevents the understanding of some otherwise simple and obvious phenomena. Globalization, for example, has been underway for almost three decades now while the growth of international trades started soon after the end of WWII. So, why the industry is still waiting for global companies “to pay close attention to the linguistic, cultural, and even socio-economic nuances” of international markets?

Also, content growth is exponential, while revenue growth in the translation industry is linear and the slope below 10°. Why, then, revenues are still that important to measure? How does translation demand correlate with content growth? What is the correlation between 99,99% of content being machine translated and the supposedly growing revenues of the industry’s major players?

Wait! Making effective translation a major part of a company’s global growth strategy is “a daunting task that is near impossible without technological leverage and momentum.” Now everything’s clear: this what happens when you try to bullshit the bullshitter. And they buy it.

This is a fairy tale, and it is where the gig economy comes into play. In fact, it is being peddled in industry events and media by the very same CEOs who are not used to do their homework and would misquote Henry Ford. Maybe they don’t even know that he did not invent either the assembly line, or interchangeable parts, or the automobile. Anyhow…

These very same people have no credible answer when they are asked why the translation industry has been familiar with the gig economy from inception, and yet still talks about innovation. And all laugh? Not really, indeed. As Marion McGovern, author of the otherwise forgettable Thriving in the Gig Economy, points out, “The advent of the digital platform world has altered the talent supply chain.” And rather than pushing out traditional staffing agencies, “digital platforms are becoming sourcing engines for them. Big companies may use both the staffing firm and then for urgent or unforeseen projects, turn to the platforms for options.”

Beyond blockchain and baloneys, the translation industry is stuck with obsolete models. Companies in other industry have been measuring performance for years as a way of adding value to their organizations; LSPs, no matter the size, are still at error-catching quality management.

 

Content flood

 

Given the inability of industry players to keep up with customer expectations for technological and process innovation, taking translation in the pipeline has been having the effect of further commoditizing translation. And commodification will continue, with expectations going up, and costs going further down.

The case of Amazon’s Chinese-branch employees leaking data for bribes is emblematic of the risks underpaid workforces might be emboldened to take. And with production business translation being more and more automated by ever better neural machine translation engines things might get a little dicey even in the translation industry.

Many of the larger translation buyers have been developing, managing and delivering multilingual content in different formats and devices for years now. The next step will be fitting everything together into a single workflow. Emerging technology will finally fully enable Sir Berners-Lee’s dream of a semantic Web. In fact, for more than a decade, a significant fraction of Web domains has been generated from structured databases. Today, many Web domains contain Semantic Web (i.e. schema.org) markups.

New models are being devised for content enrichment or augmentation (i.e. integrating different content aspects and simple resources with semantic and knowledge.) This mainly consists of metadata processing to exploit information and allow end users to navigate on semantic annotations. Content enrichment is a very expensive task typically performed for valuable content. Today, it can be done automatically by combining the analytics of multiple data sources. In document enrichment pipelines, each document is enriched, analyzed and/or linked with additional data to improve navigation and filtering for further analyses.

Also, to date, content generators are already available that take existing content and rewrite and shuffle it around to create new content, while many companies are working on Natural Language Generation, an AI sub-discipline to convert data into text used in customer service to generate reports and market summaries. It is being investigated for creating content for websites or blogs from a variety of sources including answers from questions on social media and forums.

With text analytics to understand the structure of sentences, as well as their meaning and intention and NLP to process unstructured data, full-blown computer-only-generated content will soon be a reality.

When these technologies are fully implemented, the Semantic Web will lead to a further upsurge in content production.

 

Translation redefined

 

This will forcedly lead to redefining the nature of translation and the role of linguists to leverage the value in enriched (intelligent) content. It’s time for applied linguists (e.g. translators) to re-think their role in the language industry. Tim Berners Lee’s idea was a bit futuristic (if not actually visionary) when he launched it two decades ago.

The coming future will see ‘applied linguists’ mostly employed as post-processors. Machine translation will do all the jobs, even in creative tasks and those who are still called translators today will have to confirm or, at worst, polish automatic output for cultural appropriateness. Some will re-engineer and re-organize content, more or less as digital indexers and curators, and others will clean and polish data to feed machines. Creativity will no longer exist by definition, it will depend on each one’s ability to exploit his/her knowledge and skills.

So, it is really a good time for a change. Blockchain is no change, it may possibly be an improvement, but it will keep us doing things the way they have been done so far, in a different shape. You might reclaim the things we believe have been taken away from you, but this will never happen. You can stick to obsolete models and expect to keep the same old stances, advance the same old claims, and work in the same old way you have been used to, but this won’t take you anywhere. You can only try to get new ones, and now it’s the time to find a way to get them.

So, is the translation industry attracting more investments than in burgeoning heyday? And where is the money being made? In the meantime, you better row and learn to swim.

=======================
Luigi Muzii's profile photo


Luigi Muzii has been in the "translation business"  or "the industry" since 1982 and has been a business consultant since 2002, in the translation and localization industry through his firm. He focuses on helping customers choose and implement best-suited technologies and redesign their business processes for the greatest effectiveness of translation and localization-related work.

This link provides access to his other blog posts.

1 comment:

  1. Good points, Luigi. The rumor mill says blockchain localization apps are in development with expected first deliveries next year. Let's see what they offer.

    Blockchain system have, in fact, been hacked. My user-level experience with cryptography in the intelligence community leads me to believe that inevitably, hackers and experts will continue find holes in the fundamental technology, and they’ll continue to be plugged. That inevitability doesn't scare me. I concerns focus on a more fundamental issues of how data gets in and out of the secure environment.

    Kai Stinchcombe, who describes himself as "the opposite of a futurist," is better-versed at blockchain than I ever will be. He said, “People treat blockchain as a ‘futuristic integrity wand’ – wave a blockchain at the problem, and suddenly your data will be valid.”

    I’m confident blockchain technology at it’s best will protect data (TUs, metadata, etc) from hackers attacking data in transit through a pipeline. Is this really an issue? How many hackers are attacking our TMs? The real question is, how far can we extend the ends of the protected pipeline? Ultimately, data has to go in and out.

    Back to Stinchcombe, “It’s true that tampering with data stored on a blockchain is hard, but it’s false that blockchain is a good way to create data that has integrity... Blockchain systems do not magically make the data in them accurate or the people entering the data trustworthy, they merely enable you to audit whether it has been tampered with.”

    With crypto currencies, governments, credit card companies and the like protect blockchain’s the endpoints. It’s a pretty simple for them to validate that your real-world bank account has valid real-world balance before transferring it to the blockchain. In our context, language data entered into a blockchain system does not magically become good data simply because it’s wrapped in blockchain. Conversely, blockchain contributes nothing to holy grail challenge of identifying good translations.

    Stinchcombe’s articles take a much harsher stance against blockchain than I do, and they spark many questions that I can’t touch on in this comment:

    https://medium.com/@kaistinchcombe/decentralized-and-trustless-crypto-paradise-is-actually-a-medieval-hellhole-c1ca122efdec

    ReplyDelete