Showing posts with label collaboration. Show all posts

Monday, January 14, 2019

A Vision for Blockchain in the Translation Industry

Happy New Year

This is yet another post on Blockchain, a guest post by Robert Etches who presents his vision of what Blockchain might be able to be in the translation industry. A vision, by definition, implies a series of possibilities, and in this case, quite revolutionary possibilities, but does not necessarily provide all the details of how and what. Of course, the way ahead is full of challenges and obstacles which are much more visible to us than the promised land, but I think it is wise to keep an open mind and watch the evolution even if we are not fully engaged or committed or in agreement. Sometimes it is simply better to wait and see than to come to any final conclusions.

It is much easier to be dismissive and skeptical of upstart claims of fundamental change than to allow for a slim but real possibility that some new phenomenon could indeed be revolutionary. I wrote previously about CEO shortsightedness and what I called Roryisms. Here is a classic one from IBM that shows how they completely missed the boat because of their hubris and old style thinking. Gerstner is however credited with turning around a flailing mainframe business.The cost of missing the boat can be significant for some, and we need only look at relative stock price and market value improvements over time (as this is how CEO performance is generally measured) to understand how truly clueless our boy Lou and his lieutenants at IBM, in general, were when they said this. The culture created by such a mindset can last decades as we see by the evidence. Culture is one of a company’s most powerful assets right until it isn’t: the same underlying assumptions that permit an organization to scale massively constrain the ability of that same organization to change direction. More distressingly, culture prevents organizations from even knowing they need to do so.

IBM’s chairman minimized how Amazon might transform retail and internet sales all the way back in 1999.

“Amazon.com is a very interesting retail concept, but wait till you see what Wal-Mart is gearing up to do,” said [IBM Chairman, Louis V. Gerstner Jr in 1999.]. Mr. Gerstner noted that last year IBM’s Internet sales were five times greater than Amazon’s. Mr. Gerstner boasted that IBM “is already generating more revenue, and certainly more profit, than all of the top Internet companies combined.”


AMZN Stock Price Appreciation of 36,921% versus IBMs 211% over 20 years

IBM is the flat red line in the chart above. IBM looks just as bad against Microsoft, Google, Apple, Oracle, and many others who had actual innovation.


January 11, 2019
Amazon Market Value	$802 Billion	7.3X Higher
IBM Market Value	$110 Billion

I bring attention to this, because, I also saw this week that IBM has filed more patents than any other company in the US in 2018, Samsung was second. In fact, IBM has been the top patent filer in the US for every year from 1996 to 2018. BTW they are leaders in blockchain patents as well. However, when was the last time that ANYBODY has associated IBM with innovation or technology leadership? 1980? Maybe they just have some great patent filing lawyers who understand the PTO bureaucracy and know how to get their filings pushed through. In fact, there have been some in the AI community who felt that IBM Watson was a joke and that the effort did not warrant serious credibility and respect. Oren Etzioni said this: “IBM Watson is the Donald Trump of the AI industry—outlandish claims that aren’t backed by credible data.” Trump is now a synonym for undeserved self-congratulation, fraud, and buffoonery, a symbol for marketing with false facts. IBM is also credited with refining and using something called FUD (fear, uncertainty, and doubt) as a deliberate sales and marketing misinformation tactic to keep customers from using better, more innovative but lesser-known products. We should not expect IBM to produce any breakthrough innovation in the emerging AI-first, machine learning everywhere world we see today, and most expect the company will be further marginalized in spite of all the patent filings.

Some of you may know that IBM filed the original patents for Statistical Machine Translation, but it took Language Weaver (SDL), Google and Microsoft to really make it come to life in a useful way. IBM researchers were also largely responsible for conceiving of the BLEU score to measure MT output quality that was quite useful for SMT. However, the world has changed and BLEU is not useful with NMT. I plan to write more this year on how BLEU and all its offshoots are inadequate and often misleading in providing an accurate sense of the quality of any Neural MT system.

It is important to be realistic without denying the promise as we have seen the infamous CEOs do. Change can take time and sometimes it needs much more infrastructure than we initially imagine. McKinsey (smart people who also have an Enron and mortgage securitization promoter legacy) have also just published an opinion on this undelivered potential, which can be summarized as:

"Conceptually, blockchain has the potential to revolutionize business processes in industries from banking and insurance to shipping and healthcare. Still, the technology has not yet seen a significant application at scale, and it faces structural challenges, including resolving the innovator’s dilemma. Some industries are already downgrading their expectations (vendors have a role to play there), and we expect further “doses of realism” as experimentation continues."

While I do indeed have serious doubts about the deployment of blockchain in the translation industry anytime soon, I do feel that if it happens it will be driven by dreamers, rather than by process crippled NIH pragmatists like Lou Gerstner and Rory. These men missed the obvious because they were so sure they knew all there was to know and because they were stuck in the old way of doing things. While there is much about blockchain that is messy and convoluted, these are early days yet and the best is yet to come.

Another dreamer, Chris Dixon has an even greater vision on Blockchain when he recently said:

The idea that an internet service could have an associated coin or token may be a novel concept, but the blockchain and cryptocurrencies can do for cloud-based services what open source did for software. It took twenty years for open source software to supplant proprietary software, and it could take just as long for open services to supplant proprietary services. But the benefits of such a shift will be immense. Instead of placing our trust in corporations, we can place our trust in community-owned and -operated software, transforming the internet’s governing principle from “don’t be evil” back to “can’t be evil.”

========

2018 was a kick-off year for language blockchain enthusiasts. At least five projects were launched[1], there was genuine interest expressed by the industry media, and two webinars and one conference provided a stage for discussion on the subject[2]. Then it all went very quiet. So, what’s happened since? And where are we today?

Subscribers to Slator’s latest megatrends[3] can read that it’s same same in the language game for 2019: NMT, M&A, CAT, TMS, unit rates … how we love those acronyms!

On the world stage, people could only shake their heads in disbelief regarding the meteoritic rise of the value of cryptocurrencies in 2017. However, in 2018 those same people relished a healthy dish of schadenfreude as exchange rates plummeted and the old order was restored with dollars (Trump), roubles (Putin), and pound sterling (Brexit) back in vogue.

In other words, for the language industry and indeed for the world at large, “better the devil(s) we know” appears to be the order of the day.

There is nothing surprising in this. Despite all the “out of your comfort zone” pep talks by those Cassandras of Change[4], the language industry continues to respect the status quo, grow and make money[5]. Why alter a winning formula? And certainly, why even consider introducing a business model that expects translators to work for tokens?! Hello, crazy people!!!

But maybe, just maybe, there was method in Hamlet’s madness[6] and Apple was right when they praised the crazy ones[7]?

Let’s take a closer look at the wonderful world of blockchain and token economics, and how they are going to change the language industry … also the language industry.

Pinning down the goal posts

Because they keep moving! Don’t take my word for it. Here’s what those respected people at Gartner wrote in their blockchain-based transformation report[8] in March 2018:

Summary

While blockchain holds long-term promise in transforming business and society, there is little evidence in short-term reality.

Opportunities and Challenges

Blockchain technologies offer new ways to exchange value, represent digital assets and implement trust mechanisms, but successful enterprise production examples remain rare.
Technology leaders are intrigued by the capabilities of blockchain, but they are unclear exactly where business value can be achieved in the enterprise context.
Most enterprise blockchain experiments are an attempt to improve today's business process, but in most of those cases, blockchain is no better than proven enterprise technologies. These centralized renovations distract enterprises from other innovative possibilities offered by blockchain.

And now here’s a second overview, also from Gartner, this time their blockchain spectrum report[9] from October 2018:

Opportunities and Challenges

Blockchain technologies offer capabilities that range from incremental improvements to operational models to radical alterations to business models.
The impact of blockchain’s trust mechanisms and interaction paradigms extends beyond today’s business and will affect the economy, society and governance.
Many interpretations of blockchain today suffer from an incomplete understanding of its capabilities or assume a narrow scope.

The seven-month leap from little evidence in short-term reality to will affect the economy, society and governance is akin to a rocket-propelled trip across the Grand Canyon! Little wonder that traditional businesses don’t know where to start even looking into this phenomenon, never mind taking on a new business model that basically requires emptying the building of 90% of hardware, software and, more important, people.

But!

Why does Deloitte have 250 people working in their distributed ledger laboratories? Because when immutable distributed ledgers become a reality they will put 300,000 people out of work at the big four accountancy companies[10].

Why are at least 26 central banks looking into blockchain? Because there’s a good chance that private banks[11] will be superfluous in 10-15 years’ time and we will all have accounts with central banks.

Or there will be no banks at all …

Let’s just take a second look at that Gartner statement:

The impact of blockchain’s trust mechanisms and interaction paradigms extends beyond today’s business and will affect the economy, society and governance.

Other than basically saying blockchain will change “everything”, the sentence mentions two factors that are core to blockchain: trust and interaction.

Trust. What inspires me about blockchain is its transparency. A central tenet of blockchain is its truth gene. In a world in which even the most reliable sources of information are labeled as fake, blockchain’s traceability – its judgment in stone as to who did what, when and for whom – makes it a beacon of light.

Just think if we could utilize this capability to solve the endless quality issue? What if the client always knew who has translated what – and could even set up selection criteria based upon irrefutable proof of quality from previous assignments? It is no surprise to learn that many blockchain projects are focusing on supply chain management.

Interaction is all about peer-to-peer transactions through Ethereum smart contracts. It’s not just the central banks that will be removing the middlemen. Unequivocal trust opens the door to interact with anyone, anywhere. To a global community. These people of course speak and write in one or more of approximately 6,900 languages, so there’s a market for providing the ability for these “anyones” to speak to each other in any language. What a business opportunity! And what a wonderful world it would be!

Cryptocurrencies and blockchain: peas and carrots

You’ve gotta love Forest Gump – especially now we know Jenny grew up to become Claire Underwood[12] 😊

Just as Jenny and Forest went together like peas and carrots, so do tokens and blockchain.

Unfortunately, this is where many jump off the train. One thing is accepting the relevance of some weird ledger technology that is tipped to become the new infrastructure for the Internet, another is trading in hard-won Venezuelan dollars for some sort of virtual mumbo jumbo!

All fiat currencies are a matter of trust. None is backed by anything more than our trust in a national government behaving responsibly. In 2019 that is quite a scary thought – choose your own example of lemming-like politicians.
All currencies (fiat or crypto) are worth what the market believes them to be worth. In ancient times a fancy shell from some far-off palmy beach was highly valued in the cold Viking north. Today not so. At its inception, bitcoin was worth nothing. Zero. Zip. Today[13] 1 BTC = €3,508.74. Because people say so.

Today, there is absolutely no reason why a currency cannot be minted by, well, anyone. There is indeed a school of thought that believes there will be thousands of cryptocurrencies in the non-too distant future. If we look at our own industry, we have long claimed that translation memories and termbases have a value. Why can that value not be measured in a currency unique to us and with an intrinsic value that we all respect and which is not subject to the whims of short-term political aspirations? Why can’t linguistic assets be priced in a Language Coin?

Much has already been written about the concept of a token economy, though little better than the following:

An effective token strategy is one where the exchange of a token within a particular economy impacts human economic behavior by aligning user incentives with those of the wider Community.[14]

Think about this in the context of the language industry. What if the creation and QA of linguistic assets were tied to their own token? What if you – a private company, a linguist, an LSP, an NGO, an intranational organization – were paid in this token for your data and that the value of this data grew and grew year on year as it was shared and leveraged as part of a larger whole – the Language Community[15]. What if linguists were judged by their peers and their reputations were set in stone? What if everyone was free to charge whatever hourly fees they choose, and that word rates and CAT discounts were a relic of the past?

This is why blockchain feeds the token economy and why the token needs blockchain. Peas and carrots!

To end with the words of another Cassandra – a trendy one at that: Max Tegmark:
If you hear a scenario about the world in 2050 and it sounds like science fiction, it is probably wrong; but if you hear a scenario about the world in 2050 and it does not sound like science fiction, it is certainly wrong.[16]

The pace of change will continue to accelerate exponentially, and I believe blockchain will be one of the main drivers.

Already in 10-15 years, there will be some household (corporate) names and technologies that do not exist or have only just started today. And by 2050 the entire finance, food, and transport sectors (to name the obvious) will be ‘blockchained’ beyond recognition.

At Exfluency, we see multilingual communication as an obvious area where a token economy and blockchain will also come out on top; I’m sure that other entrepreneurs in a myriad of other sectors are coming to similar conclusions. It’s going to be exciting!

Robert Etches
CEO, Exfluency
January 2019

[1] Exfluency, LIC, OHT, TAIA, TranslateMe
[2] TAUS Vancouver, and GALA and TAUS webinars
[3] https://slator.com/features/reader-polls-pay-by-hour-deepl-conferences-and-2019-megatrends/
[4] Britta Aagaard & Robert Etches, This changes everything, GALA Sevilla 2015; Katerina Pastra
Krzystof Zdanowski, Yannis Evangelou & Robert Etches, Innovation workshop, NTIF 2017; Peggy Peng & Robert Etches, The Blockchain Conversation, TAUS 2018, Jochen Hummel, Sunsetting CAT, NTIF 2018.
[5] I am painfully aware that not everyone in the food chain is making money …
[6] Hamlet, II.ii.202-203
[7] https://www.youtube.com/watch?v=8rwsuXHA7RA
[8] https://www.gartner.com/doc/3869696/blockchainbased-transformation-gartner-trend-insight

[9] https://www.gartner.com/doc/3891399/blockchain-technology-spectrum-gartner-theme
[10] P.221 The Truth Machine, by Paul Vigna and Michael J. Casey
[11] Ibid. pp163-167

[12] https://en.wikipedia.org/wiki/Robin_Wright
[13] 9 January 2019
[14] P.69 The Truth Machine
[15] See Aagaard & Etches This changes everything for the sociological and economic importance of communities, the circular society, and the sharing society.
[16] Life 3.0 by Max Tegmark

=================================================================

CEO, Exfluency

A dynamic actor in the language industry for 30 years, Robert Etches has worked with every aspect of it, achieving notable success as CIO at TextMinded from 2012 to 2017. Always active in the community, Robert was a co-founder of the Word Management Group, the TextMinded Group, and the Nordic CAT Group. He served four years on the board of GALA (two as chairman) and four on the board of LT-Innovate. In a fast-changing world, Robert believes there has never been a greater need to implement innovative initiatives. This has naturally led to his involvement in his most innovative challenge to date: As CEO of Exfluency, he is responsible for combining blockchain and language technology to create a linguistic ledger capable of generating new opportunities for freelancers, LSPs, corporations, NGOs and intergovernmental institutions alike.

Monday, May 8, 2017

Artificial Intelligence in the Language Industry: We’re Asking the Wrong Questions

This is an interesting guest post by Gábor Ugray on the potential of AI in the translation business. We hear something about artificial intelligence almost every day now and are continually told that it will change our lives. AI is indeed helping to solve complex problems that even a year ago were virtually unthinkable. Mostly, these are problems where big data and massive computing can come together to produce new kinds of efficiencies and even production solutions. However, there are dangers and risks too, and it is wise to be aware of some of the basic driving forces that underly these problems. As we have seen with self-driving cars, sometimes things don't quite work as you would expect. These mishaps and unintended results can happen when we barely understand what and how the computer "understands". Machine learning is not perfect learning and much of what is learned through deep neural nets, in particular, is kind of mysterious, to put it nicely.

We have seen that many in the translation industry have more often misused, or abusively used MT to bully translators to accept lower rates, and accept demeaning work, then used where it actually makes sense. We are just beginning to emerge into a stage where we see the more informed and appropriate use of MT, in the very recent past, however, many translators have already been bloodied. Is AI the new monster we will use to terrorize the translator, or is it a potential work assistant that actually enhances and improves the translation work process? This will depend on us and what we do, and it is good to see Gábor's perspective on this as he is one of the architects of how this might unfold.

Gábor warns us about some key issues related to AI and points us towards asking the right questions to guide enduring and positive change and deployment. We should understand the following:

AI is almost completely dependent on training data and we know data is often suspect.
Improperly used, there is a risk of inadvertent or deliberate dehumanization of work as in early PEMT use.

Neural networks are closed systems. The computer is learning something out of a data set in an intelligent but incomprehensible and obscure way to a human eye and human mind. But Google claims they are able to visualize the produced data as described in the zero-shot translation post where they say:

Within a single group, we see a sentence with the same meaning but from three different languages. This means the network must be encoding something about the semantics of the sentence rather than simply memorizing phrase-to-phrase translations. We interpret this as a sign of existence of an interlingua in the network.

Is this artificial intelligence or is this just another over-the-top-claim of "magical" scientific success? If we cannot yet define intelligence for humans, how can we even begin to do so for machines? AI is more than often not much more than optimized data-driven task systems, which can be very impressive, but can we really say this is intelligence? A few are quite wary about this whole AI trend. Here is some discussion on shit really going down, driven by AI which has gone awry.

So hopefully here is a question that makes sense to Gabor: "What needs to happen to make AI-based technology trustworthy and useful in the "language industry"?

I do basically believe that technology wisely used can indeed improve the human condition but we are surrounded by examples of how things can go wrong without some forethought and these questions that Gábor points to indeed are worth asking. For those who want to dig deep into the big picture on AI, I recommend this article, though I have some reservations about the second part.

As the BBC said recently: Machines still have a long way to go before they learn like humans do – and that’s a potential danger to privacy, safety, and more.

============

I was honored when Kirti asked me if I would contribute to eMpTy Pages about TMS and intelligent data technologies. I’ve been thinking about this for nearly two months now until I finally realized what’s been holding me back. I find it difficult to attach to most of the ongoing discourse about AI, and that’s because I believe the wrong questions are being asked.

Those questions usually revolve around: What part of life can I disrupt through AI? How can my business benefit from AI? Or, if you prefer the fear angle: Will my company be disrupted out of existence if I don’t jump on the AI train in time? Will my job be made obsolete by thinking machines?

My concern is different. But I won’t tell you until the end of this post.

It’s only as good as your data

I found Kirti’s remark in his recent intro very insightful: “Machine learning” is a fancy way of saying “finding patterns in data.”

That resonates with the way I think about MT, whether it’s the statistical or neural flavor. In simple terms, MT is a tool to extrapolate from an existing corpus to get leverage for new content. If you think about it, that’s what translation memory does, too, but it stops at fuzzy matches, concordance searches, and some amount of sub-segment leverage.

Statistical MT goes far beyond that, but at a higher cost: it needs more data and more computation. Neural MT ups the ante yet again: it needs another order of magnitude more computational power and data. The concept has been around for decades; the “deep learning” aka “neural network” explosion of the past few years has one simple reason. It took until now for both the data and the computational capacity to become available and affordable.

The key point is, AI is machine-supported pattern extraction from large bodies of data, and that data has to come from somewhere. Language data comes in the form of human-authored and human-translated content. No MT system learns a language. They process text to extract patterns that were put in there by humans.

And data, when you meet it out in the wild, is always dirty. It’s inconsistent, in the wrong format, polluted with stuff you don’t want in there. Just think of text from a pair of aligned PDFs, with the page numbers interrupting in all the wrong places, OCR errors, extra line breaks, bona fide typos and the rest.

So, even on this elementary level, your system is only as good as your data, not to mention the quality of the translation itself. And this is not specific to the translation industry: the job reality of every data scientist is 95% gathering, cleaning, pruning, formatting and otherwise massaging data before the work can even begin.

Do we have the scale?

AI, MT and machine learning are often used synonymously with automation, but in reality, they are far from that. As Kirti explained in another intro, in order to get results with MT you need technical expertise, structure, and processes beyond technology per se. All of these involve human effort and skills, and pretty expensive skills too.

So the question is: at what point does an LSP or an enterprise get a positive return on such an investment? How much content must first be produced by humans; what is the cost of training the MT system; what is the benefit per million words (financial, time or otherwise)? How many million words must be processed before you’re in the black?

No matter how I look at it, this is an expensive, high-value service. It doesn’t scale in a human-less way as the software does.

Does the translation industry have the same economy of scale that a global retailer or a global ad broker disguised as a search engine does? Clearly, a number of technology providers, Kilgray among them, are thriving in this market. But I also think it’s delusional to expect the kind of hockey-stick disruption that is the stuff Silicon Valley startup dreams are made of.

Let’s talk about the weather

I have been focusing mostly on MT, but that’s misleading. I do think there are many other ways machine learning will contribute to how we work in the translation industry. Most of these ways are as-yet uncharted, which I think is a consequence of the industry’s market constraints.

I’ll zoom out from our industry now. I checked how many results Google finds if I search for a few similar phrases.

-- AI in weather forecasting: 1.29M
-- AI in language processing: 14.2M
-- AI in police: 69.7M

Of the three, I’d say without hesitation that weather forecasting yields itself best to advanced AI. Huge amounts of data: check. Clear feedback on success: check. Much room for improvement: check. And yet, going by what’s written on the Internets, that’s not what society thinks.

There is a near-universal view that technology is somehow neutral and objective, which I think is blatantly false. Technology is the product of a social and economic context, and it is hugely influenced by society’s shared beliefs, mythologies, fears, and desires.

Choose your mythology wisely

I am on odd one: in addition to AdBlock and Privacy Badger, my browser deletes all history when I close it, which is multiple times a day. At first, I just noticed the cookie messages that kept returning. Then I started getting warning emails every time I logged in to Twitter or Google. Finally, my password manager screwed me completely, requesting renewed email verification every time I launched the browser.

These are all well-meaning security measures, with sophisticated fraud detection algorithms in the back. But they work on the assumption that you leave a rich data trail. It is by cutting that trail that you realize how pervasive the big data net already is around you in your digital life. For a different angle on the same issue, read Quinn Norton’s poignant Love in the Time of Cryptography.
Others have written about the way machine learning perpetuates biases that are encoded, often in subtle ways, in their training datasets. In a world where AI in police outscores the weather and language, that’s a scary prospect.

With all of this I mean to say one thing. Machine learning, data mining, AI – whatever you want to call it, in conjunction with today’s abundance of raw digital data, this technology has the potential to be dehumanizing in an unprecedented way. I’m not talking conveyor-belt slavery or machines-will-take-my-job anxiety. This is more subtle, but also more far-reaching and insidious.

And we, as engineers and innovators, have an outsized influence on how today’s nascent data-driven technologies will impact the world. The choice of mythology is ours.

UX is the lowest-hanging fruit

After this talk about machine learning and big data on a massive scale, let’s head back to planet Earth. To my view of a translator’s workstation, to be quite precise.

Compared to even a few years ago, there is a marvelous wealth of specialized information available online. There are pages of search results just for terminology sites. There are massive online translation memories to search. There are online dictionaries and very active discussion boards.
Without the need to name names, the user experience I get from 99% of these tools is between cringe-worthy and offending. (Linguee being one notable exception to this.)

Here is one reason why I have a hard time enthusing about cutting-edge AI solutions for the language industry. Almost everywhere you look, there are low-hanging fruits in terms of user experience, and you don’t need clusters of 10-kilowatt GPUs to pluck them. I think it’s misguided to go messianic until we get the simple things right.

Two corollaries here. One, I myself am guilty as charged. Kilgray software is no exception. We pride ourselves that our products are way better than the industry average, but they; too, have a ways to go still. Rest assured, we are working on it.

Two, user experience also happens in the context of market constraints. All of the dismal sites I just checked operate on one of two models: ad revenues, or no revenues. I have bad news for you. These models make you the product, not the customer. This is not specific to the translation industry. The world at large has yet to figure out a non-pathological way to monetize online content.

Value in human relationships

I’ve been talking to a lot of folks recently whose job is to make complex translation projects happen on time and in good quality. Now it may be that my sample is skewed, but I saw one clear pattern emerging in these conversations.

I wasn’t told about standardized workflows. I didn’t hear about machine learning to pick the best vendor from a global pool of X hundred thousand translators. I didn’t perceive the wish to shave another few percent off the price by enhanced TM leverage.

The focus, invariably, was human relationships. How do I build a long-term working relationship based on trust with my vendors? How do I do the same with my own clients? How do I formulate the value that I add as a professional, which is not churning through 10% more words per day, but enabling millions of additional revenue from a market that was hitherto not accessible?

Those are not yet the questions I’m asking about AI, but they are closing in on my point. In a narrow sense, I see technology as an enabler: a way to reduce the drudge so humans have more time left for the meaningful stuff that only humans can do.

Fewer clicks to get a project on track? Great. More relevant information at the fingertips of translators and project managers? Awesome. Less time wasted labeling and organizing data, finding the right resources, finding the right person to answer your questions? Absolutely. Finding the right problems to work on, where your effort has the greatest impact? Prima.

AI has its place in that toolset. But let’s not forget to get the basics right, like building software with empathy for the end user.

The right question

Whether or not AI will be part of our lives is not a question. Humans have a very elastic brain, and whatever invention you give us, we will figure out a use for it and even improve on it.

I argued that technology is not a Platonic thing of its own, but the product of a specific social and economic context. I also argued that if you instrumentalize big data and machine learning within the wrong mythology, it has a disturbing potential to dehumanize.

But these are not inescapable forces of nature. The mythology we write for AI is a matter of choice, and the responsibility lies with us, engineers and innovators.

The right question is:

How do I use AI responsibly?

Is empathy at the center of my own engineering work?

No touchy-feely idealism here; let’s talk enlightened self-interest.

As a technology provider, I can create products with the potential to dehumanize work and encroach on privacy. That may give me a short-term advantage in a race to the bottom, but it will not lead to a sustainable market for my company. Or I can create products that help my customers differentiate themselves through stronger relationships, less drudge, and added value to their clients. Because I’m convinced that these customers are the ones who will be successful in the long run, I am betting on building technology for them.

That means engaging with customers (then engaging some more) to learn what problems they face every day, instead of worrying about the AI train. If the solution involves AI, great. But more likely it’ll be something almost embarrassingly simple.

--------------------

Gábor Ugray is co-founder of Kilgray, creators of the memoQ collaborative translation environment and TMS. He is now Kilgray’s Head of Innovation, and when he’s not busy building MVPs, he blogs at jealousmarkup.xyz and tweets as @twilliability.

Wednesday, December 14, 2016

What is a Truly Collaborative Translation Platform?

Many observers new to the business of translation are often surprised how little "work process automation" exists in the professional translation business. Many may have noticed the email deluge that many in the translation industry are guilty of in ALL their general business communication -- they use it almost like chat or mobile phone text messaging, and thus it is easy for details to get lost and fall through the cracks. I noticed this lack of email communication discipline when I first entered the industry many years ago.

Apart from the problems introduced by this communication style we also see that since no real work process automation tools exist, there is an urgent need for a project management role. There still appears to be a critical need for a project manager (whose role is described here in some detail) to ensure that client projects are properly broken into and assigned in pieces (work packages) to the right personnel and then re-assembled by project management to hand back to a client when finished. While Translation Mangement Systems (TMS) like MemoQ, Memsource and others help to some extent the translation work management process is still a process that needs lots of detailed non-automated project management to ensure a smooth workflow and a semblance of efficiency.

I recently spoke to several SmartCAT.ai customers recently, both individual freelancers, and some translation agencies (LSPs). The sample I spoke too were mostly scattered across the EU and Russia. They were all quite consistent in their positivity about their work experience with the software and rated the following 3 factors (in sometimes different order) as the reason for their satisfaction and generally positive outlook toward the SmartCAT based work experience. These factors are,

Simplicity and ease of startup which resulted in quick productivity
Relative lower cost (free for all users)
Collaboration capabilities that ease project management burden

More than one of the customers I spoke to had experience with traditional TMS tools and contrasted the SmartCAT experience as improved in several ways but mostly in ease of startup and the inbuilt collaboration integrity and power.

This is another guest post by "Vova" from SmartCAT where he defines his view of what collaboration means in the professional translation context. It is my opinion that the SmartCAT paradigm is a step forward from the rather heavy but perhaps much more flexible footprint that traditional TMS systems have grown accustomed to.

P.S. This is some analysis from CSA on the SmartCAT offering.

------------------------

These days, content volumes are growing faster than ever, and “going global” is a major trend in many industries. But some localization customers feel that traditional LSPs are unable to easily tackle large and urgent projects. In need of a better solution, they turn their eyes to “crowd” translation platforms. Popping up like mushrooms, these claim to provide the “good new way” to localize.
Despite the obvious effects of hiring “crowds” for translation, such services provide something traditional LSPs cannot boast: They are quick, cheap, and easy to use. You click a button, and hundreds of hands start working on your job and get it done in no time. Alas, for many clients this is a fair tradeoff for the appalling quality they get as a result. Surely, this approach will backfire, but it might be too late to fix it.

So can “real” LSPs fight crowdsourcing platforms in their own territory? Can we provide a smooth and quick collaborative translation experience while keeping the quality plank high?

In response to this challenge, almost any CAT platform today claims to be “collaborative.” But “collaboration” is a word that one can inflect in many ways and one that does. For instance, one can simply allow users to share translation memories and call that “collaboration.” One can make project managers spend hours splitting large files into “digestible” chunks and call that “collaboration.” Finally, one that can have you pay a hundred dollars per each “collaborator,” and — you got it.

Is this really what we expect when we hear the word “collaboration”? Hardly. What we expect is something like Google Docs. We expect contributors to see each other’s work in real time. We expect them to be able to communicate easily and in context. And we don’t expect them to go broke (paying license fees just for being able to be there together.

And thus here are five important features to look for.

1 — Interactive collaboration between translators

Many “collaborative” CAT tools require cutting large files into smaller parts to be distributed among translators. The project manager will have to make sure that each translator gets a relatively equal volume to work on.

In essence, each translator will be just working on their own part as if it were a separate project, without seeing each other’s work. Those who finish early will have to sit there idle, wasting the precious time you need for the project. Once finished, the project manager needs to “glue” the files back together, wasting more time and bringing in human errors in the process.

In many cases, this makes the game not worth the candle. A truly collaborative translation platform rids you of the need to split or glue anything. You will just assign certain document parts to individual translators, and if someone finishes early, you will reassign more segments to them. Every translator will be able to see what others do and, if needed, bring attention to their mistakes or omissions (more on this later).

Recently, such an approach allowed one of SmartCAT users — a middle-sized LSP actually — to translate nearly 500k words every day for several weeks straight. In “peak hours,” there were up to 100 translators working at the same time. And there was only one project manager handling the whole project!

2 — Collaborative translation and editing

Reducing work in progress a key principle in today’s project management paradigm. But in terms of translation, unedited work is such a work in progress! Let’s say, you are doing a 100,000-word project with the standard TEP (translate-edit-proofread) approach. If “T” costs you $0.10 per word, you have $10,000 worth of inventory before “E” and “P” are done. $10,000 of unfinished words lying there as some warehouse stock — not a small amount, is it?

If the editor has to wait for “their turn,” a whole range of issues may arise:

The translator is busy with another assignment by the time the editor asks a question and cannot recall the subject in detail.
The editor finds an error after it has been replicated tens or hundreds of times in the document and has to correct them all.
An experienced editor may not have the flexibility to move from project to project as urgencies might require so.

Thus, the CAT tool must provide both horizontal (between peers) and vertical (between T <> E <> P) collaboration. In other words, the editor must be able to start working on the document well before its translation is completed. The same goes for proofreading and any other stages you need. From SmartCAT experience, such vertical collaboration alone can cut the delivery time almost by twice.

3 — Context-specific communication

One thing that sets collaborative translation apart from mere crowdsourcing is the degree of communication between collaborators. In the latter, each “head in the crowd” doesn’t really care what the others are doing or thinking. In the former case, all translators make their contributions to the discussion, turning them into a synergistic whole.

Allowing many people to work together on a project is of no use if you can’t provide the right means for them to communicate. Otherwise, you have to either turn the manager into a “relay device” between various contributors or let them interact on an external platform. The former is a waste of resources, the latter a loss of control, and both are a hindrance to quality.

Thus, communication has to be built into your collaborative translation environment. Translators, editors, and other participants must be able to discuss both the project in general and its specific parts in context. SmartCAT users say that such context-specific commenting ability is one of the main quality drivers in the projects they do on the platform.

4 — On-demand scalability

You don’t always know in advance if a project will need scaling. Sometimes, a customer wants you to translate just a page on their website, but then realize that they need it in whole. Or request to translate to 10 other languages. Or their business grows unexpectedly and demands more localized content and a stronger localization partner.

Often, such demands have a “deadline yesterday” and give you no time to set up the whole “collaborative translation machine” from scratch. That’s why it’s important that your CAT tool allows you to scale when it is needed, as much as you need it, and with as little additional effort as possible. If you need a separate installation just to enable collaboration, you are wasting time you can’t afford wasting.

Ideally, there shouldn’t be such thing as “scaling” at all. If you need to translate more content, you just add files to the project. If you want more languages, you add languages. If you need more people, you just assign more of them. Ideally, the CAT tool should come with an easy access to freelancers who can readily work in it. The SmartCAT marketplace (a pool of available translators), for instance, has provided many of our users with the capacity they needed when their own resources were insufficient.

5 — Affordable growth

One last (but not least!) thing to keep in mind is that collaborative translation can be costly. You might not notice this when you just start working, but the more you grow, the pricier it can get. This can be especially painful if you are a relatively small agency and cannot afford major investments. Then you are often left with no choice but to forfeit the job to a bigger vendor. And if you are big enough to afford such spendings, they will often be unproductive because you will not need the purchased licenses a lot of the time.

Therefore, pay close attention to the pricing tables. Most of them will have some sort of user-based licensing, but some won’t. In the latter category, many will be open-source, in which case it also makes sense to study the quality and support terms, which are pain points for this kind of software.

For the record, SmartCAT is free and proprietary, with 24×7 support is provided to all users at no cost. (Though agencies are expected to pay for their use and access to the translator pool via a means that is different from the industry standard approaches and is somewhat opaque. However, if they do not use the SmartCAT translator pool, agency use is also free.

From CSA: SmartCAT has an optional payment facility. Users are under no obligation to use it. However, if they do choose to process payments through it, the company takes a cut on the financial transaction. Smolnikov told us that many companies start out using their own financial methods but end up moving to SmartCAT because it takes the hassle out of managing them and that it is cheaper to pay this cut than to manage it themselves.)

The SmartCat team clearly believes in this vision and they published a vision statement recently that states the following:

Our vision of the future of the translation industry is based on three principles:

Advanced collaboration is the key to effectively manage large-scale and urgent projects,
Technology should help translators and project managers simplify time-consuming routines and increase productivity, with artificial intelligence playing a large part in setting up teams and managing their performance,
High-value and SLA-compliant linguists are the strongest success drivers in translation projects, and technology must facilitate identifying and reinforcing the choice of such professionals.

It all starts with our key belief that selling licenses for CAT software is an atavism of our industry. We believe that no one should have to continuously count licenses in a business where almost all key value producers are freelancers and teams are highly dynamic and dependent on the projects you will have tomorrow.

Relying on the number of licenses limits the efficiency of translation processes in a company and restricts its growth potential and scalability. Finally, the low technology penetration and the need to sew together multiple tools to have a more or less seamless and efficient workflow are the major factors slowing down the evolution of individual companies and the industry as a whole.

You can find the rest of the elaboration of this at the link above, where they also talk about increasing use of chatbots to improve communications between PMs and vendors, and automate an increasing number of common PM tasks and project situation responses using AI technology.

About the author

Vladimir “Vova” Zakharov is the Head of Community at SmartCAT.

"Translation is my profession and my passion, and I’m excited to be able to share it with the amazing SmartCAT community!"

Monday, October 24, 2016

10 Myths About Computer-Assisted Translation

This is a guest post by "Vova" from SmartCAT. I connected with him on Twitter and learned about the SmartCAT product which I would describe as a TMS + CAT tool with a much more substantial collaboration framework than most translation tools I know of. It is a next-generation translation management tool that enables multi-user collaboration at a large project level, but also allows individual freelancers to use it as a simple CAT tool. It has a new and non-standard approach to pricing which I am still trying to unravel. I have talked to one LSP customer, who was very enthusiastic about his user experience and stressed the QA and productivity benefits especially for large projects. I am still researching the product and company (which has several ex-ABBYY people but they seem eager to develop a separate identity) and will share more as I learn more. But on first glance, this product looks very interesting and even compelling, even though, they, like Lilt, have a hard time describing what they do quickly and clearly. Surely they should both be looking to hire a marketing consultant - I know one who comes to mind ;-). The most complete independent review on the SmartCAT product (requires a subscription), is from Jost Zetzsche who likes it, which to my mind is meaningful commendation for them, even though he likes other products too.

--------------

Many translators are wary of CAT tools. They feel that computer-aided translation commoditizes and takes the creativity out of the profession. In this article, we will try to clear up some common misconceptions that lead to these fears.

1 — Computer-aided translation is the same as machine translation

The naming of the term “computer-aided translation” often leads to its being confused with “machine translation.” The first thing some of our new users write us is “why don’t I see the automatic translation in the right column”? For some reason, they expect it to be there (and perhaps replace the translation effort at all?).

In reality, machine translation is just a part — in most cases a small part — of what computer-aided translation is about. This part is usually called “PEMT” (post-editing of machine translation) and consists in correcting a translation done by one or another MT engine. We’re nowhere near replacing a human translator with a machine.

PEMT itself is like a red flag to a bull for many translators and deserves a separate article. Here we will just reiterate that equating CAT with machine translation is like equating aviating skills with using the autopilot.

Here’s where you find it, just in case — but use cautiously.

2 — Computer-aided translation is all about handling repetitions

Another widespread misconception is that CAT tools are only used to handle repetitive translations. What does it mean? Say, you have the same disclaimer printed in the beginning of each book of a given publisher. Someone would then need to translate it only once, and a CAT tool would automatically insert this translation in each new translated book of that publisher.

Here’s what a TM match looks like. (That’s an easy one.)

This “repetition handling” feature is commonly called Translation Memory (TM). Now, TM is a large part of what CAT tools do (and why they were created in the first place). But today it is just a feature, with many others supplementing it.

In SmartCAT, for instance, we have project management, terminology, quality assurance, collaboration, marketplace, and many other features. All these features are carefully integrated with each other to form a single whole that is SmartCAT and that distinguishes us from the competition. If it were all about translation memory, there would be nothing to compete about.

3 — Computer-aided translation doesn’t work for “serious” translations

Some believe that “serious” translators (whatever that means) do not use CAT tools. The truth is that “purists” do exist, just as they do in any other field, from religion to heavy metal. But not using a CAT tool as a translator is close to not using a cellphone as a CEO. According to a 2013 study by Proz, 88% of translators were using CAT tools, and we can only expect the numbers to have gone up since then.

So, why would you use a CAT tool in a “serious” translation? Here’s a very “serious” book on numismatics I translated some time ago. I made it all in SmartCAT. Why? Because I would have never managed to keep this amount of terminology in my head. Even if I used Excel sheets to keep track of all the terms — ancient kings, regions, coin names, weight systems — it would’ve taken me dozens of hours of additional work. In SmartCAT, I had everything within arm’s reach in a glossary that was readily accessible and updateable using simple key combinations.

Inserting a glossary term in SmartCAT

Another reason CAT tools can be useful for “serious” translations is quality assurance. Okay, MS Word has a spellchecker. There is also third-party software that provides more sophisticated QA capabilities. And still, having it right at hand, with downloadable reports and translation-specific QA rules is something only a good CAT tool can boast (more on this later).

4 — CAT tools are for agencies only

Many translators receive orders from agencies dictating the use of one CAT tool or another. So they start thinking that “all these CATs” are an “agency thing” and are meant to make use of them. We’ll leave that latter argument aside for a while and come back to it later.

For now, we’ll just say that there is no reason why a translator should not use a CAT tool for their own projects. If anything, it provides a distraction-free interface where one can concentrate on the work in question and not think about secondary things such as formatting, file handling, word counting, and so on.

Note the tags (orange pentagons): You don’t need to care what formatting there was in the original.

5 — CAT tools are hard to learn

Well, that’s not exactly a myth. I remember my first experience with a prominent CAT tool (it was some ten years ago). I cried for three days, considering myself a worthless piece of junk for not being able to learn something everyone around seemed to be using. When the tears dried out, I went for some googling and realized that I wasn’t the only one to struggle with the mind-boggling interfaces of the software that was en vogue back then.

Luckily, today users have plenty of options to choose from. And although the de rigueur names remain the same (so far), many modern CATs are as easy to learn as a text processor or a juicer (though some of those can be tricky, too). Here’s a video of going all the way from signing up to downloading the final document in SmartCAT in less than one minute. It’s silent and not subtitled, but sometimes looks are more telling than words:

6 — CAT tools are ridiculously expensive

Another myth that is partly true is that CAT tools cost a freaking lot. Some do. The cheapest version of the most popular desktop computer-aided translation software costs around $500. One of the most popular subscription-based solutions costs nearly $30 a month. It’s probably okay if you have a constant inflow of orders and some savings to afford the purchase (and perhaps a personal accountant). But what if you are just starting out? Or if you are an occasional semi-pro translator? Not that okay, then.

In any case, there are still options for you to go (and grow) even if you don’t want to spend on unpredictably profitable assets. SmartCAT is free for both freelancers and companies. The only thing you might opt to pay for is machine translation and image recognition. And, if you decide to market your services via the SmartCAT marketplace, a 10% commission (payable by the customer) will be added on top of your own rate. That’s it — no hidden fees involved.

7 — Computer-aided translation works for large projects only

If you think that CAT tools work best for huge projects, you might be right. If you think they don’t work for small projects at all, you are wrong.

Here’s an example. The last project I made in SmartCAT was a one-page financial document in Excel format. To translate it, I uploaded the file to SmartCAT and already had all the translation memories, terminology, word count, etc. ready. So I just did the translation, downloaded the result and sent it back with an invoice.

If I went the “simple” way, I would have spent some valuable minutes — which are the more valuable the smaller a project is — on organizational “overheads.” Putting the files in the right place in the file system. Looking for previous translations to align the terminology. Finally, doing the translation in Excel, which is a torture in itself.

In CAT tools, whether it is an Excel file, a Powerpoint presentation, a scanned PDF (for CAT tools supporting OCR, e.g. SmartCAT), you still have the same familiar two-column view for any of them. As already said, you concentrate on words, not formats.

Thus,

becomes

— in mere seconds!

8 — Computer-aided translation slows you down

Despite evidence, some translators believe that using CAT tools will actually reduce their translation speed. The logic is that in a CAT tool you have to start a project, configure all its settings, find the TMs and terminology you need to reuse, and so on. In the end, they say, you they spend more time doing this than what they will save as a result.

The reality is quite different. In SmartCAT, for instance, the configuration needed to start a project includes a minimum number of choices. Moreover, all the resources you need are added automatically according to the customer’s name. That saves time in addition to the streamlining of the very translation process.

8 seconds to create a project with a translation memory and terminology glossary in place

9 — CAT tools worsen the quality of translation

Some believe that by not seeing the whole text, you lose its “flow.” This, they argue, leads to errors in the style and narrative of translation. While this is true in some cases (e.g. for literary translation), the fact is that the “flow” is anyway disturbed by your seeing the original text. It always makes sense to have at least one purely proofreading stage in the end, when you don’t see the original. Then you can judge the text solely on the basis of how good or bad it sounds in the target language.

That’s what I did for a children’s book I translated recently. I made the several first “runs” in SmartCAT. Then I downloaded the result and had it reviewed several more times (and once by a native speaker). When everything was ready, I got the whole thing back to SmartCAT. Why? Because I want to translate the next part of the book. I know I will have forgotten a lot by the time it comes, so having all the previous resources at hand will be very helpful for the quality.

Speaking of quality, modern CAT tools also allow a great degree of quality assurance, with some checking rules fine-tuned for translation tasks. Using those is much more convenient and practical than resorting to spellcheckers available in office software or externally.

QA rules available in SmartCAT. Some are more paranoid than the others.

10 — Computer-aided translation is bad for translators

That’s the underlying cause for many of the above misconceptions. Some translators fear that computer-aided translation is bad for the profession as a whole. Here’s a very illustrative post by Steve Vitek, a long-time opponent of translation technology. (Interestingly, the post includes many of the views countered above. I’d love to see Steve’s comment on this article of mine. Can my arguments make him change his mind, I wonder?)

The argument is that translation technology deprives translators of their bread. And instead of being there for translators’ growth and profit, it grows and profits at their expense. Customers get pickier, rates get lower, translations becomes a commodity.

In my humble opinion, CAT tools are as bad for translators as hair-cutting shears are for hairdressers. Perhaps, doing a haircut with a butcher’s knife could be more fun. You could even charge more for providing such an exclusive service. But it has little to do with the profession of cutting hair (or translating). A professional strives to increase the efficiency of their work. Using cutting-edge tools is one way to do this. A very important one, that.

Yes, it can be argued that CAT tools bring down your average per-word rate. But as Gert van Assche aptly puts it, the time you spend on a job is the only thing you need to measure. I can’t say for everyone, but my own hourly rates soar with the use of CAT tools. I know that I can provide the best quality in the shortest time possible. I also know that I don’t charge unnecessarily high rates to my long-time customers, whose attitude I care about a lot.

That’s it — I hope I did manage to clear away your fears about computer-aided translation.
Remember, if you’re not using CAT tools, you are falling behind your colleagues, who might be equally talented but just a bit tech-savvier.

A good CAT tool will aid your growth as a professional and a freelancer. After all, aiding translators is what the whole thing is about.

P.S. If you never tried CAT tools at all, or did but didn’t enjoy the experience, I suggest that you check out SmartCAT now — it’s simple, powerful, and free to use.

----------

About the author

Vladimir “Vova” Zakharov is the Head of Community at SmartCAT.

"Translation is my profession and my passion, and I’m excited to be able to share it with the amazing SmartCAT community!"

Wednesday, November 18, 2015

Translator Perspectives on MT & Technology In General

I found an interesting series of blog posts by Christelle Maignon that I thought articulated translator perspectives on MT and the increasing use of technology in translation work very well. She herself was driven away from translation work towards coaching because PEMT was just not her cup of tea from what I could gather. Anyway I thought it would be good to highlight her work in case you are not aware of her blog.

Some posts that readers of this blog may also find interesting are listed below:

Why machine translation creates so much anger and how to deal with it

This post references Dr Kübler-Ross study of grief. She describes the five stages of emotions which are experienced by people who are approaching death or dealing with the death of a loved one. Her model was widely accepted and it was found to be valid for other forms of losses, as well as situations relating to change (for instance, the loss of a job or of a familiar way of doing things). Her model has been used as a change management tool by businesses across the world.

I have written about this as well in the past referencing this link but it is good to get a real translators perspective which interestingly uses the death and grief cycle as a reference.

Disruptive Change graphic

Another post describes the widespread use of MT based on presentation by Stefan Gentz and is one the most popular posts on her blog.

What Does The Future Hold For Translators?

I find the reaction and interpretation by a translator interesting though I don’t really see how MT is taking work away from translators or the professional translation industry. MT mostly translates stuff that would never get translated were it not possible to do it with MT.

Another that I found interesting is:

Riding The Wave Of Technological Change As A Translator

Or Future Proofing Your Career As A Translator

I think there is lots of useful information for translators on her site, and while I am regularly reminded that I am not a translator and should not be telling translators how to do what they do, I will dare to say that many will find useful information here.

I truly hope that my highlighting her blog here raises her profile and does not have a negative reaction from some who might see this as an endorsement from MT advocates.

I have not been very active in the last few months but I have a new series of ideas that I will start writing on again shortly.

Have a wonderful Thanksgiving vacation for those of you who celebrate this.

Let what comes come.
Let what goes go.
Find out what remains.

— Ramana Maharshi