Wednesday, December 7, 2016

Localization and Language Quality

This is a guest post by David Snider, Globalization Architect at LinkedIn - Reprinted here with permission. I thought the article was interesting because it points out that MT quality is now quite adequate for several types of Enterprise applications, even though MT might very well be a force that influences and causes the "crapification" (a word I wish I had invented) of  overall language quality. While this might seem like horror to some, for a lot of business content that has a very short shelf life, and value only if the information is current, this MT quality is sufficient for most of the people who have in interest in the specific content. While David thinks that the language quality will improve, I doubt very much if much of this MT content will improve much beyond what is possible by the raw technology itself. Business content that has value for a short time and then is forgotten simply cannot justify the effort to raise it the level of "proper" written material.

If you go to the original post there are several comments that are worth reading as well.


People have been complaining recently about the decline of language quality (actually, they’ve been complaining for decades – or make that centuries!)  I have to admit that I sympathize: I’m from a generation that was taught to value good writing, and I still react with horror when I see obvious errors, like using “it’s” instead of “its”, or confusing ‘their’, ‘there’ and ‘they’re’.  (I’m even more horrified when I make mistakes myself, which happens more than I like to admit.)
But for my son’s generation? Not so much. Grammar, spelling, and punctuation aren’t that important to them; what matters is whether the other person understands them, and vice versa. My son is already 25 (wow, time flies!), so there’s another generation coming up behind him that’s even less concerned about ‘good’ writing; in fact, this new generation is so accustomed to seeing bad writing that for the most part they don’t even realize there are errors.  This makes for a vicious circle: people grow up surrounded by bad writing, so they, in turn write badly, which in turns exacerbates the problem. I’ve heard this referred to as the ‘crapification of language’.


Why is this happening?


Ease of publishing: in the old days, the cost of publishing content - typesetting it, grinding up trees and making paper, printing the content onto the paper, binding it, shipping it to a store and selling it - was immense. For this reason most published content was thoroughly edited and proofread, as there was no second chance. So if you read printed content like books, magazines and newspapers, you were generally exposed to correct grammar, spelling and punctuation. Since most of what people read was correctly written (even if not always well-written), people who read a lot generally learned to write well. But now anyone can create and publish content, with no editing or proofreading. The result is just what you’d expect.

Informal communications: email, texting, twitter – they all favor speed, and when people are in a hurry quality usually suffers.

Machine-generated content: this includes content that’s created by computers – for example, Machine Generated support content created by piecing together user traffic about problems – as well as Machine Translated content. Machine Generated content, and especially MT content is, as we localization people know, often of very poor quality.

What does this mean for Localization?


Being in the localization business myself, I want to tie this in to the effect on localization. In some ways this ‘crapification’ works against us: garbage in garbage out, after all, and if the source content is badly written then it’s harder for the translators to do a good job, be they humans or machines. But at the same time, this can work for us – especially when it comes to Machine Translation, where there are a couple of things that are making even raw MT more acceptable:

MT engine improvements: MT quality has steadily improved over the past 50 years (yes it’s been around at least that long!) Major improvements, like statistical MT and now neural MT, seem to occur every 10 years or so. Perfect human-quality MT is still ‘only 5 years out’ and will undoubtedly continue to be so for a long time, but quality is steadily improving.

User expectations: The good news for MT is that due to the crapification of language the expectations bar has been coming down, and people are much more willing to accept raw MT, warts and all. Despite the quality problems, more & more people are using web-based MT services like Google Translate, Bing Translator, etc., to read and write content in other languages.  As with texting above, they’re more concerned with content than with form: they’re OK with errors as long as they can understand the content or at least get the gist of it. This seems to be true even for countries that have traditionally had a high bar for language quality, like Japan and France. As shown in the chart below, we’ve already passed the point that raw MT is acceptable for some types of content. (Note that this chart is purely illustrative and is not based on hard data.)

Of course the bar remains high for things like legal documents, marketing content and of course your own personal homepage, but it’s getting lower for many other types of content, especially for things like support content (which many companies have been MTing for years), as well as for blogs and other informal content. In fact, the graph could be redrawn something like this:


Is there any hope for language quality?


As the quality of machine-generated and machine-translated content improves and as editing and proofing tools become better and more ubiquitous, the quality of all content will improve, until we approach the days of professionally edited and proofread books and magazines. As bad writing disappears and people grow accustomed to seeing well-written content, I think even unedited human language quality will start to curve back up again. (I’ve tried to capture this in the graphs above.)
So yes, I believe the crapification of language will slow and eventually reverse itself (hmm, unpleasant plumbing image there)! This doesn’t mean languages won’t continue to evolve, fortunately. That’s one of the things that make them so fascinating – and so challenging to translate.

Some key excerpts from the comments at the original post are listed below:

David Snider: The crapification 'helps' localization teams by allowing them to put more assets in the 'good enough' bucket. The real question for raw MT is: "is my customer better off having to fight their way through a badly-translated MT article and maybe get the help they need, or are they better off not getting the help at all?"
Jorge Russo dos Santos :I disagree that we will see content revert to a golden age of quality. I think, that we see today, there are different quality levels for content and that will continue to appear. If anything, the tolerance to poorly written content will probably increase, as people consume more and more content, but there will pockets where people will require premium content and will be willing to pay for it, either in the original languages or on the localized language(s) and this will not be only for legal.

Please go to the original posting to see all the comments at this link.

  David Snider, Globalization Architect at LinkedIn


  1. I posted the following comment there, but I think it is relevant here, too:
    Interesting to see the concept of "crapification" in connection with localisation on LinkedIn. Perhaps this explains why I only ever get clickbait e-mails from LinkedIn - the whole of the content is "crapified" even without any translation (e.g. job leads that I don't want because I'm not looking for a job but there's no way of telling LinkedIn that; mails asking me to congratulate people I hardly know on assorted non-events in their working life).

    This throws an interesting light on the theory of an explosion in the volume of content which needs instant translation and needs MT: much of it is content that nobody wants exept the PR gurus who write it - so of course nobody cares about the translation quality.

    Perhaps someone could write an article explaining the distinction between the "crapification of localisation" and the age-old concept of verbal diarrhoea.

    1. Victor, I'm sorry you feel LinkedIn content isn't valuable to you; this post isn't about LinkedIn, but as a LinkedIn employee I do want to respond briefly.

      We put a lot of effort into making our content relevant to our members, but its hard to please all of the people all of the time - for example, some people do want to know about job openings or to congratulate connections on special events. For this reason we let our members control the type and frequency of communications as much as possible. I've sent you a connection request, so if you'll accept it I'll show you the ways you can improve content relevance.


    2. Thanks for replying, Dave, and I've accepted your content request.
      However, the wider point here (and the reason why I posted the comment here too) is that many MT artists claim that the rapid rise in the global text volume is the killer argument for MT and post-editing. My suggestion is that at least some of this volume is content-for-content's-sake. So the suggestion that quality is secondary (or that TAUS's quality erosion "standards" are appropriate) is just a roundabout way of saying that much of this stuff isn't really worth saying IN ANY LANGUAGE.

      Whether this applies to LinkedIn is another issue, and I am happy to shift that part of the discussion to that forum itself.