Wednesday, December 7, 2016

Localization and Language Quality

This is a guest post by David Snider, Globalization Architect at LinkedIn - Reprinted here with permission. I thought the article was interesting because it points out that MT quality is now quite adequate for several types of Enterprise applications, even though MT might very well be a force that influences and causes the "crapification" (a word I wish I had invented) of  overall language quality. While this might seem like horror to some, for a lot of business content that has a very short shelf life, and value only if the information is current, this MT quality is sufficient for most of the people who have in interest in the specific content. While David thinks that the language quality will improve, I doubt very much if much of this MT content will improve much beyond what is possible by the raw technology itself. Business content that has value for a short time and then is forgotten simply cannot justify the effort to raise it the level of "proper" written material.

If you go to the original post there are several comments that are worth reading as well.


People have been complaining recently about the decline of language quality (actually, they’ve been complaining for decades – or make that centuries!)  I have to admit that I sympathize: I’m from a generation that was taught to value good writing, and I still react with horror when I see obvious errors, like using “it’s” instead of “its”, or confusing ‘their’, ‘there’ and ‘they’re’.  (I’m even more horrified when I make mistakes myself, which happens more than I like to admit.)
But for my son’s generation? Not so much. Grammar, spelling, and punctuation aren’t that important to them; what matters is whether the other person understands them, and vice versa. My son is already 25 (wow, time flies!), so there’s another generation coming up behind him that’s even less concerned about ‘good’ writing; in fact, this new generation is so accustomed to seeing bad writing that for the most part they don’t even realize there are errors.  This makes for a vicious circle: people grow up surrounded by bad writing, so they, in turn write badly, which in turns exacerbates the problem. I’ve heard this referred to as the ‘crapification of language’.


Why is this happening?


Ease of publishing: in the old days, the cost of publishing content - typesetting it, grinding up trees and making paper, printing the content onto the paper, binding it, shipping it to a store and selling it - was immense. For this reason most published content was thoroughly edited and proofread, as there was no second chance. So if you read printed content like books, magazines and newspapers, you were generally exposed to correct grammar, spelling and punctuation. Since most of what people read was correctly written (even if not always well-written), people who read a lot generally learned to write well. But now anyone can create and publish content, with no editing or proofreading. The result is just what you’d expect.

Informal communications: email, texting, twitter – they all favor speed, and when people are in a hurry quality usually suffers.

Machine-generated content: this includes content that’s created by computers – for example, Machine Generated support content created by piecing together user traffic about problems – as well as Machine Translated content. Machine Generated content, and especially MT content is, as we localization people know, often of very poor quality.

What does this mean for Localization?


Being in the localization business myself, I want to tie this in to the effect on localization. In some ways this ‘crapification’ works against us: garbage in garbage out, after all, and if the source content is badly written then it’s harder for the translators to do a good job, be they humans or machines. But at the same time, this can work for us – especially when it comes to Machine Translation, where there are a couple of things that are making even raw MT more acceptable:

MT engine improvements: MT quality has steadily improved over the past 50 years (yes it’s been around at least that long!) Major improvements, like statistical MT and now neural MT, seem to occur every 10 years or so. Perfect human-quality MT is still ‘only 5 years out’ and will undoubtedly continue to be so for a long time, but quality is steadily improving.

User expectations: The good news for MT is that due to the crapification of language the expectations bar has been coming down, and people are much more willing to accept raw MT, warts and all. Despite the quality problems, more & more people are using web-based MT services like Google Translate, Bing Translator, etc., to read and write content in other languages.  As with texting above, they’re more concerned with content than with form: they’re OK with errors as long as they can understand the content or at least get the gist of it. This seems to be true even for countries that have traditionally had a high bar for language quality, like Japan and France. As shown in the chart below, we’ve already passed the point that raw MT is acceptable for some types of content. (Note that this chart is purely illustrative and is not based on hard data.)

Of course the bar remains high for things like legal documents, marketing content and of course your own personal homepage, but it’s getting lower for many other types of content, especially for things like support content (which many companies have been MTing for years), as well as for blogs and other informal content. In fact, the graph could be redrawn something like this:


Is there any hope for language quality?


As the quality of machine-generated and machine-translated content improves and as editing and proofing tools become better and more ubiquitous, the quality of all content will improve, until we approach the days of professionally edited and proofread books and magazines. As bad writing disappears and people grow accustomed to seeing well-written content, I think even unedited human language quality will start to curve back up again. (I’ve tried to capture this in the graphs above.)
So yes, I believe the crapification of language will slow and eventually reverse itself (hmm, unpleasant plumbing image there)! This doesn’t mean languages won’t continue to evolve, fortunately. That’s one of the things that make them so fascinating – and so challenging to translate.

Some key excerpts from the comments at the original post are listed below:

David Snider: The crapification 'helps' localization teams by allowing them to put more assets in the 'good enough' bucket. The real question for raw MT is: "is my customer better off having to fight their way through a badly-translated MT article and maybe get the help they need, or are they better off not getting the help at all?"
Jorge Russo dos Santos :I disagree that we will see content revert to a golden age of quality. I think, that we see today, there are different quality levels for content and that will continue to appear. If anything, the tolerance to poorly written content will probably increase, as people consume more and more content, but there will pockets where people will require premium content and will be willing to pay for it, either in the original languages or on the localized language(s) and this will not be only for legal.

Please go to the original posting to see all the comments at this link.

  David Snider, Globalization Architect at LinkedIn

Tuesday, November 29, 2016

The Critical Importance of Simplicity

This is a post by Luigi Muzii that was initially triggered by this post and this one, but I think it has grown into a broader comment on a key issue related to the successful professional use of MT i.e. the assessment of MT quality and the extent, scope, and management of the post-editing effort. Being able to get a quick and accurate assessment of the specific quality at any given time in a production use scenario is critical, but the assessment process itself cannot be so cumbersome and so complicated a process that the measurement effort becomes a new problem in itself.

While we see that industry leaders and academics continue to develop well meaning but very difficult to deploy (efficiently and cost-effectively) metrics like MQM and DQF, most practitioners are left with BLEU and TER as the only viable and cost-effective measures. However, these easy-to-do metrics have well-known bias issues with RbMT and now with NMT. And given that this estimation issue is the “crux of the biscuit” as Zappa would say, it is worth ongoing consideration and review as doing this correctly is where MT success is hidden. 

Luigi's insistence on keeping this measurement simple, sometimes makes him unpopular with academics and industry "experts",  but I believe that this issue is so often at the heart of a successful and unsuccessful MT deployment that it bears repeated exposure and frequent re-examination as we inch our way to more practical and useful measurement procedures than BLEU which continues to confound discussions of real progress in improving MT quality.

KISS - Keep it simple, stupid” is a design principle noted by the U.S. Navy in 1960 stating that most systems work best if they are kept simple rather than made complicated.

The most profound technologies are those that disappear.
Mark Weiser
The Computer for the Twenty-First Century, Scientific American, 1991, pp. 66–75

The best way to understand the powers and limitations of a technology is to use it.

This can be easily shown for any general-purpose technology, and machine translation can now be considered as such. In fact, the major accomplishment we can acknowledge to Google Translate is that of having popularized widespread translation activity using machine translation, something most celebrated academics, and supposedly influential professional bodies have not been able to achieve after decades of trying.

The translation quality assessment debacle is emblematic i.e. the translation quality issue is, in many ways, representative of the whole translation community.  It has been debated for centuries, mostly at conferences where insiders — always the same people — talk amongst themselves. And the people attending conferences of one kind do not talk with people attending conferences of another kind.

This ill-conceived approach to quality assessment has claimed victims even among scientists working on automatic evaluation methods. Just recently, the nonsensical notion of a “perfect translation” regained momentum. Everybody even fleetingly involved in translation should know that there is nothing easier to show as flawed than the notion of a “perfect translation”. At least according to current assessment practices, in a typical confirmation bias pattern. There is a difference between “acceptable” and “usable” translation and any notion of a perfect translation.

On the verge of the ultimate disruption, translation orthodoxy still dominates even the technology landscape by eradicating the key principle of innovation, simplicity. 
The expected, and yet the overwhelming growth of content has long been going hand in hand with a demand for faster translation in an ever-growing number of language pairs, with machine translation being suggested as the one solution.

The issue remains unsolved, though, of providing buyers with an easy way to know whether the game is worth the candle. Instead, the translation community has been unable so far to provide unknowing buyers anything but an intricate maze of categories and error typologies, weights and parameters, where even an experienced linguist can have a hard time to find his way around.

The still largely widespread claim that the industry should “educate the client” is the concrete manifestation of the typical information asymmetry affecting the translation sector. By inadvertently keeping the customer in the dark, translation academics, pundits, and providers cuddle the silly illusion of gaining respect and consideration for their roles, while they are simply shooting themselves in the feet.

When KantanMT’s Poulomi Choudhury highlights the importance of the central role that the Multidimensional Quality Metrics (MQM) is supposed to play, in all likelihood she is talking to her fellow linguists. However, typical customers simply want to know whether they have to spend further to refine a translation and — possibly — understand how much. Typical customers who are ignorant of the translation production process are not interested in the kind of KPIs that Poulomi describes, while they could be interested in a totally different set of KPIs, to assess the reliability of a prospective partner.

Possibly, the perverse complexity of unnecessarily intricate metrics for translation quality assessment is meant to hide the uncertainty and resulting ambiguity of theorists and the inability and failure of theory rather than to reassure customers and provide them with usable tools.

In fact, every time you try to question the cumbersome and flawed mechanism behind such metrics, the academic community closes like a clam.

In her post, Poulomi Choudhury suggests setting exact parameters for reviewers. Unfortunately, the inception of the countless fights between translators and reviewers, between translators and reviewers and terminologists, and between translators and reviewers and terminologists and subject experts and in-country reviewers gets lost in the mist of time.

Not only are reviewing and post-editing (PEMT) instructions a rare commodity, the same translation pundits who tirelessly flood the industry with pointless standards and intricate metrics — possibly without having spent a single hour in their life negotiating with customers — have not produced even a guideline skeleton to help practitioners develop such procedural overviews.

As implementing a machine translation platform is no stroll for DIY ramblers, writing PEMT guidelines is not straightforward either, requiring specific know-how, and understanding, recalling the rationale for hiring a consultant when working with MT.

For example, although writing instructions for post-editors is a once-only task, different engines, domains, and language pairs require different instructions to meet the needs of different PEMT efforts. Once written, these instructions must then be kept up-to-date as new engines, language pairs, or domains are implemented so they vary continuously. Also, to help project managers assess the PEMT effort, these instructions should address the quality issue with guidelines and thresholds and scores for raw translation. Obviously, they should be clear and concise, and this might very well be the hardest part.

As well as being related to the quality of the raw output, the PEMT effort is a measure that any customer should be able to easily understand as a direct indicator of the potential expenditures to achieve business goals. In this respect, it should be properly described and we should go with tools that help the customer financially estimate the amount of work required to achieve the desired quality level from a machine translation output.

Indeed, the PEMT effort depends on diverse factors such as the volume of content to process, the turnaround time, and the quality expectations for the finalized output. Most importantly, it depends on the suitability of source data and input for (machine) translation.

Therefore, however, assessable through automatic measurements, PEMT effort can only be loosely estimated and projected. In this respect, KantanMT is offering the finest tool combination for accurate estimates. 
On the other hand, a downstream measurement of the PEMT effort by comparing the final post-edited translation with the raw machine translation output is reactive (just like the typical translation quality assessment practice) rather than predictive (that is business-oriented).

Also, a downstream compensation model requires an accurate measurement of the actual work performed to infer the percentage on the hourly rate from the edit distance, as no positive correlation exists between edit distance and actual throughput.
Nonetheless, tracking the PEMT effort can be useful if the resulting data is compared with estimates to derive a historical series. After all, that’s how data empowers us.

Predictability is a major driver in any business, and it should come as no surprise, then, that translation buyers have no interest in dealing with the intricacy of quality metrics that are irredeemably prone to subjectivity, ambiguity, and misinterpretation, and, most importantly, are irrelevant to them. When it comes to business — and real money — gambling is never the first option, but the last resort. (KV: Predictability here would mean a defined effort ($ & time) that would result in a defined outcome (sample of acceptable output)).
On the other hand, more than a quarter of a century has passed since the introduction of CAT tools in the professional field, many books and papers have been written about them, and yet many still feel the urge to explain what they are. Maybe this might make sense for the few customers who are entirely new to translation, even though what they could be interested to know is just that providers would use some tool of the trade and spare them some money. And yet, quality would remain a major concern, as a recent SDL study showed.

An introduction to CAT tools is at the very least curious when recipients are translation professionals or translation students about to graduate. Even debunking some still popular myths about CAT is just as curious, unless considering the number of preachers thundering from their virtual pulpits against the hazards of these instruments of the devil.

In this apocalyptic scenario, even a significant leap forward could go almost unnoticed. Lilt is an innovative translation tool, with some fabulous features, especially for professional translators. As Kirti Vashee points out, it is a virtual translator assistant. It also presents a few drawbacks, though.
Post-editing is the ferry to the singularity. It could be run interactively, or downstream on an entire corpus of machine translation output.

When fed with properly arranged linguistic data from existing translation memories, Lilt could be an extraordinary post-editing tool also on bilingual files. Unfortunately, the edits made by a single user only affects the dataset associated with that account and the task that is underway. In other words, Lilt is by no means a collaborative translation environment. Yet.

This means that, for Lilt to be effective with typically large PEMT jobs involving teams, accurate PEMT instructions are essential, and, most importantly, post-editors should strictly follow them. This is a serious issue. Computers never break rules, while free-will enables humans to deviate from them.

Finally, although cloud computing is now usual in business, Lilt can still present a major problem to many translation industry players for being only available in the cloud, due to the need for a fast Internet connection, or to the vexed — although repeatedly demystified — question of data protection for IP reasons, and despite the computing resources to process the vast amount of data that would hardly make sense for a typical SME to have.

In conclusion, when you start a business, it is usually to make money, and money is not necessarily bad if you do no evil, pecunia non olet. And money usually comes from buyers, whose prime requirement can be summarized as “Give me something I can understand.”

My ignorance will excuse me.


Luigi Muzii's profile photo

Luigi Muzii has been in the "translation business" since 1982 and has been a business consultant since 2002, in the translation and localization industry through his firm . He focuses on helping customers choose and implement best-suited technologies and redesign their business processes for the greatest effectiveness of translation and localization related work.

This link provides access to his other blog posts.

Thursday, November 24, 2016

The Thanksgiving Myth

Thanksgiving is fundamentally about giving thanks. Though, according to Wikipedia and what we are generally told in the US, it has associations with Pilgrims, Puritans and being a harvest festival in the US. For Native Americans, the story of Thanksgiving is not a very happy one.

“Thanksgiving” has become a time of mourning for many Native People. It serves as a period of remembering how a gift of generosity was rewarded by theft of land and seed corn, extermination of many Native people from disease, and near total elimination of many more from forced assimilation. As celebrated in America “Thanksgiving” is a reminder of 500 years of betrayal. To many Native Americans, the Thanksgiving Myth amounts to the settler’s justification for the genocide of Indigenous peoples. Native Americans think of this official U.S. celebration of the survival of early arrivals in a European invasion that culminated in the death of 10+ million native people. Here is a  view of how one Native American views the holiday who provides some background on the source of this darker view and also shares why she has chosen to view it in another way with a spirit of forgiveness.

Thanksgiving is also associated with hard core shopping in the U.S. with something called Black Friday.  However, in the modern era, where few are aware of the damage to the native cultures by the original settlers and broken treaties, it is essentially about feasting, football, shopping and expressing gratitude. This is what most of my personal experience has been, football, shopping and turkey (apparently 45 million will die). 

While I have never resonated with the commercialism of the event, I have always felt that the celebration of gratitude is wonderful. Gratitude is an emotion expressing appreciation for what one has — as opposed to, for example, a consumer-driven emphasis on what one wants. Gratitude is getting a great deal of attention as a facet of positive psychology: Studies show that we can deliberately cultivate gratitude, and can increase our well-being and happiness by doing so. In addition, gratefulness—and especially expression of it to others -- is associated with increased energy, optimism, and empathy.

What Is Gratitude?

Robert Emmons, perhaps the world’s leading scientific expert on gratitude, argues that gratitude has two key components, which he describes in a Greater Good essay, “Why Gratitude Is Good.”

“First,” he writes, “it’s an affirmation of goodness. We affirm that there are good things in the world, gifts and benefits we’ve received.”

In the second part of gratitude, he explains, “we recognize that the sources of this goodness are outside of ourselves. … We acknowledge that other people -- or even higher powers, if you’re of a spiritual mindset—gave us many gifts, big and small, to help us achieve the goodness in our lives.”

Emmons and other researchers see the social dimension as being especially important to gratitude. “I see it as a relationship-strengthening emotion,“ writes Emmons, “because it requires us to see how we’ve been supported and affirmed by other people.”
Because gratitude encourages us not only to appreciate gifts but to repay them (or pay them forward), the sociologist Georg Simmel called it “the moral memory of mankind.”

As an immigrant to America I have always felt that the Thanksgiving story I was told about Pilgrims and "Indians" holding hands and smiling, was at least a little bit shaky based on my very limited knowledge of American history. However, it just never rang true to my mind. And while I feel that any day when a family and a community gather to give thanks, is special and worthy of celebration, I think we should also acknowledge that the history we are told is suspect, as often, history is written by the victors and not by men of even and truthful temperance. Part of giving thanks, it seems to me is to also acknowledge the sacrifices of our ancestors who may have made one’s plenitude possible. This would include the Native Americans if you live in North America, as they have always regarded themselves as caretakers of the land rather than owners of it. The following statement is something that you will hear from many Native Americans about their ethos.

As America’s Host People, Native Americans are the keepers of the land, that is our sacred duty. Our responsibilities include bringing the land, the people, and the rest of creation back into harmony.
 On this particular Thanksgiving, near the Standing Rock Sioux Reservation, we have yet another example of Native Americans standing up for what they believe is a sacred trust, to protect the desecration of land they consider holy, and protect potential damage to the largest drinking water supply in the region. This is yet another example of the betrayal of a treaty with the US government, as many believe that this should have been prevented by treaties already in place. From one perspective the issues are complex as described here and in looking at the oil price economics driving the project. The world has been electrified by protests against the Dakota access pipeline. Is this a new civil rights movement where environmental and human rights meet?

For the Elders leading the protest there are 3 clear reasons to try and stop this:
  1. Prevent desecration of sacred burial grounds and what is considered “holy” land,
  2. Protect a major supply of natural drinking water from potential oil spill accidents,
  3. They have treaty in place with US government that was supposed to protect against commercial exploitation of protected land.

 For those who think that the oil spill potential is overstated, should take a look at how frequently these accidents do happen, and what happens when they do. Galveston Bay, a hub for oil traffic, for example averages close to 300 oil spills of various sizes each year. As you may have guessed, Galveston is not known for it’s wonderful beach experience. The Exxon Valdez spill still has a negative impact 25 years later, and the environment and wildlife has yet to fully recover from the accident. The impact of the Deepwater Horizon spill examined 5 years later, shows that while nature does have a recovery process, some things can take decades or longer to even understand the damage let alone recover.

Here is a video of a 90,000 gallon spill in May 2016 that did not even make the daily news since these kinds of spills are so common.

So on this Thanksgiving, I also give thanks to those who oppose this pipeline and make a valiant attempt to stop the potential destruction of one of the largest natural drinking water supplies in the US. The Native American ethos also has a very unique view on death in such a battle. When one battles and fights for the community well being, and for the land, it is considered to be a noble death since it is a sacrifice for the well being of others.  Robbie Robertson (of The Band) captures the emotion that these “water protectors” must feel at Standing Rock, wonderfully in this live rendition of “It is a good day to die”, a quote attributed to Crazy Horse. This translation is the English bastardization of a common Sioux battle-cry of, "Nake nula wauŋ welo!" This phrase really means, "I am ready for whatever comes." It was meant to show the warriors were not afraid of the battle or dying in it. So... Crazy Horse probably shouted, "Hokahey! Nake nula wauŋ welo!"

I wish you all a warm and loving Thanksgiving as you express your gratitude for your plenitude.