Showing posts with label best practices. Show all posts

Monday, May 18, 2020

Data Preparation Best Practices for Neural MT

In any machine learning task, the quality and volume of training data available is a critical determinant of the system that is developed. The importance of data is real for both Statistical MT and Neural MT, which are both data-driven, and produces output that is deeply influenced by the data used to train them. Some believe that Statistical MT systems have a higher tolerance for noisy data. Thus it is assumed that more data volume is better even if it is "noisy," but in my experience, all data-driven MT systems are better when you have quality data. Research shows that Neural MT is more sensitive to noise than Statistical MT. Still, as SMT has been around for 15+ years now, many of the SMT data preparation practices in use historically often continue and are carried over to NMT model building today.

This problem has raised interest in the field of parallel data filtering to identify and correct the most problematic issues for NMT, e.g., segments where source and target are the same, and misaligned sentences. This presentation by eBay provides an overview of the importance of parallel data filtering and its best practices. It adds to the useful points made by Doctor-sahib in this post. Data cleaning and preparation have always been necessary for developing superior MT engines, and most of us agree that it is even more critical now with neural network-based models.

This guest post is by Raymond Doctor, who is an old and wise acquaintance of mine who has spent over a decade at the Centre for Development of Advanced Computing (C-DAC) in Pune, India. He is a pioneer in digital Indic language work and was involved in several Indic language NLP based initiatives conducting research on Indic language Parsers, Segmentation, Tokenization, Stemming, Lemmatization, NER, Chunking, Machine Translation, and Opinion Mining.

The success of these MT experiments is yet more proof that the best MT systems come from those who have a deep understanding of both the underlying linguistics, as well as the MT system development methodology.

He and I also share two Indian languages in common (Hindi and Gujarati). Over the years, he has shown me many examples of output from MT systems he has developed in his research that were the best I had seen for these two languages going into and out of English. The success of his MT experiments is yet more proof that the best MT systems come from those who have a deep understanding of both the underlying linguistics, as well as the MT system development methodology.

Overview of the SMT data alignment processes

"True inaccuracy and errors in data are at least relatively straightforward to address, because they are generally all logical in nature. Bias, on the other hand, involves changing how humans look at data, and we all know how hard it is to change human behavior."

- Michiko Wolcott

Some other wisdom about data from Michiko:

Truth #1: Data are stupid and lazy.

Data are not intelligent. Even artificial intelligence must be taught before it learns to learn on its own (even that is debatable). Data have no ability on their own. It is often said that insights must be teased out of data.

Truth #2: Data are rarely an objective representation of reality (on their own).

I want to clarify this statement: it does not say that data is rarely accurate or error-free. Accuracy and correctness are dimensions of quality of what is in the data themselves.

The text below is written by the guest author.

**************

Over the years, I have been studying the various recommendations given to prepare training data before submitting it to an NMT learning engine. I feel these recommended practices mainly emerged as best practices at the time of SMT, and have been carried over to NMT with less beneficial results.

I have identified six major pitfalls that data analysts make when preparing training data for NMT models. These data cleaning and preparation practices originated as best practices with SMT, where they were of benefit. Many data practices that made sense with SMT are still being followed today, and it is my opinion that these should be avoided and are likely to result in better outcomes.

While I have listed a few practices that I feel should be avoided, many other SMT-based data prepping practices ensure that the training data is likely to produce a sub-optimal NMT system. But the factors I have listed below are the most common practices which have resulted in lower output quality than would be possible by ignoring these practices. I disregarded the advice given regarding punctuations, deduping, removing truncations, MWEs, and found the quality of NMT output considerably improves in my research with Indic language MT systems.

As far as possible, examples have been provided from a Gujarati <> English NMT system I have developed. But the same can apply to any other parallel corpus.

1. PUNCTUATION

Quite a few sites tell you to remove punctuations before submitting the data for learning. It is my observation that this is not optimal practice.

Punctuations are markers that allow for understanding the meaning. In a majority of languages word order does not necessarily show interrogation

Tu viens? =You are coming?

Removing the interrogation marker creates confusion and dupes [see my remark below]

See what happens when a comma is removed:

Anne Marie, va manger mon enfant=Anne Marie. Come have your lunch

Anne Marie va manger mon enfant=Anne Marie is going to eat my child

The mayor says, the commissioner is a fool.

The mayor, says the commissioner is a fool.

I feel that in preparing a corpus the punctuation markers should be retained.

2. TRUNCATIONS AND SHORT SENTENCES

Quite a few sites advise you to remove short sentences. Doing this, in my opinion, is a serious error. Short sentences are crucial for translating headlines, one of the stumbling blocks of NMT. Some have no verbs and are pure nominal structures.

Curfew declared: Noun + Verb

Sweep of Covid19 over the continent: Nominal Phrase

Google does not handle nominal structures well, and here is an example:

Sweep of Covid over India= ભારત ઉપર કોવિડનો સ્વીપ

I have found that retaining such structures strengthens and improves the quality of NMT output.

3. MULTIWORD EXPRESSIONS

Multiword expressions (MWEs) are expressions that are made up of at least two words, and which can be syntactically and/or semantically idiosyncratic in nature. Moreover, they act as a single unit at some level of linguistic analysis.

Like short sentences, MWEs are often ignored and removed from the training corpus. These MWEs are very often fixed patterns found in a given language. These can be short expressions, titles, or phrasal constructs, just to name a few of the possibilities. MWEs cannot be literally translated and need to be glossed accurately. My experience has been that the higher the volume of MWEs provided, the better the quality of learning. A few MWEs in Gujarati are provided below:

agreement in absence =અભાવાન્વય

agreement in presence =ભવાન્વય

agriculture parity =કૃષિમૂલ્ય સમાનતા

aid and advice =સહાય અને સલાહ

aider and abettor =સહાયક અને મદદગાર

aim fire =નિશાન લગાવી ગોળી ચલાવવી

4. DUPLICATES

A large number of sites providing recommendations on NMT training data preparation tell you to remove duplicates, both in the Source and Target texts. This action in popular parlance is termed as deduping. The argument being that deduping the corpus makes for greater accuracy. However, it is common to have an English sentence that can map to two or more strings in the target language. This variation can be because of synonyms used in the target languages, or because of a flexible word order that is especially common in Indic languages. De-duping the data results in weakening the quality of MT output. The only case where deduping needs to be done is when we have two identical strings, both in the Source and Target language. Higher quality NMT engines incorporate these slight variations on a single segment to enable the MT engines to produce multiple variants.

Change of verbal expression and word order:

How are the trade talks between China and the US moving forward now. =ચીન તથા અમેરિકા વચ્ચે વેપાર વ્યવહાર વિષયક વાતચીત હવે કેવી આગળ વધે છે.

How are the trade talks between China and the US moving forward now. =ચીન તથા અમેરિકા વચ્ચે હવે વેપાર વિષયક વાતચીત કેવી આગળ વધે છે.

Synonyms:

Experts believe. =એક્સપર્ટ્સ માને છે.

Experts believe. =જાણકારોનું માનવું છે.

Experts believe. =નિષ્ણાતોનું માનવું છે.

Deduping the data in such cases results in reducing the quality of output.

The only case where deduping needs to be done is where we have two identical strings, both in the Source and Target language. In other words, an exact duplicate. High-end NMT engines do not practice deduping since this deprives the MT system from being able to provide variants, which can be seen by clicking on full or part of the gloss.

5. VERBAL PATTERNS

The inability to handle these are the Achilles heel of a majority of NMT engines, including Google, insofar as English to Indic languages are concerned. Attention to this area is ignored because it is felt that the corpus will handle all verbal patterns in both the source and target language. Even the best of corpora does not handle this.

Providing a set of sentences with the Verbal Pattern of both the source and target languages goes a long way.

Gujarati admits around 40+ verbal patterns and NMT fails on quite a few:

They ought to have been listening to the PM's speech =તેઓએ વડા પ્રધાનનું ભાષણ સાંભળ્યું હોવું જોઈએ

Shown below is a sample of Gujarati verbal patterns with “to eat “ as a paradigm

You are eating =તમે ખાઓ છો
You are not eating =તમે ખાતા નથી
You ate =તમે ખાધું
You can eat =તમે ખાઈ શકો છો
You cannot eat =તમે નહીં ખાઈ શકો
You could not eat =તમે ખાઈ શક્યા નહીં
You did not eat =તમે ખાધું નહીં
You do not eat =તમે ખાતા નથી
You eat =તમે ખાધું
You had been eating =તમે ખાતા હતા
You had eaten =તમે ખાધું હતું
You have eaten =તમે ખાધું છે
You may be eating =તમે ખાતા હોઈ શકો છો
You may eat =તમે ખાઈ શકો છો
You might eat =તમે કદાચ ખાશો
You might not eat =તમે કદાચ ખાશો નહીં
You must eat =તમારે ખાવું જ જોઇએ
You must not eat =તમારે ખાવું ન જોઈએ
You ought not to eat =તમારે ખાવું ન જોઈએ
You ought to eat =તમારે ખાવું જોઈએ
You shall eat =તમે ખાશો

Similarly, the use of a habitual marker used when glossed into French by a high-quality NMT system

6. POLE AND VECTOR VERBS

This construct is very common in Indic languages and often leads to mistranslation.

Thus, Gujarati uses જવું કરવું as an adjunct to the main verb. The combination of the pole and the vector verb such as જવું creates a new meaning.

મરી જવું is not translated as die go, but is simply die

Gujarati admits around 15-20 such verbs, as do Hindi and other Indic languages, and once again, a corpus needs to be fed this type of data in the shape of sentences to produce better output.

In the case of English it is the prepositional phrases that often create issues:

Pick up, pick someone up, pick up the tab

Conclusion

We noticed that when training data that ignores some of the frequent data preparation recommendations are sent in for training, the quality of MT output markedly improves. However, there is a caveat. If the threshold of the training data is lower than 100,000 segments, following or not following the above recommendations make little or no difference. Superior NMT systems require a sizeable corpus, and generally, we see that at least a million+ segments are needed.

Here is a small set of sentences from various domains is provided below as proof of the quality of output using these techniques

Now sieve this mixture.=હવે આ મિશ્રણને ગરણીથી ગાળી લો.
It is violence and violence is sin.=હિંસા કહેવાય અને હિંસા પાપ છે.
The youth were frustrated and angry.=યુવાનો નિરાશ અને ક્રોધિત હતા.
Give a double advantage.=ચાંલ્લો કરીને ખીર પણ ખવડાવી.
The similarity between Modi and Mamata=મોદી અને મમતા વચ્ચેનું સામ્ય
I'm a big fan of Bumrah.=હું બુમરાહનો મોટો પ્રશંસક છું.
38 people were killed.=તેમાં 38 લોકોના મોત થયા હતા.
The stranger came and asked.=અજાણ્યા યુવકે આવીને પૂછ્યું.
Jet now has 1,300 pilots.=હવે જેટની પાસે 1,300 પાયલટ છે.

====================================================================

Raymond Doctor, has spent over a decade at the Centre for Development of Advanced Computing (C-DAC) in Pune, India. He is a pioneer in digital Indic language work and was involved in several Indic language NLP based initiatives and conducted research in furthering Indic language Parsers, Segmentation, Tokenization, Stemming, Lemmatization, NER, Chunking, Machine Translation, and Opinion Mining.

Tuesday, March 27, 2018

Sharing Efforts to get the most from MT and Post-Editing

This is a guest post from Luigi Muzii which is basically made up primarily of the speaker notes of his presentation at the ELIA Together 2018 conference. I believe that Luigi has wise words and represent best practices thinking and thus offer it on this blog forum.

While some of his advice might seem obvious to some, I am always struck by often these very commonsensical recommendations are either overlooked or simply ignored willfully.

As generic MT improves, I think it becomes more and more important for enterprises to consider bringing rampant, uncontrolled MT use by employees and outsourced translators under control.

As always the emphasis of bold text in the post is my doing.

====

The presentation was designed to provide some practical advice about tackling the challenges that freelancers, project managers, and translation buyers face when approaching MT, implementing MT or running MT and post-editing projects.

Sharing efforts to get the most from MT+PE from Luigi Muzii

Three target groups can be identified for three kinds of task:

In the making; machine translation is used for “suggestions” while processing a document by a human translator; this is probably the most common approach today;
Downstream; machine translation is used to fully process a document that will be possibly post-edited; this approach is typically adopted by larger clients with established experience in the field;
On constraints; machine translation is used by an LSP to finalize a translation job by asking translators to work on suggestions coming from a specialized engine.

While scenarios two and three might meet the customer’s need for confidentiality and IP protection through an in-house engine using only the client’s own data, scenario one is becoming more and more general among professional translators, given the astonishing improvement of online engines. At the same time, scenario three is slowly but constantly applying to LSPs who try to escape price and volume pressures through machine translation and post-editing.

Laying foundations

The three scenarios just outlined require different strategies to be devised. The first one involves the machine-translation method.

Given the circumstances, the premises and the many reservations about it—basically, the hype—a question must be asked: Is PEMT already in the past?

In this paper, the reference method is PB-SMT (Phrase-Based Statistical Machine Translation) because PB-SMT engines are inexpensive and effective, whereas customizable NMT (Neural Machine Translation) engines are still quite pricey and challenging as to technical requirements and operational complexity, thus out of range for most customers. Translators working on unrestricted documents (scenario one applies here,) would generally choose an online NMT engine. For customers requiring confidentiality and IP protection and willing to leverage their own language data (scenario two and three apply here,) a highly customizable PB-SMT engine might be a valid option, especially where no major investment is envisaged in the field, vast and suitable data is available and/or limited volumes are processed.

In general, the main drivers in the adoption of MT are productivity (speed and volumes) and usefulness (consistency and marginal cost) especially for large projects otherwise involving many translators. Unfortunately, MT is not exactly child’s play. MT engines are complex and challenging applications requiring a combination of tools, data, and knowledge. This is a rare commodity, especially in a single person, on both sides of a translation project.

Also, in the future, while MT will continue to proliferate, the shortage of good translators will increase.

Join forces

Therefore, joining forces is important to explore and vet as many solutions as possible. According to a popular saying, everyone’s needed, no one’s indispensable, and can be easily replaced by anyone else with a similar profile in the same role. This also means, though, that, to be valuable, everyone’s effort is needed, on the highest level of performance all the time. For quite some time now, translation is no longer a solitary feat, but a collaborative effort. This is especially true with the current level of technology.

In this respect, three steps should be completed before starting any MT project.

Recap your goals and requirements
What you expect from MT and how much you are willing to rely on it.
Check your MT readiness;
Realistically analyze your knowledge, tools, and data.
Plan for assistance
Never venture into an unfamiliar territory without a guide.

Defining requirements

When planning to implement MT, keep scope, goals, and expectations clearly distinct.
Identify one or more items within your scope of business for MT, possibly picking those where the amount of data available is larger, and the quality higher.

Clearly define and prioritize your goals. Major goals may be reducing labor, boosting productivity and keeping consistency, especially in larger projects.

Be realistic in your expectations. Therefore, familiarize with technology and strengthen your expertise to make the best of it. Tackle any security issues for confidentiality and data integrity and protection of intellectual property. Don’t forget to scrub your data if you plan to train an engine and to plan for any relevant support. Finally, revise your pricing model to encompass MT-related tasks.

Building a platform

When building an MT platform, never forget that MT engines are not all equal, for different environments, configurations, and data. Therefore, although the output could be considered someway predictable, raw output quality can vary across systems and language combinations, and error may not follow a consistent pattern. Performances of MT engines also vary.

In data-driven MT, data maintenance is crucial, and it is the first task when setting up an MT platform. Data must be organized, cleaned, and fixed for terminology and style. For a customized engine, at least 100,000 paired segments are necessary and the cleaner and healthier the better.

Another important factor to the effectiveness of an MT strategy are the tools used for data preparation, pre-translation, and post-editing. Special attention must be given to translation tool settings to allow for sub-segment recall and fuzzy match repair.

Finally, when choosing the engine, the items to consider are:

The total cost of ownership (TCO)
Integration
Expertise
Security (especially as to intellectual property and confidentiality)

Best practices: Running projects

When running MT projects, best practices may be different depending on whether you are a translation buyer or vendor.

In general, knowing your data and mastering quality metrics is a must. As to post-editing, devise your own compensation scheme.

A common mistake is to consider all content as equal and then mess with data. In the same way, absolutely avoid relying only on your capacities or on vendors. In the end, everything can be summed up in the simple invitation to not expect any miracles.

Never forget to agree with the customer and the content owner about using MT, especially if you are using a SaaS/online platform to prevent being sued.

Also, remember that data is the fuel to any SMT/NMT engine and that the output is only as good as the data used. In fact, these engines perform statistical predictions by inference, and when the amount and quality of data increase, an engine improves.

Collect as much data as possible, but always make sure it comes from few reliable sources in a restricted domain, that it contains correct translations with no errors, it is real and recent, and terminologically and stylistically consistent.

At this point, you must accept that MT output is unpredictable. For this reason, MT quality assessment should be run in such a way as to prevent any subjectivity.

For the same reason, post-editing is and will remain a critical, integrated part of MT usage, and it is expected to be fast, unchallenging, and flowing.

Anyway, the amount of post-editing required can be hard to assess. To plan deadlines and allocate a budget for the job, two different measures can be used, the edit time and the post-editing effort. The first is the time required to get a raw MT output to the desired standard, and the latter is the percentage of edits to be applied to raw MT output to attain the desired standard.

The main problem with edit time is that it can only be computed downstream, assuming that the time spent has been entirely devoted to editing.

The post-editing effort can be estimated through probabilistic forecasts based on automatic metrics as a reverse projection of the productivity boost. In fact, translation productivity is measured as the throughput expressed in the number of words per hour, and MT is supposed to improve it by reducing the turnaround time and increasing the workable volumes. However, the post-editing effort and the turnaround time are hard to predict, especially for translation of publishable quality and/or data for incremental training of engines. In fact, it depends on diverse factors such as the quality expectations for the finalized output, the volume of content to process, and the allotted time for the task. Also, the effort required depends on the post-editing level.

The post-editing level is generally restricted to:

Gisting;
Light;
Full.

Gisting consists in fixing recurring errors in raw MT output with automatic scripts. It is used for volatile content, e.g. messaging, conversations, etc. Light post-editing consists of fixing capitalization and punctuation errors, replacing unknown words, removing redundant words or inserting missing ones, and ignoring all stylistic issues. It is generally used for continuous delivery and reprocessing. Full post-editing consists in fixing meaning distortion, grammar, and syntax, translating untranslated terms (possibly new terms), and adjusting fluency. It is reserved for publishing and engine training.
Finally, always follow a few basic rules before boarding on a post-editing project:

Test before operating;
Ask for MT samples for negotiation;
Negotiate throughput rates;
Ask for glossary with the list of DNT words;
Ask for instructions;
Be open to feedback.

Similarly,

Never use MT to sustain price competition;
Never process poor MT outputs;
Never treat post-editing as fuzzy matches.

Remember that different engines, domains, and language pairs produce different outputs, involve different post-editing efforts, and require different post-editing instructions. These should address tool selection criteria and environment setup guidelines, as well as a style guide, and a comprehensive term base. They should also address conventions and the type and number of project details as well as the general pricing model and the actual operating instructions.

As to pricing and compensation, for light post-editing of very good output when speed is a major concern and the first requirement, a model should be settled prior to any assessment of the actual MT output based on a clear-cut predictive scheme. However, do not follow any translation-memory fuzzy matches scheme. In fact, while fuzzy matches over 85% are inherently correct and generally require minor changes, machine-translated segments may contain errors and inaccuracies, and even a supposedly light post-editing may prove challenging. A downstream computation scheme might also be devised in full post-editing for an accurate measurement of the actual work performed. This is usually made by computing the edit distance and then inferring the percentage on the hourly rate.

A negotiation grid can be helpful to cross-reference type and nature of engines, quality of raw output, and all the relevant technical requirements with compensation based on productivity, type of performance, bid (hourly rate) and ancillary services (e.g. filling in QA forms for ongoing training of engine.)

In any case, a strong and clear “No, thanks!” is more than reasonable when a considerably low pay rate is offered that is unrelated to language pair and MT output quality and/or MT output quality is lower than a generic free online service.

Lastly, raw MT output should be processed before a post-editing job for automatic removal of empty and/or untranslated segments and duplications, fixing of diacritics, punctuation marks, extra spaces, noise, and numbers; terminology should also be checked for consistency and a spellcheck should be run. A post-processing stage should also be envisaged involving encoding, normalization, formatting (especially tag injection,) a terminology check and, obviously, a spellcheck.

=======================

Luigi Muzii has been in the "translation business" since 1982 and has been a business consultant since 2002, in the translation and localization industry through his firm. He focuses on helping customers choose and implement best-suited technologies and redesign their business processes for the greatest effectiveness of translation and localization related work.

This link provides access to his other blog posts.

Thursday, March 1, 2018

Machine Translation Maturity Model (MTMM)

This is a guest post by Valeria Cannavina, Project Coordinator at Donnelley Language Solutions, adopting the Common Sense Advisory’s Localization Maturity Model (LMM) which is itself an adaptation of the software industry’s Capability Maturity Model (CMM). The resulting Machine Translation Maturity Model (MTMM) is a way of assessing the users’ understanding of the technology, and whether they are using it in an efficient and effective manner, properly linking it to other organizational processes. Valeria provides a framework for businesses to “identify where they are and what they can do to either significantly or modestly improve their existing production model to maximize the value that MT can provide to their organizations.”

This is, however, a perspective that is quite localization-centric, and process alignment for a global Enterprise MT service that might be used by thousands of users, across an enterprise to translate hundreds of millions of words could be quite different.

As we head into 2018, we continue to see excitement and hype around Neural MT (Machine Translation), which is a breakthrough approach on the verge of providing a wealth of possibilities for the creation and management of business content. But, because the technology is relatively new, many players in the translation industry are overlooking the importance of implementing aligned procedures to guide the use of MT.

Neural MT, or any other kind of MT on its own, is not a magic wand that can solve any and every translation problem. Production and work procedures need to be aligned in an informed and competent way to enable the technology to provide maximum benefits and also minimize risks and data security issues.

It is useful to always ask fundamental questions before embarking on new technology deployment initiatives. The most fundamental question for businesses looking to invest in MT may be:

Why do we want to translate the content at all?

Content translation only makes sense if it furthers overall business objectives and improves the global customer experience in some way. Today’s markets are massively global and that means communication and collaboration need to happen at scale and in volumes that were inconceivable just a decade ago. Today, any business that seeks to have even a moderately global footprint must understand how MT can provide increasing volumes of relevant content to their customer base.

Customers all over the world expect relevant information at their fingertips as quickly as possible, and this information is increasingly more dynamic and also short-lived. Business information is very important for a brief instant and of very limited value after that. This ability to deliver the right content quickly and effectively is often critical to the impression the customer forms of the business and its product offerings.

In addition, businesses are rightly concerned about data security and privacy. Improperly implemented MT deployments, where key processes and systems are not properly aligned, can expose private and confidential data. As the sheer volume of information continues to increase, businesses need to ensure that security is not compromised when content flows through these new translation processes. This may be especially critical with new product/service developments, sensitive employment, credit-related, medical, and financial data.

As MT becomes much more pervasive, it is wise for us to understand the bigger picture. In this paper, Valeria provides a unique and valuable perspective on assessing organizational alignment with new technology deployments. I hope you find it a useful guide for assessing your business needs around MT.

This post was originally published last month and then removed so that Donnelly could prepare the more complete document that is referenced at the bottom.

====

5 different approaches to succeed with machine translation

Introduction

With the ever-increasing volume and pace of global trade, the need to communicate to multiple markets simultaneously has never been greater. Add to this the huge technological advances of recent years, and it’s easy to see why machine translation (MT) has emerged as a translation tool of choice for high volume, high-speed translations, with ever-improving quality.

MT delivers big time-saving and money-saving benefits, plus big gains in productivity. But as is often the case when technology moves at speed, many businesses are lagging behind. While the demand for MT is growing very fast, there are still some basic challenges that clients are not aware of. For example, not all language combinations, documents, and formats types are ideal for MT. In fact, the quality of the output can change considerably based on these criteria which could affect your workflow, the time to market, quality and business targets.

This is where partnering with a specialized Language Service Provider (LSP) can give you the upper hand. A professional LSP will not only help you understand the MT landscape together with the latest developments but also advise on how it can be best used to optimize your processes, productivity and profits.

MT engines, its output, and training require skilled professionals and solid technologies to support automated workflows. Simply put, MT is currently far from being just a plug-and-play technology.

This approach is paramount, especially when confidentiality is key to the process. Assessing the risks of publishing data and securing processes is not a standard practice for all language service providers, so while clients and regulators are setting up very strict measures for data breaches, vendors are struggling to create processes to ensure top quality processes and services for MT.

Whether you are a large or a small business; whether you have a little or a lot of knowledge of MT, this paper will show you how to take full advantage of it. We follow a Machine Translation Maturity Model (MTMM) which is based on the Localization Maturity Model[1] created by Common Sense Advisory, an independent market research company for the language services industry. This paper is a guide to help businesses identify where they are and what they can do to either significantly or modestly improve their existing model to maximize the value that MT can provide to their organizations.

[1] The Localization Maturity Model was created by Common Sense Advisory, an independent Massachusetts-based market research company for the language services industry. It is based on the Capability Maturity Model (CMM), a development model informed by a study of data collected from organizations contracted with the US Department of Defense, who funded the research. The term "maturity" relates to the degree of formality and optimization of processes – from ad hoc practices, to formally defined steps, to managed result metrics, to active optimization of the processes. The model's aim is to improve existing software development processes, but it can also be applied to other processes.

Machine Translation Maturity Model (MTMM)

The model has five maturity levels, each divided into different areas which we encourage companies to evaluate individually on their own merits, allowing you to freely move from one level to another, not necessarily in successive steps, although an ideal path is shown below.

We will now walk through them so you can identify where you are and what you need to do to move to a different level - and enjoy the associated benefits.

Level 1: Initial

If you're at this level, you are requesting MT only when absolutely necessary. This could be due to time or budget constraints, or because your communication is internal only. It may be that you are not satisfied with the service that you are getting. Or it could be that you have no MT investment, resources or best practices in place. This may be because you don't have a localization department in place, or because it is not directly affecting a core activity within your organization.
This model may work for some organizations with a very limited usage of MT, but who may benefit from taking the following steps to increase its value within their organization:

Governance: make a case to management for investment in the maturity process.

Organization: appoint a dedicated MT resource in the localization/translation/marketing department who will set the basis for the process investigation.

Process: document the main tasks of the outsourcing process so you can track those that can be repeated and those that can be deleted because they don't generate any added value. For example, to track which department translation requests come from, the type of content you receive, and the turnaround times. This will help you start setting best practice for your process.

Level 2: Repeatable

The first step to maturity through process improvement is two-fold: describing what you do, and doing what you've described. If you are at this level, you have started documenting some tasks of the process which are repeatable - for example, your criteria for identifying texts that can be sent out to MT and how to store them by category. You view terminology management as a relatively low priority, whereas in reality, it's an investment that will pay dividends by ensuring consistency.
This is where most organizations may be at and would benefit from taking the following steps to evolve their existing processes:

Organization: define an internal process to track feedback on the source text from your LSP. For example, you might have received a comment saying that the text wasn't suitable for MT because it was too creative or overly complex in structure. Make a note of this in the process documentation.

Process:

analyze your existing content in order to understand exactly which documents can be translated with MT and how the source text is structured.
organize linguistic reviews on the translated material involving internal country reviewers, so that terminology management starts to become part of your internal process.

Governance:

track the costs of the process improvements you are implementing (expectations/forecasts vs. reality).
define KPIs for this process to track the ROI of activities involved at this level

Automation: investigate available tools for automating some tasks. For example, look for tools to help you build a repository of texts that have already been sent to MT and decide a naming convention. You will then be able to identify similar content and remove text that's not suitable for MT. The automation will be run parallel to the process of analyzing the texts.

Level 3: Defined

At this level, you will have clear goals around integrating MT into your business, in the form of a roadmap of tasks aimed at continuous improvement through collaboration with your LSP.
Your processes will be documented and fully executed. The internal process of outsourcing MT is defined, repeatable and managed. You have best practices in place - a process to identify the MT content, a process to collect and implement feedback, a process for internal translation review, a process to track costs etc.. and can now measure your process.

For example, to track productivity you might want to measure word output per hour. Before a process improvement is introduced, a baseline measurement is taken. At the end of the project, the process is measured again to show whether the change resulted in more words produced per hour.

Terminology management is no longer seen as a secondary task, but as a fundamental step which adds value to your business and your brand. The benefits are now showing in your ROI. For example, you can identify which content has already been sent to MT, which means fewer words to process, fewer man-hours for both you and your LSP, reduced time to market and reduced costs.
This is the stage where most organizations should aim to reach in order to optimize their supplier relationship with their language service provider and maximize return on investment. That said, some organizations take additional steps to further mature their MT procurement strategy as follows:

Organization: supported by your LSP, hire specialized terminology management staff who will work closely with other departments to:
- organize feedback received on source text for fields of application
- pass the feedback to internal technical writers
- check the feedback has been implemented
- incorporate the feedback into your CMS or whichever tool you're using to store the source and target documents.

Process:
- define the internal process to combat source linguistic inconsistency. For example, give clear guidelines on what to do if a word has more than one meaning, who is the decision maker, how many review cycles the work will go through, and how this will be implemented in the CMS
- plan internal review cycles so you can send feedback to your LSP and implement it in the CMS.
Governance:
- establish the budget for multilingual projects based on the forecast for previous MT projects in terms of volumes and languages
- establish decision making to prioritize languages and markets.
Automation: identify a tool to automatically apply correct terminology to source content in the CMS.

Level 4: Managed

MT is now tied to your corporate goals and part of your production process. The different departments rely on the MT department to prepare documents before sending the files to translate. The idea of 'department' here is fluid; for example, it could take the form of resources performing MT tasks alongside non-MT tasks. Alternatively, if you don't have your own MT department you can contact your trusted LSP to help you manage this internally or serve as the department itself.

The size of your business will dictate its scope.

The MT department has its own budget and schedule for incoming projects during the year, with automated checks in place for managing terminology and producing the source documents. The focus at this level is on automation; you will be working closely with the technology department to improve the source i.e., a style guide for writing source text to ensure consistency.

If your organization is in this camp, consider taking the following steps to improve the maturity of your sourcing model:

Organization: define roles within the process; for example, a project manager (PM) to handle requests from different departments, dedicated engineers for automation, and internal reviewers.
Processes:
- define delivery parameters around new products/documents. For example, you're issuing a new set of letters to shareholders and, based on previous experience, you know they will need to go through X reviews. This knowledge enables you to set realistic deadlines, review cycles, and delivery volumes
- define text structure rules (short sentences typically translate much better than long, complicated sentences).
Governance: measure business benefits vis-a-vis strategic use of MT budgets.
Automation: technology resources work on a roadmap to automation which integrates all the previous stages: from terminology management to check that the source text follows the defined rules.

Level 5: Optimized

At level 5 you have a team of engineers, terminologists, internal reviewers and project managers running the content creation process for MT on a daily basis. You have rules in place for content editors writing source text, ad hoc terminology and an internal tool to check and select the right kind of source text for MT, extracting only the new parts to be translated. You are now looking at new ways to get the same quality output while trying to keep costs and content creation time to a minimum.

What to do to achieve continuous improvement beyond level 5?

Organization:
- prepare training material for staff and build a career path in the MT office
- plan to offer a global service 24/7/365.
Process
- content creators work to minimize content sent to MT: less content = lower costs
- customize writing rules based on target language to minimize linguistic disparity between source and target. You will already have writing rules for new content creation, but with your accrued experience, constant LSP feedback, and the help of internal reviewers with in-depth knowledge of the target languages, the rules can be customized further to optimize MT for each target market
- allocate LSP resources strategically, based on the language combination that best fits your quality expectations.
Governance: based on your long-term business goals plan how to support everyone involved in the roadmap to continuous improvement.
Automation: connect your CMS with your LSP's Translation Management System (TMS) to:
- speed up the process
- send requests and import the MT content automatically
- centralize review cycles in a familiar environment
- ensure consistency across content with a shared repository of linguistic assets.

Conclusion

Even if localization is not central to your organization, it can and does have an impact on your business. To reap the maximum rewards from MT, working with a trusted LSP is key to strengthening your supply chain, improving ROI and protecting your brand and reputation.

The MTMM was designed for any size or type of organization to use to make the most of MT. It is an ideal path to maturity because it has built-in flexibility – each activity can be performed as a stand-alone step as well as in a sequential way.

We believe that MT is not just about getting the technology right. It’s also about having a strong relationship with your LSP; a partnership that is characterized by collaboration and constant feedback between the whole team. Only by analyzing your process and implementing some of the suggested tasks will you arrive at an MT roadmap which delivers against your expectations.

It is also possible to get a more complete version of this post directly from the RRD website at this link:

http://www.donnelleylanguagesolutions.com/machine-translation-maturity-model/

======*****=====

Valeria Cannavina holds a degree in language and culture mediation, and a master's degree in technical and scientific translation from Libera Università degli Studi “San Pio V” in Rome. She has spent 10 year in the GILT industry as Project Manager and while working for companies like SAP and Xerox she was involved in quite a few research projects on new processes implementation and Machine Translation. At present she works for Donnelley Language Solutions a Project Manager.

Tuesday, September 12, 2017

LSP Perspective: Applying the Human Touch to MT, Qualitative Feedback in MT evaluation

In all the discussion on MT that we hear, we do not often hear much about the post-editors and what could be done to enhance and improve the often negatively viewed PEMT task. Lucía Guerrero provides useful insights on her direct experience in improving the work experience for post-editors. Interestingly over the years I have noted that, strategies to improve the post-editor experience can often make mediocre MT engines viable, and failure to do so can make good engines fail in fulfilling the business promise. I cannot really say much beyond what Lucía says here other than restate what she is saying in slightly different words. The keys to success seem to be:

Build Trust by establishing transparent and fair compensation and forthright work related communication
Develop ways to involve post-editors in the MT engine refinement and improvement process involvement
Demonstrate that the Feedback Cycle does in fact improve the work experience on an ongoing basis

============

Post-editing has become the most common practice when using MT. According to Common Sense Advisory (2016), more than 80% of LSPs offer Machine Translation Post-Editing (MTPE) services, and one of the main conclusions from a study presented by Memsource at the 2017 Conference of the European Association for Machine Translation (EAMT) states that less than 10% of the MT done in Memsource Cloud was left unedited. While it is true that a lot of user-generated content is machine-translated without post-editing (we see it every day at eBay, Amazon, Airbnb, to mention just a few), whether it is RBMT, SMT, or NMT, post-editors are still needed to improve the raw MT output.

Quantitative Evaluation Methods: Only Half the Picture

While these data show that they are key, linguists are often excluded from the MT process, and only required to participate in the post-editing task, with no interaction “in process.” Human evaluation is still seen as “expensive, time consuming and prone to subjectivity.” Error annotation takes a lot of time, compared to automated metrics such as BLEU or WER, which are certainly cheaper and faster. These tools provide quantitative data usually obtained by automatically comparing the raw MT to a reference translation, but the post-editor’s evaluation is hardly ever taken into account. Shouldn’t that be important if the post-editor’s role is here to stay?

While machines are better than we are at spotting differences, humans are better at assessing linguistic phenomena, categorizing them and giving detailed analysis.

Our approach at CPSL is to involve post-editors in three stages of the MT process:

For testing an MT engine in a new domain or language combination
For regular evaluation of an existing MT engine
For creating/updating post-editing guidelines

Some companies use the Likert scale for collecting human evaluation. This method involves asking people – normally the end-users, rather than linguists – to assess raw MT segments one by one, based on criteria such as adequacy (how effectively has the source text message been transferred to the translation?) or fluency (does the segment sound natural to a native speaker of the target language?).

For our evaluation purposes, we find it more useful to ask the post-editor to fill in a form with their feedback, correlating information such as source segment, raw MT and post-edited segment, type and severity of errors encountered, and personal comments.

Turning Bad Experiences Into Rewarding Jobs

One of the main issues I often have to face when I manage an MT-based project is the reluctance of some translators to work with machine-translated files due to bad previous post-editing experiences. I have heard many stories about post-editors being paid based on an edit distance that was calculated from a test that was not even close to reality, or post-editors never being asked for their evaluation of the raw MT output. They were only asked for the post-edited files, and sometimes, the time spent, but just for billing purposes. One of our usual translators even told me that he received machine-translated files that were worse than Google Translates results (NMT had not yet been implemented). All these stories have in common the fact that post-editors are seldom involved in the system improvement and evaluation process. This can turn post-editing into an alienating job that nobody wants to do a second time.

To avoid such situations, we decided to create our own feedback form for assessing and categorizing error severity and prioritizing the errors. For example, errors such as incorrect capitalization of months and days in Spanish, word order problems in questions in English, punctuation issues in French, and other similar errors were given the highest priority by our post-editors and our MT provider was asked to fix them immediately. The complexity of the evaluation document can vary according to need. It can be as detailed as the Dynamic Quality Framework (DQF) template or be a simple list of the main errors with an example.

Post Editor Feedback Form

However, more than asking for severity and repetitiveness, what I really want to know is what I call ‘annoyance level,’ i.e. what made the post-editing job too boring, tedious or time-consuming – in short, a task that could lead the post-editor to decline a similar job in the future. These are variables that quantitative metrics cannot provide. Automated metrics cannot provide any insight on how to prioritize error fixing, either by error severity level or by ‘annoyance level.’ Important errors can go unnoticed in a long list of issues, and thus never be fixed.

I have managed several MT-based projects where the edit distance was acceptable (< 30%) and the post-editors’ overall experience, to my surprise was still unpleasant. In such cases, the post-editor came back to me saying that certain types of errors were so unacceptable for them that they didn’t want to post-edit again. Sometimes this opinion was related to severity and other times to perception, i.e. errors a human would never make. In these cases, the feedback form helped detect the errors and turned a previously bad experience into an acceptable job.

It is worth noting that one cannot rely on one single post- editor's feedback. The acceptance threshold can vary quite a lot from one person to another, and post-editing skills are also different. Thus, the most reasonable approach is to collect feedback from several post-editors, compare their comments and use them as a complement to the automatic metrics.

We must definitely make an effort to include the post-editors’ comments as a variable when evaluating MT quality, to prioritize certain errors when optimizing the engines. If we have a team of translators whom we trust, then we should also trust them when they comment on the raw MT results. Personally, I always try my best to send machine-translated files that are in good shape so that the post-editing experience is acceptable. In this way, I can keep my preferred translators (recycled as post-editors) happy and on board, willing to accept more jobs in the future. This can make a significant difference not only in their experience but also in the quality of the final project.

5 Tips for Successfully Integrating Qualitative Feedback into your MT Evaluation Workflow

Devise a tool and a workflow for collecting feedback from the post-editors.

It doesn’t have to be a sophisticated tool and the post-editors shouldn’t have to fill in huge Excel files with all changes and comments. It’s enough to collect the most awkward errors; those they wouldn’t want to fix over and over again. However, if you don’t have the time to read and process all this information, a short informal conversation on the phone from time to time can also be of help and give you valuable feedback about how the system is working.

Agree to fair compensation

Much has been said about this. My advice would be to rely on the automatic metrics but to include the post- editor's feedback in your decision. Therefore, I usually offer hourly rates when language combinations are new and the effort is harder, and per word rates when the MT systems are established and have stable edit distances. When using hourly rates, you can ask your team to use time-tracking apps in their CAT tools or ask them to report the real hours spent. To avoid last-minute surprises, for full PE it is advisable to indicate a maximum number of hours based on the expected PE speed, and ask them to inform you of any deviation, whereas for light post-editing you may want to indicate a minimum amount of hours to make sure the linguists are not leaving anything unchecked.

Never promise the moon

If you are running a test, tell your team. Be honest about the expected quality and always explain the reason why you are using MT (cost, deadline…).

Don’t force anyone to become a post-editor

I have seen very good translators becoming terrible post-editors; either they change too many things or too few, or simply cannot accept that they are reviewing a translation done by a machine. I have also seen bad translators become very good post-editors. Sometimes a quick chat on the phone can be enough to check if they are reluctant to use MT per se, or if the system really needs further improvement before the next round.

Listen, listen, listen

We PMs tend to provide the translators with a lot of instructions and reference material and make heavy use of email. Sometimes, however, it’s worth it to arrange short calls and listen to the post-editors’ opinion of the raw MT. For long-term projects or stable MT-based language combinations, it is also advisable to arrange regular group calls with the post-editors, either by language or by domain.

And… What About NMT Evaluation?

According to several studies on NMT, it seems that the errors produced by these systems are harder to detect than those produced by RBMT and SMT, because they occur at the semantic level (i.e. meaning). NMT takes context into account and the resulting text flows naturally; we no longer see the syntactically awkward sentences we are used to with SMT. But the usual errors are mistranslations, and mistranslations can only be detected by post-editors, i.e. by people. In most NMT tests done so far, BLEU scores were low, while human evaluators considered that the raw MT output was acceptable, which means that with NMT we cannot trust BLEU alone. Both source and target text have to be read and assessed in order to decide if the raw MT is acceptable or not; human evaluators have to be involved. With NMT, human assessment is clearly even more important, so while the translation industry works on a valid approach for evaluating NMT, it seems that qualitative information will be required to properly assess the results of such systems.

--------------------

Lucía Guerrero is a senior Translation and Localization Project Manager at CPSL, in the translation industry since 1998. In the past, she has managed localization projects for Apple Computer and translated children’s and art books. At CPSL she is specialized in international and national institutions, machine translation and post-editing.

About CPSL: ------------------------------------------------------------------

CPSL (Celer Pawlowsky S.L.) is one of the longest-established language services providers in the translation and localization industry, having served clients for over 50 years in a range of industries including: life sciences, energy, machinery and tools, automotive and transport, software, telecommunications, financial, legal, electronics, education and government. The company offers a full suite of language services – translation and localization, interpreting and multimedia related services such as voice-over, transcription and subtitling.

CPSL is among a select number of language service suppliers that are triple quality certified, including ISO 9001, ISO 17100 and ISO 13485 for medical devices. Based in Barcelona (Spain) and with production centers in Madrid, Ludwigsburg (Germany) Boston (USA) and a sales office in the United Kingdom,the company offers language integrated solutions on both sides of the Atlantic Ocean, 24/7, 365 days a year.

CPSL has been the driving force behind the new ISO 18587 (2017) standard, which sets out requirements for the process of post-editing machine translation (MT) output. Livia Florensa, CEO at CPSL, is the architect of the standard, which has just been published. As Project Leader for this mission, she played a key role, being responsible for the proposal and for drafting and coordinating the development of the standard. ISO 18587 (2017) regulates the post-editing of content processed by machine translation systems, and also establishes the competences and qualifications that post-editors must have. The standard is intended for use by post-editors, translation service providers and their clients.

Find out more at: www.cpsl.com.

The following is Storify Twitter coverage by Lucia from the EAMT conference earlier this year.

Tuesday, July 18, 2017

Linguistic Quality Assurance in Localization – An Overview

This is a post by Vassilis Korkas on the quality assurance and quality checking processes being used in the professional translation industry. (I still find it really hard to say localization, since that term is really ambiguous to me, as I spent many years trying to figure out how to deliver properly localized sound through digital audio platforms. To me, localized sound = cellos from the right and violins from the left of the sound stage. I have a strong preference for instruments to stay in place on the sound stage for the duration of the piece. )

As the volumes of translated content increase, the need for automated production lines also grows. The industry is still laden with products that don't play well with each other, and buyers should insist that vendors of the various tools that they use enable and allow easy transport and downstream processing of any translation related content. Froom my perspective automation in the industry is also very limited, and there is a huge need for human project management because tools and processes don't connect well. Hopefully, we start to see this scenario change. I also hope that the database engines for these new processes are much smarter about NLP and much more ready to integrate machine learning elements as this too will allow the development of much more powerful, automated, and self correcting tools.

As an aside, I thought this chart was very interesting, (assuming it is actually based on some real research), and shows why it is much more worthwhile to blog than to share content on LinkedIn, Facebook or Twitter. However, the quality of the content does indeed matter and other sources say that high quality content has an even longer life than shown here.

Source: @com_unit_inside

Finally, CNBC had this little clip describing employment growth in the translation sector where they state: "The number of people employed in the translation and interpretation industry has doubled in the past seven years." Interestingly, this is exactly the period where we have seen the use of MT also dramatically increase. Apparently, they conclude that technology has also helped to drive this growth.

The emphasis in the post below is mine.

==========

In pretty much any industry these days, the notion of quality is one that seems to crop up all the time. Sometimes it feels like it’s used merely as a buzzword, but more often than not quality is a real concern, both for the seller of a product or service and the consumer or customer. In the same way, quality appears to be omnipresent in the language services industry as well. Obviously, when it comes to translation and localization, the subject of quality has rather unique characteristics compared to other services, however, ultimately it is the expected goal in any project.

In this article, we will review what the established practices are for monitoring and achieving linguistic quality in translation and localization, examine what the challenges are for linguistic quality assurance (LQA) and also attempt to make some predictions for the future of LQA in the localization industry.

Quality assessment and quality assurance: same book, different pages

Despite the fact that industry standards have been around for quite some time, in practice, terms such as ‘quality assessment’ and ‘quality assurance’, and sometimes even ‘quality evaluation’, are often used interchangeably. This may be due to a misunderstanding of what each process involves but, whatever the reason, this practice leads to confusion and could create misleading expectations. So, let us take this opportunity to clarify:

[Translation] Quality Assessment (TQA) is the process of evaluating the overall quality of a completed translation by using a model with pre-determined values which can be assigned to a number of parameters used for scoring purposes. Such models are the LISA, the MQM, the DQF, etc.
Quality Assurance “[QA] refers to systems put in place to pre-empt and avoid errors or quality problems at any stage of a translation job”. (Drugan, 2013: 76)

Quality is an ambiguous concept in itself and making ‘objective’ evaluations is a very difficult task. Even the most rigorous assessment model requires subjective input by the evaluator who is using it. When it comes to linguistic quality, in particular, we would be looking to improve on issues that have to do with punctuation, terminology and glossary compliance, locale-specific conversions and formatting, consistency, omissions, untranslatable items and others. It is a job that requires a lot of attention to detail and strict adherence to rules and guidelines – and that’s why LQA (most aspects of it, anyway) is a better candidate for ‘objective’ automation.

Given the volume of translated words in most localization projects these days, it is practically prohibitive in terms of time and cost to have in place a comprehensive QA process, which would safeguard certain expectations of quality both during and after translation. Therefore it is very common that QA, much like TQA, is reserved for the post-translation stage. A human reviewer, with or without the help of technology, will be brought in when the translation is done and will be asked to review/revise the final product. The obvious drawback of this process is that significant time and effort could be saved if somehow revision could occur in parallel with the translation, perhaps by involving the translator herself with the process of tracking errors and making these corrections along the way.

The fact that QA only seems to take place ‘after the fact’ is not the only problem, however. Volumes are another challenge – too many words to revise, too little time and too expensive to do it. To address this challenge, Language Service Providers (LSPs) use sampling (the partial revision of an agreed small portion of the translation) and spot-checking (the partial revision of random excerpts of the translation). In both cases, the proportion of the translation that is checked is about 10% of the total volume of translated text, and that is generally considered agreeable to be able to say whether the whole translation is good or not. This is an established and accepted industry practice that was created out of necessity. However, one doesn’t need to have a degree in statistics to appreciate that this small sample, whether defined or random, is hardly big enough to reflect the quality of the overall project.

The progressive increase of the volumes of text translated every year (also reflected in the growth of the total value of the language service industry, as seen below) and the increasing demands for faster turnaround times makes it even harder for QA-focused technology to catch up. The need for automation is greater than ever before.


Source: Common Sense Advisory (2017)

Today we could classify QA technologies into three broad groups:

built-in QA functionality in CAT tools (offline and online),
stand-alone QA tools (offline),
custom QA tools developed by LSPs and translation buyers (mainly offline).

Built-in QA checks in CAT tools range from the completely basic to the quite sophisticated, depending on which CAT tool you’re looking at. Stand-alone QA tools are mainly designed with error detection/correction capabilities in mind, but there are some that use translation quality metrics for assessment purposes – so they’re not quite QA tools as such. Custom tools are usually developed in order to address specific needs of a client or a vendor who happens to be using a proprietary translation management system or something similar. This obviously presupposes that the technical and human resources are available to develop such a tool, so this practice is rather rare and exclusive to large companies that can afford it.

Consistency is king – but is it enough?

Terminology and glossary/wordlist compliance, empty target segments, untranslated target segments, segment length, segment-level inconsistency, different or missing punctuation, different or missing tags/placeholders/symbols, different or missing numeric or alphanumeric structures – these are the most common checks that one can find in a QA tool. On the surface at least, this looks like a very diverse range that should cover the needs of most users. All these are effectively consistency checks. If a certain element is present in the source segment, then it should also exist in the target segment. It is easy to see why this kind of “pattern matching” can be easily automated and translators/reviewers certainly appreciate a tool that can do this for them a lot more quickly and accurately than they can.

Despite the obvious benefits of these checks, the methodology on which they run has significant drawbacks. Consistency checks are effectively locale-independent and that creates false positives (the tool detects an error when there is none), also known as “noise”, and false negatives (the tool doesn’t detect an error when there is one). Noise is one of the biggest shortcomings of QA tools currently available and that is because of the lack of locale specificity in the checks provided. It is in fact rather ironic that the benchmark for QA in localization doesn’t involve locale-specific checks. To be fair, in some cases users are allowed to configure the tool in greater depth and define such focused checks on their own (either through existing options in the tools or with regular expressions).

Source: XKCD

But, this makes the process more intensive for the user and it comes as no surprise that the majority of users of QA tools never bother to do that. Instead, they perform their QA duties relying on the sub-optimal consistency checks which are available by default.

Linguistic quality assurance is (not) a holistic approach

In practice, for the majority of large scale localization projects, only post-translation LQA takes place, mainly due to time pressure and associated costs – an issue we touched on earlier in connection with the practice of sampling. The larger implication of this reality is that:

a) effectively we should be talking about quality control rather than quality assurance, as everything takes place after the fact; and
b) quality assurance becomes a second-class citizen in the world of localization. This contradicts everything we see and hear about the importance of quality in the industry, where both buyers and providers of language services prioritise quality as a prime directive.

As already discussed, the technology does not always help. CAT tools with integrated QA functionality have a lot of issues with noise, and that is unlikely to change anytime soon because this kind of functionality is not a priority for a CAT tool. On the other hand, stand-alone QA tools with more extensive functionality work independently, which means that any potential ‘collaboration’ between stand-alone QA tools and CAT tools can only be achieved in a cumbersome and intermittent workflow: complete the translation, export it from the CAT tool, import the bilingual file in the QA tool, run the QA checks, analyse the QA report, go back to the CAT tool, find the segments which have errors, make corrections, update the bilingual file and so on.

The continuously growing demand in the localization industry for the management of increasing volumes of multilingual content in pressing timelines and the compliance with quality guidelines means that the challenges described above will have to be addressed soon. As the trends of online technologies in translation and localization become stronger, there is an implicit understanding that existing workflows will have to be uncomplicated in order to accommodate future needs in the industry. This can indeed be achieved with the adoption of bolder QA strategies and more extensive automation. The need in the industry for a more efficient and effective QA process is here now and it is pressing. Is there a new workflow model which can produce tangible benefits both in terms of time and resources? I believe there is, but it will take some faith and boldness to apply it.

Get ahead of the curve

In the last few years, the translation technology market has been marked by substantial shifts in the market shares occupied by offline and online CAT tools respectively, with the online tools gaining rapidly more ground. This trend is unlikely to change. At the same time, the age-old problems of connectivity and compatibility between different platforms will have to be addressed one way or another. For example, slowly transitioning to an online CAT tool and still using the same offline QA tool from your old workflow is inefficient as it is irrational, especially in the long run.

A deeper integration between CAT and QA tools also has other benefits. The QA process can move up a step in the translation process. Why have QA only in post-translation when you can also have it in-translation? (And it goes without saying that pre-translation QA is also vital, but it would apply to the source content only so it’s a different topic altogether.) This shift is indeed possible by using API-enabled applications – which are in fact already standard practice for the majority of online CAT tools. There was a time when each CAT tool had its own proprietary file formats (as they still do), and then the TMX and TBX standards were introduced and the industry changed forever, as it became possible for different CAT tools to “communicate” with each other. The same will happen again, only this time APIs will be the agent of change.


Source: API Academy

Looking further ahead, there are also some other exciting ideas which could bring about truly innovative changes to the quality assurance process. The first one is the idea of automated corrections. Much in the same way that a text can be pre-translated in a CAT tool when a translation memory or a machine translation system is available, in a QA tool which has been pre-configured with granular settings it would be possible to “pre-correct” certain errors in the translation before a human reviewer even starts working on the text. With a deeper integration scenario in a CAT tool, an error could be corrected in a live QA environment the moment a translator makes that error.

This kind of advanced automation in LQA could be taken even a step further if we consider the principles of machine learning. Access to big data in the form of bilingual corpora which have been checked and confirmed by human reviewers makes the potential of this approach even more likely. Imagine a QA tool that collects all the corrections a reviewer has made and all the false positives the reviewer has ignored and then it processes all that information and learns from it. Every new text processed and the machine learning algorithms make the tool more accurate in what it should and should not consider to be an error. The possibilities are endless.

Despite the various shortcomings of current practices in LQA, the potential is there to streamline and improve processes and workflows alike, so much so that quality assurance will not be seen as a “burden” anymore, but rather as an inextricable component of localization, both in theory and in practice. It is up to us to embrace the change and move forward.

Reference
Drugan, J. (2013) Quality in Professional Translation: Assessment and Improvement. London: Bloomsbury.

---------------

Vassilis Korkas is the COO and a co-founder of lexiQA. Following a 15-year academic career in the UK, in 2015 he decided to channel his expertise in translation technologies, technical translation and reviewing into a new tech company. In lexiQA he is now involved with content development, product management, and business operations.

Note
This is the abridged version of a four-part article series published by the author on lexiQA’s blog: Part 1 – Part 2 – Part 3 – Part 4

This link will also provide specific details on the lexiQA product capabilities.

Pages

Monday, May 18, 2020

Data Preparation Best Practices for Neural MT

Truth #1: Data are stupid and lazy.

Truth #2: Data are rarely an objective representation of reality (on their own).

1. PUNCTUATION

2. TRUNCATIONS AND SHORT SENTENCES

3. MULTIWORD EXPRESSIONS

4. DUPLICATES

5. VERBAL PATTERNS

6. POLE AND VECTOR VERBS

Conclusion

Tuesday, March 27, 2018

Sharing Efforts to get the most from MT and Post-Editing

Laying foundations

Join forces

Defining requirements

Building a platform

Best practices: Running projects

Thursday, March 1, 2018

Machine Translation Maturity Model (MTMM)

5 different approaches to succeed with machine translation

Introduction

Machine Translation Maturity Model (MTMM)

Level 1: Initial

Level 2: Repeatable

Level 3: Defined

Level 4: Managed

Level 5: Optimized

Conclusion

Tuesday, September 12, 2017

LSP Perspective: Applying the Human Touch to MT, Qualitative Feedback in MT evaluation

Quantitative Evaluation Methods: Only Half the Picture

Turning Bad Experiences Into Rewarding Jobs

5 Tips for Successfully Integrating Qualitative Feedback into your MT Evaluation Workflow

And… What About NMT Evaluation?

About CPSL: ------------------------------------------------------------------

Tuesday, July 18, 2017

Linguistic Quality Assurance in Localization – An Overview

Quality assessment and quality assurance: same book, different pages

Consistency is king – but is it enough?

Linguistic quality assurance is (not) a holistic approach

Get ahead of the curve