Showing posts with label Post-editing. Show all posts

Tuesday, October 8, 2019

Post-editese is real

Ever since machine translation was introduced into the professional translation industry, there have been questions about what the impact would be on a final delivered translation service product. For much of the history of MT many translators claimed that while translation production work using a post-edited MT (PEMT) process was faster, the final product was not as good. The research suggests that this has been true from a strictly linguistic perspective, but many of us also know that PEMT worked quite successfully with technical content especially with terminology and consistency even in the days of SMT and RBMT.

As NMT systems proliferate, we are at a turning point, and I suspect that we will see many more NMT systems that are in fact seen as providing useful output that clearly enhances translator productivity, especially on output from systems built by experts. NMT will also quite likely have an influence on the output quality and the difference is also likely to become less prominent. This is what is meant by developers who make claims of achieving human parity. If competent human translators cannot tell that segments they review came from MT or not, we can make a limited claim of having achieved human parity. This does not mean that this will be true for every new sentence submitted to this system.

We should also understand that MT provides the greatest value in use scenarios where you have large volumes of content (millions rather than thousands of words), short turnaround times, and limited budgets. Increasingly MT is used in scenarios where little or no post-editing is done, and by many informed estimates, we are already at a run rate of a trillion words a day going through MT engines. While post-editese may be an important consideration in localization use scenarios, this is likely no more than 2% of all MT usage.

Enterprise MT use is rapidly moving into a phase where it is an enterprise-level IT resource. The modern global enterprise needs to enable and allow millions of words to be translated on demand in a secure and private way and needs to be integrated deeply into critical communication, collaboration, and content creation and management software.

The research presented by Antonio Toral below documents the impact of post-editing on the final output across multiple different language combinations and MT systems.

==============

This is a summary of the paper “Post-editese: an Exacerbated Translationese” by Antonio Toral, which was presented at MT Summit 2019, where it won the best paper award.

Introduction

Post-editing (PE) is widely used in the translation industry, mainly because it leads to higher productivity than unaided human translation (HT). But, what about the resulting translation? Are PE translations as good as HT? Several research studies have looked at this in the past decade and there seems to be consensus: PE is as good as HT or even better (Koponen, 2016).

Most of these studies measure the quality of translations by counting the number of errors therein. Taking into account that there is more to quality than just the number of mistakes, we ask ourselves the following question instead: are there differences between translations produced with PE vs HT? In other words, does the final output created via PEs and HTs have different traits?

Previous studies have unveiled the existence of translationese, i.e. the fact that HTs and original texts exhibit different characteristics. These characteristics can be grouped along with the so-called translation universals (Baker, 1993) and fundamental laws of translation (Toury, 2012), namely simplification, normalization, explicitation and interference. Along this line of thinking, we aim to unveil the existence of post-editese (i.e. the fact that PEs and HTs exhibit different characteristics) by confronting PEs and HTs using a set of computational analyses that align to the aforementioned translation universals and laws of translation.

Data

We use three datasets in our experiments: Taraxü (Avramidis et al., 2014), IWSLT (Cettolo et al., 2015; Mauro et al., 2016) and Microsoft “Human Parity” (Hassan et al., 2018). These datasets cover five different translation directions and allow us to assess the effect of machine translation (MT) systems from 2011, 2015-16 and 2018 on the resulting PEs.

Analyses

Lexical Variety

We assess the lexical variety of a translation (HT, PE or MT) by calculating its type-token ratio:

In other words, given two translations equally long (number of words), the one with bigger vocabulary (higher number of unique words) would have a higher TTR, being therefore considered lexical richer, or higher in lexical variety.

The following figure shows the results for the Microsoft dataset for the direction Chinese-to-English (zh–en, the results for the other datasets follow similar trends and can be found in the paper). HT has the highest lexical variety, followed by PE, while the lowest value is obtained by the MT systems. A possible interpretation is as follows: (i) lexical variety is low in MT because these systems prefer the translation solutions that are frequent in the training data used to train such systems and (ii) a post-editor will add lexical variety to some degree (difference in the figure between MT and PE), but because MT primes him/her (Green et al., 2013), the resulting PE translation will not achieve the lexical variety of HT.

Lexical Density

The lexical density of a text indicates its amount of information and is calculated as follows:

where content words correspond to adverbs, adjectives, nouns, and verbs. Hence, given two translations equally long, the one with the higher number of content words would be considered to have higher lexical density, in other words, to contain more information.

The following figure shows the results for the three translation directions in the Taraxü dataset: English-to-German, German-to-English and Spanish-to-German. The lexical density in HT is higher than in both PE and MT and there is no systematic difference between the latter two.

Length Ratio

Given a source text (ST) and a target text (TT), where TT is a translation of ST (HT, PE or MT), we compute a measure of how different in length the TT is with respect to the ST:

This means that the bigger the difference in length between the ST and the TT (be it because TT is shorter or longer than the ST), the higher the length ratio.

The following figure shows the results for the Taraxü dataset. The trend is similar to the one in lexical variety; this is, HT obtains the highest result, MT the lowest and PE lies somewhere in between. We interpret this as follows: (i) MT results in a translation of similar length to that of the ST due to how the underlying MT technology works and PE is primed by the MT output while (ii) a translator working from scratch may translate more freely in terms of length.

Part-of-speech Sequences

Finally, we assess the interference of the source language on a translation (HT, PE and MT) by measuring how close the sequence of part-of-speech tags in the translation is to the typical part-of-speech sequences of the source language and to the typical part-of-speech sequences of the target language. If the sequences of a translation are similar to the typical sequences of the source language that would indicate that there is an inference from the source language in the translation.

The following figure shows the results for the IWSLT dataset. The metric used is perplexity difference; the higher it is the lower the interference (full details on the metric can be found in the paper). Again, we find a similar trend as in some of the previous analyses: HT gets the highest results, MT the lowest and PE somewhere in between. The interpretation is again similar: MT outputs exhibit a large amount of interference from the source language, a post-editor gets rid of some of that interference but the resulting translation still has more interference than an unaided translation.

Findings

The findings from our analyses can be summarised as follows in terms of HT vs PE:

PEs have lower lexical variety and lower lexical density than HTs. We link these to the simplification principle of translationese. Thus, these results indicate that post-editese is lexically simpler than translationese.
Sentence length in PEs is more similar to the sentence length of the source texts, than sentence length in HTs. We link this finding to interference and normalization: (i) PEs have

interference from the source text in terms of length, which leads to translations that follow the typical sentence length of the source language; (ii) this results in a target text whose
length tends to become normalized.

Part-of-speech (PoS) sequences in PEs are more similar to the typical PoS sequences of the source language than PoS sequences in HTs. We link this to the interference principle: the sequences of grammatical units in PEs preserve to some extent the sequences that are typical of the source language.

In terms of the role of MT: we have not considered only HTs and PEs but also MT outputs, from the MT systems that were the starting point to produce the PEs. This to corroborate a claim in the literature (Greenet al., 2013), namely that in PE the translator is primed by the MT output. We expected then to find similar trends to those found in PEs also in MT outputs and this was indeed the case in all four analyses. In some experiments, the results of PE were somewhere in between those of HT and MT. Our interpretation is that a post-editor improves the initial MT output, but due to being primed by the MT output, the result cannot attain the level of HT, and the footprint of the MT system remains in the resulting PE.

Discussion

As said in the introduction, we know that PE is faster than HT. The question I wanted to address was then: can PE not only be faster but also be at the level of HT quality-wise? In this study, this is looked at from the point of view of translation universals and the answer is clear: no. However, I'd like to point out three additional elements:

The text types in the 3 datasets that I have used are news and subtitles, both are open-domain and could be considered to a certain extent "creative". I wonder what happens with technical texts, given their relevance for industry, and I plan to look at that in the future.
As mentioned in the introduction, previous studies have compared HT vs PE in terms of the number of errors in the resulting translation. In all the studies I've encountered PE is at the level of HT or even better. Thus, for technical texts where terminology and consistency are important, PE is probably better than HT. I find thus the choice between PE and HT to be a trade-off between consistency on one hand and translation universals (simplification, normalization and interference) on the other.
PE falls behind HT in terms of translation universals because MT falls behind HT in those terms. However, this may not be the case anymore in the future. For example, the paper shows that PE-NMT has less interference than PE-SMT, thanks to the better reordering in the former.

Antonio Toral is an Assistant Professor at the Computational Linguistics group, Center for Language and Cognition, Faculty of Arts, University of Groningen (The Netherlands). His research is in the area of Machine Translation. His main topics include resource acquisition, domain adaptation, diagnostic evaluation and hybrid approaches.

Related Work

Other work has previously looked at HT vs PE beyond the number of errors. The most related papers to this paper are Bangalore et al. (2015), Carl and Schaeffer (2017), Czulo and Nitzke (2016), Daems et al. (2017) and Farrell (2018).

Bibliography

Avramidis, Eleftherios, Aljoscha Burchardt, Sabine Hunsicker, Maja Popovic, Cindy Tscherwinka, David Vilar, and Hans Uszkoreit. 2014. The taraxü corpus of human-annotated machine translations. In LREC, pages 2679–2682.

Baker, Mona. 1993. Corpus linguistics and translation studies: Implications and applications. Text and technology: In honor of John Sinclair, 233:250.

Bangalore, Srinivas, Bergljot Behrens, Michael Carl, Maheshwar Gankhot, Arndt Heilmann, Jean Nitzke, Moritz Schaeffer, and Annegret Sturm. 2015. The role of syntactic variation in translation and post-editing. Translation Spaces, 4(1):119–144.

Carl, Michael and Moritz Jonas Schaeffer. 2017. Why translation is difficult: A corpus-based study of non-literality in post-editing and from-scratch translation. Hermes, 56:43–57.

Cettolo, Mauro, Jan Niehues, Sebastian Stüker, Luisa Bentivogli, Roldano Cattoni, and Marcello Federico. 2015. The iwslt 2015 evaluation campaign. In IWSLT 2015, International Workshop on Spoken Language Translation.

Green, Spence, Jeffrey Heer, and Christopher D Manning. 2013. The efficacy of human post-editing for language translation. Chi 2013, pages 439–448.

Hassan, Hany, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, Will Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Zhuangzi Wu, Yingce Xia, Dongdong Zhang, Zhirui Zhang, and Ming Zhou. 2018. Achieving Human Parity on Automatic Chinese to English News Translation. https://arxiv.org/abs/1803.05567.

Koponen, Maarit. 2016. Is machine translation post-editing worth the effort? A survey of research into post-editing and effort. Journal of Specialised Translation, 25(25):131–148.

Mauro, Cettolo, Niehues Jan, Stüker Sebastian, Bentivogli Luisa, Cattoni Roldano, and Federico Marcello. 2016. The iwslt 2016 evaluation campaign. In International Workshop on Spoken Language Translation.

Toury, Gideon. 2012. Descriptive translation studies and beyond: Revised edition, volume 100. John Benjamins Publishing.

Thursday, August 16, 2018

Post-Editing and NMT: Embracing a New Age of Translation Technology

This is a guest post by Rodrigo Fuentes Corradi and Andrea Stevens from the SDL MT project management team.

SDL is unique amongst LSPs as they have both deep MT development expertise, and also have a large pool of in-house professional translators who can communicate much more easily with the MT development and management team on engine specifics, and provide the kind of feedback that leads to best practices in MTPE projects and delivers overall superior quality. MT in many localization contexts only works if it indeed delivers output that is useful to translators rather than frustrates them. This quality is best achieved by an active and productive (constructive) dialogue between translators, project managers, and developers, which is the modus operandi at SDL.

While most MT developers have a strong preference for using automated quality metrics like BLEU and LEPOR, the SDL MT project management teams have discovered many years ago, that competent human assessments are much more reliable to determine if an MT system is viable or not for an MTPE project. They have developed very reliable methods to make this determination efficiently and effectively. NMT presents special challenges in MTPE use scenarios, as the automated metrics are generally even less reliable than they have been with SMT. NMT fluency can sometimes veil accuracy and thus special training and modified strategies are needed as this post describes. Also, we often see that the quality of the MT output can be significantly better than a BLEU score might suggest as this metric is often lower for NMT systems for various reasons.

We are beginning to see other research that suggests that NMT does often provide productivity benefits over SMT, but that it requires special updated training and at least some understanding of the new challenges presented to established production processes by the compelling but sometimes inaccurate fluency of NMT systems.

=====

Post-editing means different things to different people. To the corporate world, it means making localization budgets go further by translating more for less. To freelance translators, it may well mean an infringement of their craft and livelihood. We know that freelance careers are the result of several years of study, hard work and perseverance to create a client portfolio based on a reputation for quality, consistency and on-time delivery.

Language Service Providers like SDL are often stuck in the middle, working to satisfy the complex demands of our clients while nurturing the linguistic and creative talent in our supply chain. The truth is that in today’s localization market, Machine Translation Post-Editing (MTPE) is very much a reality and an answer to the unstoppable content explosion that we are experiencing.

How do we find a balance and how can we, as an LSP committed to machine translation (MT) and post-editing, do the impossible and manage both client expectations and supply chain needs?
At the heart of the conundrum lies content. What content needs to be developed and translated, what is its purpose, who does it serve, and how can our clients bring it to their customers in the most appropriate and cost-efficient way possible? Content and how well it communicates to a local audience is what defines a company in today’s fast-moving markets. As a business changes and evolves, so does content, creating the need for almost constant innovation and reinvention. This undoubtedly poses a challenge for translators who need to commit to life-long learning to understand new concepts, trends, and challenges and produce materials in the target language that are fully adapted to local markets and audiences.

Content keeps growing exponentially and our challenge is to understand the vast amounts of diverse content that our customers create and how to best deal with it. All content has value and answers a specific requirement, but for example, there is a difference between content created for a technical knowledge base, content for an advertising feature or even regulatory content. For content that has a short shelf-life, a straight MT solution without human intervention may be perfectly acceptable whereas, at the other end of the spectrum, more creative content may require specialized translators or transcreation. In between, there is a wide range of content that could be best served with a hybrid human and machine approach.

As a translation work-giver and machine translation provider, it’s up to us at SDL to navigate the challenges posed by the ongoing content explosion in partnership with our clients and our supply chain. We need to be realistic about the role that MTPE plays in today’s translation marketplace and acknowledge the advantages for all involved. At the same time, challenges and constraints cannot be swept under the rug, but need to be openly addressed and discussed for full transparency.

“While NMT is inspired by the way the human brain works, it does not learn languages quite like humans do – humans learn to speak to communicate with each other in a wide social context.”

One of the challenges is to make post-editing sustainable. Post-editing is established as a standard solution for a wide variety of content types and language pairs. The application of MT is constantly pushed further by the commercial need to respond to the overall content growth while dealing with limited or unchanged localization budgets. As a result, the MTPE footprint continues to expand beyond initially successful languages and domains into new territories, such as regulatory life sciences content or even marketing.

To do this successfully, we rely on technology advances and improvements, some of which have only become possible over the last one to two years. Customized solutions and real-time adaptive machine translation are among the tools that improve the post-editing experience for translators, but the biggest step forward is surely the arrival of Neural Machine Translation.

Neural Machine Translation

Neural Machine Translation (NMT) has rightly been described as a revolution rather than an evolution. With its core developed entirely from scratch, NMT offers amazing opportunities for innovation. Its powerful architecture paradigm does not only capture text or syntax information but actual meaning and semantics, leading to the improvements in translation quality that we are seeing across the board. This is something all MT providers, including SDL, agree on after extensive testing across language pairs and combinations.

How is NMT achieving this? In short, NMT uses artificial neural networks which are based on mimicking the human brain with its interconnected neurons that help us understand the world around us, what we see, touch, smell, taste or hear. An NMT system learns from observing correlations between the source and the target text and modifies itself to increase the likelihood of producing a correct translation. While NMT is inspired by the way the human brain works, it does not learn languages quite like humans do – humans learn to speak to communicate with each other in a wide social context. NMT systems are still trained on bilingual data sets but promise noticeable uplift in translation quality through a more efficient framework for learning translation rules. NMT systems use an input layer where the text for translation is fed into the system, a hidden layer or multiple hidden layers where processing takes place and an output layer where we obtain the translation.

“When meaning and semantics are represented through math, words with similar meanings tend to cluster together.”

The hidden layer contains a vast network of neural nodes where the input is encoded into a vector of numbers to give a predictive output. Essentially, we are applying math to the problem of language and translation. When meaning and semantics are represented through math, words with similar meanings tend to cluster together. This is how we know that the NMT system starts to learn the semantics of words. When words have several meanings, they appear in different clusters; for example, ‘bank’ can appear in the geography or the finance cluster. It is then even possible to apply further math to the vectors, as shown in the graphic below: if you take the vector ‘king,’ subtract the vector ‘man’ and add the vector ‘woman,’ the result is a vector that is exactly or very close to the vector ‘queen.’

For NMT, we use deep neural networks, which are better for long-range context and dependencies. This is particularly important when it comes to languages for which the benefits of statistical machine translation were limited. Good examples are language pairs with long distance word traveling such as Japanese, where the clause structure is very different from English, or languages with long-distance dependencies such as German and Dutch.

One of the main advantages of NMT is the very fluent translation output. However, it is important to understand that the very fluent output can sometimes mask the fact that the content of the automated translation is not correct.

This is just one of the reasons why post-editing is still so important, even when working with NMT output. It is essential that translators understand the behaviors and patterns of NMT to be able to take full advantage of this promising technology.

To prepare translators for working with NMT, training and active engagement with the supply chain is essential. This is part of a much larger training effort that includes our soon to be updated Post-Editing Certification Program. Training will not only help our vendors to prepare for working with new technologies but also ensure that everyone is ready for the technological expansion of MTPE.

Working Together

Being a good work giver is key to jointly face new challenges along with our supply chain. Increasingly, this means guiding projects with an expert understanding of tools, domains, processes and intended audience. The combination of these factors will bring sustained success and quality continuity.

New technologies such as NMT can prove disruptive; transparency and training will be key to reassure and prepare our vendors. One key challenge will be to align the improved NMT technology with post-editing experience and a strategy for on-boarding our supply chain.

The intent is to utilize NMT technology across a wide range of domains and content types, and it is important to collect valid data points to proactively assess the chances of success before reaching out to freelancers. Furthermore, when approaching freelancers, it has always been helpful to share these findings and provide guidelines for MT behaviors that can help their post-editing decisions.

In summary, so much of the success is centered on good communication, which is dependent on openness, sharing materials and providing channels to discuss issues and answer questions. In this respect, we see a responsibility to drive these initiatives with a structured communication plan that includes webinars and open days held in our language offices.

We are witnessing the ongoing growth of Artificial Intelligence and Machine Learning in many aspects of our daily lives, from self-driving cars to medical diagnosis and intervention. Technology is a huge part of how we live and what we do, and this particularly holds true for translators. Post-editing works at the intersection of humans and machines, and machine translation is one of the most advanced tools in the translators’ toolbox to future-proof the profession for a new generation of translators. MTPE is, of course, a choice that everyone needs to make for themselves, but with new technologies such as Adaptive or Neural MT working for translators and the growing reach of MT into new domains, this is not a choice to be taken lightly. Technology developments are not reducing the role of translators, but rather, are changing and enhancing it, opening a host of new opportunities. This is a journey we need to embark on together for continued learning, support, and feedback.


Rodrigo Fuentes Corradi

MT Business Consultant, SDL


Andrea Stevens

MT Translation Manager, SDL

The authors have produced a white paper on "Best Practices for Enterprise Scale Post-Editing" that can be accessed at this link.

eMpTy Pages

Pages