Tuesday, March 15, 2011

The Future of Translation Memory (TM)

There have been several voices talking about the demise of TM recently, most notably Renato Beninatto who has made it a theme of several of his talks at industry conferences in the true agent provocateur spirit. More recently, apparently Jaap van der Meer said the same thing (dead in 5 years no less) at the final LISA standards summit event. (My attempt to link to the twitter trail failed since @LISA_org is no more). This resulted in comments by Peter Reynolds and some commentary by Jost Zetzsche (published in the Translation Journal) questioning these death announcements and providing a different perspective.

Since there have been several references to the value of TM to statistical MT (which by the way are all pretty much hybrid nowadays, as they try to incorporate linguistic ideas in addition to just data), I thought that I would jump in with my two cents as well and share my opinion.
So what is translation memory technology? At it’s most basic level it is a text matching technology whose primary objective is to save the professional translator from having re-translate the same material over and over again. The basic technology has evolved from segment matching to sub-segment matching or something called corpus-based TM.(is there a difference?) In it’s current form it is still a pretty basic database technology applied to looking up strings of words. Many of the products in the market focus a lot on format preservation and this horrible (and somewhat arbitrary quantification, I think) concept called fuzzy matching, which unfortunately has become the basis for translation payment determination. This matching rate based payment scheme I think is at the heart of marginalizing professional translation work, but I digress.

It makes great sense to me that any translator working on a translation project be able to easily refer to their own previous work, and possibly even all other translation work in the domain of interest to expedite their work. There are some, if not many translators who are also ambivalent about TM technology e.g. this link and this oneMy sense is that the quality of the “text matching technology” is still very primitive in the current products, but the basic technology concept could be poised for a significant leap forward to be more flexible, accurate and linguistically informed in other parts of the text-oriented tools world, e.g. Search, natural language processing (NLP) and Text Analytics, where the stakes are higher than just making translation work easier (or finding a rationale to pay translators less). Thus, I would agree that the days are numbered for the old “klunker-type TM” technology,  but I also think that new replacements will probably solve this problem in much more elegant and useful ways.

The old klunker-type TM technology has an unhealthy obsession with project and format related meta-data and I think we will see that in the evolution of this technology that linguistics will become more important. We are already seeing early examples of this next generation at Linguee. In a sense I think we may see the kind of evolution that we saw in word-processing technology, from something used by geeks and secretaries only, to something any office worker or executive could use and operate with ease. The ability to quickly access the reference use of phrases, related terms and context as needed is valuable, and I expect we will move forward in delivering useful, use-in-context material to a translator who uses such productivity tools.

It is clear that SMT based approaches do get better with more TM data and to some extent (up to 8 words) they will even reproduce what they have seen in the same manner that TM does. But we have also seen that there are limits to the benefit of ever growing volumes of data and that it actually matters more to have the “right” data in the cleanest possible form to get the best results. For many short sentences, SMT already performs as a TM retrieval technology, and we can expect that this capability will become more visible and more controls may become available to improve concordance and look-ups. We should also expect that the growing use of data-driven MT approaches will create more translation memory after post-editing, so TM is hardly going to disappear but hopefully it will get less messy. In SMT we are already developing tools to transform and change existing TM for normalization and standardization related reasons, to make it work better for specific purposes, especially when using pooled TM data. I think it will also be likely that many translation projects will start with pre-translation from a (TM+MT) process and hopefully a better, more equitable payment methodology. 

The value of TM created from the historical development of user documentation is likely to change. This user documentation TM that makes up much of what is in the TDA repository is seen as valuable by many today, but as we move increasingly to making dynamic content more multilingual I think it’s relative value will decline. I also expect that the most valuable TM will be that which is related to customer conversations. Also, community based collaboration will play an ever increasing role in building leverageable linguistic assets and we are already seeing evidence of MT and collaboration software infrastructure working together on very large translation initiatives. It is reasonable to expect that the tools will get better to make collaboration smoother and more efficient.

I found this following video fascinating, as it shows just how complex language acquisition is, has delightful baby sounds in it, and also shows just what state-of-the-art database technology looks like in the context of the massive data we are able to collect and analyze today, all in a single package. If you imagine what could be possible as you use this kind of database technology on our growing language assets and TM, I think you will agree that we are going to see some major advances in the not so distant future, since this technology is already touching how we analyze large clusters of words and language today. It is just a matter of time as it starts impacting all kinds of text repositories in the translation industry, enabling us to extract new value from them. What is emerging from these amazing new data analysis tools, is the ability to see new social structures and dynamics that were previously unseen and to make our work more and more relevant and valuable. This may even solve the fundamental problem of helping us all understand what matters most for the customers who consume the output of professional translation.


  1. I don't know whether Renato means translation memories or translation memory management tools. I've heard him saying that translation memories will become unrelevant, which is something a little bit different from saying they will die.
    I don't think translation memories are dead. I don't even think they are dying. I think that translation tools as we know them will become unrelevant in the mid term (in the long run we're all dead, JMK).
    TEnT people and SMT people have products to sell and families to feed, so TM's will survive. Even evangelists and consultants have products (themselves) to sell, so the quarrel will go on. Maybe it is productive, hopefully. Debating around TM's destiny will probably help release better translation and writing tools, something that has been actually missing over the last two decades, at least within the translation industry.
    SMT opened a new perspective even for translators to view at translation memories. A shift in the production and maintenance of TM's has come to meet the demands from the SMT world. And this is definitely positive. It could grow better in the coming years, and this could lead to cleaner data, and I bet you know what I'm pointing at: the moon, not the finger.

  2. Luigi

    My interpretation of Renato's comments has been that he was referring to the current desktop tools and the need to treat TM as very valuable closely held IP. I think we all see there is at least some if not great value in the actual linguistic knowledge that is contained in TM. (He can clarify for us perhaps)

    There are many translators who find that the current TM technology does in fact provide a productivity boost and they will continue to use it because of this. This number has always been surprisingly small for something that makes so much sense at a logical level. The lack of adoption is maybe for reasons of cost, usability and overall value - I am not really sure.

    Perhaps a better way to say this is that TM is more likely to evolve from an isolated desktop tool to a more collaborative shared capability that leverages a community of translators rather than an individual one. Perhaps this is what people are pointing to when they talk about the demise of TM.

    The actual fact is that it is probably the most used translation technology in the world today which is not saying that much since so many choose not to use it. Here are some translator perspectives on why TM may not be so useful.

    Corinee Mackay:

    or Patent Translator:


    1. My personal experience...

      I started using a TM tool driven by a client and found out it helped me to avoid a recurring failure... skipping paragraphs! But that was eons ago... now I know better.

      Then I found out that it was useful and increased my productivity. I use it mostly for context search and welcome all 100% matches. No retyping!

      One of the process key advantages is that I do not have to worry about over-typing or deleting source text after translating, and CAT tools have also levied the task of working differently with different file formats.

      My fuzzy rates, though, are different from Trados or those usually employed by LSPs.

  3. Step 1: Translation is cooking. You take ingredients that make sense in one language and you try to create another dish using other ingredients to repoduce something that tastes the same. Obviously as it does not work as a rocket science in practice, it does not work that way in translation either. By fixing the composition of various meals already cooked you fix varieties that have been tasted by different customers at different times and we do not even know what they have thought of it. Matches at word level is a tragedy, we know why.
    Imagine a shop where they sell soap of different smell but the whole shop has a composite smell that you love. You either move in there to live, or buy the whole stock, but buying separate bars will never give you the same joy. Bu I have cheated here
    Step 2. My metaphore was not precise, because translation is not cooking, nor selling flagrants. It is making but carbonated drinks, as coca cola, which cannot be reproduced at all without knowing the exact amounts of the ingredients to be used. The closest to that is fröccs in Hungarian that comes in several varietis calling for an expert to see the difference. But the expert would be a drunkard, who has drunk so much of fröccs, that he cannot see the difference any more - as the case is with TMs.

  4. I absolutely agree to the post and translation memoriy itself is hardly going to disappear as a fact but at present it's going at some new stage of existence and what the translator should be in this connection is the question now.

  5. It's really a true vision of the problem.
    Posted by Roman Blinov

  6. Here is a link to the presentation by Jochen, the founder of Trados (who BTW used one of my graphics without attribution ;-)

    He seems to see a future of sub-segment matching and also interestingly felt that TM did not live up to it's promise as a helpful tool for translators who were his real target customer in the beginning.

    I think we are already way past that and that there are much smarter text analysis and matching technologies out there not developed by people in the translation industry.

  7. Perhaps you are interested in visiting Transopedia, a collection of translation industry related terms. This glossary provides an explanation of many of the terms frequently used in connection with translation and other linguistic services. Please click the link below

    English Hungarian translator

  8. That's absolutely my opinion. This is going to be a mightily interesting decade for translation software. It's only just starting to make sense. Plus SDL and other programs go back to the Windows 98 era, so they are stuffed with a lot of legacy code related to former Windows'instability. But this is a decade where everybody is starting to make software that "just works". It's the only form of software people will pay for today. So I think we will see significant advances, most of all in performance and stability.


    ''At it’s most basic level it is a text...''

    it should be ''AT ITS MOST BASIC...'' right?