Wednesday, April 20, 2011

Exploring The Future of Translation in Rome

Rome is a special city in so many ways, but especially because you can see ancient, medieval and modern elements of the city, side by side at almost every turn. Rome seems to have a different sense of time. Yesterday, now and forever. At the beginning of April, Rome was also host to two conferences focused on  professional translation. I had the good fortune to be at the the LUSPIO Translation Automation Conference #LTAC held at Libera Universit√† degli Studi per l'Innovazione e le Organizzazioni.  The event was a very pleasant and positive surprise for me, and I think it is worth watching closely to see what the organizers do in future.
The conference was curated by Luigi Muzii (@ilbarbaro) with assistance from Anna Fellet and Valeria Cannavina who I fully expect will be future translation industry stars. Possibly even shaping the agenda and discovering best practices for the next generation of the professional translation industry across the globe.


“It’s not what you look at that matters, it’s what you see.”    Henry David Thoreau


The conference was special for many reasons:
  • The large number of fundamental questions about the effective use of technology that were raised and explored during the sessions
  • The balance of theory and practice, and careful observation of experiments presented by many speakers
  • The large presence of many young translators (and their teachers) whose dispassionate view of technology suggests that new approaches are possible and even likely
  • Presentations in both English and Italian
  • All the sessions were available in both Italian and English via simultaneous interpreters (eager interpreter students, who were quite effective)
  • The very thoughtful curation of the content that made the program stand out as particularly insightful amongst the many conferences I have attended
  • Many sessions drew you in and left you with new questions. Have you noticed that  people who are asking questions often are the most likely to discover new things? 
I have always felt that it is very valuable to have translators participate in industry events as they are instrumental in getting the work done and it is where the rubber meets the road as they say in English. In this event you had an auditorium full of people, with many translators, who share both a skepticism about language technology but also have an interest in where technology has been successfully used in practice. There were many who were sincerely exploring how/where/when technology makes sense and sharing their findings.
Unfortunately it has become increasingly difficult to gather the stream from Twitter, after the fact, to get the real-time feel of the event. (Let me know if you know of a good way to retrieve or show this). Some highlights of the questions explored and insightful observations included:
Anthony Pym: We do not know how to talk to each other and listen to each other, e.g. What does quality mean? To whom? Why? He also pointed out that research takes too long to get to commercial users and this needs to be sped up to be relevant for the real world and asked what academia should focus on in translation education to be more relevant?
Renato Beninatto talked about will be talking about the same things in future that we are talking about today: automation, process improvements, more languages, faster and increasingly more voice and video. He also said that there would be no project management in the cloud and collaboration infrastructure would become increasingly more important as Mygengo and dotSUB already show. He also said: “Instead of finding mistakes, translation reviewers should be rating readability or functionality” for the business purpose.  
Fernando Ferreira Alves raised the inevitable questions about the economic value of translation, made a case for translators being revenue drivers not cost centers, and suggested that large translator networks are emerging as a new production model. He also characterized  “Translation as Cinderella – indispensible but neglected” and admitted that it is hard for translators to make a living nowadays and made an urgent case for more certification to reshape the professional image. (This was very similar to what I heard the President of the Indian Translators Association say a few months ago in Delhi.)
Luigi Muzii (who presented in Italian) had an interesting and deep view of “controlled language” in his “Dov’e la lingua?” talk. He pointed out how most manuals are written under time pressures by reluctant authors and thus often not useful from the start. I liked how he used quotes to point out what the intent of “controlled language” is or should be e.g. “Make things as simple as possible but no simpler” Einstein or “The most valuable of all talents is that of never using two words when one will do” by Thomas Jefferson. He pointed out that tools do not replace knowledge and that translators should not be afraid or hostile to them. He also suggested that translators not get stuck with the view that translation should be literary. It was an insightful presentation that is worth looking up, scholarly yet practical. There were also several other presentations on controlled language that to me, all seemed to be more enlightened and profound than most discussions I have heard on the subject at other events. The term controlled language was used in it’s broadest possible sense. (I think that maybe all Italians have a little bit of Michelangelo in them).
Ana Gueberof talked about her tests in measuring the relative productivity of MT vs. TM based processes which was done to better understand how to pay translators for MT post-editing work. She made some interesting findings: 1) MT segments had the highest productivity but varied greatly by translator, 2) Some translators are better suited to MT work than others, 3) The slowest translators benefited the most from MT, 4) MT errors were more easily identified than TM errors and TM generated much higher levels of final errors than MT processes.
Alessia Lattanzi had an interesting presentation that explored what type of content is best suited to a controlled language process and what impact would this have on different kinds of MT systems. In her tests Google MT produced the best results and SDL the worst results by far. Interestingly, she also the monolingual translator quoting a study by Philipp Koehn which showed that target language competence and subject matter expertise plus MT can often outperform bilingual translators.
Valeria and Anna (they are a really good team and kind of a dynamic duo) provided great insight in a case study for Arrex, on how to manage the translation process in a highly leverageable and efficient way, that links content creation, customer support and international sales to the an informed and constantly improving translation process. This is a great example of how to do business translation right.  They later told me that they “strongly believe that companies wishing to change they way they write and translate their content, like ARREX did, will drive change (in the translation industry), not LSPs” and this case study certainly was a strong validation of that. 
Isabella Chiari also made an interesting presentation of corpus linguistics and made an interesting statement: “Relevance in corpora is a dangerous word, as quality is in translation”. She shared many sources of both monolingual and bilingual data.
Federico Garcea from Microsoft provided an overview of CTF (Collaborative Translations Framework) that brought MT, community TM and broad web-based collaboration together in a very cool collaboration environment that is being used by translation “amateurs” to undertake serious long-term translation projects. I could not help but be struck by how much nicer, elegant and free-flowing these tools looked compared to the clunky TMS/TM tools created by SDL et al.
Artur Raczynski of the EPO described the EPO experience with customized MT for European languages, saying that it was not really better than raw Google MT and thus their decision to just go forward with that. (However, he has not tried the Asia Online approach and should perhaps take a peek at the Lexis Nexis experience).
Jaap van de Meer rushed in from another conference in Pisa and gave his future of the industry talk that drew some strong interest. There were several more presentations on controlled language. When I questioned why such a strong focus on CL, I was told it was because authoring is a relatively easy place to bring improvement into the overall corporate globalization strategy, it created clean data for MT and helped to define winning versus losing authoring processes. It also appears to be a good way to get product, sales and translation people talking productively together. 

I was struck by the level of understanding and insight offered by many of the speakers and how it all came together as a cohesive picture of what is going on in translation and why these issues matter. It was also interesting to see the friction and tension between the old, abstract and theoretically dominant views and the new urges to make everything more relevant, practical and useful now. I found the curators ability to bring controlled language, MT, terminology management, enlightened authoring and corpus linguistics together into such a cohesive picture truly commendable. The only thing missing is community collaboration / crowdsourcing and I think they will get there soon.

2011-04-08 06.55.00
In my opinion, Anna & Valeria can offer great value to many companies looking to overhaul their translation processes and looking to update their translation automation strategies, and also to bodies like the CNGL and EU who could fund useful research through them that would help us all. I hope they get more involved in new case studies and that we hear more from them at AMTA, as I am sure they will continue to produce useful and insightful information for the industry. If you are in the market for these services, or just want a reality check, they are worth talking to, smart, focused and know how to look out into the distance as you can see above.

While in Rome, I had the good fortune to have a wonderful dinner in the Trastevere with David Orban, CEO of dotSUB and Renato (referenced in his blog)  where I even got to practice my Hindi with the waiter, and also visit a famous gelatteria with the lovely Sara Nicolini who is returning to Ireland shortly, discussing the challenges of working in Italy. I won’t even mention the other amazing meals with friends, wonder filled walks around the art of the amazing human creative explosion they call the Renaissance period, and the glory that was Ancient Rome that can be seen at virtually every turn. I hope I get to return, for Rome indeed is a special place.


Monday, April 11, 2011

The Rush to Manage and Control Standards

There has been a lot of talk about standards since the demise of LISA, perhaps because the collapse of LISA was announced almost immediately after their final event, a “Standards Summit” in early March, 2011. We are now seeing something of a rush, with industry groups setting up positions (perhaps even well intentioned)  to establish a controlling interest on “what happens next with standards”. There is still much less clarity on what standards we are talking about, and almost no clarity on why we should care or why it matters.


What are the standards that matter?

The post I wrote on the lack of standards in May 2010 is the single most influential (popular?) post I have written in this blog according to PostRank.  So what all this new posturing on standards is about? From my vantage point (as I stated last year), standards are important to enable information to flow from the information creators to the information consumers as efficiently as possible. Thus my view of standards is about those rules and structures that enable clean and efficient data interchange, archival, and reuse of linguistic assets in new language and text technology paradigms. Search, Semantic search, SMT, language search (like Linguee) and text analytics is what I am thinking about. (You may recall that I had much more clarity on what and why standards matter than on how to get there.) Good standards require that vendors play well with each other, that language industry tools interface usefully with corporate content management systems and make life easier for both the information creators and consumers, not just people involved in translation.  

However, I have also seen that there is more conflation on this issue of standards than almost any other issue (“quality“ of course is the winner) amongst localization professionals. I am aware that there are at least three different perspectives on standards:

1. End to End Process Standards: ISO 9001, EN15038, Microsoft QA and LISA QA 3.1. They have a strong focus is on administrative, documentation, review and revision processes not just the quality assessment of the final translation.
2. Linguistic Quality of Translation (TQM): Automated metrics like BLEU, METEOR, TERp, F-Measure, Rouge and several others that only focus on rapidly scoring MT output and human measurements that look at the linguistic quality by error categorization and subjective human quality assessment, usually at a sentence level. SAE J2450, the LISA Quality Metric and perhaps the Butler Hill TQ Metric
3. Linguistic Data Interchange: These standards facilitate data exchange from content creation and enable transformation of textual data within a broader organizational data flow context than just translation, good interchange standards can ensure that fast flowing streams of content get transformed more rapidly and get to customers as quickly as possible. XLIFF and TMX are examples of this, but I think the future is likely to be more about interfacing with “real” mission-critical systems (DBMS, Collaboration and CMS) used by companies rather than just TMS and TM systems which IMO are very likely to become less relevant and important to large scale corporate translation initiatives.

It is my sense that we have a lot of development on the first kind of "standard" listed above, but have a long way to go before we have meaningful standards in the second and third categories listed above.

So it is interesting to see the new TAUS and GALA initiatives to become standards leaders when you consider that LISA was actually not very effective in developing standards that really mattered. LISA was an organization that apparently involved buyers, LSPs and tools vendors but were unable to produce standards that really mattered to the industry. (In spite of sincere efforts to the contrary). TMX today is a weak standard at best and there are many variations that result in data loss and leverage loss whenever data interchange is involved. (Are the other standards they produced even worth mentioning? Who uses them?) Are we going to see more of the same with these new initiatives? Take a look at the TAUS board and the GALA board as these people will steer (and fund) these new initiatives. Pretty much all good folks, but do they really represent all the viewpoints necessary to develop standards that make sense to the whole emerging eco-system?


Why do standards matter?

Real standards make life easier for the whole eco-system, i.e. the content creators, the professional  translation community, the content consumers and everybody else who interacts, transforms or modifies valuable content along the way.  Standards matter if you are setting up translation production lines and pushing translation volumes up. At AGIS2010, Mahesh Kulkarni made a comment about standards in localization. He called them traffic rules that ease both user and creator experience (and of course these rules matter much more when there is a lot of traffic)  and he also said that standards evolve and have to be tested and need frequent revision before they settle. It is interesting to me that the focus in the non-profit world is on studying successful standards development in other IT areas in contrast to what we see at TAUS and GALA where the modus operandi seems to be to create separate new groups, with new missions and objectives, though they both claim to be in the interest of “everyone”.

There was a great posting by Arle Lommel on the LISA site that is now gone on why standards matter, and there is also a perspective presented by Smith Yewell on the TAUS site on why we should care. I hope there will be more discussion on why standards matter as this may help drive meaningful action on what to do next, and produce more collaborative action.

So today we are at a point where we have TAUS saying that it is taking on the role of an "industry watchdog for interoperability"  by funding activities that will track compliance of the various tools and appointing a person as a full-time standards monitor.  Jost Zetzsche has pointed out that this is fabulous, but the TAUS initiative only really represents the viewpoint of “buyers” i.e. localization managers, (not the actual corporate managers who run international businesses). The REAL buyer (Global Customer Support, Global Sales & Marketing Management)  probably care less about TM leverage rates than they do about getting the right information to the global customer in a timely and cost-effective way on internet schedules i.e. really fast so that it has an impact on market share in the near term. Not to mention the fact that compliance and law enforcement can be tricky without a system of checks and balances, but it is good to see that the issue has been recognized and a discussion has begun. TAUS is attempting to soften the language they use in defining their role, as watchdogs are often not very friendly.

GALA announced soon after, that it would also start a standards initiative  which will "seek input from localization buyers and suppliers, tool developers, and various partner localization and standards organizations." Arle Lommel, the former director of standards at LISA will be appointed as the GALA standards guy. Their objective they say is: “The culmination of Phase I will be an industry standards plan that will lay out what standards should be pursued, how the standards will be developed in an open and unbiased way, and how the ongoing standards initiative can be funded by the industry.” Again, Jost points out (in his 188th Tool Kit Newsletter) that this will be a perspective dominated by translation service companies and asks how will the needs and view of individual translators be incorporated into any new standards initiatives? He also appeals to translators to express their opinions on what matters to them and suggests that a body like FIT (Federation of International Translators) perhaps also engage in this dialogue to represent the perspective of translators.   

There are clearly some skeptics who see nothing of substance coming from these new initiatives. Ultan points out how standards tend to stray, how expensive this is for users and also raises some key questions about where compliance might best belong. However, I think it is worth at least trying to see if there is some potential to channel this new energy into something that might be useful for the industry.  I too, see some things that need to be addressed to get forward momentum on standards initiatives which I suspect get stalled because the objectives are not that clear. There are three things at least, that need to be addressed.

1) Involve Content Creators – Most of the discussion has focused only on translation industry related players. Given the quality of the technology in the industry I think we really do need to get CMS, DBMS and Collaboration software/user perspectives on what really matters for textual data interchange if we actually are concerned with developing meaningful standards. We should find a way to get them involved especially for data interchange standards.
2) Produce Standards That Make Sense to Translators – The whole point of standards is to ease the data flow from creation to transformation to consumption. Translators spend an inappropriately huge amount of time in format related issues, rather than with translation and linguistic issue management. Standards should make it easier for translators to ONLY deal with translation related problems and allow them to build linguistic assets that are independent of any single translation tool or product. A good standard should enhance translator productivity.
3) Having Multiple Organizations Focused On The Same Standards is Unlikely to Succeed – By definition standards are most effective when there is only one. Most standards initiatives in the information technology arena involve a single body or entity that reflects the needs of many different kinds of users. It would probably be worth taking a close look at the history of some of these to understand how to do this better. The best standards initiatives have hard core techies who understand how to translate clearly specified business requirements into a fairly robust technical specification that could evolve but leaves some core always untouched.
One of the problems in establishing a constructive dialogue is that the needs and the technical skills of the key stakeholders (Content creators, Buyers, LSPs, Translators) differ greatly. A clearer understanding of this is perhaps a good place to start. If we can find common ground here, it is possible to build a kernel that matters and is valuable to everybody. I doubt that TAUS and GALA are open and transparent enough to really engage all the parties, but I hope that I am proven wrong. Perhaps the first step is to identify the different viewpoints and clearly identify their key needs before coming together and defining the standards. It is worth speaking up (as constructively as possible) whatever one may think of these initiatives. We all stand to gain if we get it right, but functioning democracy also requires vigilance and participation, so get involved and let people know what you think.