Translation Technology & Innovation: Where Can You Learn More?

I was recently trapped in the LA County Criminal Court Juror holding room for a day waiting to do my civil duty, albeit reluctantly. As I was waiting I had a wonderful twitter storm chat with @rinaneeman and @renatobeninatto and others about innovation in general and about innovative LSPs in particular. It was pretty intense and we covered a lot of ground given the format. We talked about the change in the overall business model (translation-as-a-utility), automation, new, more efficient ways of of doing translation work and much more. This is the best I can recapture (and click on show conversation) and I am not sure how to really see the thread in a nice chronological stream with all the people involved. If you know how to do this please let me know. However, one of the questions that came up in this discussion was where could one learn about these new technologies and processes (MT being just one of them) that facilitate innovation and allow one to address new translation problems?

I had believed that that there is very little formal training around and then Renato reminded us that regional associations play an important role in providing training. The next ELIA conference in Dublin in particular has a very strong focus on innovation and translation automation technology in addition to the traditional localization themes. I have found these smaller regional shows to be more effective in providing useful training and allows a much deeper dive into the reasons why this makes sense. The ELIA event has singled out MT and affiliated technologies as worthy of serious attention in direct response to member requests. I think this is wonderful not only because I have a prominent role at this event, as I will be presenting a keynote on broad changes impacting the overall world of translation as well as doing a detailed training session on how to get started with MT technology for those who really want to get down and dirty. It is also a sign that this technology can take the next step with technology developers and translation practitioners working together. I am a big believer in dialog, and this event is an example I think of an honest attempt to build this dialog.

In the keynote session I will look at how 2 billion+ internet users, community and crowdsourcing initiatives, translation technology, ever improving free MT, new attitudes to open collaboration and data sharing are impacting the professional translation world. I will explore how the shift to the project-less, translation-as-utility world will require new skills and new services from language service providers, explore and comment on emerging innovation and also point to the ever increasing market potential that becomes available to industry innovators who have competence with and understand the new dynamics.

ELIA Bridge
I will also run a training session that will go over MT technology in some detail and provide basic background on the technology fundamentals and point to what I think are keys to being successful with MT. I will try and make this as practical and useful as possible answering questions about RbMT vs. SMT,  MT engine customization strategies, MT quality assessment and relationship to post-editing effort, understanding data, skills required for different tasks etc. I believe that innovative LSPs will be the driving force behind creating really amazing MT systems in future and I will focus on the skills that I think will be most critical to enabling this kind of success.I will also explore new business opportunities that MT can enable to get you out of the software and documentation localization market. Hopefully this session is highly interactive and I am open to communication about what participants might want to most focus on and understand. The session is on Monday October 11th  so please feel free to communicate with me on this before then. 
Translation Production Line
As we setup translation production lines to handle 10X or 100X more content in the future we will need to link key processes together. Information quality focused processes and integrated and efficient post-editing will also be necessary to build efficiency. MT alone is not enough to solve the problems we face in the future and I think it will also be critical to learn how to clean up and “improve” source content before any kind of translation attempt. Frans Wijma will also provide guidance on Simplified Technical English which will provide attendees some insight on the IQ, controlled language, source simplification issue. Something that will be increasingly valuable to learn and do in future.

Those who stay in the MT track will also get to hear Sharon O’Brien talking about post-editing MT. She will answer all the following questions: How does post-editing of Machine Translation output differ from revision or QA activities in the localization domain? Are translators the best post-editors? Do they need specific experience and training? What guidelines should be given to post-editors? What productivity enhancements can be reasonably expected? Why do translators seem to dislike this task? I saw her speak at LRC (one of the best conferences I attended last year) and she has great insight and advice to offer on this subject.

And if that weren't enough to make you sign right up, there are also some great sessions on sales strategies for LSPs from non other than Renato, localization basics and next generation localization research from CSA, CNGL and the Gilbane Group. And all at a fraction of the cost of larger conferences. Check out the ELIA site for more details.

I hope to see you there and for those of you who don’t know, I am easily persuaded onto the karaoke floor. No alcohol required but unfortunately this is not because I necessarily sing so well. I went to a Jesuit (Boys only) School in India and had a teacher of mixed Indian/Portuguese (Goanese) extraction who used to exhort:

“Sing with gusto boys! Don’t worry about the notes, you will find them.” 

This is advice I have taken to heart, as my karaoke friends from the IMTT Cordoba 2009 event will also tell you. In spite of having nothing more than a laptop with tiny speakers to provide musical backing, we sang with gusto till dawn and indeed we did eventually find the notes. ;-)

The Problem with Standards in the Localization Industry

In my continuing conversation with Renato Beninatto we recently talked about standards:

It is clear from this conversation that the word “standards” is a source of great confusion in the professional translation world. Part of the problem is conflation and part of the problem is definition or lack of “clear” definition on what is meant by standards especially as they relate to quality. In the conversation, we both appear to agree that data interchange is becoming much more critical and it would be valuable to the industry to have robust data interchange standards, however, we both feel that overall process standards like EN15038 have very little value in practical terms.

This discussion on quality standards in particular is often difficult because of conflation, i.e. very different concepts being equated and assumed to be the same. I think we have at least 3 different concepts that are being referenced and confused as being the same concept,  in many discussions on “quality”.
  1. End to End Process Standards: ISO 9001, EN15038, Microsoft QA and LISA QA 3.1. They have a strong focus is on administrative, documentation, review and revision processes not just the linguistic quality assessment of the final translation.
  2. Automated SMT System Output Translation Quality Metrics (TQM): BLEU, METEOR, TERp, F-Measure, Rouge and several others that only focus on rapidly scoring MT output by assessing precision and recall and referencing one or more human translations of the exact same source material to develop this score.(Useful for MT system developers but not much else).
  3. Human Evaluation of Translation Linguistic Quality: Error categorization and subjective human quality assessment, usually at a sentence level. SAE J2450, the LISA Quality Metric and perhaps the Butler Hill TQ Metric (that Microsoft uses extensively and TAUS advocates) are examples of this.(Can vary greatly depending on the humans involved.)
To this hot mess you could also add the “container” standards discussions, to further obfuscate matters. These include TMX, TBX, SRX, GMX-V, xml:tm etc.. Are any of these standards, even by the much looser definition of “standard” in the software world? If you look at the discussions on quality and standards in translation around the web we can see that a real dialog is difficult and clarity on this issue is virtually impossible.

But standards are needed to scale and handle the volume of translation that will likely be done and enable greater inter-process automation as we head into a world  where we continuously translate dynamic streams of content. Free online MT services have given the global enterprise a taste for what translation as a utility looks like. Now some want to see if it can be done better and in a more focused way at higher quality levels to enhance global business initiatives and expand the dialog with the global customer. (I think it can be done much better with customized, purpose-driven MT working with and steered by skilled language professionals). Translation as a utility is a concept that describes an always-on, on-demand, streaming translation service that can translate high value streams of content at defined quality levels for reasonable rates. Data will flow in and out of authoring, content management, social networks, translation workflow, MT and TM systems.

As this new mode of production gains momentum, I believe that it would be useful to the industry in general to have a meaningful and widely used measure of relative translation quality i.e. average linguistic quality of a target corpus. This would facilitate the production processes for 10X and 100X increases in content volume, and allow LSPs to define and deliver different levels of quality using different production models. I am convinced that the best translation production systems will be man-machine collaborations, as we already know what free online raw MT looks like.(Useful sometimes for getting the gist of a text, but rarely useful for enterprise use). Skilled humans who understand translation automation tools and know how to drive and steer linguistic quality in these new translation production models can dramatically change this reality.

It would also be useful to have robust data interchange standards. I recently wrote an entry about the lack of robust data interchange standards that seemed to resonate. We are seeing that content on the internet is evolving from an HTML to an XML perspective. This makes it easier for content to flow in and out of key business processes. Some are suggesting soon all the data will live in the cloud and applications will decline in importance as translators zero in on what they do best and only what they do best: translate. Today, they too much time is spent on making the data usable today.

There are some data standard initiatives that could build momentum e.g. XLIFF 2.0,  but these initiatives will need more volunteer involvement (as I was reminded by “Anonymous” to include people like me to actually walk the walk and not just talk about it)  and broad community support and engagement. The problem is that there is no one place to go to for standards. LISA? OASIS? W3C? ISO TC37? How do we get these separate efforts to collaborate and produce single unified specifications that have authority and MUST be adhered to? There are others who have lost faith in the industry associations and expect that the standards will most likely come from outside the industry, perhaps inadvertently from people like Google and Facebook who implement an open XML-based data interchange format. Or possibly this could come from one of the open translation initiatives that seem to be growing in strength across the globe.

There are at least two standards (that are well defined and used by many) that I think would really be helpful to make translation as a utility happen:
  1. A linguistic quality rating that is at least somewhat objective, can be easily reproduced/replicated and can be used to indicate the relative linguistic quality of both human translated and various MT systems output. This would be especially useful to LSPs to understand post-editing cost structures and help establish more effective pricing models for this kind of work that if fair to both customers and translators.
  2. A robust, flexible yet simple data interchange standard that protects linguistic assets (TM, terminology, glossary) but can also easily be exported to affiliated processes (CMS, DMS, Web Content). 
Anyway it is clear to me that we do need standards and that it is likely that this will require open innovation and collaboration on a scale and in ways that we have not seen yet. This can start simply though with a discussion right here, that could guide or provide some valuable feedback to the existing initiatives and official bodies.We need to both talk the talk and also walk the walk. My intent here is to raise the level of awareness on this issue though I am also aware that I do not have the answers. I am also focused on a problem that few see as a  real one yet. I invite any readers who may wish to contribute to this forum to further this discussion, even if you choose to do this anonymously or possibly as a guest post. (No censorship allowed here)

Perhaps we should take heed of of what John Anster and W H Murray (not Goethe) said as we move forward: 
Until one is committed, there is hesitancy, the chance to draw back, always ineffectiveness, concerning all acts of initiative and creation. There is one elementary truth, the ignorance of which kills countless ideas and splendid plans: that the moment one definitely commits oneself, then Providence moves too. All sorts of things occur to help one that would never have otherwise occurred.

Innovation in Localization

I recently had a series of impromptu, spontaneous chats with Renato Beninatto on various issues affecting the professional translation and localization world. These are now being presented as short videos on MilengoTV. I thought it might be useful to expand on the themes we discuss in the video in this blog. Our last discussion was focused on innovation in localization. (There is also an older presentation by Renato (with FR and BrPt subtitles) that has been very popular where he touches upon innovation).

Some of the key points we make in this chat:
  • There is very little innovation in the localization world
  • Generally, industry outsiders are the most likely innovators in many markets. Often insiders are too timid and preoccupied with preservation of the old way and don’t want to rock the boat
  • Google and Asia Online are innovators for the translation industry and Madcap has also been innovative in the multilingual CMS space
  • Innovation comes from people who look at the basic industry problems from a different angle 
  • It is important to look at problems with a really new eye, like a child, not being afraid to ask stupid questions
  • Some innovation comes from simply looking at new problems that may be related but yet quite different
  • The new problem for the translation industry is the ever growing stream of content that demands to be translated, quickly and efficiently. What production models make sense for this? 
This theme of “projectless” localization, or “translation as a utility” is something that is beginning to gain some momentum.  Translation as a utility is a concept describing an always-on, on-demand, streaming translation service that can translate high value streams of content at acceptable quality for reasonable rates. And this is more than just MT. It is a man-machine and process alignment that allows agile, efficient and continuous response. As far as I can tell, there is not much in the market today that can claim this and I expect the urgency will only grow.  I have written about the driving forces behind this trend previously in:
The Data Deluge and the Growing Need for Innovation

So what might looking at the same problem in new ways involve?

This was covered to some extent in The Coming Disintermediation and Disruption in the Translation Industry conversation between Renato and Bob, but I think there are probably several other dimensions to this:

  • New Production Models: More workflow automation, customized MT integrated into production workflow, increasing process and inter-agency automation and connections
  • New Collaboration Models: Professional and amateur (community and crowdsourcing) working together in the supply chain to create new eco-systems to take on really high volume projects
  • New Tools & Processes: Language asset management, concordance, the web as a dictionary, collaborative sharing of language assets, integration with customer data and information infrastructure, solid and robust linguistic data interchange standards that are strictly adhered to.
  • New Products & Offerings: Translation of customer communication streams in support and social networks, SMS translation, High value knowledge silo translation and high quality just-in-time translations.
For any of this to happen, it is necessary to break away from the software and documentation localization (SDL ;-)) mindset that rules the industry today. The biggest translation industry opportunities lie elsewhere. Innovation by definition carries with it uncertainty and risk. Any idea about which we are confident is probably not very innovative. The innovators who figure out how best to develop translation solutions for the growing content  “stream”, are likely to be less invested in the old model and may not even be part of the professional translation industry today.

The Consortium for Service Innovation has considered this issue for some time and share some wonderful nuggets of information and advice on innovation. Here are some excerpts from the CSI site and a BCG study on Innovation:
  • Innovation is more about culture and values and not about process.
  • Innovation is an inherently human activity. If you want to innovate faster, you have to learn faster.
  • Innovation requires a level of trust (humor is a strong indicator of trust) and a respect for diverse perspectives, it is about people and interaction
  • Senior executives are often major roadblocks to innovation because they have no innovation education, are risk-averse and are control freaks
  • The emerging world, long a source of cheap labor, now rivals the rich countries for business innovation with India and China leading the way in “frugal” innovation
  • The BCG also points out that changing business models is also an approach in When the game gets tough, change the game.
“The good news is that customer-led innovation is predictably successful. The bad news is many managers and executives don’t yet believe in it” – Patty Seybold

And some tips on on creating an innovative culture:
  • Alignment to a compelling purpose and a set of values replaces command and control thinking
  • Learning is more important than success or failure
  • Removing the arbitrary boundary that keeps your customer out of your business and find ways to make their presence persistent
  • Recognize the importance of collaboration and focus on continuously increasing the number and the diversity of players
  • Innovative leaders are comfortable with ambiguity.
  • Develop knowledge management systems to capture organizational learning
  • Remember that there are a lot more smart people outside your company than inside it
However, having said all that, the most disruptive innovation often seems to come from those who focus on new problems rather than improving the old ways e.g. Microsoft with PCs left IBM in the dust and Google in turn left Microsoft completely unaware of the power of new free+ads value creation models. The big disruptors all focused on new problems. I think we are at a turning point and this time of change is a great opportunity for innovators. Many of us seem to sense this and evolution demands it.

So here’s to finding the fundamental new translation problem.

And for those of you wondering what other major trends are likely to affect us as we figure this much out, take a look at this video (40 minutes+) for some insight from David Siegel on how the next wave of the web’s evolution (which is sometimes called the semantic web) might evolve. More change coming.

Inspiration on the Web – TED Talks 1

One of the wonderful discoveries I made when I first signed on to Twitter in 2008 were the TED talks. These are riveting talks by remarkable people free to the world and available as a long term resource. I happened to catch some Twitter coverage of TED2008 early in my use of Twitter and I have been hooked since. These talks cover a lot of ground and I don’t think I can really do them justice by my descriptions so I am just going to highlight some of my favorite TED talks and hope that some of you may watch. They are as much about deep exploration of the human condition as about awareness, technology, love, passion and social trends in general. Many are truly “ideas worth spreading” and TED also has one of the most successful crowdsourcing translation projects on the web today. Almost 10,000 translations into 75 languages by inspired volunteers. TED is a regular source of inspiration and new ideas for me and I thought it would be cool to share some of my favorite talks. I probably could make a series of blog entries highlighting more and perhaps I will.

Jill Bolte Taylor’s talk is a prime example of amazing insights presented in a crisp clear way to completely transform one’s view of everyday life while educating and inspiring you. Still moves me deeply.

Hans Rosling has several talks that are all wonderful and it is hard to choose one but I have chosen one where he talks about a Rising Asia. He makes talking about long term socioeconomic trends fun and I love his use of graphics.

Jose Antonia Abreu has transformed young children into an astonishing orchestra and even if you don’t really care about classical music you will realize that what these kids have accomplished is special. If you know something about classical music your reaction will likely be amazement.

Ethan Zuckerman whose essay on the Polyglot Internet has become the manifesto for the Open Translation Tools community and who I have referenced often in my blog. Here he lays out how the internet could actually create closed communities of “sameness” rather than really connect diverse people globally, if we are not vigilant.

This one is for my daughter who is working in the Teach For America (TFA) program . She has made a huge impact on the academic standing of her class of students in an “under privileged” school. For those of you who define your self-worth by the money you make and the stuff you buy, you may find this offensive: Taylor Mali on the huge impact that good teachers have (in 3 minutes flat).

Here is a talk that compares the mythical foundations of East and West and uses them to explain differing attitudes in work and business styles. Useful stuff for you localization (and especially transcreation) people out there.

Finally, this is a short talk by Alisa Miller that shows how limited American news consumption is. In 5 minutes she manages to make a significant point.

I hope that you will try and get to one or two (or maybe all )of these and I would love to hear what TED talks have really struck your fancy. I know I could make a much longer list and maybe I will do an update some time in the future. But in the meantime here is a list of the greatest TED talks made by Fast Company who also describes the growing influence of  TED very nicely as -- "the first new top-prestige education brand in more than 100 years."  And this force with growing momentum is described in this article.  They contrast TED to the old symbol of learning and ideas (the Ivy League Colleges) in this way, "unlike fearful old-school colleges, TED is finding that the more open it is, the more it becomes the global education brand of the 21st century."

We can all learn from this openness and willingness to share.