Wednesday, May 26, 2010

Are There Any Standards in Translation?

One of the things that struck me at the aQuatic conference I attended recently, was how empowering, functioning standards can be. In a presentation on TCS2 and globalization MK Gupta of Adobe provided some clear examples.This may not be a big deal in the content management world but it certainly is in professional translation workflows.

What a huge improvement over the sorry mess that we call standards in localization e.g. TMX, TTX, TBX etc… I loved the fact that I can edit a document downstream with an application that did not create the original data and send it on to others who can continue the editing in other preferred applications. I think this is a big deal. I think this is the future, as data flows more freely in and out of organizations.

I do not know very much about translation industry standards except that they do not work very well and I invite anybody who reads this to come forward and comment or even write a guest post to explain what does work to me and others who are interested. I was involved with the logical file system standard ISO 9660 early on in my career, so I know what a real working standard looks and feels like. This standard allows CDs to be read across hundreds of millions of devices, across multiple versions of PC, Mac, Unix and mainframe operating systems. Data recorded on a DOS PC in 1990 can be read today on a Mac or Windows 7 machine without problem. (Though if you saved Wordperfect files you may still have a problem.) The important factor is that your data is safe if it can be read today.

The value of standards is very clear in the physical world: electric power plugs, shipping containers, tires, CD and DVD discs etc… Life would indeed be messy if these things were not standardized. Even in communications we have standards that enable us to easily communicate: GSM, TCP/IP, HTTP, SMTP and the whole set in the OSI layers. Even regular people care and know about some of these. These standards make many things possible: exchange, interoperability, integration into larger business processes, evolving designs and architecture. In the software world it gets murkier, standards are often de-facto (RTF, SQL?, PDF, DOC?, DOCX?) or just really hard to define. In software it is easier to stray, so MP3 becomes WMA and AIFF and there is always a reason, usually involving words like better and improved to move away from the original standard. The result: You cannot easily move your music collection from iPod to Zune or vice versa, or to a new better technology without some pain. You are stuck with data silos or a significant data conversion task.

The closest we have to a standard in the translation is TMX 1.4 (not the others) and with all due respect to the good folks at LISA, it is a pretty lame “standard” mostly because it is not standard, and mostly because some vendors choose to break away from the LISA specification. It does sort of work but is far from transparent. SDL has it’s own variant and so do others, and data interchange and exchange is difficult without some kind of normalization and conversion effort even amongst SDL products!!! And data exchange among tools usually means at least some loss in data value. Translation tools often trap your data in a silo because the vendors WANT to lock you in and make it painful for you to leave. (Yes Mark, I mean you). To be fair, this is the strategy that IBM, Microsoft and especially Apple follow too. (Though I have always felt that SDL is more akin to DEC.) Remember that a specification is not a standard - it has to actually be USED as a matter of course by many to really be a standard. 

In a world with ever increasing amounts of data, the data is more important than the application that created it.

For most people it is becoming more and more about the data.  That is where the long-term value is. As tools evolve I want to be able to take my data to new and better applications easily. I want my data to be in a state where it does not matter if I change my application tool, and all related in-line applications can easily access my data and further process it as needed. I want to be able to link my data up, down, backwards and forward in the business process chain I live in, and I want to be able to do this without asking the vendor(s). I care about my data, not the vendor or the application I am using. If better tools appear, I want to be able to leave with my data, intact and portable.

So what would that that look like in the translation world? If a real standard existed for translation data I would be able to move my data from Authoring and IQ systems to CMS to TM to TMS to DTP or MT or Web sites and back with relative ease. And the people in the chain would be able to use whatever tool they preferred without issue. (Wouldn’t that be nice?) It could mean that translators could use whatever single TM tool they preferred for every job they did. The long-term leverage possible from this could be huge in terms of productivity improvements, potential new applications and making translation ubiquitous. The graphic below is my mental picture of it. (Who knows if it really makes sense?)

None of the “standards” in the picture today would be able to do this and perhaps real standards will come from the CMS world or elsewhere where standards are more critical. @Localization pointed out a good article on translation related standards at Sun. I think a strong and generic XML foundation (DITA compliant according to an IBM expert I talked to) will be at the heart of a “meaningful” standard. Ultan (aka @localization) has an interesting blog entry on DITA that warns about believing the (over) promises. I keep hearing that XLIFF and possibly OAXAL could lead us to the promised land but of course it requires investment. To work, any of these need commitment and collaboration from multiple parties and this is where the industry falls short. We need a discussion focused on the data and keeping it safe and clean, not the tools. Let them add value within the tool but they should always hand over a standard format so other apps can use it. Again, Ultan who knows much more about this issue than I do says: " We need to move from bringing data to people to bringing people to data. Forget XML as a transport. Use it as structure...:)"

Meanwhile, others are figuring out what XML based standards can do. XBRL is set to become the standard way of recording, storing and transmitting business financial information.  It is capable of use throughout the world, whatever the language of the country concerned, for a wide variety of business purposes.  It will deliver major cost savings and gains in efficiency, improving processes in companies, governments and other organizations.  Check out this link to see how powerful and revolutionary this already is and will continue to be.

As we move to more dynamic content and into intelligent data applications in the “semantic web” of the future, standards are really going to matter as continuous data interchange between key applications from content creation to SMT and Websites will be necessary, and I for one hope that the old vanguard (yes it starts with an S) does not lead us into yet another rabbit hole with no light in sight.You can vote by insisting and making standard-based data preservation a big deal for any product you buy and use. I hope you do.

I would love to hear from any others who have an opinion on this, as you may have gathered I am really fuzzy on how to proceed on the standards issue. (My instincts tell me the two that matter the most are generic and standard XML and XLIFF, but what do I know?). Please enlighten me (us).

Friday, May 14, 2010

The Importance of Information Quality & Standards

I spent a day at the Adobe headquarters in San Jose this week at aQuatic (acrolinx quality assurance tool users) together with Bay Area MT User Group conference and thought I would share some of the highlights. The conference was focused on “Information Quality” (or IQ for short) and several users of the acrolinx quality assurance tool were there describing their experiences and best practices, including users from Adobe, Cisco, Autodesk, IBM, Symantec and John Deere. 

It is clear that any effort towards improving content early in the content development process makes all kinds of subsequent processes like translation and MT much easier and is clearly worthwhile. IQ as acrolinx calls it, is sometimes mixed up with “controlled language” which is what I would characterize as a 1st generation approach to making content more easily leverageable for translation and other downstream processes. Controlled Language is a strategy that was most frequently used with RbMT systems but some of the basic principles in a less stilted form can be useful to SMT as well. If you are interested in seeing a more detailed discussion and many useful links on controlled language vs. simplified English vs. source cleanup, check out the MT group in the LinkedIn forum:Discussion on the use of Controlled Language in various kinds of MT approaches.

The presentations showed that documentation creation is now a much more dynamic process and also showed how the community and Web 2.0 concepts are affecting the content production process.  I have provided links to what I felt were the best presentations below, hopefully the links work even if you are not a BAMTUG member:

Get Ready for Socially-enabled Everything – Scott Abel, The Content Wrangler (starts at 5:10) about how social networks and Web 2.0 concepts are accelerating B2C change and how important community content and social networks have become (includes the overplayed and now somewhat tired video from Socialnomics). (It was played twice during the day!) This is useful for those who still think developing corporate websites is just an internal affair.

MT Best Practices: Pre-editing, Common MT Errors and Cures - Mike Dillinger, Ph.D., Translation Optimization. Interesting anecdotes about how content cleanup can have a huge impact on success with MT or usability for any general user. Lots of good advice in here (including me rudely interrupting on Microsoft KB satisfaction rating interpretation).

The two most interesting and instructive presentations (for me anyway) were:

acrolinx Roadmap and Future Directions - Andrew Bredenkamp, CEO, acrolinx. I was happy to see that an API is being developed that would allow automated cleanup on content prior to MT and that there is a growing awareness of how IQ technology can be used to make “community content” more useful. (Andrew and I have the same birthday so that could also have given him the edge)

Adobe Technical Publication Suite 2 - Mahesh Kumar Gupta, Product Manager, Adobe Systems. I was struck by how relatively open and portable and REALLY standards based this product was for content creation and organization. What a huge improvement over the sorry mess that we call standards in localization e.g. TMX, TTX, TBX etc… I loved the fact that I can edit a document in process with an application that did not create the original and send it on to others who can continue the editing in other preferred applications. I think this kind of flexibility is a huge deal and I will explain why in a future entry. Having a real standards foundation (generic XML, DITA) allows an agility and ability to respond with ease and effectiveness to all kinds of change and dynamically shifting processes and situations. This is dramatic in contrast to the lame standards that exist in the TM and TMS world. Standards allow you to use the best tools and change these tools without compromising or losing your data. Standards give you flexibility. 

Anyway it was an interesting day and I learnt much. The presentations are worth a look.

Monday, May 10, 2010

The IBM/LIOX Partnership: A Review of the Implications

There has been a fair amount of interest in this announcement and I thought it was worth a comment. I have paraphrased some of the initial comments to provide some context for my own personal observations. 

Some initial discussion of this was started by Dave in the GTS blog.   There is speculative discussion on how Google might view this, what languages will be provided, and what kind of MT technology is being used in the discussion. There is a fairly active comment section on this blog as well. (Honestly, I cannot see why Google would care at all, if they even noticed it at all.)

TAUS announced that this was “an alliance to beat their competitors and improve top lines” and they point to the magical combination of SMT technology and huge TM repositories that this alliance creates. They also explain and warn that in the past we have seen that: “Whoever owned the translation memory software, ‘owned’ the customer. The unfortunate effect of this ‘service lock-in’ model was to block innovation and put the brakes on fair competition in our industry “ and they ask that the TM data created through this alliance also be copied into the TDA pool so that “we all own them collectively.” (Umm, yeah right, says @RoarELion).

This was also enough to get Mark at SDL to write a blog entry, where he informs us that basically, SDL knows how to do MT correctly, and that SDL is a real technology company with an ‘open’(sic) architecture. He then goes on to raise doubts about this new alliance being able to get to market in timely fashion, raises questions about the quality of the IBM SMT and also informs us that neither IBM or LIOX has any sales and development experience with this technology. He ends saying they have “a lot of work ahead of them.” Clearly he is skeptical.

The translator community has not been especially impressed by Translator Workspace (TW) or this announcement, and the two are being blended together in that community. The general mood is suspicious and skeptical because of the somewhat coercive approach taken by Lionbridge. ProZ is filled with rants on this. Jost Zetsche raises a question that he thinks LSPs who work with LIOX will have to ask: “They want me to give them a month-by-month rundown of how much I translate?" in his review of translation technology. He thinks that this fact will cause some resistance. (Umm, yeah, prolly so.) Why is that the largest companies in this industry continue to compete (or create conflict of interest issues) with their partners and customers?

CSA sees this as another customized MT solution where LIOX could leverage IBM's system integration expertise and they also see a potential “blue ocean” market-expansion possibility and note that the translation world needs to play in the Web 2.0 space. (Finally!)They also point to all the content that is not translated and once again allude to that wonderful $67.5B+ market that MT could create.

I found the comments from Pangeanic also quite interesting as they attempt to provide some analysis of who the winners and losers could be and provide a different and thought provoking view that is quite different from mine, but still related. Interestingly, they also point to the overwhelming momentum that SMT is gathering "Statistical MT always proves much more customizable or hybridable than other technologies".

Bill Sullivan of IBM provides some business use scenarios and calmness in his LISA interview and talks about creating a larger pie and bigger market and can’t understand what all the uproar is about.

Some of the key facts gathered from LIOX on Kevin Perry’s blog:
 “With this agreement, Lionbridge will have a three year exclusive agreement that:
- gives LIOX the rights to license and sell IBM real-time technology 
- a patent cross-licensing agreement 
- A partnership that establishes Lionbridge as IBM’s preferred deployment partner for real-time translation technology and related professional services. “

This agreement focuses on the SMT technology not the old WebSphere RbMT : “RTTS (Real-Time Translation Service) is not based on RbMT technology. It is based on SMT (Statistical Machine Translation) technology. IBM has deployed RTTS internally through “n.Fluent,” a project that made the RTTS technology generally available to IBM’s approximately 400,000 employees for chat, email, web page, crowd sourcing, eSupport, blogs, knowledge portals, and document translation. It has been in pilot for the last 4-5 years. “
 My sense is that this is not really new in terms of MT technology. It is old technology that was neglected and ignored, trying to make it into the light. IBM has been doing MT for 35 years (35 with RbMT and 5+ with SMT) and have made pretty much NO IMPACT on the MT world. I challenge you to find anything about MT products on the IBM web site without using search. IBM has a great track record for basic R&D and filing patents, but also a somewhat failed track record for commercializing much of this. SMT is one failure, there are many others including IBM PC, OS/2, Lotus Office Software, Token-Ring Networking etc… They do not have a reputation for agility and market-driving, world changing innovation IMO. IBM SMT has even done quite well against Google at the NIST MT system comparisons, so we know that they do have respectable Arabic and Chinese systems. But in spite of having a huge sales/marketing team focused on the US Federal Government market, they have not made a dent in the Federal SMT market. So what gives? Do they not believe in their technology or do they just not know how to sell/market nascent technology, or is $20M or so just not worth the bother?

 Lionbridge also does not have a successful track record with SaaS (Freeway). Much of the feedback on Freeway was lackluster and the adjective I heard most often about it was “klunky” but I have never played with it so I can’t really say. The general sentiment against Translator Workspace today is quite negative. Perhaps this is a vocal minority or perhaps this is just one more translation industry technology fiasco. Generally, initiatives that do not have some modicum of trust among key participants, fail. Also, neither company is known to be the most cost-efficient players in any market they service, so buyers need to be aware of high overhead engagement.

There is the promise, as some have commented in the blogs, that finally this will be a way to get MT really integrated into the Enterprise. (Doesn’t MSFT count?) IBM does have considerable experience with mission-critical technologies but translation is still far from being mission critical. So while there are many questions, I do see that this initiative does the following:
  • Shows that SMT has rising momentum
  • Make it clearer that the translation market is bigger than SDL (software and documentation localization) and MT (really SMT) is a strategic driver in making that happen
  • Increase the market (and hopefully successes) for domain focused SMT engines. Also creates the possibility of SMT becoming a key differentiation tool for LSPs (once you know how to use it).
  • Increase the opportunity for companies like Asia Online who also provide an integrated “SMT + Human” technology platform for other technology oriented LSPs to emulate and reproduce the platform that LIOX has created here, and avoid being dependent on a competitors technology. (We will be your IBM :-) )
 But I do think there are other questions that have been raised that should also be considered. The translation industry has struggled on the margin of the corporate landscape (i.e. not mission critical or corporate power-structure related) as there has never been any widely recognized leadership in the industry. I have always felt that the professional translation industry is filled with CEO’s but very few leaders.


The largest vendors in this industry create confusion, and conflicts of interest by being both technology providers on the one hand and competing service providers on the other. This slows progress and creates much distrust. Many also have very self-serving agendas.

So some questions that are still being asked (especially by TAUS) include:
  • Do you think this is a new “vendor lock-in” scenario?
  • Will this concentrate power and data in the wrong place, in an unfair way as TAUS claims?
  • Can translators or other LSPs get any benefit from this arrangement by working with LIOX?
  • Would most service providers prefer to work with pure technology providers?
  • What would be the incentive for any LIOX competitor to work with this?
  • Would LIOX exert unfair long-term pressure on buyers who walk into the workspace today?
So is this another eMpTy promise? Success nowadays is not built with just technology and being a large company. Remember that Microsoft was tiny when they challenged IBM on PC operating systems, many experts said that OS/2 was technically superior, but being a developer at the time I remember that IBM was really difficult to work with. Filled with vendor lock-in, bullying tactics and huge support costs. Microsoft was easy to work with and relatively open. The developers of course chose the “technically inferior” Microsoft, and the rest is history. Betamax was the battle that Sony lost to an "inferior" VHS. Microsoft in turn lost the search market to Google, because they were preoccupied with success in OS and Office software, even though Bill Gates predicted the internet would be the future. Your past successes can create a worldview that can sometimes be the cause of your future failures.

To my eye, this announcement is completely missing the most critical ingredients for success: collaboration, commitment and engagement from key elements in the supply chain.  Apparently LIOX is trying to recruit the crowd (Work at Home Moms) while they also claim that TW is a good deal for professional translators. How is this interesting, beneficial for professional translators and other LSPs?   Why should they trust their future to a platform run by a competitor who could change the rules? Why would LIOX's largest customers want to do all their processing with IBM whom they probably compete with?

So while it is vaguely possible that this could a turning point in the translation industry, I think it is too early to tell, and that buyers and partners need to be wary. The openness and the quality of this initiative have yet to be established. My sense is that the future is all about, collaboration, innovation and end-user empowerment and engagement. This still looks very much like the old command and control model to me, with minimal win-win scenario possibilities for partners. Actually, LIOX takes the "You give, we take" to a new extreme. But I may be wrong, big companies do sometimes get it right, and finally the proof of the pudding is in the eating.

What do you think? (I wonder if I will be invited to the LIOX party in Berlin?)

Wednesday, May 5, 2010

If Content is Exploding, Why are Translation Prices Still Falling?

In my last entry I talked about three macro trends that are causing downward pressure on translation prices. And I got a fair amount of feedback, some agreeing and some challenging things I said. I also found out that I am being lumped together with others in the “Localization 2.0” camp. This confused me, since as a child I was denied the camp experience in apartheid based South Africa (no darkies allowed!), and CSA just released research that says that “Localization 2.0” is a failed experiment that was a non-starter and really, in the end it is all about translation. However several good questions were raised from this feedback that I would like to address here.

Am I “against” the TEP model?

I am not proposing tearing down the TEP model. I am suggesting that it has a place and is best suited for static content that has a relatively long shelf life. The TEP model made a lot of sense in the late 90’s and even early into the 2000 decade. But it is much less useful as a production model for the more dynamic content that is more common for global enterprises today. This "new" content is more voluminous and often has a half-life of 3-6 months or even less, but could be very valuable during this peak time. Much of this new content is user or community generated and strongly influences purchase behavior, so it is risky to ignore it. However, manuals and software still need to be localized and I do not advocate discarding this proven model for some kinds of content. I do expect it will need to evolve and connect to new technology and infrastructure.

If we are experiencing a content explosion and there is a shortage of human translators, how can prices be falling? This does not jive with standard market behavior where increasing demand typically results in higher prices.

On the surface this does sound illogical and inconsistent with standard economic theory. Usually, more demand = higher prices. However, for the longest time, we have seen enormous amounts of content on the web unavailable to customers and communities around the world. And why? Because traditional translation processes have been largely unaffordable and too slow for many of these new applications. By this, I mean content and communications such as web pages, intranets, knowledge bases, social network feedback, product documents, IM’s, blogs and emails.  I believe that the demand has always been there, but unable to fulfill itself in the cost/time scenarios that traditional translation production models were based on. But now the urgency created by globalization and internet based commerce makes this a much more pressing issue.

So we see global enterprises asking: How do we reduce costs, maintain quality and increase the productivity and speed of the translation process?  And we see localization managers asking: Is this MT/crowdsourcing stuff really going to affect the area of focus for the “commercial grade localization” market?

The explosion of content has an impact in three ways that I can see on the commercial grade localization market.

Firstly, it raises questions of value at an executive management level. 

Most professional translation is done to facilitate and drive international revenue. If the product and sales line-managers receive feedback that end customers (who generate international revenue) do not care about the “traditional” content, then I am sure some of these managers who are P&L driven, will seek to reduce this spend. This may reduce the need and the volume of traditional content or may cause price pressure, because it is felt that the value of the traditional content is low, and thus can be cut back without damage to revenue flows. Localization managers in global enterprises are generally not power players in the corporate hierarchy, and are often just told to do more with less. However, some of this content (GUI, Brand Marketing Content, Legal Terms) cannot be compromised, and it is likely that there will not be any price pressure and possibly even see rate increases for this type of content. So corporate localization managers have to cut budgets on documentation and other relatively “low value” content.

Secondly, as managers become aware of the automation possibilities they begin to consider it for use in “high-volume” commercial grade work. I have been involved with several automotive OEMs who produce 2,000 and 3,000 page manuals for their dealer and service networks. Customized and tuned SMT makes great sense here, as there is a lot of data to train with, the content is repetitive and MT can deliver measured productivity benefits. The net gain from using MT: manuals done significantly faster and at a much lower ongoing cost, at the SAME QUALITY levels of the old TM-based process. While editors/translators are usually paid lower rates they have much higher throughput and most actually make more money under this scenario. Note that I said CUSTOMIZED MT, not Google. 

Thirdly, for those companies that have products with very short product life-cycles (usually in consumer electronics), custom MT based automation can greatly speed up the documentation production process and over time reduce the actual net costs for each new manual. Again post-editing ensures that HT quality levels are delivered. 

Custom MT is the equivalent of building a production assembly line – it can make one or a few products very efficiently and give you a translation factory that gets more and more profitable as the production volumes increase, since the marginal cost continues to fall. In every case I know of, if you want publishable quality you still have it go through human post-editing. However, MT is a strategic long-term investment and production capacity asset in my opinion, (though several RbMT experts claim it can be used on a project to project basis). Real leverage for commercial grade localization comes from domain focus and continuing enhancement of these systems. To produce HT quality, it will ALWAYS be necessary to have a post-editing step in the process.  An informed SMT engine development process will deliver continuing improvements and starting point quality. This task of producing an ever-improving translation production line is relatively new and we are just beginning to understand how to do it well. Don’t believe the experts who tell you they have done this before. This is new and we are all still learning. Just as cars have gotten much more reliable as automation technology evolved,  we will see that effective man-machine collaborations will produce both, compelling quality and efficiency.
The most polished human translations (TM) will, I think, generally produce the best systems and the most skilled correction feedback will produce the biggest ongoing boost to the quality. If you want human quality you will need to have a modified form of TEP to make that happen, with more objectivity and more automation. I believe, growing use of SMT in particular will create new eco-systems and tools that require strong human linguistic skills. I believe there will be a need for more skilled linguists in addition to translators and cheap editors who may not even be bilingual but have subject matter expertise.

Expanding use of MT and community collaboration does not mean that QA, process control and quality concerns go away. I actually think there is much more room to build differentiation as these skills are broader and deeper than you need with TM. I expect that both at the translator and the LSP level there will be more room for differentiation than there is today.

Google continues to improve their MT systems and so will Microsoft and maybe even IBM now  (finally!) with assistance from Lionbridge. These are all largely SMT technology initiatives which are all a little bit hybrid nowadays. I am still betting that domain focused SMT systems developed by agile and open minded LSPs and translators, working in collaboration with technology partners like Asia Online will easily outperform anything the big boys will produce. The key to excellence is still the quality of the collaboration and the dialog between these players, not just the technology.