eMpTy Pages: industry associations

Showing posts with label industry associations. Show all posts

Monday, April 11, 2011

The Rush to Manage and Control Standards

There has been a lot of talk about standards since the demise of LISA, perhaps because the collapse of LISA was announced almost immediately after their final event, a “Standards Summit” in early March, 2011. We are now seeing something of a rush, with industry groups setting up positions (perhaps even well intentioned) to establish a controlling interest on “what happens next with standards”. There is still much less clarity on what standards we are talking about, and almost no clarity on why we should care or why it matters.

What are the standards that matter?

The post I wrote on the lack of standards in May 2010 is the single most influential (popular?) post I have written in this blog according to PostRank. So what all this new posturing on standards is about? From my vantage point (as I stated last year), standards are important to enable information to flow from the information creators to the information consumers as efficiently as possible. Thus my view of standards is about those rules and structures that enable clean and efficient data interchange, archival, and reuse of linguistic assets in new language and text technology paradigms. Search, Semantic search, SMT, language search (like Linguee) and text analytics is what I am thinking about. (You may recall that I had much more clarity on what and why standards matter than on how to get there.) Good standards require that vendors play well with each other, that language industry tools interface usefully with corporate content management systems and make life easier for both the information creators and consumers, not just people involved in translation.

However, I have also seen that there is more conflation on this issue of standards than almost any other issue (“quality“ of course is the winner) amongst localization professionals. I am aware that there are at least three different perspectives on standards:

1. End to End Process Standards: ISO 9001, EN15038, Microsoft QA and LISA QA 3.1. They have a strong focus is on administrative, documentation, review and revision processes not just the quality assessment of the final translation.

2. Linguistic Quality of Translation (TQM): Automated metrics like BLEU, METEOR, TERp, F-Measure, Rouge and several others that only focus on rapidly scoring MT output and human measurements that look at the linguistic quality by error categorization and subjective human quality assessment, usually at a sentence level. SAE J2450, the LISA Quality Metric and perhaps the Butler Hill TQ Metric

3. Linguistic Data Interchange: These standards facilitate data exchange from content creation and enable transformation of textual data within a broader organizational data flow context than just translation, good interchange standards can ensure that fast flowing streams of content get transformed more rapidly and get to customers as quickly as possible. XLIFF and TMX are examples of this, but I think the future is likely to be more about interfacing with “real” mission-critical systems (DBMS, Collaboration and CMS) used by companies rather than just TMS and TM systems which IMO are very likely to become less relevant and important to large scale corporate translation initiatives.

It is my sense that we have a lot of development on the first kind of "standard" listed above, but have a long way to go before we have meaningful standards in the second and third categories listed above.

So it is interesting to see the new TAUS and GALA initiatives to become standards leaders when you consider that LISA was actually not very effective in developing standards that really mattered. LISA was an organization that apparently involved buyers, LSPs and tools vendors but were unable to produce standards that really mattered to the industry. (In spite of sincere efforts to the contrary). TMX today is a weak standard at best and there are many variations that result in data loss and leverage loss whenever data interchange is involved. (Are the other standards they produced even worth mentioning? Who uses them?) Are we going to see more of the same with these new initiatives? Take a look at the TAUS board and the GALA board as these people will steer (and fund) these new initiatives. Pretty much all good folks, but do they really represent all the viewpoints necessary to develop standards that make sense to the whole emerging eco-system?

Why do standards matter?

Real standards make life easier for the whole eco-system, i.e. the content creators, the professional translation community, the content consumers and everybody else who interacts, transforms or modifies valuable content along the way. Standards matter if you are setting up translation production lines and pushing translation volumes up. At AGIS2010, Mahesh Kulkarni made a comment about standards in localization. He called them traffic rules that ease both user and creator experience (and of course these rules matter much more when there is a lot of traffic) and he also said that standards evolve and have to be tested and need frequent revision before they settle. It is interesting to me that the focus in the non-profit world is on studying successful standards development in other IT areas in contrast to what we see at TAUS and GALA where the modus operandi seems to be to create separate new groups, with new missions and objectives, though they both claim to be in the interest of “everyone”.

There was a great posting by Arle Lommel on the LISA site that is now gone on why standards matter, and there is also a perspective presented by Smith Yewell on the TAUS site on why we should care. I hope there will be more discussion on why standards matter as this may help drive meaningful action on what to do next, and produce more collaborative action.

So today we are at a point where we have TAUS saying that it is taking on the role of an "industry watchdog for interoperability" by funding activities that will track compliance of the various tools and appointing a person as a full-time standards monitor. Jost Zetzsche has pointed out that this is fabulous, but the TAUS initiative only really represents the viewpoint of “buyers” i.e. localization managers, (not the actual corporate managers who run international businesses). The REAL buyer (Global Customer Support, Global Sales & Marketing Management) probably care less about TM leverage rates than they do about getting the right information to the global customer in a timely and cost-effective way on internet schedules i.e. really fast so that it has an impact on market share in the near term. Not to mention the fact that compliance and law enforcement can be tricky without a system of checks and balances, but it is good to see that the issue has been recognized and a discussion has begun. TAUS is attempting to soften the language they use in defining their role, as watchdogs are often not very friendly.

GALA announced soon after, that it would also start a standards initiative which will "seek input from localization buyers and suppliers, tool developers, and various partner localization and standards organizations." Arle Lommel, the former director of standards at LISA will be appointed as the GALA standards guy. Their objective they say is: “The culmination of Phase I will be an industry standards plan that will lay out what standards should be pursued, how the standards will be developed in an open and unbiased way, and how the ongoing standards initiative can be funded by the industry.” Again, Jost points out (in his 188th Tool Kit Newsletter) that this will be a perspective dominated by translation service companies and asks how will the needs and view of individual translators be incorporated into any new standards initiatives? He also appeals to translators to express their opinions on what matters to them and suggests that a body like FIT (Federation of International Translators) perhaps also engage in this dialogue to represent the perspective of translators.

There are clearly some skeptics who see nothing of substance coming from these new initiatives. Ultan points out how standards tend to stray, how expensive this is for users and also raises some key questions about where compliance might best belong. However, I think it is worth at least trying to see if there is some potential to channel this new energy into something that might be useful for the industry. I too, see some things that need to be addressed to get forward momentum on standards initiatives which I suspect get stalled because the objectives are not that clear. There are three things at least, that need to be addressed.

1) Involve Content Creators – Most of the discussion has focused only on translation industry related players. Given the quality of the technology in the industry I think we really do need to get CMS, DBMS and Collaboration software/user perspectives on what really matters for textual data interchange if we actually are concerned with developing meaningful standards. We should find a way to get them involved especially for data interchange standards.

2) Produce Standards That Make Sense to Translators – The whole point of standards is to ease the data flow from creation to transformation to consumption. Translators spend an inappropriately huge amount of time in format related issues, rather than with translation and linguistic issue management. Standards should make it easier for translators to ONLY deal with translation related problems and allow them to build linguistic assets that are independent of any single translation tool or product. A good standard should enhance translator productivity.

3) Having Multiple Organizations Focused On The Same Standards is Unlikely to Succeed – By definition standards are most effective when there is only one. Most standards initiatives in the information technology arena involve a single body or entity that reflects the needs of many different kinds of users. It would probably be worth taking a close look at the history of some of these to understand how to do this better. The best standards initiatives have hard core techies who understand how to translate clearly specified business requirements into a fairly robust technical specification that could evolve but leaves some core always untouched.

One of the problems in establishing a constructive dialogue is that the needs and the technical skills of the key stakeholders (Content creators, Buyers, LSPs, Translators) differ greatly. A clearer understanding of this is perhaps a good place to start. If we can find common ground here, it is possible to build a kernel that matters and is valuable to everybody. I doubt that TAUS and GALA are open and transparent enough to really engage all the parties, but I hope that I am proven wrong. Perhaps the first step is to identify the different viewpoints and clearly identify their key needs before coming together and defining the standards. It is worth speaking up (as constructively as possible) whatever one may think of these initiatives. We all stand to gain if we get it right, but functioning democracy also requires vigilance and participation, so get involved and let people know what you think.

Tuesday, November 9, 2010

The Machine Translation Community Building Bridges with Translators

The American Machine Translation Association (AMTA) recently held their annual conference in close proximity with the ATA in an attempt to build bridges and foster a growing dialogue between these two communities. When I entered the world of MT (I have always preferred the term automated translation) I had the good fortune to work with Laurie Gerber at Language Weaver who encouraged engagement with translators. While her voice was not heard there, she has always stayed true to this vision and she was instrumental in influencing me to also reach out to the world of professional translators as a core business strategy. She has long been a clear voice encouraging the broad MT community to reach out to translators and she was visible in Denver last week making sure that ATA guests were engaged and making all the right connections or just having a good time.

It is clear to me that the path to better quality MT, that really does fulfill the promise of sharing information, knowledge more freely in the world can only come from a close, cooperative and collaborative relationship with professional translators.

The conference began with a keynote from Nicholas Hartmann who is the current ATA President and also a past technical marketing translator. He gave, what I thought was an articulate, considered and clear perspective of the translator vis-à-vis translation technology and MT while pointing to some directions for real collaboration in future. I thought it would be valuable to restate what I heard, as there were several key messages for the MT community. A published paper version of his speech is also available on the AMTA website (but it is a really hard to get to these resources as the unique URL is not easily displayed.)

The ATA has 11,000 members, of whom 70% are freelancers and Nick had carefully prepared to be their voice, expressing their concerns and needs in this forum. (Here is the twitter stream). He stated that the bad blood with translators was originally created with the historical overstatement of MT capabilities in the 60’s where MT was expected to replace translators: FAHQT (Have you noticed that this sounds a lot like f**&ked?). He noted that many translators do in fact use some form of translation technology today even though they find the future vision of being post-editors at “burger flipping wages” abhorrent. He gave some examples of human translations that went beyond the literal, to show how only a human could make the non-literal interpretations to correctly translate some example phrases. The examples proved that even a “perfect” literal translation can be nonsense at times and asked if the future of MT is as T.S. Eliot says:

“That is not what I meant at all. That is not it, at all”

Some additional points he made included:

Translators have a very different view of quality which is linked to their code of ethics to render the source material accurately
MT makes sense where something is better than nothing
MT is really only “a probability distribution over strings of letters and sounds” (especially SMT) (part of a quote from Martin Kay in which he specifically cautioned the MT community NOT to consider language so simplistically)
Translators want it to be acknowledged that their work is critical to feeding and improving SMT systems with HT corpus
MT should be matched to task and purpose and could be unfortunate in the hands of the wrong people e.g. the infamous Welsh street sign
SMT that is often built using flawed TM and thus one could hardly be surprised at some of the results and in this case past performance will be a predictor of future performance
MT must be edited and checked to avoid serious errors
Post-editing has come to mean cleaning up really bad quality MT output at very low wages even when everybody doing it understands that they could have done it faster without the MT
The data pollution of some SMT systems perpetuates and is difficult if not impossible to remove

He then went on to answer the following question. So what do translators want and need?

“We want to work together constructively. We want technology that we can use. Machine assisted translation does make sense to us but we do not want tools that make our jobs harder.” Translators want to have a hand in making the tools but want the dialogue to be realistic. They also do not want a role of PEMT drudgery and asked for technology that assists translators to be more productive. He ended his talk saying that he had enjoyed meeting many in the MT community and he hoped that the dialogue would continue as, “We are all in the same business.”

I did not stay long enough to really get the reaction of the MT crowd as I rushed off to tekom in Germany that same evening. Of course there was one somewhat hostile question immediately, but I think many MT users want to know how to work with translators and as you will see from my previous blog entries that I am a big believer in this rapprochement.

Nick was followed by Jost Zetzsche who continued on the theme of building bridges and improved communication and he pointed out how translators have a self-perception of being bridge builders, language lovers, artists, cultural intermediaries in contrast to the techie, computer science self image that many MT practitioners have. Clearly cultural and communication problems can arise from this. He was self critical and admitted that translators need to learn more about MT technology and not resist it like they did with TM, but also pointed out some foolish statements made by MT proponents that any average translator would see as unfortunate or stupid. Some examples, Jaap van der Meer’s statement about letting a thousand MT systems bloom. Some of you may realize that this is really close to something that Mao Zedong said to flush out dissidents and eventually execute them. The other screamer he listed (without reference to the source, other than it being a major MT vendor executive) was, “It’s quite a magical technology when you see it (MT) work” by Mark Tapling of SDL (He says this with a smile at the end of the video clip). (Dude, it’s a data transformation!!! Really ?!? I wonder if Tapling thinks that spreadsheets adding numbers up and Powerpoint slide transitions are also magical? Perhaps from a 19th century frame of mind this is all pretty magical). Jost contrasted this to a Twitter conversation he had with @kvashee ;-) about features that would make MT more useful to translators. ( I assure you I did not instigate this comparison.)

Some things that he asked for: Give us (translators) challenging tasks, we want to participate in “making it better”. He stated that he wanted to see that his corrections had immediate and direct impact on the system. (One of the biggest complaints that translators have about MT is that the systems make the same error over and over again.) He asked that MT vendors talk to translators in “real” language and “admit what your tools can and cannot do”. He ended on a positive note by saying that we as humans tend to demonize the unknown (HIC SVNT DRACONES!) and invited the audience to enter into each others terra incognita, and put our myths behind us.

While this event was a good and constructive start, I hope that the dialogue between translators and MT developers continues beyond this conference and produces real innovation and collaboration. One of the first subjects that needs elucidation and better definition is “post-editing”. It was clear in several very instructive presentations in the “Commercial User Track” that the concept of post-editing needs development and clarification. There were several very good presentations and we see many successful MT implementations being discussed on a regular basis now. Check out http://www.wallofsilver.net/ and type #AMTA2010 to see a cool Twitter summary of the conference. Chris Wendt of Microsoft and I were also voted to the AMTA board, representing commercial users and I thank all those who voted for me and hope to help drive our common interests and agenda.

There was an interesting demo of several post-editing tools that are available in the market currently. Lingotek showed their translator workbench, PAHO showed their Word Macro based post-editor which is perhaps the longest running and most widely used direct post-editing tool around. GTS showed a promising looking community management and basic editing environment for Wordpress blogs. These examples all suggest that the tools to make post-editing more interesting are going to continue to evolve, and that while these are wonderful examples the best is yet to come. We also see that increasingly community and collaboration are intertwined with post-editing and this connection to MT is likely to develop further, as new kinds of people are drawn into the translation process. AMTA will make much of the content from this conference available on their website and I am sure many will find it useful if they can actually find it. (They need a serious update to their website).

Unfortunately, I had to rush off to the Tekom conference in Germany at the end of the first day and missed the rest of the conference, but I kept in touch via the GTS blog since the twitter stream died pretty quickly after I left. I noticed Jost mentioned in his blog that Laurie had accomplished what she had set out to do, years ago i.e. bring MT developers and translators closer together. However, while Laurie may be happy at this initial accomplishment, I would suggest that she stay around a while longer and make sure that we all set sail together with the wind at our backs. The journey together has just begun and we have many miles to go before we sleep.

And this was tekom – a blur of meetings and some wonderful dinner conversations.

Friday, October 8, 2010

Highlights from the TAUS User Conference

Earlier this week, there were a 100+ people gathered in Portland,Oregon at the TAUS annual user conference. This included a large contingent of translation buyers, mostly from the IT space - (Intel, Oracle, EMC, Sun, Adobe, Cisco, Symantec, Sony), a few LSPs with real MT experience and people from several MT and translation automation technology providers gathered to share information about what was working and what was not. Like any conference, some presentations were much more compelling and interesting than others, and I wanted to share some of the things that stood out for me. It was not always easy to tweet, since the connectivity faded in and out but a few of us did try to get some coverage out. CSA also has a nice blog entry on the event that focuses on the high level themes. I think that some of this will be made available as streaming video eventually.

If you are interested in the business use of machine translation this was a useful event and it showed you many examples of successful use-cases and also get technology presentations from MT vendors in some depth, perhaps too much depth for some. There was a full day that was filled with MT related presentations from users, MT tool developers and from LSPs using MT. Some of the highlights:

Jaap stated that translation has become much more strategic and that global enterprises will need language strategies. He stated that he felt that there would not be any substantial breakthroughs in MT technology research in the foreseeable future. I actually agree with him on this, in the sense that the rate of improvement from pure MT technology research alone will decline, but I believe that we are only at the beginning of the improvements possible around better man-machine collaboration. It is my opinion that the many of the systems presented at the event are far from the best systems possible today with the technology and people available today. Another way to say this is that I think the improvements from free online MT will slow down, but the systems coming from professionals collaborations like Asia Online has with LSP partners will rise rapidly in quality and show clearly that just data, computers and algorithms are not enough. I predict that collaboration with skilled, focused and informed professional linguistic feedback (LSPs) will drive improvements in MT quality faster than anything done in MT research labs. The two most interesting pure MT technology presentations included an overview of morpho-syntactic SMT (a hybrid with 500M rules! from Kevin Knight at USC/ISI) and the overview of a “deep” hybrid RbMT approaches from ProMT where statistical selection is introduced all through the various stages of an enhanced RbMT transfer process. This is in contrast to the “shallow” hybrid model used by Systran which uses SMT concepts as a post-process to improve output fluency. All the RbMT vendors/users stressed the value of upfront terminology work to get better quality. For those of you who heard the presentation that Rustin Gibbs (mostly, who was also a star performer at the conference) and I did, will know that I am much more bullish on what a "vendor-LSP collaboration" could accomplish over any technology-only approach.

It was good to see (finally!) that many, including Jaap admit that data quality and cleanliness do matter in addition to data volume.

Several people came up to me and mentioned that they had experienced that sometimes less is more, when dealing with the TDA data. TM cleaning, corpus linguistics analysis to better understand the data and assessing TM quality at a supplier level at least is now a recognized and real issue that is getting more attention within TAUS. There were several presentations that mentioned data quality, “harmonizing terminology”, TM scrubbing and strategies to reduce the risk of data pollution when large amounts of TM are gathered together. The TDA database now has 2.7 billion words and TAUS admitted that it has become more difficult to search and use and thus they are integrating the data into a Globalsight based repository to make the data more usable. They are hoping to add more open source components to further add value to the data and it’s ongoing use. Their ability to further refine this data and really standardize and normalize it in useful ways will define the TDA’s success as a data resource in future, in my opinion.

There was an effective and very interesting point counterpoint session (unlike the ones I have seen at GALA) between Keith Mills, SDL and Smith Yewell, Welocalize that positioned the “practical” legacy system view versus the open systems view of the world. It really did justice to both viewpoints in a common framework and thus provided insight to the careful observer. It was interesting to see that SDL used the word “practical” as many as 30 times while presenting the SDL view of the world. In brief, SDL claimed that they will spend $15M in 2010 on R&D to create a platform to link content creation more closely to language. He said that SDL does not make money on TMS because there is too little leverage due to too many one-off translation processes. He also said that SDL will not go open source but was “really into standards” and will create APIs to “let” customers integrate with other software infrastructure.

Smith presented his view of a world in contrast to the “walled garden” of SDL that was interoperable, “open” and collaborative and involved multiple vendors. Keith responded they will “eventually” connect to other products and “are working on it,” with counters on how closed black boxes make it difficult to scale up and meet new customer requirements of translating fast flowing content from disparate sources. It was interesting to see Jaap characterize the debate saying that perhaps Smith was “too visionary” (a nice way of saying impractical?) and that the SDL perspective is “realistic and practical”. I actually have captured the flow of the debate in my twitter stream. Keith also made good points about how MT must learn to deal with formatted data flows to be really usable but seemed to completely miss the growing urgency for language data to move in an out of SDL software systems. Smith also pointed out that MT is not revolutionary, rather it is just another tool that needs to be integrated into the right business processes to add value. I liked the debate because it accurately presented the two viewpoints in an authentic way and let you see the strengths and weaknesses of both perspectives. I, of course, have an open systems and collaboration bias but this session gave me some perspective on the value of the walled garden perspective as well.

I had some back channel Twitter chatter with @paulfilkin and @ajt_sdl about what meaningful openness means. In my view SDL does NOT have it. My advice to SDL on this is to share API information like the TAUS API, that is published for self-service use and so any customer/member can connect to get data in and out of the TDA repository efficiently. Facilitate the work to let the data flow and give customers enough information (API, SDK) so that they can do this themselves without SDL permission and move language data to wherever it is needed easily e.g. TMS to TMS, TMS to TM, TMS to CMS, TMS to MT, TMS to Web. They should provide basic information access for free and customers should only be required to pay for this if they need engineering support. SDL needs to understand that language data can be useful outside of SDL tools and that they can make this easier by delivering real openness to both their customers and the overall market. I have written about this previously. My advice and warning to customers – stay away from SDL until they do this or you will find yourself constantly wounded and tending to “integration friction”.

One thing that I found really painful about the conference was a horrendously detailed continuing presentation (it refused to end) on the history of MT. I am not sure that there is anything to be learnt from such minutiae. For me this is clearly a case where history was horrifically boring and did not teach any real lessons. It made me want to poke my eyes and see how much pain I could tolerate before crying out. I hope they never do it again and that the presentation and recordings are destroyed.

I also feel that the conference was way too focused on “internal” content. That is, content created by and within corporations. It is no surprise that SDL (which originally stood for Software and Documentation Localization) is so committed to the walled garden since in the legacy world, command and control has always been the culture, even within most global companies. In an age where social networks and customers sharing information with each other increasingly drives purchase behavior, this is a path to irrelevance or obscurity. I am not a big believer in any Global Content Value Chain (GCVS) that does not have a strong link to high value community, partner and customer created content. The future is about BOTH internal and external content. I think the TAUS community would be wise to wake up to this, to stay relevant and attract new and higher levels of corporate sponsorship. We should not forget that the larger goal of localization and translation efforts is to build strong relationships with global customers. Conversations in social networks is how these relationships with brands and customer loyalty are being formed today. Localization will need to learn to connect into these customer conversations and add value on behalf of the company in these living, ever present conversations happening mostly outside the corporations content control initiatives. Richard Margetic from Dell said it quite clearly at Localization World Seattle: “Corporations will have to take their heads out of the sand and listen to their customers, we believe that the engagement of customers on Twitter is critical to success (and sales)” He also said “We had to teach our corporate culture that negative comments have value. It teaches you to improve your products” and I hope that the TAUS board will take my comments in that spirit.

There was surprisingly little discussion about data sharing and the focus had moved to much more pragmatic issues like data quality, standards, categorization and meaningful identification and extraction from the big mother database but there were few details on this other than some basic demos of how Globalsight and the search engine worked. If you have ever seen a database search and lookup demo you will know that it is seriously underwhelming. The TDA is great if you are looking for Europarl or IT data but is pretty thin if you want data in other domains. A lot of this data is available elsewhere with no strings attached. The TDA needs to give people a reason to come to their site for this data.The value of TDA in future, IMO, is going to depend on how they add value to the data, normalize it, clean it and categorize it. This to my mind is the real value creation opportunity. They also need to find ways to attract more people who are not from the localization world but have an interest in large scale translation. Additionally I believe they should expand the supervisory board beyond IT company people to increase the possibility of being a real force of change. When everybody has the same background, groupthink is inevitable. (Aren't they forced to watch those workforce diversity presentations that I was forced to when I was at EMC?) I suspect that open innovation / collaboration models will be more likely to come up with ways to add value to data resources and new ways to share data and hopefully TDA finds a way to engage with people like Meedan, TRF, Yeeyan and other information poverty focused initiatives. While there was a focus on innovation and collaboration I got the feeling that it was too focused on getting more open source tools and nothing else. I think open innovation needs more diversity in opinion and ideas than we had in the room. What is open innovation? Henry Chesbrough; “Open innovation is a paradigm that assumes that firms can and should use external ideas as well as internal ideas, and internal and external paths to market, as the firms look to advance their technology”

Here is a presentation on Data Is the New Oil. I think the point they make is very useful to TDA: Refine, Refine, Refine to create value. Value is getting the right data to the right people at the right time in the right format. I think it is worth finding out what that actually means in terms of deliverables. Make it easy to slice and dice and package and wrap.

Some of you who know me, know that I am a Frank Zappa fan and Frank (well ahead of his time as usual) said it well in 1979 on the Joe’s Garage album: (I would add “Data is not information” for this blog and recommend you look at more Zappa quotes)

"Information is not knowledge.Knowledge is not wisdom.

Wisdom is not truth.Truth is not beauty.

Beauty is not love, Love is not music.

Music is THE BEST"

Finally, one thing about conferences is that there are always a few conversations that stand out. While there were many professional conversations of substance, for me, the ones that stand out the most are conversations that come from the heart. I was fortunate to have four such conversations at this event. One with Elia Yuste of Pangeanic about building trust, another with Alon Lavie of AMTA about where innovation and the real exciting MT opportunities will come from, a third with Smith of Welocalize about building openness with substance, integrity & honor and finally a conversation with Jessica Roland about life, finding purpose and family. I thank them all for bringing out the best in me.

As I prepare for my ELIA keynote, I am compelled to share one more quote that I think is worth pondering:

In revolution, the best of the new is incompatible with the best of the old. It’s about doing things a whole new way… Clay Shirky

Thursday, September 16, 2010

Asia Online In The Conference Season

Asia Online principals will be very active in the conference season this autumn as we go and share our vision about best practices in using machine translation and our ideas on the continuing evolution of this technology. Please don’t hesitate to introduce yourself if we don’t recognize you.
Dion Wiggins will be speaking at the LRC conference in Limerick, Ireland in a special free workshop that will be held just before the actual conference. This workshop will explore a number of elements of machine translation. Dion is a great speaker and I am sure this will be a entertaining and thought provoking three hours. He will also be on a panel during the conference.

Section 1: The future of Machine Translation – What MT means to Enterprise and Language Service Providers
This session will explore key trends in the MT industry; address many misconceptions of about machine translation. We will explore a variety of MT concepts, technologies and provide a core background on MT, present, past and future. We will investigate a variety of attitudes towards MT and concepts relation to translation overall that often get blurred when it comes to MT. We will look at a variety of models for LSPs and enterprises to use MT – we will also explore models on how LSPs and enterprises can monetize MT and integrate MT into their business. We will explore a mystical word called quality and examine what it is and what it means to various organizations in a variety of situations.
Section 2: Asia Online Language Studio™ Translation Platform.
This session will expand on the first, with a live demo of Asia Online Language Studio™ Pro desktop tools and Language Studio™ Enterprise translation platform. We will explore best practices in customizing a translation engine and go through the key steps one at a time that Asia Online performs in order to deliver a high quality translation engine. We will look at the creation of training data, aligning text from multiple documents in different languages, cleaning the data to ensure only high quality data is used to build the engine. Finally we will look at how to improve an engine’s quality – a process that is unique to Asia Online and shows the true impact of clean data and how the normal process of editing can rapidly improve the overall quality.
Attendance is free but people must register by emailing lrc@ul.ie

Kirti Vashee (that would be me) will be doing a detailed session on How to Get Started with MT at the ELIA Networking Days conference. I will also be presenting a keynote on the growing impact of MT on the professional translation world and what this might mean. There is also other great content on MT at this same conference presented by others. And thanks to @ParaicS I will also spend a day with CNGL researchers to share and exchange ideas on translation technology and improving collaboration in localization. Topics to be covered in the detailed session include:

MT Technology Overview – RbMT, SMT and Hybrids
Detailed SMT technology overview
Skills required to succeed with MT
Rapid Quality Assessment Of MT Output
Post-editing Practices & Pricing Approaches
New Revenue Opportunities Created by MT
Getting Started – Key Steps & Considerations

Additionally, we are also doing two presentations together with Moravia, one at the TAUS Annual User Conference on Machine Translation in the Imperfect World: where we discuss how SMT engines can be developed in situations when there are scarce bilingual (TM) resources. We will be doing an expanded version of this session at the AMTA conference where we discuss how strategies differ for different data availability scenarios. We are going to skip Localization World as there is very little focus on MT.

In mid to late October we will also participate in a road show on the East Coast (Boston, NYC, DC) with our partners Milengo, Acrolinx, Clay Tablet and Lingoport where we will describe end-to-end solutions for global enterprises in a “high personal interaction” seminar setting. Watch for announcements coming soon.

In early November we will also participate at tekom tcworld conference where I will speak together with Across Systems on the integration of MT and crowdsourcing into Enterprise translation processes. I will also present how MT and post-editing can be used to leverage technical customer support efforts and make much more information available in self-service environments to increase customer satisfaction and build customer loyalty.

Finally I will also be involved in a GALA Webinar on December 16th which will be an abbreviated and probably updated version of the Dublin presentations. The links and info will be up as soon as I send in a description to GALA.(Sorry Amy) And I have just realized that I will also need to find a way to participate in a Proz virtual conference on October 13th while I am in Dublin. Maybe I could get a bunch of the CNGL people also involved with this if the time zone differences permit this?

Wednesday, August 25, 2010

Translation Technology & Innovation: Where Can You Learn More?

I was recently trapped in the LA County Criminal Court Juror holding room for a day waiting to do my civil duty, albeit reluctantly. As I was waiting I had a wonderful twitter storm chat with @rinaneeman and @renatobeninatto and others about innovation in general and about innovative LSPs in particular. It was pretty intense and we covered a lot of ground given the format. We talked about the change in the overall business model (translation-as-a-utility), automation, new, more efficient ways of of doing translation work and much more. This is the best I can recapture (and click on show conversation) and I am not sure how to really see the thread in a nice chronological stream with all the people involved. If you know how to do this please let me know. However, one of the questions that came up in this discussion was where could one learn about these new technologies and processes (MT being just one of them) that facilitate innovation and allow one to address new translation problems?

I had believed that that there is very little formal training around and then Renato reminded us that regional associations play an important role in providing training. The next ELIA conference in Dublin in particular has a very strong focus on innovation and translation automation technology in addition to the traditional localization themes. I have found these smaller regional shows to be more effective in providing useful training and allows a much deeper dive into the reasons why this makes sense. The ELIA event has singled out MT and affiliated technologies as worthy of serious attention in direct response to member requests. I think this is wonderful not only because I have a prominent role at this event, as I will be presenting a keynote on broad changes impacting the overall world of translation as well as doing a detailed training session on how to get started with MT technology for those who really want to get down and dirty. It is also a sign that this technology can take the next step with technology developers and translation practitioners working together. I am a big believer in dialog, and this event is an example I think of an honest attempt to build this dialog.

In the keynote session I will look at how 2 billion+ internet users, community and crowdsourcing initiatives, translation technology, ever improving free MT, new attitudes to open collaboration and data sharing are impacting the professional translation world. I will explore how the shift to the project-less, translation-as-utility world will require new skills and new services from language service providers, explore and comment on emerging innovation and also point to the ever increasing market potential that becomes available to industry innovators who have competence with and understand the new dynamics.

I will also run a training session that will go over MT technology in some detail and provide basic background on the technology fundamentals and point to what I think are keys to being successful with MT. I will try and make this as practical and useful as possible answering questions about RbMT vs. SMT, MT engine customization strategies, MT quality assessment and relationship to post-editing effort, understanding data, skills required for different tasks etc. I believe that innovative LSPs will be the driving force behind creating really amazing MT systems in future and I will focus on the skills that I think will be most critical to enabling this kind of success.I will also explore new business opportunities that MT can enable to get you out of the software and documentation localization market. Hopefully this session is highly interactive and I am open to communication about what participants might want to most focus on and understand. The session is on Monday October 11th so please feel free to communicate with me on this before then.

As we setup translation production lines to handle 10X or 100X more content in the future we will need to link key processes together. Information quality focused processes and integrated and efficient post-editing will also be necessary to build efficiency. MT alone is not enough to solve the problems we face in the future and I think it will also be critical to learn how to clean up and “improve” source content before any kind of translation attempt. Frans Wijma will also provide guidance on Simplified Technical English which will provide attendees some insight on the IQ, controlled language, source simplification issue. Something that will be increasingly valuable to learn and do in future.

Those who stay in the MT track will also get to hear Sharon O’Brien talking about post-editing MT. She will answer all the following questions: How does post-editing of Machine Translation output differ from revision or QA activities in the localization domain? Are translators the best post-editors? Do they need specific experience and training? What guidelines should be given to post-editors? What productivity enhancements can be reasonably expected? Why do translators seem to dislike this task? I saw her speak at LRC (one of the best conferences I attended last year) and she has great insight and advice to offer on this subject.

And if that weren't enough to make you sign right up, there are also some great sessions on sales strategies for LSPs from non other than Renato, localization basics and next generation localization research from CSA, CNGL and the Gilbane Group. And all at a fraction of the cost of larger conferences. Check out the ELIA site for more details.

I hope to see you there and for those of you who don’t know, I am easily persuaded onto the karaoke floor. No alcohol required but unfortunately this is not because I necessarily sing so well. I went to a Jesuit (Boys only) School in India and had a teacher of mixed Indian/Portuguese (Goanese) extraction who used to exhort:

“Sing with gusto boys! Don’t worry about the notes, you will find them.”

This is advice I have taken to heart, as my karaoke friends from the IMTT Cordoba 2009 event will also tell you. In spite of having nothing more than a laptop with tiny speakers to provide musical backing, we sang with gusto till dawn and indeed we did eventually find the notes. ;-)

Wednesday, August 18, 2010

The Problem with Standards in the Localization Industry

In my continuing conversation with Renato Beninatto we recently talked about standards:

It is clear from this conversation that the word “standards” is a source of great confusion in the professional translation world. Part of the problem is conflation and part of the problem is definition or lack of “clear” definition on what is meant by standards especially as they relate to quality. In the conversation, we both appear to agree that data interchange is becoming much more critical and it would be valuable to the industry to have robust data interchange standards, however, we both feel that overall process standards like EN15038 have very little value in practical terms.

This discussion on quality standards in particular is often difficult because of conflation, i.e. very different concepts being equated and assumed to be the same. I think we have at least 3 different concepts that are being referenced and confused as being the same concept, in many discussions on “quality”.

End to End Process Standards: ISO 9001, EN15038, Microsoft QA and LISA QA 3.1. They have a strong focus is on administrative, documentation, review and revision processes not just the linguistic quality assessment of the final translation.
Automated SMT System Output Translation Quality Metrics (TQM): BLEU, METEOR, TERp, F-Measure, Rouge and several others that only focus on rapidly scoring MT output by assessing precision and recall and referencing one or more human translations of the exact same source material to develop this score.(Useful for MT system developers but not much else).
Human Evaluation of Translation Linguistic Quality: Error categorization and subjective human quality assessment, usually at a sentence level. SAE J2450, the LISA Quality Metric and perhaps the Butler Hill TQ Metric (that Microsoft uses extensively and TAUS advocates) are examples of this.(Can vary greatly depending on the humans involved.)

To this hot mess you could also add the “container” standards discussions, to further obfuscate matters. These include TMX, TBX, SRX, GMX-V, xml:tm etc.. Are any of these standards, even by the much looser definition of “standard” in the software world? If you look at the discussions on quality and standards in translation around the web we can see that a real dialog is difficult and clarity on this issue is virtually impossible.

But standards are needed to scale and handle the volume of translation that will likely be done and enable greater inter-process automation as we head into a world where we continuously translate dynamic streams of content. Free online MT services have given the global enterprise a taste for what translation as a utility looks like. Now some want to see if it can be done better and in a more focused way at higher quality levels to enhance global business initiatives and expand the dialog with the global customer. (I think it can be done much better with customized, purpose-driven MT working with and steered by skilled language professionals). Translation as a utility is a concept that describes an always-on, on-demand, streaming translation service that can translate high value streams of content at defined quality levels for reasonable rates. Data will flow in and out of authoring, content management, social networks, translation workflow, MT and TM systems.

As this new mode of production gains momentum, I believe that it would be useful to the industry in general to have a meaningful and widely used measure of relative translation quality i.e. average linguistic quality of a target corpus. This would facilitate the production processes for 10X and 100X increases in content volume, and allow LSPs to define and deliver different levels of quality using different production models. I am convinced that the best translation production systems will be man-machine collaborations, as we already know what free online raw MT looks like.(Useful sometimes for getting the gist of a text, but rarely useful for enterprise use). Skilled humans who understand translation automation tools and know how to drive and steer linguistic quality in these new translation production models can dramatically change this reality.

It would also be useful to have robust data interchange standards. I recently wrote an entry about the lack of robust data interchange standards that seemed to resonate. We are seeing that content on the internet is evolving from an HTML to an XML perspective. This makes it easier for content to flow in and out of key business processes. Some are suggesting soon all the data will live in the cloud and applications will decline in importance as translators zero in on what they do best and only what they do best: translate. Today, they too much time is spent on making the data usable today.

There are some data standard initiatives that could build momentum e.g. XLIFF 2.0, but these initiatives will need more volunteer involvement (as I was reminded by “Anonymous” to include people like me to actually walk the walk and not just talk about it) and broad community support and engagement. The problem is that there is no one place to go to for standards. LISA? OASIS? W3C? ISO TC37? How do we get these separate efforts to collaborate and produce single unified specifications that have authority and MUST be adhered to? There are others who have lost faith in the industry associations and expect that the standards will most likely come from outside the industry, perhaps inadvertently from people like Google and Facebook who implement an open XML-based data interchange format. Or possibly this could come from one of the open translation initiatives that seem to be growing in strength across the globe.

There are at least two standards (that are well defined and used by many) that I think would really be helpful to make translation as a utility happen:

A linguistic quality rating that is at least somewhat objective, can be easily reproduced/replicated and can be used to indicate the relative linguistic quality of both human translated and various MT systems output. This would be especially useful to LSPs to understand post-editing cost structures and help establish more effective pricing models for this kind of work that if fair to both customers and translators.
A robust, flexible yet simple data interchange standard that protects linguistic assets (TM, terminology, glossary) but can also easily be exported to affiliated processes (CMS, DMS, Web Content).

Anyway it is clear to me that we do need standards and that it is likely that this will require open innovation and collaboration on a scale and in ways that we have not seen yet. This can start simply though with a discussion right here, that could guide or provide some valuable feedback to the existing initiatives and official bodies.We need to both talk the talk and also walk the walk. My intent here is to raise the level of awareness on this issue though I am also aware that I do not have the answers. I am also focused on a problem that few see as a real one yet. I invite any readers who may wish to contribute to this forum to further this discussion, even if you choose to do this anonymously or possibly as a guest post. (No censorship allowed here)

Perhaps we should take heed of of what John Anster and W H Murray (not Goethe) said as we move forward:
Until one is committed, there is hesitancy, the chance to draw back, always ineffectiveness, concerning all acts of initiative and creation. There is one elementary truth, the ignorance of which kills countless ideas and splendid plans: that the moment one definitely commits oneself, then Providence moves too. All sorts of things occur to help one that would never have otherwise occurred.

Monday, May 10, 2010

The IBM/LIOX Partnership: A Review of the Implications

There has been a fair amount of interest in this announcement and I thought it was worth a comment. I have paraphrased some of the initial comments to provide some context for my own personal observations.

Some initial discussion of this was started by Dave in the GTS blog. There is speculative discussion on how Google might view this, what languages will be provided, and what kind of MT technology is being used in the discussion. There is a fairly active comment section on this blog as well. (Honestly, I cannot see why Google would care at all, if they even noticed it at all.)

TAUS announced that this was “an alliance to beat their competitors and improve top lines” and they point to the magical combination of SMT technology and huge TM repositories that this alliance creates. They also explain and warn that in the past we have seen that: “Whoever owned the translation memory software, ‘owned’ the customer. The unfortunate effect of this ‘service lock-in’ model was to block innovation and put the brakes on fair competition in our industry “ and they ask that the TM data created through this alliance also be copied into the TDA pool so that “we all own them collectively.” (Umm, yeah right, says @RoarELion).

This was also enough to get Mark at SDL to write a blog entry, where he informs us that basically, SDL knows how to do MT correctly, and that SDL is a real technology company with an ‘open’(sic) architecture. He then goes on to raise doubts about this new alliance being able to get to market in timely fashion, raises questions about the quality of the IBM SMT and also informs us that neither IBM or LIOX has any sales and development experience with this technology. He ends saying they have “a lot of work ahead of them.” Clearly he is skeptical.

The translator community has not been especially impressed by Translator Workspace (TW) or this announcement, and the two are being blended together in that community. The general mood is suspicious and skeptical because of the somewhat coercive approach taken by Lionbridge. ProZ is filled with rants on this. Jost Zetsche raises a question that he thinks LSPs who work with LIOX will have to ask: “They want me to give them a month-by-month rundown of how much I translate?" in his review of translation technology. He thinks that this fact will cause some resistance. (Umm, yeah, prolly so.) Why is that the largest companies in this industry continue to compete (or create conflict of interest issues) with their partners and customers?

CSA sees this as another customized MT solution where LIOX could leverage IBM's system integration expertise and they also see a potential “blue ocean” market-expansion possibility and note that the translation world needs to play in the Web 2.0 space. (Finally!)They also point to all the content that is not translated and once again allude to that wonderful $67.5B+ market that MT could create.

I found the comments from Pangeanic also quite interesting as they attempt to provide some analysis of who the winners and losers could be and provide a different and thought provoking view that is quite different from mine, but still related. Interestingly, they also point to the overwhelming momentum that SMT is gathering "Statistical MT always proves much more customizable or hybridable than other technologies".

Bill Sullivan of IBM provides some business use scenarios and calmness in his LISA interview and talks about creating a larger pie and bigger market and can’t understand what all the uproar is about.

Some of the key facts gathered from LIOX on Kevin Perry’s blog:

“With this agreement, Lionbridge will have a three year exclusive agreement that:
- gives LIOX the rights to license and sell IBM real-time technology
- a patent cross-licensing agreement
- A partnership that establishes Lionbridge as IBM’s preferred deployment partner for real-time translation technology and related professional services. “

This agreement focuses on the SMT technology not the old WebSphere RbMT : “RTTS (Real-Time Translation Service) is not based on RbMT technology. It is based on SMT (Statistical Machine Translation) technology. IBM has deployed RTTS internally through “n.Fluent,” a project that made the RTTS technology generally available to IBM’s approximately 400,000 employees for chat, email, web page, crowd sourcing, eSupport, blogs, knowledge portals, and document translation. It has been in pilot for the last 4-5 years. “

My sense is that this is not really new in terms of MT technology. It is old technology that was neglected and ignored, trying to make it into the light. IBM has been doing MT for 35 years (35 with RbMT and 5+ with SMT) and have made pretty much NO IMPACT on the MT world. I challenge you to find anything about MT products on the IBM web site without using search. IBM has a great track record for basic R&D and filing patents, but also a somewhat failed track record for commercializing much of this. SMT is one failure, there are many others including IBM PC, OS/2, Lotus Office Software, Token-Ring Networking etc… They do not have a reputation for agility and market-driving, world changing innovation IMO. IBM SMT has even done quite well against Google at the NIST MT system comparisons, so we know that they do have respectable Arabic and Chinese systems. But in spite of having a huge sales/marketing team focused on the US Federal Government market, they have not made a dent in the Federal SMT market. So what gives? Do they not believe in their technology or do they just not know how to sell/market nascent technology, or is $20M or so just not worth the bother?

Lionbridge also does not have a successful track record with SaaS (Freeway). Much of the feedback on Freeway was lackluster and the adjective I heard most often about it was “klunky” but I have never played with it so I can’t really say. The general sentiment against Translator Workspace today is quite negative. Perhaps this is a vocal minority or perhaps this is just one more translation industry technology fiasco. Generally, initiatives that do not have some modicum of trust among key participants, fail. Also, neither company is known to be the most cost-efficient players in any market they service, so buyers need to be aware of high overhead engagement.

There is the promise, as some have commented in the blogs, that finally this will be a way to get MT really integrated into the Enterprise. (Doesn’t MSFT count?) IBM does have considerable experience with mission-critical technologies but translation is still far from being mission critical. So while there are many questions, I do see that this initiative does the following:

Shows that SMT has rising momentum
Make it clearer that the translation market is bigger than SDL (software and documentation localization) and MT (really SMT) is a strategic driver in making that happen
Increase the market (and hopefully successes) for domain focused SMT engines. Also creates the possibility of SMT becoming a key differentiation tool for LSPs (once you know how to use it).
Increase the opportunity for companies like Asia Online who also provide an integrated “SMT + Human” technology platform for other technology oriented LSPs to emulate and reproduce the platform that LIOX has created here, and avoid being dependent on a competitors technology. (We will be your IBM :-) )

But I do think there are other questions that have been raised that should also be considered. The translation industry has struggled on the margin of the corporate landscape (i.e. not mission critical or corporate power-structure related) as there has never been any widely recognized leadership in the industry. I have always felt that the professional translation industry is filled with CEO’s but very few leaders.

The largest vendors in this industry create confusion, and conflicts of interest by being both technology providers on the one hand and competing service providers on the other. This slows progress and creates much distrust. Many also have very self-serving agendas.

So some questions that are still being asked (especially by TAUS) include:

Do you think this is a new “vendor lock-in” scenario?
Will this concentrate power and data in the wrong place, in an unfair way as TAUS claims?
Can translators or other LSPs get any benefit from this arrangement by working with LIOX?
Would most service providers prefer to work with pure technology providers?
What would be the incentive for any LIOX competitor to work with this?
Would LIOX exert unfair long-term pressure on buyers who walk into the workspace today?

So is this another eMpTy promise? Success nowadays is not built with just technology and being a large company. Remember that Microsoft was tiny when they challenged IBM on PC operating systems, many experts said that OS/2 was technically superior, but being a developer at the time I remember that IBM was really difficult to work with. Filled with vendor lock-in, bullying tactics and huge support costs. Microsoft was easy to work with and relatively open. The developers of course chose the “technically inferior” Microsoft, and the rest is history. Betamax was the battle that Sony lost to an "inferior" VHS. Microsoft in turn lost the search market to Google, because they were preoccupied with success in OS and Office software, even though Bill Gates predicted the internet would be the future. Your past successes can create a worldview that can sometimes be the cause of your future failures.

To my eye, this announcement is completely missing the most critical ingredients for success: collaboration, commitment and engagement from key elements in the supply chain. Apparently LIOX is trying to recruit the crowd (Work at Home Moms) while they also claim that TW is a good deal for professional translators. How is this interesting, beneficial for professional translators and other LSPs? Why should they trust their future to a platform run by a competitor who could change the rules? Why would LIOX's largest customers want to do all their processing with IBM whom they probably compete with?

So while it is vaguely possible that this could a turning point in the translation industry, I think it is too early to tell, and that buyers and partners need to be wary. The openness and the quality of this initiative have yet to be established. My sense is that the future is all about, collaboration, innovation and end-user empowerment and engagement. This still looks very much like the old command and control model to me, with minimal win-win scenario possibilities for partners. Actually, LIOX takes the "You give, we take" to a new extreme. But I may be wrong, big companies do sometimes get it right, and finally the proof of the pudding is in the eating.

What do you think? (I wonder if I will be invited to the LIOX party in Berlin?)

eMpTy Pages

Pages