Pages

Showing posts with label conferences. Show all posts
Showing posts with label conferences. Show all posts

Tuesday, June 7, 2016

The ABRATES Conference in Rio: Translators focusing on MT

I had the honor of participating in the 7th ABRATES International Translation and Interpreting Conference in Rio de Janeiro last week. An event that had over 500 attendees, based on my casual observation. A large portion of the attendees were translators, but there were also some LSPs and Enterprise representatives. As much of the information was presented in Portuguese I had direct experience with simultaneous translation via a headset which was also kind of cool, and it was fun to switch around when I was less interested in the actual subject matter.


The formidable, emotion packed sign language interpretation by Paloma Bueno, intensely focused simultaneous interpreter volunteers in the booth, and the abounding loveliness of Rio.

I found the conference surprisingly refreshing for several reasons including:
  • The high level of understanding that many translators had about MT, Post-Editing practice and their general attitude that it is better to understand and use translation technology than fight it or fear it.
  • The beautiful location, as Rio is a naturally scenic and inviting spot.
  • An emotionally powerful sign language interpretation of the keynote session by Paloma Bueno who I cannot believe was doing this in real time.
  • The eagerness and openness of many translators present, to try and understand how they as translators could engage and work with MT and develop meaningful expertise in MT related skills.
  • The willingness to explore and understand how translation technology will continue to evolve and possibly impact their professional work.
  • Several conversations with translators who had long term experience with MT and thus had direct knowledge of MT systems that improved over time and had also seen both good and bad MT engines over the years, so were much more coherent in their criticism.
  • The shared experience of many different kinds of MT encounters from a variety of translators, ranging from DIY horror, experts systems that slowly evolved in quality gradually over years, and some proprietary efforts that produce astonishing quality.  
  • The presence of several very competent presentation sessions on developing MT related skills including:
    • Corpus Preparation for MT training
    • Working with the varying quality of PEMT output that translators get from LSPs
    • Using REGEX (Regular Expression) to develop more powerful text based editing skills when deal with corpora
    • PEMT best practices and tools and shared experiences
I also found this conference special, because I personally had no corporate allegiance at the event and was truly just an independent spokesperson with some knowledge of MT technology and it’s potential and place within the context of many of the attendees professional lives. As I am no longer employed or affiliated with Asia Online I felt very comfortable sharing my opinions, with no concern about persuading anybody to go one way or another. My opinions were all truly independent and the truest expression of what I try and do in this blog, i.e. provide useful and relevant information to inquiring minds. So while I am indeed looking for professional work, I am really enjoying this independence and focus on what really matters.

It was interesting to find that when one has this kind of openness and lack of bias as a presenter, there is an opening of the perception, and I was able to see much of what I was saying with a new and fresh eye. It was like playing improvised music to a keen and attentive audience, the shared attention of the musician and the audience creates a new, more evolved, version of an existing musical idea. I will share some of those insights in upcoming posts.

I also understood much more clearly that most often, translators have very little control of the content they are given to translate, because of the current structure of the professional translation business which is usually: Enterprise > MLV (Big Agency) > SLV (Small Agency) > Translator. Thus translators are often left to deal with poor quality source which cannot by contract be corrected or changed, work with crappy MT output produced by DIY practitioners who do not know how to actually do it themselves, or have no say in how the MT engines evolve since they are so far down the production line. Thus we have the current situation of unnecessarily mind numbing PEMT work, rather than evolving and rapidly evolving MT technology from more efficient production processes. And very often the extremely valuable linguistic feedback that translators provide is lost or ignored. An MT paradigm that organizes and collects valuable translator feedback will surely be more competitive and produce higher quality and benefit to all concerned. Not to mention that it will be personally rewarding for the many translators who will need to be involved, as the nature of the problems they solve will evolve in value and impact from the typical LSP project.


Plenary session on MT
 
I had an interesting experience during a plenary session panel on MT where all the other speakers were speaking in Portuguese, so I had to have a headset to understand what they were saying. When I started speaking, the interpreters of course started speaking in Portuguese, and I found it very strange and unsettling to hear a voice saying everything I was saying in English in Portuguese in real time. Somebody once said that MT is magic which I felt deserved some scorn, but to me this act of listening and translating into another language in the instant, not knowing what I was going to say, was surely closer to magic.

If this conference is an indicator of what is happening in the professional translation world, it is very promising for several kinds of translation technology initiatives. I have always felt, much to the chagrin of my former employers, that the real promise of MT will be seen when translators seek it out and learn to steer, drive and enhance the ongoing evolution of this technology. If this conference is really only representative of the Brazilian reality with translation technology, then I predict that the most exciting advances in MT will come from those working with Portuguese. This community is primed for the most interesting new Adaptive MT initiatives like Lilt which can empower motivated and technically savvy translators.

You can find some Twitter coverage of the event by searching on the hashtag #abrates16  or if you look up the following accounts:

http://twitter.com/AlberoniTrans 
http://twitter.com/oscarcurros 
http://twitter.com/sidney_barros

https://storify.com/oscarcurros/abrates16-rio  

  

Ipanema Street Market

Friday, March 28, 2014

Impressions from the Localization Summit at the Games Developer Conference




I recently attended the GDC conference and were impressed to see how advanced the tools and thinking on translation and localization was amongst the game developer community. This is a new frontier for the professional translation industry and it is amazing to see how quickly games developers are learning about localization issues and developing tools and methodologies to rapidly make their products viable in international markets. It is commonplace to see development tools that are truly both multi-platform capable and multilingual capable in active use throughout the community. Their comfort level with multimedia (video, voice) data, multi-platform development (mobile, desktop and TV) issues and willingness to consider technology was quite refreshing, and we think it is quite possible that this community will drive translation and localization technology advances much more rapidly than the traditional software and documentation community where things change much more slowly. While the use of MT is still very limited, the exploration of the viability is much more pragmatic and informed than the torturous trail we have seen in the traditional documentation translation business. This is perhaps because things move much more rapidly in the games business and most products have short and intense life cycles and behooves the developers to  quickly capitalize on international opportunities while their games are hot. We predict that MT will be more frequently used there in future as developers become more informed about the limitations and possibilities of MT.


==============================================================
This is a guest post by Lauren Scanlan that provides a quick summary of some of the localization related presentations at GDC.Lauren Scanlan is currently a Freelance Localization Professional and manga proofreader looking to dive into the wide, fun world of game localization. She's based in Denver, where she spends her time baking, reading, and wrangling the odd dinosaur. You can find her online at @lsscanlan, laurenscanlan.com, or connect with her on LinkedIn at www.linkedin.com/in/lsscanlan.




On March 18th I attended the Localization Summit at Game Developer's Conference (GDC) 2014, which, in truth, was the whole reason I came to GDC in the first place. I've been working in the localization industry for three years, and as an avid fan of video games, I thought that combining the two might be a good fit for me – and I wanted to see what the state of localization is within the gaming world.

The Summit, organized by Fabio Minazzi (Localization Summit Advisory Board Member and Account Manager at game localization service provider Binari Sonori) had a variety of talks – from emerging markets (Localizing Games for Spanish Speaking America, Emerging Communities: A Snapshot of the Brazilian Indie Game Development Scene), to culturalization (Journey to the West: A Chinese Game Localization Primer) to how localization can be improved, either through communication with an LSP (Indie Games Localization: Is It Worth It?) or improving tools and processes within your own localization department (The Future of Localization Testing, LAMS: Building a Localization Tool for Everyone – both talks given by Sony Computer Entertainment Europe senior employees). 

There was also a series of five mini-talks called Localization Microtalks: Globetrotting in the Fast Lane, which covered indie game localization, an open-source localization frameworks, localized advertising, mo-cap dubbing technology and processes, and using app description localization as a tool to “test the waters” for localization. All of these talks should be available as part of the GDC vault, and are well worth a watch for those with Vault access!

The two final talks of the day were uncomfortable for some of the people in the room – Crowdsourcing the Localization of Gone Home and What is the Place of Machine Translation in Today's Gaming Industry? Both took a look at methods that do not necessarily rely only on trained linguists to complete translations, and have the potential to make localization accessible to others without the budget (more so in the crowdsourcing talk than the machine translation panel), even if it is not 100% perfect or within total control of the creators.

It seems to me that, while things seem to be going well in game localization, there are areas for improvement. The first concentration seemed to be on how to get indie games, and especially indie mobile games, out to international markets, including emerging markets, which may not yet have a framework for distribution which works well for the consumer. The issues that face most indie games are cost management for the studio and accessibility for the gamers. Belén Agulló García, Language Production Manager at Pink Noise and Jonas Waever, Creative Director at Logic Artists (Indie Games Localization) mentioned that, as localization is a part of marketing, it made more sense to use the marketing budget for their localization. I thought this was a fantastic idea, since (as all indie studios that participated in the Summit would attest) they received sizable returns on their localization investment, which could then be invested back into marketing. 

Martina Santoro from Okam Game Studio and Alejandro Gonzales, CEO and Studio Director at Brainz (Localizing Games for Spanish Speaking Latin America) mentioned that some mobile games, especially freemium games, were hard to bring into the Latin American market because of the limited payment methods. Prepaid cards and vouchers seem to be popular, but this is certainly an issue that needs to be addressed as new markets start opening up. For entirely free-to-play games, or games on platforms that are already taken care of, this may not be an issue – and may make the market more enticing. Accessibility is going to be a growing issue for both localization service providers and distribution platforms, and I think a collaborative effort to grow in these new markets and find solutions would be great.

The next area for improvement seems to be in translation and culturalization. For example, contrary to what I've done in general localization, it seems that since the app store only lists two kinds of Spanish (Mexican Spanish and Spanish), it's better to localize an app into neutral Spanish to make it easily understandable for the LATAM region. This illustrates well that distribution platforms play a large role in how effective (or not!) localization can be – though I was also pleasantly surprised to learn that the App Store and Google Play will spotlight and promote localized games in the relevant market, which has meant huge success for those games overall. Shaun Newcomer, Vice President of Reality Squared Games (Journey to the West), gave a great talk on some of the culturalization issues of bringing Chinese games to the US market. For example, games in China usually have a short life cycle and high monetization, and the storylines can be of middling quality, since the games are not expected to last very long. However, US/Western consumers dislike high monetization in the interest of fairness, and look for a higher quality storyline, leading Newcomer to amend both of these things in-house before releasing games to the US market. I had never before considered aspects like monetization as a part of the localization process, but it makes sense to find out which markets lean more toward monetization and which shy away from it. Crid Yu, Vice President and Managing Director of North America for InMobi (Localization Microtalks) mentioned that it was also important to look at alternative marketing methods for each market, citing as an example using subway advertisements for mobile games in Seoul, South Korea. I thought this was a great idea – having been to Seoul myself and seeing the amount of uber-connected Seoulites with their phones out on the subway, always looking for the next new thing.

Finally, the last issue is how much Machine Translation should play a part in game localization. I've been working in Linguistic QA for three years now, and when I first started, I was trained on how to use SDL tools like Trados and Studio as part and parcel of the localization process. Since we usually leverage it only on material that needs to be the same year after year (employee satisfaction surveys) or technical specs (medical devices) that generally don't change, it makes the localizations far more cost-effective for our clients. However, games are full of mostly creative, non-repetitive text, which makes this method dicey, as the quality can be lessened and the text can sound stale if it's always repeated. One of the best uses that came up within the panel (What is the Place of Machine Translation in Today's Gaming Industry?) was for live chat – it would be great to use an automated system to instantly replace the most common words in a sentence, or the most common sentences – that way, players on European servers (for example) could easily get the main points across to each other while not interrupting the gameplay. Another good application, I think, would be to keep a TM of UI terms that won't often (if ever) change, such as “Play,” “Continue,” “Game Over,” etc. Also, depending on the intended quality/lifecycle/churn of the game, it may be better for some companies to use machine translation and go with an imperfect translation if it gets the game through faster... though, as a Linguistic QA-er, I am certainly not advocating lower-quality games!! 

 I also thought Christopher Burgess, Senior Programmer at SCEE (LAMS) did a wonderful job talking us through the struggles with building the perfect project management/localization software, showing that even customizable tools can be a challenge to work with to get the results you want.  Even when using Machine Translation, the users need to be well-trained and always should think critically about what they're using the MT software for. 

Overall, I learned quite a bit through the Localization Summit, and I think it's a wonderful addition to the GDC programming. It was great to meet so many people who were just starting out in game development, or just about to localize, in the audience! I'm glad that non-localization people are taking the time to educate themselves about localization and their options, and I only hope that others join them next year. I'll certainly be there!

Thursday, June 14, 2012

Thoughts on an MT technology presentation at ALC New Orleans, May 2012

This is a guest post by Huiping Iler who I had the pleasure to meet in New Orleans last month who made a very interesting presentation on how to increase the intrisnic value of an LSP firm. She runs a language services firm that is one of the growing fold of LSPs who have direct experience with post-editing MT output, and see an increasing role for MT in the future of her business.  I should add that while her own feedback on my presentation here is quite flattering, there were also others who commented through the regular feedback process that my slides were too dense and information filled, and one who even felt that my presentation was a “thinly disguised sales pitch”. (I assure you Sir, it was not.) It is difficult to find a balance that makes sense to everybody and all feedback is valuable. The pictures below come from the wonderful photographic eye of Rina Ne’eman taken during her visit to New Orleans.

--------------------------------------------------
558695_10150900208229885_1009527471_n
It was a real delight listening to Kirti Vashee from Asia Online presenting on the ROI of Machine Translation – Scoping and Measuring MT. It took place at the most recent annual Association of Language Companies conference in New Orleans between May 16-20, 2012.

Kirti pointed out that:

  • Much of the today’s business content is dynamic and continuously flowing.
  • The need for real time international -language content cannot be met by human translators alone due to cost and time restraints.
  • Machine translation (MT), especially statistical machine translation is gaining traction among enterprises that have large amounts of data to translate.
  • IT companies and travel review sites are examples of early adopters of statistical MT.
  • Compared to any general or free MT tools out there such as Google, an enterprise MT tool and service like Asia Online is highly customizable and adaptable to unique customer needs
  • It gives clients much more control on terminology, non-translatable terms, vocabulary choice and writing style. As a result, it produces much higher accuracy and translation quality, especially in highly specialized and focused domains.
This echoes the feedback I heard from one of wintranslation’s enterprise clients who has been using statistical MT for the last few years. Our translation team have been tasked with post editing, providing corrective feedback to the client’s MT engineering team for continuous improvements.

According to translators who have mastered the art of editing machine translation, post editing raw output requires a different skill set than the traditional editing of human translations. 

As a starter, text selected for MT often tends to be “low visibility.” Kirti gave an example that for a travel review site, the four or five star hotel reviews are human translated while the lower star hotel reviews are machine translated with some or no human post editing. 

Other low visibility text examples include car service manuals that not everybody reads, or web-based support content. High visibility (and typically low volume) text such as marketing communications, rarely if ever, gets selected for machine translation. 

In the situation of translating low visibility text, particularly in technical communication, it is more important for the text to be technically accurate than stylish. It is a case where the translation might sound awkward but technically correct IS acceptable, as long as translation efficiency is maximized without hurting accuracy.
521315_10150909353029885_507237146_n
But translators new to post editing may be tempted to edit the text for not only accuracy but also flow and style. It leads them to spend more time than necessary on the text and they are also more likely to complain about the quality of MT output. After all style and flow is not the strength of MT but speed and consistency is. It is important to have an agreement with the human post editors what is good enough (i.e. technical accuracy only, not style). Improved productivity and lower cost are very important to clients using MT. The best post editors understand this and can deliver a high number of edited words per hour that meet quality standards. 

One of wintranslation’s MT post editors commented, “When I have to review a translation, either done by a human or by a machine, I do not try to make it sound like if I wrote it. I mostly correct errors, terminology inconsistencies, awkward style, problems with conveying the intended meaning and issues that really bother me. If we are able to have that mindset, then it will be less cumbersome to review machine-translated text. If we have the tendency to rewrite the translation, then the editing will be time-consuming and cumbersome.” It sums up the ideal attitude a post editor should have.

Consistency is one of machine translation’s core strengths. When set up properly, non-translatable text, like numbers, acronyms and product names are reliably consistent throughout the translation. It is an area that MT can outperform human translators.
For example,
Source: Migration information for JKJ 5.x
MT Target: Información sobre migraciones para JKJ 5.x
When a post editor reviews this text, she/he knows that for sure “JKJ 5.x” is correct and she/he doesn’t have to worry about it being translated as “JKJ 6.x” or “JKJ 5.s.” This is not always the case when reviewing human translations, because the editor will always have to double check the product name and version etc.

The absence of spelling errors in machine translated text is a distinct advantage that saves time. But it is a good practice to spellcheck the translation before delivery, because post editors could have introduced typos while inputting corrections.

When a post editor finds an error pattern, communicating it with the client will help training the engine and improving results for the future. For example, in one of the MT text, the term “wireless” is always translated into Spanish as “productos inalámbricos,” which in most cases is wrong. The post editor quickly identifies and fixes the error. Because this error happens often enough to be a pattern, it is submitted to the client for dictionary updating. This and other types of pattern based corrective work can greatly enhance the overall production efficiency of post-editing work.

When words are not in the right order in the translated text, it is best the post editor just drag them to the right place and that way he/she doesn’t have to retype them and delete them from the wrong place. This saves time.

Example
Source: Where to buy ABC Anti-Theft Service related products.
MT Target: Dónde comprar ABC contra robo servicio productos relacionados.
In this case, “productos relacionados” needs to be moved towards the beginning of the sentence, the post editor just highlights the two words and drag them to their right place. She also needed to move the word “servicio” and make a few quick fixes.
Final Target: Dónde comprar productos relacionados con el servicio ABC contra robo.
When the upfront linguistic set up work has been inadequate or there is a lack of ongoing communication between the post editing team and the MT engineers, it produces a lot of frustrations for MT editors and creates unnecessary delay.

For example, a Brazilian Portuguese translator noticed the MT software was often using European Portuguese vocabulary even though the text is intended for the Brazilian market (there are significant spelling differences between Brazilian Portuguese and European Portuguese).

For instance, acção (should be ação), gestores de projectos (should be gerenciadores de projetos), etc.

She asked why the machine translation software “was not told” about that. This ability to provide feedback to the MT system is a key ingredient to getting better results and raising editor productivity and satisfaction. The best results of MT come from close collaboration between the engineering and linguistic post editing team.

The translator mentioned above also found inconsistencies in the translation of key terms such as product names. “Green Power Management” was translated as Energia verde Management, Verdes gerenciamento de energia, and Verdes poder Management. Some editing of the translation memory to reduce such inconsistency would speed up the posting editing process a lot.

In terms of productivity gains, it varies from language to language. In Spanish and Portuguese for example where MT has made more inroads, one can expect as high as 50% productivity increase in terms of number of words translated per hour assuming the MT engine has been properly set up and trained. But gains are harder to come by in Asian languages. 

There is a real and imminent opportunity for translation companies to offer real-time translation services for select type of content that is out of reach for human translations due to time and cost. The linguistic training of statistical translation engines and developing post MT editors are key pieces in realizing that opportunity.
544765_10150912191064885_1060226544_n
On a side note, I cannot help but noticing Mr. Vashee’s passion and sharing of MT expertise is contagious. He is one of the finest craftsmen in the sales and marketing field of technology and translation; an empathetic communicator, he is always able to see things from his clients’ eyes; when in the company of translation company owners, he presents possibilities to use a tool like Asia Online to generate new revenue and create differentiation (ask which translation company owner doesn’t like to hear that); he satisfies the data driven analytical types with numbers and return on investment measured in quality metrics and dollars; he has an amazing ability to stay insightful and relevant in a conversation while sticking to his value proposition; he is an outstanding marketer and an entrepreneur’s dream pitch man. 


About Huiping Iler:
Huiping Iler is the president of wintranslationTM, a Canadian based translation company *specializing in information technology and financial services. wintranslation has been coordinating post editing of machine translated text for the last several years.
clip_image001
Canadian translation company wintranslation 

Friday, October 28, 2011

The Building Momentum for Post-Edited Machine Translation (PEMT)

This is an (opinionated) summary of interesting findings from a flurry of conferences that I attended earlier this month. The conferences were the TAUS User Conference, Localization World and tekom. Even though it is tiring to have so many so close together, it is interesting to see what sticks out a few weeks later. For me TAUS and tekom were clearly worthwhile, and Localization World was not, and I believe that #LWSV is an event that is losing it’s mojo in spite of big attendance numbers.

Some of the big themes that stand out (mostly from TAUS) were:
  • Detailed case studies that provide clear and specific evidence that customized MT enhances and improves the productivity of traditional (TEP) translation processes
  • The Instant on-demand Moses MT engine parade
  • Initial attempts at defining post-editing effort and difficulty from MemoQ and Memosource
  • A future session on the multilingual web from speakers who actually are involved with big perspective, global web-wide changes and requirements
  • More MT hyperbole
  • The bigger context and content production chain for translation that is visible at tekom
  • Post-editor feedback at tekom
  • The lack of innovation in most of the content presented at Localization World
 The archived twitter stream from TAUS (#tausuc11) is available here, the tekom tag is #tcworld11 and Localization World is #lwsv. Many of the TAUS presentations will be available as web video shortly and I recommend that you check some of them out.


PEMT Case Studies
In the last month I have seen several case studies that document the time and cost savings and overall consistency benefits of good customized MT systems. At TAUS, Caterpillar indicated that their demand for translation was rising rapidly and thus they instituted their famed controlled language (Caterpillar English) based translation production process using MT. The MT process was initially more expensive since 100% of the segments needed to be reviewed but they are now seeing better results on their quality measurements from MT than from human translators on Brazilian Portuguese and Russian according to Don Johnson, Caterpillar. They expect to expand to new kinds of content as these engines mature.

Catherine Dove of PayPal described how the human translation process got bogged down on review and rework cycles (to ensure PayPal brand’s tone and style was intact) and was unable to meet production requirements of 15K words per week with a 3 day turnaround in 25 languages. They found that “machine-aided human translation” delivers better, more consistent terminology in the first pass and thus they were able to focus more on style and fluency. Deadlines are easier to meet and she also commented that MT can handle tags better than humans. They also focus on source cleanup and improvement to leverage the MT efforts and interestingly the MT is also useful in catching errors in the authoring phase. PayPal uses an “edit distance” measurement to determine the amount of rework and have found that the MT process reduces this effort by 20% on 8 of 10 languages they are using MT on. An additional benefit is that there is a new quality improvement process in place that should continue to yield increasing benefits.

A PEMT user case study was also presented by Asia Online and Sajan at the Localization Research Conference in September 2011. The global enterprise customer is a major information technology software developer, hardware/IT OEM manufacturer, and comprehensive IT services provider for mission critical enterprise systems in 100+ countries. This company had a legacy MT system developed internally that had been used in the past by the key customer stakeholders. Sajan and Asia Online customized English to Chinese and English to Spanish engines for this customer. These MT systems have been delivering translated output that even beats the first pass output from their human translators due to the highly technical terminology, especially in Chinese.  A summary of the use case is provided below:
  • 27 million words have been processed by this client using MT
  • Large amounts of quality TM (many millions of words) and glossaries were provided and these engines are expected to continue to improve with additional feedback.
  • The customized engine was focused on the broad IT domain and was intended to translate new documentation and support content from English into Chinese and Spanish.
  • A key objective of the project was to eliminate the need for full translation and limit it to MT + Post-editing as a new modified production process.
  • The custom engine output delivered higher quality than their first pass human translators especially in Chinese
  • All output was proof read to deliver publication quality.
  • Using Asia Online Language Studio the customer saved 60% in costs and 77% in time over previous production processes based on their own structured time and cost measurements.
  • The client also produces an MT product, but the business units prefer to use Asia Online because of considerable quality and cost differences.
  • Client extremely impressed with result especially when compared to the output of their own engine.
  • The new pricing model enabled by MT creates a situation where the higher the volume the more beneficial the outcome.
The video presentation below by Sajan begins at 27 minutes (in case you want to skip over the Asia Online part) and even if you only watch the Sajan presentation for 5 minutes you will get a clear sense for the benefit delivered by the PEMT process.

A session on the multilingual web at TAUS by the trio Bruno Fernandez Ruiz, Yahoo! Fellow and Vice President, Bill Dolan, Head of NLP Research, Microsoft, Addison Phillips, Chair, W3C Internationalization Group / Amazon also produced many interesting observations such as:
  • The impact of “Big Data” and the cloud will affect language perspectives of the future and the tools and processes of the future need to change to handle the new floating content.
  • Future applications will be built once and go to multiple platforms (PC, Web, Mobile, Tablets)
  • The number of small nuggets of information that need to be translated instantly will increase dramatically
  • HTML5 will enable publishers to be much freer in information creation and transformation processes and together with CSS3 and Javascript can handle translation of flowing data across multiple platforms
  • Semantics have not proven to be necessary to solve a lot of MT problems contrary to what many believed even 5 years ago. Big Data will help us to solve many linguistic problems that involve semantics
  • Linking text to location and topic to find cultural meaning will become more important to developing a larger translation perspective
  • Engagement around content happens in communities where there is a definable culture, language and values dimension
  • While data availability continues to explode for the major languages we are seeing a digital divide for the smaller languages and users will need to engage in translation to make more content in these languages happen
  • Even small GUI projects of 2,000 words are found to have better results with MT + crowdsourcing than with professional translation
  • More translation will be of words and small phrases where MT + crowdsourcing can outperform HT
  • User s need to be involved in improving MT and several choices can be presented to users to determine the “best” ones
  • The community that cares about solving language translation problems will grow beyond the professional translation industry.

At TAUS, there were several presentations on Moses tools and instant Moses MT engines via a one or two step push button approach. While these tools facilitate the creation of “quick and dirty data” MT engines, I am skeptical of the value of this approach for real production quality engines where the objective is provide long-term translation production productivity. As Austin Powers once said, “This is emPHASIS on the wrong syllABLE" My professional experience is that the key to long-term success (i.e. really good MT systems) is to really clean the data and this means more than removing formatting tags and removing the most obvious crap. This is harder than most think. Real cleaning also involves linguistic and bilingual human supervised alignment analysis. Also, I have seen that it takes perhaps thousands of attempts across many different language pairs to understand what is happening when you throw data into the hopper, and that this learning is critical to fundamental success with MT and developing continuous improvement architectures. I expect that some Moses initiatives will produce decent gist engines, but are unlikely to do much better than Google/Bing for the most part. I disagree with Jaap’s call to the community to produce thousands of MT systems, what we really need to see are a few hundred really good, kick-ass systems, rather than thousands that do not even measure up to the free online engines. And so far, getting a really good MT engine is not possible without real engagement from linguists and translators and more effort than pushing a button. We all need to be wary of instant solutions, with thousands of MT engines produced rapidly but all lacking in quality and "new" super semantic approaches that promise to solve the automated translation problem without human assistance. I predict that the best systems will still come from close collaboration with linguists and translators and insight borne from experience.

I was also excited to see the initiative from MemoQ to establish a measure of translator productivity or post-editing effort expended, by creating an open source measurement of post-edited output, where the assumption is that an untouched segment is a good one. MemoQ will use an open and published edit distance algorithm that could be helpful in establishing better pricing for MT post-editing and they also stressed the high value of terminology in building productivity. While there is already much criticism of the approach, I think this is a great first step to formulating a useful measurement. At tekom I also got a chance to see the scheme that MemSource has developed where post-edited output is mapped back to a fuzzy matching scheme to establish a more equitable post-editing pricing scheme than advocated by some LSPs. I look forward to seeing this idea spread and hope to cover it in more detail in the coming months.

Localization World was a disappointing affair and I was struck by how mundane, unimaginative and irrelevant much of the content of the conference was. While the focus of the keynotes was apparently innovation, I found the @sarahcuda presentation interesting, but not very compelling or convincing at all in terms of insight into innovation. The second day keynote was just plain bad, filled with clichés and obvious truisms e.g. “You have to have a localization plan” or “I like to sort ideas in a funnel”. (Somebody needs to tell Tapling that he is not the CEO anymore even though it might say so on his card). I heard several others complain about the quality of many sessions, and apparently in some sessions audience members were openly upset. The MT sessions were really weak in comparison to TAUS and rather than broadening the discussion they succeeded in mostly making them vague and insubstantial. The most interesting (and innovative) sessions that I was witness to were the Smartling use case studies and a pre-conference session on Social Translation. Both of these sessions focused on how the production model is changing and both were not particularly well attended. I am sure that there were others that were worthwhile (or maybe not), but it appears that this conference will matter less and less in terms of producing compelling and relevant content that provides value in the Web 2.0 world. This event is useful to meet with people but I truly wonder how many will attend for the quality of the content.

The tekom event is a good event to get a sense for how technical business translation fits into the overall content creation chain and also see how synergies could be created within this chain. There were many excellent sessions and it is the kind of event that helps you to broaden your perspective and understand how you fit into a bigger picture and ecosystem. The event has 3300 visitors so it is also a much larger perspective in terms of many different views points. I had a detailed conversation with some translators about post-editing. They were most concerned about the compensation structure and post-editor recruitment practices. They specifically pointed out how unfair the SDL practice of paying post-editors 60% of standard rates was, and asked that more equitable and fair systems be put into place. LSPs and buyers would be wise to heed this feedback if they want to be able to recruit quality people in future. I got a close look at the MemSource approach to making this more fair, and I think that this approach which measures the actual work done at a segment level should be acceptable to many. This approach measures the effort after the fact. However, we still need to do more on making the difficulty of the task before the translators begin more transparent. This begins with an understanding of how good the individual MT system is and how much effort is needed to get to production quality levels. This is an area that I hope to explore further in the coming weeks.

I continue to see more progress on the PEMT front and I now have good data of measurable productivity even on a language pair as tough as English to Hungarian. I expect that a partnership of language and MT experts will be more likely to produce compelling results than many DIY initiatives, but hopefully we learn from all the efforts being made.

Tuesday, September 27, 2011

The Growing Interest & Concern About the Future of Professional Translation

I have noticed of late that every conference has a session or two that focuses on the future. Probably because many sense that change is in the air. Some of you may also have noticed that the protest from some quarters has grown more strident or even outright rude, to some of the ideas presented at these future outlook sessions. The most vocal protests seem to be directed at predictions about the increasing use of machine translation, anything about “good enough” quality and the process and production process changes necessary to deal with the increasing translation volume. (There are still some who think that the data deluge is a myth). 

Some feel personally threatened by those who speak on these subjects and rush to kill or at least stab the messenger. I think they miss the point that what is happening in translation, is just part of a larger upheaval in the way global enterprises are interacting with customers. The forces causing change in translation are also creating upheaval in marketing, operations and product development departments as many analysts have remarked for some time now. The discussion in the professional translation blogosphere is polarized enough (translators vs. technology advocates) that dialogue is difficult, but hopefully we all continue to speak with increasing clarity, so that the polemic subsides. The truth is that none of us really knows the definite future, but that should not stop us from making educated (or even wild) guesses at where current trends may lead. (I highly recommend you skim The Cluetrain Manifesto to get a sense for the broader forces at play.)
Brian Solis has a new book coming out that describes the overall change quite succinctly. The End of Business As Usual (his new book) explores each layer of the complex consumer revolution that is changing the future of business, media, and culture. As consumers further connect with one another, a vast and efficient information network takes shape and begins to steer experiences, decisions, and markets. It is nothing short of disruptive.
I was watching the Twitter feed from two conferences last week (LRC XVI in Ireland and Translation Forum in Russia)  and I thought it would be interesting to summarize and highlight some of the tweets as they pertain to this changing world and perhaps provide more clarity about the trends from a variety of voices and perspectives. The LRC conference had several speakers from large IT companies who talked about their specific experience, as well as technology vendor and LSP presentations. For those who are not aware, CSA research identifies IT as one of the single largest sectors buying professional translation services. The chart below shows the sectors with the largest share of global business. This chart is also probably a good way to understand where these changes are being felt most strongly.
image

Here are some Twitter highlights from LRC on the increasing volume of translation, changing content, improving MT and changing translation production needs. I would recommend that you check out @therosettafound for the most complete Twitter trail. I have made minor edits to provide context and clarify abbreviations and have attempted to provide some basic organization to the tweet trail to make it somewhat readable.
image
@CNGL Changing content consumption and creation models require new translation and localisation models – (according to) @RobVandenberg
@TheRosettaFound We are all authors, the enterprise is going social - implications for localisation?
@ArleLommel Quality even worse than Rob Vandenberg says: we have no real idea what it is/how to measure, especially in terms of customer impact
Issue is NOT MT vs. human translation (HT). It's becoming MT AND HT. Creates new opportunities for domain experts.
Dion Wiggins. LSPs not using MT will put themselves out of business? Prediction: yes in four/five years
CNGL says 25% of translators/linguists use MT. I wonder how many use it but say they don't use it due to (negative) perception (with peers)?
Waiting for translation technology equivalent of iPhone: something that transforms what we do in ways we can't yet imagine.

Tweets from Jason Rickhard’s presentation on Collaborative Translation (i.e. Crowdsourcing) and IT go Social.
@TheRosettaFound Jason of Symantec giving the enterprise perspective, added 15-20 languages to small but popular product, built tech to support this. Not just linguistic but also legal, organizational issues to be resolved in collaborative, paid-for product.
Is collaborative translation bad & not-timely? #lrcconf Not so, a lot of translators = involved users of the content/product they translate.
Review process is different in collaborative translation. Done by voting, not by editors
The smaller the language gets, the more motivated volunteer translators are and the better collaborative translation works.
Is volunteering something for people who don't have to worry that their day-to-day basics are covered?
Does collaborative translation and collaboration mean that content owners "give up the illusion of control" over their content?
Enterprises do collaborative translation for languages they would/could not cover otherwise - true, for both profit and non-profits
Collaborative/Community will not replace existing service providers but open up more content for more languages
Language Service Providers could play an important role in community translation by building, supporting, moderating communities
It's not correct to say Community Translation = bad; Professional Translation = good
Microsoft appoints moderators with a passion for the project/language for community localization
>1,200 users translated Symantec products into 35 languages
If >1,200 were needed to translate 2 small-ish products, how can millions of translators translating 1 ZB be 'managed'?
@ArleLommel Symantec research: Community involvement in support often leads to ~25% reduction in support spend
“Super users” are what make communities scalable. Key is to identify/cultivate them early in the process
Jason Rickard: Dell is a good example of using Facebook for support. One of few companies with real metrics and insight in this area.
Jason Rickard: Symantec has really cool/systemic/well-thought ways to support community


@TheRosettaFound 21st generation localisation is about the user, about user-generated content - Ellen Langer: Give up the Illusion of Control
@ArleLommel Illusion of control? You mean we can have even less control that we have now? That's a scary thought!
@TheRosettaFound The most dramatic shifts driven by the web happened because communities took over - Imagine: 100000s of user translators translating billions of words into 100s of languages - control that!
Seems the deep and complex problems of localisation are a minute drop in the ocean of digital content management
@CNGL Discovery, analysis, transformation - Alex O'Connor tells how CNGL is addressing the grand challenges of digital content management
@TheRosettaFound Is the L10N industry due for a wave of destruction or positive transformation?
@ArleLommel Yes, Most of the mainstream technologies for translators are non-ergonomic and still in 20-year-old paradigms

Tweets from Tony Allen, Product Manager Intel Localisation Solutions presentation
@TheRosettaFound 30+ langs >200k pages >40% localised @ Intel's web presence. Intel: important to have user-driven content, interaction with the customer. Integration important, e.g. multilingual support chat. Integration, Interoperability key issues for Intel L10N. To figure out how content flows, without loss of metadata, interoperates with internal/external range of systems, is crucial.
2.5b netizens, >15b connected devices >1 zetabyte of traffic by 2015 and companies will interact with their customers using social media - type setups; new challenges for localization.
#intel What does it mean for localization infrastructures if we have >1 zetabyte of content in 2015? Current methods won't keep up
@ArleLommel #intel says that interoperability standards are required for cloud to meet future demands. L10n must evolve to meet this need too.

@ArleLommel Alison Toon (#hp) puts it this way: “localization (people) are the garbage collectors of the documentation world”
@TheRosettaFound 600GB of data in Oracle's Translation Platform - We need concise well-structured content - then we're going to be able to deliver efficient translation services - How to get it right: analyze content, identify problems and throw it back into the face of writers and developers. I18N and l10n have to get into the core curriculum at Universities says Paul Leahy (of Oracle), since we spend too much time teaching it.

Tweets from Sajan / Asia Online MT presentation
@TheRosettaFound MT cannot perform magic on bad source text - user-generated non-native-speaker content is often 'bad'
MT errors make me laugh... but human errors make me cry - an old quote from previously recycled presentations... Asia Online
Dirty Data SMT - what kind of translations would you expect? If there are no humans involved you are training on dirty data, says Asia Online. Sajan achieved 60% reduction in costs and 77% time savings for specific project - a miracle? Probably no, let’s see.
Millions of words faster, cheaper, better translated by Sajan using Asia Online - is this phenomenal success transferable? How?
XLIFF contributed to the success of Sajan/Asia Online's MT project. Asia Online's process rejected 26% of TM training data.

Tweets from Martin Orsted, Microsoft presentation
@TheRosettaFound Cloud will lead to improved cycle times and scalability: 100+ languages, millions of words
Extraordinary scale: 106 languages for the next version of Office. Need a process that scales up & down in proportion.
Microsoft: We have fewer people than ever and we are doing more and more languages than ever
Martin: "The Language Game - Make it fun to review Office"... here is a challenge :) Great idea to involve community via game
How can a "Game" approach be used for translation? Levels of experience, quality, domains, complexity; rewards?
No more 'stop & go', just let it flow @robvandenburg >>Continuous publishing requires continuous translation. New workflows

Tweets from Derek Coffey, Welocalize presentation Are we the FedEx or the WallMart of words?
@TheRosettaFound TMS of SDL = burning stacks of cash - Reality: we support your XLIFF, but not your implementation
Lack of collaboration, workflow integration, content integration = most important bottle necks. Welocalize, MemoQ, Kilgray and Ontram working on reference implementation - Derek: Make it compelling for translators to work for us
It's all about the translators and they will seek to maximise their earning potential according to Derek.

Tweets from Future Panel
@TheRosettaFound Many translators don't know what XML looks like
Rob: more collaborative, community translation - Rob: Users who consume content will have a large input into the translation BINGO
Tony: users will drive localisation decision, translation live
Derek: future is in cooking xxx? Open up a whole new market - user generated, currently untranslatable content. HUGE market
Derek: need to re-invent our industry, with focus on supply chain
The end of the big projects - how are we going to make money (question from audience)
From service/product to community - the radical change for enterprises, according to Fred
No spark, big bang, revolution - but continuous change, Derek
Big Spark (Dion): English will no longer remain the almost exclusive source language

image
The Translation Forum Russia twitter trail has a much more translator oriented focus and is also bilingual. Here are some highlights below, again with minor edits to improve readability.

@antonkunin Listened to an information-packed keynote by @Doug_Lawrence at #tfru this morning. As rates keep falling, translators' income keeps rising.
@ilbarbaro Talking about "the art of interpreting and translation" in the last quarter of 2011 is definitely outdated
Language and "quality" are important for translators, speed and competence for (final) clients. Really?
Translators are the weakest link in the translation process
Bert: here and now translation more important than perfect translation
Bert on fan subbing as an unstoppable new trend in translation
Is Bert anticipating my conclusions? Noah's ark was made and run by amateurs, RMS Titanic by professionals
Carlos Incharraulde: terminology is pivotal in translators training < Primarily as a knowledge transfer tool
To renatobeninatto at who said: Translation companies can do without process standards < I don't agree
@renatobeninatto: Start looking at income rather than price/rates
Evaluating translation is like evaluating haircuts - it's better to be good on time than perfect too late
Few translation companies do like airlines: 1st class/ Economy/ Discount rates – Esselink
Traditional translation models only deal w/ tip of iceberg. New models required for other 90%. Esselink
Good enough revolution. Good enough translation for Wikileaks, for example. Bert Esselink
In 2007 English Russian was $0.22 per word, in 2010 it dropped to $0.16 @Doug_Lawrence
There's much talk on innovation but not much action - don't expect SDL and Lionbridge to be innovative
@Doug_Lawrence all languages except German and French decreased in pricing from 2007 to 2010
@AndreyLeites @ilbarbaro problem-solving is the most important feature translator should acquire - Don't teach translators technology, teach them to solve problems - language is a technology, we need to learn how to use it like technology - 85% of translators are still women
@ilbarbaro 3 points on quality: 1. Quality is never absolute, 2. Quality is defined by the customer, 3. Quality can be measured - it is necessary to learn to define quality requirements (precisely)
@Kilgraymemoq announces that they will open Kilgray Russia before the end of the year

This is of course my biased extraction from the stream, but the original Twitter trail will be there for a few more weeks and you can check it out yourself. It is clear to me from seeing the comments above, that at the enterprise level, MT and Community initiatives will continue to gather momentum.  Translation volumes will continue to rise and production processes will have to change to adapt to this. Also, I believe, there are translators who are seeking ways to add value in this changing world and I hope that they will provide the example that leads the way in this changing flux.

And for a completely different view of "the future" check this out.