Pages

Monday, May 20, 2013

Highlights from ELIA Munich ND - Translation Pricing & PEMT Process Management

My view of a conference is usually determined by the quality of the sessions related to MT and translation automation, or sometimes other sessions  that may trigger new thoughts on innovation and business process evolution. The ELIA conferences I have attended, stand out for me because I think they have better content in general than most, and one actually learns new things. To me it is clear that business translation is evolving beyond a focus on software and documentation localization (“the SDL mindset”) and I look for content that recognizes and addresses these emerging issues and market imperatives.

One of the most interesting sessions and perhaps the only one by a translation buyer was entitled “How Cloud TMSs are Changing the Relationship Between a Translation Buyer and LSPs” by Elina Lagoudaki of Turner Broadcasting. She described how cloud-based technology is used to manage a growing stream of digital media localization projects. Turner is a good example of a translation customer who has many small jobs (micro translation), often involving social media content and usually also closely  linked to dynamic web content that needs to go out in 15 languages. Elina presented her very organized and structured process to identify, administer and supervise translation projects and also provide final quality feedback to translators on an ongoing basis. Some things that she pointed out about her process included:
  • A preference for a SaaS or Cloud-based TMS solution (WordBee in her case) over inflexible, costly, arcane and management-heavy onsite solutions
  • The need for a management dashboard that allowed high level and job-specific status monitoring
  • A translation management environment that allows and facilitates collaboration between translators
  • A translation management environment that allows and facilitates online review and content sign-off
  • A translation management environment that allows and facilitates ongoing feedback to translators
  • A translation management environment that allows and facilitates that enabled terminology and TM collection and centralization
  • A translation management environment that allows and facilitates that facilitates vendor comparison and selection
For those who still have doubts about how much sense cloud-based solutions make for many customers, Elina presented a very clear and articulate view on how her cloud solution was superior (to archaic client-server solutions) not just in terms of cost, but also in terms of scalability and ease and speed of customization, for her unique requirements and needs. 

Some of the things that stood out from her presentation in my mind included:
  • Half the translation work was done by agencies and half directly with individual freelance translators sourced via ProZ  - and it was interesting that she used the phrase “trusted translators” to describe how a subset of the freelancers had risen to this status because they had tuned in to the writing style of the company, were reliable, and thus favored on an ongoing basis.
  • Elina also showed a slide (shown below) that showed the large variance in rates for the same language pair. This variance will of course raise questions in a buyer’s mind about whether there is a trade-off in quality or reliability of some kind, or is it just what they think the market will bear. This slide shows why the buyer should be wary and do the due diligence to understand what trade-offs they make if any,  with higher and lower prices.  LSPs should also take great care to properly understand their costs, define prices and link them to well defined quality/service deliverables, as collaboration tools like WordBee will make these comparisons easier and easier to do.
The most striking point she made to my mind was when she showed a slide of how different LSPs responded to an RFQ request where every agency was given the exact same job specification and was also promised that they would get 20+ more projects of this kind over the coming year. The translation task involved translation of 3 flash banners, which means there was very little text (5 –10 words at most per string) to translate but the translated text had to be placed in a Flash banner. So we are talking about maybe 30 words to be translated into 14 languages and delivered in a multimedia format. It is kind of shocking that she received quotes that ranged from $310 to $10,430 for the exact same job description. The actual price quotes she received are listed below in the slide she showed to show the wide variance in price quotes.
image
This points to several problems in the translation industry that range from completely random pricing practices, lack of understanding of multimedia content and translation tasks, price gouging, business model mismatches to sheer unprofessional behavior. She characterized this as “a wild west approach” in the market where anything goes. There were clearly some in the room that were upset at being exposed and I heard that some complained that too much information was shared. I think we will increasingly see more work involving multimedia content, coming in steady dribbles but critical to building trust and credibility with a customer. It turned out that the agency with the lowest quote also had a track record of success and reliability  with Turner, and thus probably understood multimedia issues much better, and so did not impose huge price penalties for  simply putting text into Flash. The companies with the highest price quotes clearly did not understand the complexity of the job or perhaps simply lacked scruples.

This is related to some extent, to an interesting UnSession discussion that I also attended where a group of LSPs (plus Elina and me), discussed how one could respond to a potential customer who said that they already had a translation agency they were working with. Much of the discussion focused on identifying “problems” with the current  vendor and thus displacing them, and to my mind only one of the LSPs had a compelling differentiation story. The session made three things clear to me:
  1. It is very easy to displace an LSP that is previously engaged with a customer if you can identify problems the customer is having with their current vendor.
  2. That quality and “service” are repeatedly used as differentiators but nobody can define either, in a way that is understandable or clear to a buyer.
  3. That very few LSPs understand the business of the customer and thus have great difficulty building trust.
The best strategy that I heard in the UnSession, was from an LSP who had a clear domain expertise & focus and who ONLY focused on building customer relationships in that domain, with a long tenured in-house team that were expert in the subject domain and thus could add overall business value in the translation process. I would bet that that particular agency is very hard to displace, and can charge premium prices, and are often viewed as real trusted extensions of their customer's organization. 

Building trust is a critical foundation for long-term success in a service business, and this requires that there is real transparency, clear communication and a collaborative and cooperative business approach.

  Post-Editing from the LSP Perspective highlighted many issues around the LSP experience of PEMT in the market today. The session had a strong focus on the management of the PEMT process which included things like managing quality and cost/price expectations with the customer, selection and training of post-editors, and ensuring source material quality is good, as this is an area that LSPs understand and action here can have a large impact on MT quality. Some highlights from the presentation:
  • Post-editors need to have a positive attitude (to MT), be flexible and  be “system-oriented” to provide constructive feedback,
  • The technical issues that the session focused on included capitalization, punctuation and there was much talk about the issues in handling tagging with MT which as messy with MT as it is with humans,
  • Several examples of MT output with various error types were shown so that others could understand the nature of the problems and the challenges,
  • The problematic issue of proper compensation was discussed and most felt it was easier to properly determine this after the project is done, though Edit Distance, BLEU, Average Words/Hour and other effort measurement approaches were also discussed. It is interesting to me that my own blog on this issue written in March last year is the most popular post on my blog even today. I find that using “trusted translators” to establish a priori rates, are a very reasonable and fair way to establish fair compensation rates. However, this does require some skill with proper sampling technique. For some very specific guidelines on how this could be done a priori check out this article from Asia Online.
  • A survey of ELIA members suggested that the average post-editor throughput was 5,189 words per day and that the range seen was from 1,500 to 10,000 words/day per post-editor. 
  • The presenters felt that there was an urgent need for a good PEMT tool that facilitates error detection and error correction, since it was felt that MT had very different error patterns than TM typically does.
  • The presenters also felt that dealing with low quality MT output was worse than TM 0% matches and should perhaps be penalized and charged at much higher rates, since the translator had to spend more time making this determination. Asia Online provides a solution for this problem by providing segment level confidence indicators and thus low quality segments could be pre-identified and processed differently to minimize the bad segment detection effort.
imageWhat was missing from this discussion was a focus on the HUGE impact that the MT system being used has on the post-editing experience.  While I admit that all the suggestions and findings presented at the session would be useful for almost any PEMT exercise, some MT systems are more adjustable and configurable and thus ensure a better and more productive PEMT experience.  I know that within the Asia Online experience, MT systems go through several rounds of tests on small representative data subsets AND corrective actions BEFORE being put into production. During this MT system refinement process, high frequency problematic error patterns are identified and addressed to both minimize post-editor frustration and maximize throughput and productivity. This molding of the MT system can only be done with some very selective MT systems  but I think this is a critical step if you wish to avoid tedious, repetitive errors like many shown in the sample slides and maximize your ROI. The slide to the side shows how a typical Asia Online system evolves and shows which error types are the easiest to correct. In general spelling, punctuation, capitalization and basic terminology errors are the easiest to permanently correct and the grammar and syntax errors are the hardest to completely fix.

I found the session by Diego and Guillem Vidal – NOVA  interesting, as here we have an LSP who has reached a level of competence with MT (with an expert partner) and are seeing that they can provide better productivity, better terminology control, faster turnaround and lower error rates even with medical domain content. Their actual experience resulted in a 6X increase in MT word volume over two years to 10 million MT processed words in 2012. It is refreshing to see this type of competence when we still see examples of ignorant claims being presented as fact in articles published in Multiingual.

I also had a fireside chat session with Renato where we discussed industry trends and much of the material we covered is summarized in a previous post where we talked about how volume is growing, continuing flows of micro translation tasks are increasing and how MT and automation are gaining by the day. One point we disagreed on was about the impact of  “new” translation focused ventures like Smartling,  Cloudwords and Lingotek. I feel these initiaves are all very interesting and relatively innovative, and make the whole translation services purchase and management  process much easier and simpler. Renato felt that while they had succeeded in raising money and had a “technology story”, they had yet to prove that they could provide the same level of “service”. Given that nobody can really define “service” with anything resembling clarity, I think it is quite possible that some these new ventures could displace some LSPs (Multi-Language Vendors – MLV) and become the new aggregators of translation purchasing activity  because they do the following things well:
  1. Simplify the translation purchasing process (without the slow and laborious and often customer hostile TEP mindset where the customer is always wrong),
  2. Eliminate the need for buyers and agencies and translators to keep multiple suites of incompatible translation CAT tools on hand, by simply ingesting translation related content into their technology infrastructure straight from the content creation systems (CMS) and return the translated content straight back to the customer CMS via straightforward web-based interfaces,
  3. Handle small projects as well as large projects with equal ease and efficiency,
  4. Provide collaboration focused software infrastructure for translators, buyers and project managers in the cloud, so real work related conversations can happen without hundreds of emails with receipt notifications being used,
  5. Enable translators to spend most of their time focusing on linguistic work rather than dealing with file format conversions, tag management and data transformations before they actually get to the translate step,
  6. Easily handle multimedia, video and mobile content which will continue to grow in importance,
  7. Greater facility to handle, and mix and match different customer content types to different production methodologies which include TEP, customized MT, crowdsourcing and productive and efficient PEMT.
Elina’s slide on what she would like to see in the future are clear indications of what lies beyond the SDL (software and documentation localization) world for a modern buyer: more competence with multimedia, new business models for microtranslation,  more innovation from tools vendors and better standards (so that data can flow more quickly and easily in and out of translation processes).
image
We live in a world where faster and cheaper production at “reasonable quality” is beginning to be linked to business survival. Companies that don’t get it done in time or don’t get enough done in time lose market share.   As the volume of smaller (SME) companies going global increases, they will likely find these new portals much easier to work with, rather than have to go to the arcane and archaic client-server software world of SDL et al.  Innovation matters more and more and while I cannot say with any real assurance that these new portals are THE winners of the future, I would bet on them over those with the SDL mindset. Innovation usually means making it simpler and more efficient. You can see this lack of enthusiasm from investors reflected  in the stock market performance of  both LIOX and SDL as they trade at market capitalizations way below their annual sales. Even Google is working as an aggregator for video subtitling projects in addition to their widely used MT which I assure you many translators use on a regular basis. In Stefan’s session it was mentioned that the greatest trigger for organizational change is reaction to competitive action, but in this industry it seems that change is sneaking up in a way that many don’t even realize it is happening.

I learnt that the people of München like their beers and potato balls large, in fact very large, as you can see from this photo of Irina Voronova’s hand versus the potato ball which was about the size of an American softball.
20130503_211720
I asked Stefan Gentz, who is second to none in terms of conference attendance what he thought the best conferences were from all those that he had attended , and his almost instant reply was that GALA Miami was the best in terms of balancing both quality content with great networking opportunities.

I also got to walk around a bit and wandered into the English Gardens which inspired Hans Cousto to develop his theories about The Cosmic Octave which later led to the creation of the planetary series of gongs made by Paiste. For those in the know, Germany is a leader in the science of sound healing, something which is gaining acceptance as a way to deal with a variety of illnesses.  As somebody who is interested in really good sound and an amateur musician who plays the sitar I find this quite interesting, and even fascinating, and I really appreciate a culture and the people who would place a well tuned piano in a public park so that anybody could walk up and play it. In the brief time I was there I saw some very accomplished pianists walk up and play Bach and Beethoven, and also some who played that old favorite “ Chopsticks”.
PhotoGrid_1368912268209
For a completely different view of the conference from the lovely ladies of WTH (who some lucky attendees got to meet au naturel in the sauna) check out their blog post on the event.

Wednesday, April 3, 2013

PEMT Case Study - Advanced Language Translation

The most active advocates of machine translation today are Fortune 100 companies especially in the IT industry and the translation agencies that serve them.  The large IT companies have used MT more widely than any other group. However, MT can also be used by smaller LSPs outside of this sphere, especially when they collaborate with experts. This is an example of one such case study which provides many specifics that might be illustrative and educational for others.

Corporate Translation & Localization Services

Advanced Language Translation (www.advancedlanguage.com) is a Rochester, NY based Language Service Provider (LSP) which has skillfully incorporated MT (machine translation) into its production process, after years of resisting the technology. CEO Scott Bass admits that this anti-MT stance caused them to miss out on some larger projects, as customers increasingly looked for service providers with a coherent automation strategy. Customers were looking for a partner who understood how to deploy machine translation in order to output cost-effective and high volume translation projects. After much debate, the company finally decided to jump on the MT train.

ALT began the process by identifying certain customers who were open to a collaborative PEMT (post-edited machine translation) production model. They then began to work with Asia Online in the summer of 2012 to develop MT engines for the selected clients. For ALT’s first MT project, engines were simultaneously developed for French, Spanish, Russian and Japanese; however, there were some issues that needed addressing in order to ensure successful completion of the project. The greatest challenge initially was the scarcity of data available to build and train the MT systems; and in fact, data volume was so limited that the likelihood of producing usable systems with raw SMT (statistical machine translation) approaches like Moses, was nil. The other challenge was building an engine for Japanese, as it is considered an especially difficult language for MT.

To remedy these issues, ALT collaborated with Asia Online to develop a terminology-driven data manufacturing strategy. They worked to build up critical data resources that enabled productivity enhancing systems to be developed, and they leveraged relevant monolingual data that was readily available to boost the engines’ capabilities in the domain of interest. ALT relied on the broad and deep experience of the Asia Online team to maximize and leverage their limited data assets and resources.

Additionally, ALT focused on using translators who had previous PEMT experience rather than using ones who either had no PEMT experience, or were not interested in working with PEMT output. Prior to establishing production deadlines and appropriate compensation rates, ALT sent several samples of MT output to the post-editors to ensure that the scope and difficulty of the work was well understood. Bass notes, “Many companies rush into ‘instant MT’ solutions, overlooking the fact that MT takes time to develop, and coordination among all parties. While it is possible to leverage MT systems once they have been built, practitioners must understand that there is a direct relationship between this initial effort and ongoing success with the engine.” He adds, “This outlook is critical to successfully leveraging MT in the long-run, and lack of it, is one of the main reasons why MT initiatives fail.”

ALT also allowed post-editors to set their own throughput rates based on their experience with the MT output samples produced by the customized systems. They discovered that on average this process resulted in throughput rates of 750 words per hour (6,000 words per day). For Japanese, the rate was lowered to 500 words per hour, as the MT systems produced lower quality output when translating between English and Japanese. After the throughput and MT quality issues were resolved, compensation was addressed by giving the editors a 25% premium over standard human editing rates. These parameters were established to the satisfaction of all parties for this initial “test” project; and it turned out to be successful on all accounts due to cooperation and skilled implementation.

image

PEMT Best Practices

Scott Bass summarizes lessons learned and gives advice for others undertaking MT initiatives:

  • Do not rush MT engine development. A higher quality engine takes longer to develop and may require multiple iterations to build it into a usable engine.
  • Pro-actively manage the expectations of all the people involved, including clients, project managers, post-editors and LSP sales and marketing personnel.
  • Ensure that post-editors understand the very specific nature of the work.
  • Ensure that MT output levels reach a quality level similar to a light to moderate cleanup of a human translated segment.
  • Collect as much data as possible including TMs, in-domain monolingual data in the target language and core terminology. (ALT used MemoQ LiveDocs to quickly build corpora.)
  • Test the MT engines and benchmark them prior to starting actual production work.
  • Give the post-editors insight into the kinds of edits they will have to make by producing examples with smaller representative test data sets.
  • Focus on minimizing the most frequent errors first and understand that dumb repetition can kill enthusiasm faster than anything else.
  • Ensure that the MT engine is improving through feedback from post-editors. Ask for their feedback often and give them plenty of time and attention.
  • Retune and retrain the MT engine quickly and as frequently as possible. 
  • Make sure that the strengths of MT are clearly understood, and manage any weaknesses throughout the process.

Overall Benefits

ALT is a fantastic example of a company who has leveraged MT properly. The company has demonstrated that when MT is used with skill and when human factors are carefully managed, the benefits go beyond mere increases in productivity.  ALT has found that overall business with accounts who ventured into MT has increased by over 75%. Bass notes, “In many cases, we gained preferred vendor status because we added MT to our service mix.”

Bass also emphasizes that sitting on the fence with regard to machine translation enabled ALT to deny the possible benefits of an MT-HT production model for far too long. Tackling the business and human challenges first were actually the most difficult facets of shifting ALT’s production model. In fact, Bass comments, “The process of customizing an MT engine is not that much different than undertaking formal terminology development or managing high-quality translation memories. Extending our toolset to include MT has been a natural extension of skills we already had in place as an LSP.”

To hear an online presentation of this case study you can also go to the Asia Online website.

 

Tuesday, February 26, 2013

Dispelling MT Misconceptions

MT in 2013 is still a complex affair requiring many skills, expertise and understanding that are not commonplace, to enable successful deployment as a productivity enhancing technology for business translation needs. While it has become much easier to build basic custom engines using a variety of Instant Moses solutions or by creating a dictionary for a RbMT, there are still very few who know how to coax MT system output to consistent productivity enhancing levels. Getting some kind of a basic engine up and running is NOT the same thing as having a production-ready post-editor friendly system. There are even fewer who know what to do if the first MT attempt does not work, or is lackluster. Most of these basic/instant MT systems are inferior to basically free online MT from Microsoft and Google. Building long-term productivity and strategic production advantage require much more skill, expertise and experimentation than most LSPs or users have access to, or care to invest in.   While it is sometimes possible for a user to get usable MT output after throwing some data into an instant MT/Moses engine, it is not common, even for “easy” languages like Spanish as several TAUS case studies reveal. 

It is my sense that MT is still complex enough that meaningful expertise can only be built around one methodology i.e. RbMT or SMT and that anybody who tells you that they can do both should be viewed with some skepticism. It is almost certain that they cannot do both well, and also quite likely they cannot do either well if they claim expertise in both, since very different kinds of skills are required. Specialization and long-term experience is necessary to build real competence with either approach.

We have reached a point today, where many more MT systems are successful, but we also have many mediocre systems that do not provide any long-term production/productivity leverage and can easily be duplicated by any competitor with minimal investment. Today it is quite easy to find many (usually bad) examples of free/instant MT but the best custom systems are still not widely known or commonplace. Good MT system development takes work and ongoing investment and require overall process modifications, communication and expectation management, not only technology investments.

Recently we have seen some articles in the blogosphere and even the mainstream professional translation press that continues to provide what I believe is a lop-sided and even a somewhat disingenuous view of the verifiable use and known best practices of various MT technologies. (This link gets you to full article). In this particular case it is somewhat clear that the author has a preference and a bias favoring an RbMT approach where value-add is generally limited to building dictionaries. 

The misinformation is typically around the following concepts:
  • Rules-Based vs. Statistical MT Comparisons
  • The scope and extent of possibilities with instant MT customization
  • The degree of expertise and experience required to develop skills in any of these  approaches

Firstly let me state my own biases:
  1. I think the Rules-based MT vs. Statistical MT arguments are largely irrelevant, even though I think it is increasingly evident that SMT is becoming the preferred approach, especially as more linguistics are added to the data-driven approach. To a great extent most systems out there except for raw Moses systems are all hybrids of some sort.  Recently MT technology has evolved to a point where SMT and RBMT concepts are being merged into a single ”hybrid” approach. While there is some overlap in these approaches, there are two primary hybrid models in use today.
  2. a) RbMT with SMT smoothing tacked on after the RBMT translation is completed, such as with Systran to help improve the fluency and quality of the often clumsy raw RbMT output and,
    b) Linguistically informed rules that modify source text before SMT processes it and that guides the SMT processes and additional rules after SMT processing takes place to perform normalization and adjustments to translation output where required. Or the newer syntax and morpho-syntactic SMT approaches which have shown limited success and are still emerging.
     
    Finally, what really matters is how much productivity does an MT system offer, and the RbMT vs. SMT issue is largely irrelevant. The objective is to get translation work done faster and more cost effectively.

  1. In the right hands, both approaches (RbMT or SMT) can work for projects where MT is suitable. However, there are many more user controls and much simpler options available to tune MT systems in the SMT world.
  2. In general I would say that it makes sense to specialize in one MT (SMT or RbMT) approach and go deep to understand what you can control and how it works rather than do shallow and instant approaches. It takes work and extensive experimentation to develop real expertise in either approach and there is nobody I know in the industry who can do both well.  So choose RbMT or SMT and figure out what it takes to make it really work.
Some of the specific claims made and disinformation in the Multilingual article referenced above that I would challenge and dispute are as follows:

“In our experience, languages such as Japanese and German perform best with an RBMT approach” This was actually true in the early SMT days (~2005-2007) but is simply not an accurate truism anymore. I have seen custom SMT (if done right) outperform customized RbMT systems in both these languages even when large amounts of data are not available.

if you do not have enough data — we're talking millions of segments of in-domain bilingual and monolingual segments — you may not have enough corpora to train an SMT engine This seems to me to be a statement often made by people who have little or very shallow experience with SMT. In the large majority of SMT systems I have been involved with this amount of training data volume was simply not available. However, it is possible to get productivity enhancing SMT engines with even just 50,000 segments if you know what you are doing. This is possible even for languages like Japanese and Russian as Scott Bass of Advanced Language Translation points out in this webinar, where this was done with a fraction of the data mentioned in this misleading statement. A large majority of Moses MT engines, especially those of the instant kind, produce MT systems that are inferior to the free MT provided by Google and Microsoft. This is more likely to be related to a lack of understanding about the technology rather than any fundamental deficiency in the basic technology or the data as the Multilingual article suggests. If data privacy or copyright is not an issue, most LSPs would probably be better of using the Microsoft Hub option over using some generic instant MT option or some LSP managed Moses effort. 

“If the terminology is fixed in a narrow domain such as automotive or software documentation, RBMT or a hybrid is generally the best choice. This is because the rules component protects terminology better”  While this may be true for systems developed by naïve Moses users, many SMT experts like Asia Online have figured out that terminology really matters and know how to use it. Most of the corporate SMT systems out there focus exactly on automotive and IT product user documentation of various kinds, in addition to unstructured content. It is in fact possible to build a single Automotive engine (at Asia Online) and then tune it for different clients (Toyota, Honda etc..) and have the preferred terminology dominate IF you know what you are doing. See the diagram below for example.
image

“Wild West content where the terminology runs all over the map and would be impossible to train for, such as patents, works better with SMT. ”  This again suggests the authors lack of experience with patent domain and basic unfamiliarity with SMT technology. The largest terminology effort I have seen was with a patent engine where tens of thousands of scientific and technical terms were carefully translated to ensure accurate and useful translation of patent material. SMT benefits greatly from good, consistent terminology work and we have several customers (e.g. Sajan) who have gone on record to say that terminology consistency was one of the major benefits of an Asia Online engine. In fact the strategy deployed by Asia Online in data scarce situations usually begins with a tightly focused terminological foundation. 

“However, if there are metadata tags, you should be aware that SMT doesn't preserve tags well, so RBMT or hybrid technology will save you some headaches.”  While this  may be true for many Moses efforts made by technically naïve and unskilled users, any SMT developer worth his/her salt knows how to easily resolve this problem. Asia Online handles all the formatting tags in XLIFF and TMX automatically and also provides a variety of tools that allow power users to do sophisticated handling of different kinds of formatting.

“Today's SMT systems are still hampered by a lack of predictability, which means that translators waste a lot of time verifying terminology that already ought to be automatically verified.”  Asia Online ran an experiment a few years ago using TDA data from multiple sources. It was discovered that combining data or using noisy data of any kind produces much lower quality MT systems.Understanding how to get the data clean and building a quality foundation makes on-going maintenance and update of the engine much easier and largely eliminates this unpredictability. We also discovered that consistent terminology in the TM ensures much higher quality results and thus at Asia Online we now have tools to ensure this. Again, if you know what you are doing this is a manageable issue and after you have built a few thousand engines you realize that unpredictability can be managed by data cleaning and ensuring terminological consistency. Kevin Nelson, Managing Director of Omnilingua, stated in a webinar that the terminology and writing style produced by his Asia Online MT system was even more consistent than a human only approach. This was specifically noticed by his end-client who contacted Omnilingua directly without prompting to discuss how they had accomplished recent improvements in qualityimage
“When post-editing SMT, that next training cycle may be six months or a year away because you usually want a fair bit of new data accumulated before you begin the process of retraining. In this case, the post-editors are not empowered to make lasting changes and it typically takes until the next training cycle to see any progress at all”.  This may actually be true for many Moses systems and for most naïve users of instant MT solutions. But for the higher value-add systems like the ones produced by Asia Online this is not true. There are two ways that SMT based systems can incorporate corrective feedback:
  1. Real-time corrections that are used on each job and can easily be done by translators every single time they run a translation. Since there is no additional cost for retranslating the same content at Asia Online, users are encouraged to resubmit the translation until it is in better shape to hand over to a post-editor. Many dumb and high-frequency error patterns can be corrected instantly by some simple analysis and corrections based on small test translation runs.
  2. Periodic retraining which is done when sufficient corrective feedback is available. Incremental Trainings with Asia Online can be performed in just a few days and can be performed with just a few thousand segments to show meaningful improvements especially with terminology and high-frequency phrases.
image
image

Perhaps the biggest misconception of all is that More Data is Always Better.  We now have much more evidence that this is frequently not true. Even Google, the high priest of big data, admitted this some time ago: "We are now at this limit where there isn't that much more data in the world that we can use"

So be careful to not believe everything you read (including on this blog) and if you take more than a glancing look at MT technology today you will probably understand that while it is becoming much simpler to play and experiment with MT, it is still a long way from being easy to produce production-quality systems that provide long-term business leverage. Do not underestimate the expertise requirements to be successful with MT, and realize that even after jumping in with Asia Online or others it will  take ongoing changes in process and human factor management to really achieve long-term cost advantages and build sustainable business leverage.  The reward for those who figure this out will be clear differentiation and long-term production cost advantages that others with instant MT or home-brewed Moses systems will never be able to match.

MT is messy and not quite as predictable as most want it to be yet. You have to have a stomach for uncertainty and are probably better off with "real experts" than people who say they can do it all and are "technology agnostic". And the next time you see an article that says they have all the answers for you and that for a nominal service charge you could reach nirvana tonight just tell them: "Don't you jive me with that cosmic debris!". 

Watch this video and feel your face melt at 4:50 when the guitar solo happens.