Thursday, December 29, 2016

The Machine Translation Year in Review & Outlook for 2017

“I have given the record of what one man thought as he pursued research and pressed his hands against the confining walls of scientific method in his time. But men see differently. I can at best report only from my own wilderness.”

---- Loren Eiseley

This is a slightly expanded version of a presentation, given together with Tony O'Dowd of Kantan, who focused more on Kantan's accomplishments in 2016 earlier this month. The Kantan webinar can be accessed in this video, though be warned the first few minutes of my presentation has some annoying scratchy audio disturbance. Tony speaks for the first 23 minutes and then I do, followed by some questions after that.

2016 was actually a really good year for machine translation technology, as MT had a lot more buzz than it has had in the past 10 years and some breakthrough advances in the basic technology. It was also the year I left Asia Online, and got to engage with the vibrant and much more exciting and rapidly moving world of MT outside of Thailand. As you can see from this blog, I had a lot more to say after my departure. The following statements are mostly just opinions (with some factual basis) and I stand ready to be corrected and challenged on every statement I have made here. Hopefully some of you who read this may have differing opinions that you may be willing to share in the comments.

MT Dominates Global Translation Activity

For those who have any doubt about how pervasive MT is today, (whatever you may think of the output quality), the following graphic makes it clear. To put this in context, Lionbridge reported about 2B words translated in the year, and SDL just informed us earlier this month that they do 100M words a month (TEP) and over 20B+ words/month with MT. The MT vendor translated words, together with the large public engines around the world would probably easily make over 500B MT words a day! Google even provided us some sense of what the biggest languages if you look closely below. My rough estimation tells me that this means that the traditional translation industry does about 0.016% of the total words translated every day or that computers do ~99.84% of all language translation done today.

Neural MT and Adaptive MT Arrive

2016 is also the year that Neural MT came into the limelight. Those who follow the WMT16 evaluations and people at the University of Edinburgh already knew that NMT systems were especially promising, but once FaceBook announced that they were shifting to Neural MT systems, it suddenly became hot. Google, Microsoft and Baidu all acknowledged the super coolness of NMT, and the next major announcement was from SYSTRAN, about their Pure NMT systems. This was quickly followed a month later by Google's over the top and somewhat false claim to produce MT as good as "human-quality" translation from their new NMT system. (Though now, Mike Schuster seems to be innocent here, and looks like some marketing dude is responsible for this outrageous claim.) This claim sent the tech press into hyper-wow mode and soon the airwaves were abuzz with the magic of this new Google Neural MT. Microsoft also announced that they do 80% of their MT with Neural MT systems. By the way the list also tells you which languages account for 80% of their translation traffic. KantanMT and tauyou also started experimenting and SDL also has been experimenting for some time, BUT experiments do not a product make. And now all the major web portals are focusing on shifting to generic NMT systems as quickly as it is possible.

2016 also saw the arrival of production AdaptiveMT systems. These systems though based on phrase-based SMT technology, learn rapidly and dynamically as translators work. A company called Lilt has been the first to market and current market leader with Adaptive MT, but SDL is close behind and 2017 could present quite a different picture. ModernMT, an EU based initiative has also shown prototypes of their Adaptive MT. Nobody has been able to build any real momentum with translators yet, but this is technology is a very good fit for an individual translator.

2016 also saw Microsoft introduce several speech-to-speech related MT offerings, and while this has not gotten the same publicity as NMT, these speech initiatives are a big deal in my opinion, as we all know from Star Trek that speech is the final frontier for automated translation.

The Outlook for 2017 and Beyond

Neural MT Outlook

Given the improved output quality evidence and the fact that FaceBook, SYSTRAN, Google, Microsoft, Baidu, Naver and Yandex are all exploring and deploying Neural MT, it will continue to improve and be more widely deployed. SYSTRAN will hopefully provide evidence of domain adaptation and specialization with PNMT for enterprise applications. And we will see many others try and get on the NMT wagon but I don't expect that all the MT vendors will have the resources available to get market ready solutions in place. Developing deliverable NMT products requires investment that for the moment is quite substantial and will require more than an alliance with an academic institution. However, the NMT success even at this early stage, suggest it is likely to replace SMT eventually as training and deployment costs fall or as quality differentials increase.

Adaptive MT Outlook

While Phrase-Based SMT is well established now, and we have many successful enterprise applications of it, this latest variant looks quite promising. Adaptive MT continues to build momentum in the professional translation world and it is the first evolution of the basic MT technology that is getting positive reviews from both experienced and new translators. While Lilt continues to lead the market, SDL is close behind and could change the market landscape if they execute well. ModernMT may also be a player in 2017 and will supposedly create an open source version. This MT model is driven by linguistic and corrective feedback, one sentence or word at a time, and thus is especially well suited to be a preferred MT model for the professional translation industry. It can also be deployed at an enterprise level or at an individual freelancer level and I think Adaptive MT is a much better strategy than a DIY Moses approach. Both Lilt and SDL have better baselines to start with than 95% (maybe 99%) of the Moses systems out there, and together with active corrective feedback can improve rapidly enough to be useful in production translation work. Remember that a feedback system improves upon what it already knows, so the quality of the foundational MT system also really matters. I suspect in 2017 these systems will outperform NMT systems, but it would be great for someone to do a proper evaluation to better define this. I would not be surprised if this technology also supersedes the current TM technology and becomes a cloud-based Super-TM that learns and improves as you work.

Understanding MT Quality

Measuring MT quality continues to be a basic challenge and the research community has been unable to come up with a better measure than BLEU. TAUS DQF is comprehensive but too expensive and complicated to deploy consistently, thus not as useful. Both Neural MT and Adaptive MT cannot really be accurately measured with BLEU, but practitioners continue to use it in spite of all it's shortcomings because of longitudinal data history. We are seeing that small BLEU differences are often seen as big improvements by humans with NMT output. Adaptive MT engines that are actively being used can have a higher BLEU every hour, and probably what matters more is how rapidly the engine improves, rather than a score at any given point. There are some in the industry who diligently collect productivity metrics and then also reference BLEU and something like Edit Distance to create an Effort Score that over time can become very meaningful and accurate measurement in a professional use context. A more comprehensive survey is something that GALA could sponsor to develop much better MT metrics. If multiple agencies collaborate and share MT and PEMT experience data we could get to a point where the numbers are much more meaningful and consistent across agencies.

Post-Editing Compensation

I have noticed that a post I wrote in 2012 on PEMT compensation remains one of the most widely read posts I have ever written. PEMT compensation remains an issue that causes dissatisfaction and MT project failure to this day. Some kind of standardization is needed, to link editing and correction effort and measured MT quality in a trusted and transparent way. Practitioners need to collect and gather productivity, editing effort and automated score data to see how these align with payment practices. Again, GALA or other industry associations can collaborate to gather this data and form more standardized recommendations. As this type of effort/pay/quality data is gathered and shared, more standardized approaches will emerge and become better understood. There is a need to reduce the black art and make all the drivers of compensation clearer. PEMT compensation can become as clear as fuzzy match related compensation in time, and will require good faith negotiation in the interim while this is worked out. While MT will proliferate and indeed get better, competent translators will become a more scarce commodity and I do not buy the 2029 singularity theory of perfect MT. It would be wise for agencies to build long-term trusted relationships with translators and editors, as this is how real quality will be created now and in future.

Wishing you all a Happy, Healthy & Prosperous New Year

Thursday, December 22, 2016

SYSTRAN’s Continuing Neural MT Evolution

Recently, I had the opportunity and kind invitation to attend the SYSTRAN community day event where many members of their product development, marketing, and management team gathered with major customers and partners.

The objective was to share information about the continuing evolution of their new Pure Neural MT (PNMT) technology, share detailed PNMT output quality evaluation results, and provide initial customer user experience data with the new technology. Also, naturally such an event creates a more active and intense dialogue between company employees and customers and partners. This, I think has substantial value for a company that seeks to align product offerings with its customer's actual needs.

Ongoing Enhancements of the PNMT Product Offering

The event made it clear that SYSTRAN is well down the NMT path, possibly years ahead of other MT vendors, and provided a review of the current status of their rapidly evolving PNMT technology.

Some highlights from my perspective:

Alex Rush from the Harvard NLP Department made an enthusiastic and valiant attempt to explain how NMT works, but I think lost most of the audience by the time he got to how LM/Softmax works and how dense features enable discrete predictions. Ironically the crowd was pretty glassy-eyed and stupified by the time he got to attention mechanisms, LSTMs and the thousands of hidden dimensions underlying each machine translation. However, for those who seek these kinds of details, Harvard and SYSTRAN are making much of this information available as open source via the OpenNMT project at Harvard University.

Jean Senellart, SYSTRAN CEO and CTO, described that they had already moved to the second generation production versions of the PNMT engines that had been released in October, and that they continued to see meaningful improvements in output quality, even though they are not using Internet-scale training data volumes.

He also announced 15 new language pairs that have been completed to make for a total of more than 45 language pairs.These new languages include French <> Chinese, English <> Russian, English <> Hungarian, English <> Hebrew, among others.

He pointed out how they had overcome one of the major hurdles of deploying NMT technology: the real-time translation throughput issue. SYSTRAN has found a way to deliver a production PNMT engine that runs only slightly less than 20% slower than their current hybrid (PB-SMT) systems.

Expert Human Evaluation Of PNMT Output

There was a very interesting presentation by Heidi Depraetere of CrossLang who is running a comprehensive human and automatic evaluation of ALL the PNMT systems output results. She is one of the few in my opinion, who can do a meaningful and accurate evaluation that will stand serious scrutiny and provide true insight into the MT output quality from a professional translator perspective. Her personal enthusiasm about the PNMT technology was quite revealing for me, as she has been around, and has an “MT Reality Meter” that I trust. Among other things, Heidi reported that:

There is a global improvement on all standard evaluation metrics like BLEU, TER, RIBES for a variety of languages with some examples shown below.

For English to French, evaluators clearly preferred SYSTRAN NMT when compared with the output from online web-scale SMT engines (Google, Bing).
She presented evidence on the strong preference by evaluators for PNMT output from an IT domain-adapted EN > KO engine. When comparing PNMT and professional translation:
- For 38% of the sentences, NMT is preferred
- For 54% of the sentences, Human Translation is preferred
- For 18% of the sentences, PNMT translation is judged equivalent to Human. This means that 1 in 2 sentences produced by the PNMT engine was either preferred or seen to be as good as professional human translation.

She also provided some very interesting data on the types of errors by language, which is too detailed to go into here, but was quite revealing of data bias and other data related problems. Given the scale of the evaluation, more tests are still underway, and SYSTRAN will share this information as it becomes available.

Customer User Experience Findings

Several customers also presented their use experience, with several LSPs vouching for the PNMT improvements over previous systems e.g. see Lionbridge comments below, or here is a gushing report from Lexcelera. The most interesting use case study (for me) was presented by the travel guide publisher, Petit Futé, who can now custom publish-on-demand, heavily personalized and unique one-of-kind travel guides that may draw data from several source languages, into a selected target language, at a customer accepted quality level, driven by PNMT. This is something they call Augmented Tourist 2.0, which allows a customer to create a unique travel plan, and then print a custom travel guide book to provide specific information just for that unique trip, aggregating both user review and generic tourist information. This is an approach that could also be used by other kinds of popular specialized domain publishers like Romance Novellas, Hatha Yoga Manuals, Multi User Online Gaming Guides etc..

Near Term Improvements Coming in 1H2017

Jean later showed a brief demo of how PNMT also has some of the capabilities of Interactive MT/AutoSuggest that competitive products have, that he called Predictive Translation, and described additional capabilities to handle unknown words which is a major concern for many in their initial explorations of Neural MT.

In the first quarter of 2017, the SYSTRAN PNMT engines will incorporate the broad infrastructural complementary functionality that is available to all SYSTRAN MT engines. Customers will benefit from the full power of this new engine in their current solution platform with all the same functionality, such as processing many file formats, customization with user dictionaries and translation memories, real time translation, named-entity recognition and integration of the engine into the Microsoft Office tools.

SYSTRAN will also work to transfer the compute intensive cloud PNMT system onto a somewhat scaled down translation server, to enable on-premise server installation and even mobile phone implementations, hopefully without compromising translation quality too much.

Domain Adaptation and Specialization

One of the major criticisms of NMT is that it is not ready for professional business use because it cannot be customized, or domain-adapted, for each enterprise customer like the most successful PB-SMT systems are today. Critics say that NMT is only a technology for generic system use. The training process is so expensive and slow that it is not feasible to use NMT for enterprise systems today, say the critics. However, SYSTRAN plans to do exactly that over the next few months, and has already begun beta testing as the EN>KO IT system tests described above suggest. NMT offers new approaches to customization that can be quick and not require the slow and expensive initial training period. This is not PB-SMT and new things are possible.

With its new PNMT engine, SYSTRAN can optimize neural networks in a new post-training process called “specialization”. Think of this as fast engine customization for unique customer or project needs, almost like Adaptive MT. This method significantly improves the quality of translation in record time.

Jean Senellart, SYSTRAN CEO, explains: “Adaptation of translation to a specific domain such as legal, marketing, technical, pharmaceutical, is an absolute necessity for global companies and organizations. Offering professionals specialized translation solutions in their trade terminology has been part of SYSTRAN's DNA for many years. This new generation of neural engines offers new capabilities in domain adaptation. PNMT is able to adapt a generic model to new data and even to each translator. Generic NMT is undoubtedly a great improvement in translation technology, but “Specialized NMT” is the technology that will truly help companies meet their global challenges.”

Impact on MT Market Dynamics

The growing feedback on the significantly better NMT Google engines also suggests that a change is coming. While SYSTRAN is easily 12 to 18 months ahead of most other MT vendors, with any possibility of delivering a market-ready Neural MT product, it is now becoming increasingly clear that the whole Expert MT Vendor market will start to move towards Neural MT. But the shift to NMT is expensive and complex and most other MT vendors do not have the funding, manpower and expertise to jump into NMT implementation easily. Having an alliance with an academic partner and funding an occasional experiment is not enough.

Open source will ease the hurdles and interestingly SYSTRAN is aiding this through the Harvard OpenNMT project, but the funding needs to properly undertake NMT development will keep this a game only for the big boys and the really smart small ones. Hopefully we do not see the same DIY foolishness we saw with SMT and Moses, as this is not a proper realm for LSPs to play in, even for very large ones. In my opinion the only one who could do this competently is SDL. Ask any one of the real NMT experts to explain how NMT works, if you have any doubt about my skepticism.

While I still do believe that Adaptive MT will produce the highest quality MT output in the near-term (2017), I think they too are heading towards Neural MT. Unless SYSTRAN can surprise us by providing the market with a rapid, robust, high-quality specialization capability, I would still expect that a properly engaged Adaptive MT system will produce the best professional MT engines in terms of output quality and translation productivity in the near term. 2017 will probably be the last year that phrase-based SMT systems will dominate in the professional and enterprise use arena. The Google, Microsoft, FB, Naver, Baidu evidence is clear, NMT is the way forward for generic systems, with improvements that are good enough to justify huge increases in deployment hardware costs. However, if SYSTRAN solves the domain adaptation and specialization challenge, and the large vocabulary issues, at enterprise scale, sooner rather than later, I think we will see a more rapid transition even for the smaller but usually higher quality professional translation MT world.

These are exciting times again for the MT world, as now, once again we are seeing people take big strides forward. SYSTRAN also briefly presented some new product concepts that look interesting and I hope to cover that in more detail in future. BTW, I am also talking to SDL about their Adaptive MT technology and will report on that shortly.

I wish you all a happy, healthy and joyous holiday season. And a prosperous and happy new year.

Tuesday, December 20, 2016

Private Equity, The Translation Industry & Lionbridge

Lionbridge has recently been in the news as the story of their “agreement” to be acquired by H.I.G. Capital spread. I thought it would be useful to examine more closely what Private Equity is, and what they do since I found much of the commentary on the Lionbridge deal unclear, unsatisfying and quite unlikely to be accurate as they seem to overlook why PE gets involved at all. I thought that while I am at it, let me also add Luigi’s opinion which is also not a translation industry mainstream opinion, to provide alternate perspectives to the ones we have been hearing.

From my early business career when I worked close to Wall St. and institutional investment, private equity meant firms like Kohlberg Kravis Roberts, The Blackstone Group, and Apollo Global Management. They were often called corporate raiders, and, in fact, they made a movie about a PE raider, called Wall Street, where Hollywood made PE rock stars like Gordon Gekko, look bad, and suggested he/they lived by an ethos that “greed is good”, destroying companies to extract a few million dollars to buy a private yacht. Maybe that was an exaggeration, but it is true, that PE generally went into companies that they felt were undervalued or under-performing and went in and “cleaned up and improved the situation”. They showed them “how to do it right” as Frank Zappa would say. In fact, there is research that suggests that private equity investors are corporate doctors. Perhaps things have changed since the 90’s, and now Private Equity is a much more friendly and fluffy experience, but my ample gut tells me that generally investors want to make money, and they don’t acquire assets to get a board seat and hang out with the management. They generally want change that results in them making MORE money.

An informative Harvard Business School draft paper provides some insight on the whole Private Equity view of the world. Given that they base their observations on a study of 79 PEGs (Private Equity Groups) with $750B under management suggests to me that there could be some truth here. Top Private Equity Companies are listed here HIG Capital who acquired LIOX is #70 in this list.

What do Private Equity firms do?

PE firms typically buy controlling shares of private or public firms, often funded by debt, with the hope of later taking them public or selling them to another company in order to turn a profit. Private equity is generally considered to be buyout or growth equity investments in mature companies.

What do PE firms do after they invest?

PE firms typically take three types of value increasing actions — financial engineering, governance engineering, and operational engineering. These value-increasing actions are not necessarily mutually exclusive, but it is likely that certain firms emphasize some of the actions more than others.

In financial engineering, PE investors provide strong equity incentives to the management teams of their portfolio companies. At the same time, debt puts pressure on managers not to waste money. In governance engineering, PE investors control the boards of their portfolio companies and are more actively involved in governance than public company directors and public shareholders. In operational engineering, PE firms develop industry and operating expertise that they bring to bear to add value to their portfolio companies.

In fact, because PE firms often fund investments with debt, they are usually very concerned that these investments can service the debt organically i.e. without injecting in more money. So they are often all about cash flow and look for companies that can service the debt while they do their restructuring magic. It is said that they have a pretty strict requirement on return and look for internal rates of return that are in the 20-25% range. The Harvard paper suggests that there is often evidence of both financial and governance engineering. PE investors say they provide strong equity incentives to their management teams and believe those incentives are very important. They regularly replace top management, both before and after they invest. And they structure smaller boards of directors with a mix of insiders, PE investors, and outsiders. These boards are much more concerned about real progress and have much more clout and impactful involvement than in a "normal" public company setting where a board may allow a non-performing CEO to sit for 17 years without asking any probing questions.

There is evidence that the sweet spot for private equity is a company doing okay in an industry whose fortunes are about to improve dramatically.

Several characteristics of the PE business model directly impact the operations of their portfolio companies:

First, private equity investments are illiquid and more highly leveraged than investments in publicly traded companies–hence, riskier. They need to yield high returns to be worth undertaking.
Second, the high debt that portfolio companies must service means they must quickly achieve an increased and predictable cash flow. Cutting costs by squeezing labor is the surest way to accomplish this.
Third, the PE model is the opposite of "patient capital." While limited partners make a long-term commitment to the PE fund, portfolio companies have only a short time to show results to the general partners.
Finally, PE will not undertake long-term investments in its portfolio companies unless capital markets are efficient and reward such investments with a higher price when the company is sold. There are definite incentives to turnaround investments as quickly as possible to maximize internal rates of return.

So the research definitely seems to suggest that HIG is not here to just be in the wonderful language loving translation business. They want returns, and they probably want returns soon. As they say on their website: Flexibility, speed of execution, and delivering on commitments are our hallmarks.

Private equity firms typically excel at putting strong, highly motivated executive teams together. Good private equity firms also excel at identifying the one or two critical strategic levers that drive improved performance. They are renowned for excellent financial controls and for a relentless focus on enhancing the performance basics: revenue, operating margins, and cash flow. Plus, a governance structure that cuts out a layer of management - private equity partners play the role of both corporate management and the corporate board of directors - allows them to make big decisions fast.

The following data certainly suggests that Lionbridge (LIOX) was a lackluster if not a failing public company. At no point in the last 10 years did the company market valuation ever reach it’s revenue levels. This is usually a sign that something is not quite right. But the LIOX board sat with this for over a decade. Its market value is terrible even relative to other peer translation companies as the table below shows. In October 2008 the $500M revenue company had a market value of about $70M! It is surprising that HIG is apparently, making no real management and strategy changes, and from all reports will continue with the same management and the same strategy. Some additional things to note:

The company is being sold for close to the highest value it has had in 10 years and investors in LIOX must be sighing in relief that they did not use an average of the stock price of a longer time period to establish the acquisition price.
The management team has been mostly the very same for the last 17 years and is said will remain in office.
The CTO, Eric Blassin, very suddenly left and joined TransPerfect as Luigi has pointed out in his comments below.
Lionbridge has made less than $50 million net income in total from 2011-2015. This is less than 2% of revenue which I suppose is better than losing money.

Perhaps HIG does not really want to discuss their real plans at this point. As a Limited Partner, I would certainly be concerned about the rationale behind this investment, if no changes are planned, and you are simply expecting to ride the coming translation industry wave of growth. But stranger things have happened, and we do live in a world where the #americanbuffoon Don Trump is the next President of the US.

Interestingly, we now have an attempted action by some investors who think the price is too low and want more cash! Luckily for them a 10-year average was not used as the takings would be a lot slimmer. The odds of this class action suit leading to any changes are pretty slim, since the data overwhelmingly suggests that the investors were lucky to get $360M.

Private Equity investing regularly in an industry is usually a sign that they think that things can be done better (more efficiently, effectively, profitably) and that they can expedite this. We shall have to wait and see if big changes are indeed made, or wait and see if we suddenly start seeing really savvy and skillful execution, and market-leading strategy coming from Lionbridge. That is rare enough that we would all notice. ;-)

I think it is also pretty safe to say that when a private equity firm gets involved with a translation company, it is a clear signal that the PE firm thinks that the company in question can be run more efficiently, and they believe strongly enough in this possibility that they would be willing to put some money on it and prove it.

On a completely different note, I read today that Mike Schuster at Google who I had accused of hyperbole and deception, apparently did not care for the outrageous MT performance claims that were made either, this is described in a fascinating story in the NYT on how Google developed their NMT. He seems like a very smart, nice, chap with a "piston-shaped head", and I hope that I meet him some day and laugh with abandon. Here is the specific quote from this great background story:

At the party, Schuster came over to me to express his frustration with the paper’s media reception. “Did you see the first press?” he asked me. He paraphrased a headline from that morning, blocking it word by word with his hand as he recited it: GOOGLE SAYS A.I. TRANSLATION IS INDISTINGUISHABLE FROM HUMANS’. Over the final weeks of the (technical overview) paper’s composition, the team had struggled with this; Schuster often repeated that the message of the paper was “It’s much better than it was before, but not as good as humans.” He had hoped it would be clear that their efforts weren’t about replacing people but helping them.

The section below the dashed line is Luigi Muzii's take on the deal.

↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓

------------------------------------------

The news everyone in the translation community is frantically analyzing and commenting is emblematic of the irrelevance of the industry itself.

Rory Cowan summarized the experience of the company he is most probably going on to keep leading for the next few years with a statement that is as simple as disrupting: “The U.S. markets really do not appreciate the translation industry.”

While most commenters are looking at the company’s financial data, this does not seem to be the most relevant. The really important thing to think about is that the money shareholders will receive from H.I.G Capital for buying Lionbridge is no big deal.

The other important thing is Lionbridge’s former CTO’s relocation to TransPerfect. Both Lionbridge and TransPerfect have been investing in technology over the last twenty years, although with the very same spirit that moved the general business strategy of the two companies: Buy low, sell high.

This is the most-loved kind of strategy in most business circles—although definitely not creative or innovative, while possibly conservative and just lucrative.

On the other hand, like it or not, no translation business, however big, is perceived as important. We could long debate this, but the conclusion will always be the same.

The one differentiator, especially today, is technology, and this is the discriminant even in the Lionbridge sale.

Given the company’s recognized experience with professional service organizations, financial analysts at H.I.G Capital have most certainly gone through all industry outlooks. In Rory Cowan’s opinion, the translation industry “really has an extraordinary decade ahead of it.” This is the first discrepancy in the rationale for the sale. Why are they selling a buoyant company, that is profiting well from a sound industry with such an astonishing future? Certainly, not because the company’s efforts in the stock market for the past three years were expected to swing high and resulted in just hovering.

Actually, this could be a reason for not delisting Lionbridge. This option could have been a blood bath for shareholders, as the company could hardly find the necessary resources.

Also, in the last few years, Lionbridge underwent some major changes to the organizational structure, although the management team remained unchanged. Incidentally, according to what Rory Cowan told Slator, the management team will remain unchanged.

Curiously, right after the announcement of the Lionbridge sale, TransPerfect announced the hiring of Lionbridge’s former CTO Eric Blassin, who spent over two decades there. This not exactly the kind of movement that happened offhand, and it usually goes with some compelling NDA.

There is also another non-negligible element: stock options and compensations. Delisting Lionbridge would possibly deprive its management team members of their stock options and certainly force them to reduce their compensation or leave the company.

What financial analysts at H.I.G Capital have most probably done is to assess in detail Lionbridge’s financial statements first and books later, with the ability, knowledge, and due diligence that are typical of the business they are in.

So, if the people at H.I.G Capital will not change the management team or the business strategy, the investment would prove poor and unlikely to deliver a return.

According to Lionbridge’s chief sales officer, Paula Shannon, H.I.G. Capital saw growing opportunity in the company’s business and the value in the long-term relationships that Lionbridge has with customers in verticals such as IT and financial services. Curiously, Rory Cowan told Slator that he does not see much growth in the enterprise IT sector.

As more influential commenters noted, private equity firms acquire mature companies from which they expect to extract more value than the current owners by cutting costs and selling unessential units. This means squeezing overhead (that must be huge in a company the size of Lionbridge), closing offices, reducing staff. In turn, this optimization would procure the necessary funds for M&A to strengthen the core business.

Despite the fine words that have accompanied Eric Blassin’s time and work at Lionbridge, the technological capital does not seem to be enticing either, as the buy-low-sell-high investing adage applied to this sector too, with all the challenges and the backlashes of inconsistent implementation.

Rory Cowan told Slator that Lionbridge is going to participate very actively in M&A. The best candidate in this respect is TransPerfect. Not only would consolidation help elude the ruling of the Delaware Supreme Court, it would also be the perfect basis for focusing on services while leaving technology to a distinct entity, possibly not just a business unit but a conglomerate subsidiary.

For this operation to take place, the service unit of Lionbridge could be the first to be sold, after filling its safe with sufficient cash to enable it to sail off safely, and conceivably run a leveraged acquisition.

The translation industry will soon be made up of one or two global mammoths, with all the problems of mastodons, and a dusty galaxy of scattered mid-size companies, where mid-size here, as usual, for this industry, means micro for any other sector.

Luigi Muzii has been in the "translation business" since 1982 and has been a business consultant since 2002, in the translation and localization industry through his firm . He focuses on helping customers choose and implement best-suited technologies and redesign their business processes for the greatest effectiveness of translation and localization related work.

This link provides access to his other blog posts.

Wednesday, December 14, 2016

What is a Truly Collaborative Translation Platform?

Many observers new to the business of translation are often surprised how little "work process automation" exists in the professional translation business. Many may have noticed the email deluge that many in the translation industry are guilty of in ALL their general business communication -- they use it almost like chat or mobile phone text messaging, and thus it is easy for details to get lost and fall through the cracks. I noticed this lack of email communication discipline when I first entered the industry many years ago.

Apart from the problems introduced by this communication style we also see that since no real work process automation tools exist, there is an urgent need for a project management role. There still appears to be a critical need for a project manager (whose role is described here in some detail) to ensure that client projects are properly broken into and assigned in pieces (work packages) to the right personnel and then re-assembled by project management to hand back to a client when finished. While Translation Mangement Systems (TMS) like MemoQ, Memsource and others help to some extent the translation work management process is still a process that needs lots of detailed non-automated project management to ensure a smooth workflow and a semblance of efficiency.

I recently spoke to several SmartCAT.ai customers recently, both individual freelancers, and some translation agencies (LSPs). The sample I spoke too were mostly scattered across the EU and Russia. They were all quite consistent in their positivity about their work experience with the software and rated the following 3 factors (in sometimes different order) as the reason for their satisfaction and generally positive outlook toward the SmartCAT based work experience. These factors are,

Simplicity and ease of startup which resulted in quick productivity
Relative lower cost (free for all users)
Collaboration capabilities that ease project management burden

More than one of the customers I spoke to had experience with traditional TMS tools and contrasted the SmartCAT experience as improved in several ways but mostly in ease of startup and the inbuilt collaboration integrity and power.

This is another guest post by "Vova" from SmartCAT where he defines his view of what collaboration means in the professional translation context. It is my opinion that the SmartCAT paradigm is a step forward from the rather heavy but perhaps much more flexible footprint that traditional TMS systems have grown accustomed to.

P.S. This is some analysis from CSA on the SmartCAT offering.

------------------------

These days, content volumes are growing faster than ever, and “going global” is a major trend in many industries. But some localization customers feel that traditional LSPs are unable to easily tackle large and urgent projects. In need of a better solution, they turn their eyes to “crowd” translation platforms. Popping up like mushrooms, these claim to provide the “good new way” to localize.
Despite the obvious effects of hiring “crowds” for translation, such services provide something traditional LSPs cannot boast: They are quick, cheap, and easy to use. You click a button, and hundreds of hands start working on your job and get it done in no time. Alas, for many clients this is a fair tradeoff for the appalling quality they get as a result. Surely, this approach will backfire, but it might be too late to fix it.

So can “real” LSPs fight crowdsourcing platforms in their own territory? Can we provide a smooth and quick collaborative translation experience while keeping the quality plank high?

In response to this challenge, almost any CAT platform today claims to be “collaborative.” But “collaboration” is a word that one can inflect in many ways and one that does. For instance, one can simply allow users to share translation memories and call that “collaboration.” One can make project managers spend hours splitting large files into “digestible” chunks and call that “collaboration.” Finally, one that can have you pay a hundred dollars per each “collaborator,” and — you got it.

Is this really what we expect when we hear the word “collaboration”? Hardly. What we expect is something like Google Docs. We expect contributors to see each other’s work in real time. We expect them to be able to communicate easily and in context. And we don’t expect them to go broke (paying license fees just for being able to be there together.

And thus here are five important features to look for.

1 — Interactive collaboration between translators

Many “collaborative” CAT tools require cutting large files into smaller parts to be distributed among translators. The project manager will have to make sure that each translator gets a relatively equal volume to work on.

In essence, each translator will be just working on their own part as if it were a separate project, without seeing each other’s work. Those who finish early will have to sit there idle, wasting the precious time you need for the project. Once finished, the project manager needs to “glue” the files back together, wasting more time and bringing in human errors in the process.

In many cases, this makes the game not worth the candle. A truly collaborative translation platform rids you of the need to split or glue anything. You will just assign certain document parts to individual translators, and if someone finishes early, you will reassign more segments to them. Every translator will be able to see what others do and, if needed, bring attention to their mistakes or omissions (more on this later).

Recently, such an approach allowed one of SmartCAT users — a middle-sized LSP actually — to translate nearly 500k words every day for several weeks straight. In “peak hours,” there were up to 100 translators working at the same time. And there was only one project manager handling the whole project!

2 — Collaborative translation and editing

Reducing work in progress a key principle in today’s project management paradigm. But in terms of translation, unedited work is such a work in progress! Let’s say, you are doing a 100,000-word project with the standard TEP (translate-edit-proofread) approach. If “T” costs you $0.10 per word, you have $10,000 worth of inventory before “E” and “P” are done. $10,000 of unfinished words lying there as some warehouse stock — not a small amount, is it?

If the editor has to wait for “their turn,” a whole range of issues may arise:

The translator is busy with another assignment by the time the editor asks a question and cannot recall the subject in detail.
The editor finds an error after it has been replicated tens or hundreds of times in the document and has to correct them all.
An experienced editor may not have the flexibility to move from project to project as urgencies might require so.

Thus, the CAT tool must provide both horizontal (between peers) and vertical (between T <> E <> P) collaboration. In other words, the editor must be able to start working on the document well before its translation is completed. The same goes for proofreading and any other stages you need. From SmartCAT experience, such vertical collaboration alone can cut the delivery time almost by twice.

3 — Context-specific communication

One thing that sets collaborative translation apart from mere crowdsourcing is the degree of communication between collaborators. In the latter, each “head in the crowd” doesn’t really care what the others are doing or thinking. In the former case, all translators make their contributions to the discussion, turning them into a synergistic whole.

Allowing many people to work together on a project is of no use if you can’t provide the right means for them to communicate. Otherwise, you have to either turn the manager into a “relay device” between various contributors or let them interact on an external platform. The former is a waste of resources, the latter a loss of control, and both are a hindrance to quality.

Thus, communication has to be built into your collaborative translation environment. Translators, editors, and other participants must be able to discuss both the project in general and its specific parts in context. SmartCAT users say that such context-specific commenting ability is one of the main quality drivers in the projects they do on the platform.

4 — On-demand scalability

You don’t always know in advance if a project will need scaling. Sometimes, a customer wants you to translate just a page on their website, but then realize that they need it in whole. Or request to translate to 10 other languages. Or their business grows unexpectedly and demands more localized content and a stronger localization partner.

Often, such demands have a “deadline yesterday” and give you no time to set up the whole “collaborative translation machine” from scratch. That’s why it’s important that your CAT tool allows you to scale when it is needed, as much as you need it, and with as little additional effort as possible. If you need a separate installation just to enable collaboration, you are wasting time you can’t afford wasting.

Ideally, there shouldn’t be such thing as “scaling” at all. If you need to translate more content, you just add files to the project. If you want more languages, you add languages. If you need more people, you just assign more of them. Ideally, the CAT tool should come with an easy access to freelancers who can readily work in it. The SmartCAT marketplace (a pool of available translators), for instance, has provided many of our users with the capacity they needed when their own resources were insufficient.

5 — Affordable growth

One last (but not least!) thing to keep in mind is that collaborative translation can be costly. You might not notice this when you just start working, but the more you grow, the pricier it can get. This can be especially painful if you are a relatively small agency and cannot afford major investments. Then you are often left with no choice but to forfeit the job to a bigger vendor. And if you are big enough to afford such spendings, they will often be unproductive because you will not need the purchased licenses a lot of the time.

Therefore, pay close attention to the pricing tables. Most of them will have some sort of user-based licensing, but some won’t. In the latter category, many will be open-source, in which case it also makes sense to study the quality and support terms, which are pain points for this kind of software.

For the record, SmartCAT is free and proprietary, with 24×7 support is provided to all users at no cost. (Though agencies are expected to pay for their use and access to the translator pool via a means that is different from the industry standard approaches and is somewhat opaque. However, if they do not use the SmartCAT translator pool, agency use is also free.

From CSA: SmartCAT has an optional payment facility. Users are under no obligation to use it. However, if they do choose to process payments through it, the company takes a cut on the financial transaction. Smolnikov told us that many companies start out using their own financial methods but end up moving to SmartCAT because it takes the hassle out of managing them and that it is cheaper to pay this cut than to manage it themselves.)

The SmartCat team clearly believes in this vision and they published a vision statement recently that states the following:

Our vision of the future of the translation industry is based on three principles:

Advanced collaboration is the key to effectively manage large-scale and urgent projects,
Technology should help translators and project managers simplify time-consuming routines and increase productivity, with artificial intelligence playing a large part in setting up teams and managing their performance,
High-value and SLA-compliant linguists are the strongest success drivers in translation projects, and technology must facilitate identifying and reinforcing the choice of such professionals.

It all starts with our key belief that selling licenses for CAT software is an atavism of our industry. We believe that no one should have to continuously count licenses in a business where almost all key value producers are freelancers and teams are highly dynamic and dependent on the projects you will have tomorrow.

Relying on the number of licenses limits the efficiency of translation processes in a company and restricts its growth potential and scalability. Finally, the low technology penetration and the need to sew together multiple tools to have a more or less seamless and efficient workflow are the major factors slowing down the evolution of individual companies and the industry as a whole.

You can find the rest of the elaboration of this at the link above, where they also talk about increasing use of chatbots to improve communications between PMs and vendors, and automate an increasing number of common PM tasks and project situation responses using AI technology.

About the author

Vladimir “Vova” Zakharov is the Head of Community at SmartCAT.

"Translation is my profession and my passion, and I’m excited to be able to share it with the amazing SmartCAT community!"

Wednesday, December 7, 2016

Localization and Language Quality

This is a guest post by David Snider, Globalization Architect at LinkedIn - Reprinted here with permission. I thought the article was interesting because it points out that MT quality is now quite adequate for several types of Enterprise applications, even though MT might very well be a force that influences and causes the "crapification" (a word I wish I had invented) of overall language quality. While this might seem like horror to some, for a lot of business content that has a very short shelf life, and value only if the information is current, this MT quality is sufficient for most of the people who have in interest in the specific content. While David thinks that the language quality will improve, I doubt very much if much of this MT content will improve much beyond what is possible by the raw technology itself. Business content that has value for a short time and then is forgotten simply cannot justify the effort to raise it the level of "proper" written material.

If you go to the original post there are several comments that are worth reading as well.

-------

People have been complaining recently about the decline of language quality (actually, they’ve been complaining for decades – or make that centuries!) I have to admit that I sympathize: I’m from a generation that was taught to value good writing, and I still react with horror when I see obvious errors, like using “it’s” instead of “its”, or confusing ‘their’, ‘there’ and ‘they’re’. (I’m even more horrified when I make mistakes myself, which happens more than I like to admit.)
But for my son’s generation? Not so much. Grammar, spelling, and punctuation aren’t that important to them; what matters is whether the other person understands them, and vice versa. My son is already 25 (wow, time flies!), so there’s another generation coming up behind him that’s even less concerned about ‘good’ writing; in fact, this new generation is so accustomed to seeing bad writing that for the most part they don’t even realize there are errors. This makes for a vicious circle: people grow up surrounded by bad writing, so they, in turn write badly, which in turns exacerbates the problem. I’ve heard this referred to as the ‘crapification of language’.

Why is this happening?

Ease of publishing: in the old days, the cost of publishing content - typesetting it, grinding up trees and making paper, printing the content onto the paper, binding it, shipping it to a store and selling it - was immense. For this reason most published content was thoroughly edited and proofread, as there was no second chance. So if you read printed content like books, magazines and newspapers, you were generally exposed to correct grammar, spelling and punctuation. Since most of what people read was correctly written (even if not always well-written), people who read a lot generally learned to write well. But now anyone can create and publish content, with no editing or proofreading. The result is just what you’d expect.

Informal communications: email, texting, twitter – they all favor speed, and when people are in a hurry quality usually suffers.

Machine-generated content: this includes content that’s created by computers – for example, Machine Generated support content created by piecing together user traffic about problems – as well as Machine Translated content. Machine Generated content, and especially MT content is, as we localization people know, often of very poor quality.

What does this mean for Localization?

Being in the localization business myself, I want to tie this in to the effect on localization. In some ways this ‘crapification’ works against us: garbage in garbage out, after all, and if the source content is badly written then it’s harder for the translators to do a good job, be they humans or machines. But at the same time, this can work for us – especially when it comes to Machine Translation, where there are a couple of things that are making even raw MT more acceptable:

MT engine improvements: MT quality has steadily improved over the past 50 years (yes it’s been around at least that long!) Major improvements, like statistical MT and now neural MT, seem to occur every 10 years or so. Perfect human-quality MT is still ‘only 5 years out’ and will undoubtedly continue to be so for a long time, but quality is steadily improving.

User expectations: The good news for MT is that due to the crapification of language the expectations bar has been coming down, and people are much more willing to accept raw MT, warts and all. Despite the quality problems, more & more people are using web-based MT services like Google Translate, Bing Translator, etc., to read and write content in other languages. As with texting above, they’re more concerned with content than with form: they’re OK with errors as long as they can understand the content or at least get the gist of it. This seems to be true even for countries that have traditionally had a high bar for language quality, like Japan and France. As shown in the chart below, we’ve already passed the point that raw MT is acceptable for some types of content. (Note that this chart is purely illustrative and is not based on hard data.)

Of course the bar remains high for things like legal documents, marketing content and of course your own personal homepage, but it’s getting lower for many other types of content, especially for things like support content (which many companies have been MTing for years), as well as for blogs and other informal content. In fact, the graph could be redrawn something like this:

Is there any hope for language quality?

As the quality of machine-generated and machine-translated content improves and as editing and proofing tools become better and more ubiquitous, the quality of all content will improve, until we approach the days of professionally edited and proofread books and magazines. As bad writing disappears and people grow accustomed to seeing well-written content, I think even unedited human language quality will start to curve back up again. (I’ve tried to capture this in the graphs above.)
So yes, I believe the crapification of language will slow and eventually reverse itself (hmm, unpleasant plumbing image there)! This doesn’t mean languages won’t continue to evolve, fortunately. That’s one of the things that make them so fascinating – and so challenging to translate.

Some key excerpts from the comments at the original post are listed below:

David Snider: The crapification 'helps' localization teams by allowing them to put more assets in the 'good enough' bucket. The real question for raw MT is: "is my customer better off having to fight their way through a badly-translated MT article and maybe get the help they need, or are they better off not getting the help at all?"

Jorge Russo dos Santos :I disagree that we will see content revert to a golden age of quality. I think, that we see today, there are different quality levels for content and that will continue to appear. If anything, the tolerance to poorly written content will probably increase, as people consume more and more content, but there will pockets where people will require premium content and will be willing to pay for it, either in the original languages or on the localized language(s) and this will not be only for legal.

Pages