Pages

Tuesday, October 24, 2017

Never Stop Bullshitting: Or Being Popular in the Translation Industry

This is a guest post by Luigi Muzii (and his unedited post title). Luigi likes to knock down false idols and speak plainly, sometimes with obscure (to most Americans anyway) Italian literary references. I would characterize this post as an opinion on the lack of honest self-assessment and self-review that pervades the industry (at all levels) and thus slows evolution and progress. Thus, we see that industry groups struggle to promote "the industry", but in fact, the industry is still one that "gets no respect" or is "misunderstood" as we hear at many industry forums.  While change can be uncomfortable, real evolution also results in a higher and better position for some in the internationalization and globalization arena. Efficiency is always valuable, even in the arts. In my view, the companies that solve the most challenging and interesting translation problems today (and thus earn the most money from translation related activity) are all VERY focused on efficiency, and interestingly are also not really part of  "the translation industry." These companies are in a different league, in terms of scale and process efficiency and I think could pave the way in providing a vision for real change in how things are done in "the industry". The old process models are not sustainable for anything but the most important and critical content. From my vantage point, most of the content that is the primary focus of  "the industry" is not really important or critical. Simpler, faster and more efficient translation (cheaper) production really does matter to both the existing and emerging customer base

I came to the "translation industry"  from the IT industry, specifically, the storage virtualization and applications software development sectors. I recall that I was surprised by several things I saw when I first arrived to "the industry", and I am sure this is not unlike what many brand new translation buyers might also experience. My key observations when I had a fresh mind include:
  1. The degree of fragmentation in the translation industry which hampers movement towards more efficient production processes,
  2. How archaic and dated the most popular translation technology was, especially TM (1990's technology that was really the only automation technology in widespread use),
  3. The number of LSPs who still kept building (sub-optimal, klunky) "custom" TMS and Project Management systems, when perfectly workable, more robust options were available,
  4.  The hostility and ignorance regarding MT technology and it's potential and proper use cases,
  5. How labor-intensive, slow, reactive and inefficient production processes were in general.
This, I suspect is what many others feel if they come from other industries where efficient production processes are much more deeply established and understood. Not that much has really changed in 10 years, though now you see a proliferation of really bad, self-built MT systems, and fortunately, most have given up on building that magical TMS, that is allegedly waaaaay better than anything yet known to man. Maybe MT will also reach the point where most users realize that it is an undertaking best left to experts, especially in these NMT days, where new developments and techniques are occurring every week.

There are going to be many more "buyers" and new customers entering the market for translation services in future, and we probably do need to move beyond the insubstantial fluff that pervades "the industry".  Potential customers who understand your product and service are more likely to buy quickly, than those who need to learn and decipher what your special obfuscating in-group language and terminology actually means.

Frankfurt (referenced below) determines that bullshit is speech intended to persuade, without regard for truth. The liar cares about the truth and attempts to hide it; the bullshitter doesn't care if what they say is true or false, but rather only cares whether or not their listener is persuaded.
For those who are offended by the term bullshit, let me share these alternatives and feel free to substitute these terms wherever you see the offending word. In the British Parliament, you are not allowed to call a member a liar as this ‘unparliamentary language’ brings dishonor to the house. As a result, various euphemistic phrases are used to indicate ‘bullshit’:

  • Winston Churchill in 1906 used the term ‘'terminological inexactitude’;
  • more popular recently has been the phrase ‘economical with the truth'. This was originally coined by Edmund Burke referring to measured speech, but has come to mean ‘liar’ in parliament or court
Bullshit Innovation - Graphical View
The emphasis and italics comments in [  ] below are mine.

-------------------

Never tell a lie when you can bullshit your way through.
Eric Ambler
 

According to Evgeny Morozov, bullshit is the new oil, everything derives from that. Many years earlier, Harry Frankfurt opened his best-selling essay with “One of the most salient features of our culture is that there is so much bullshit. Everyone knows this. Each of us contributes his share.”

It is worrisome that most people are rather confident of their ability to recognize bullshit and to avoid being taken in by it. As Frankfurt noticed, the bullshitter is, by his very nature, a mindless slob. And a presumptuous snob, too. The bullshitter’s arguments and use of language are both meant to support a kind of bluff or be a misrepresentation or deception. Indeed, his ultimate goal is to convey a plain falsehood. Marketing is the typical realm for bullshitters. Not surprisingly, storytelling is the new black in written communication.

Unfortunately, marketing, especially social media marketing, as it’s virtually free, is the new mantra of "translation industry" people.

Whether he/she is a translator, an LSP, or an advisor, a bullshitter is always there, mouth breathing, his/her words like mere vapor.

Invariably, bullshitting in the translation industry revolves around three subjects: Innovation, Technology, and Quality. They may be presented differently, but they are always interconnected. Invariably, the most active bullshitters are always the very same people. They champion innovation, without having introduced any, ever; they champion technology, especially when related to automation, without having automated anything, unless, sometimes, forced by customers; or they provide premium quality, only work for premium customers, receive premium fees, even though they won’t substantiate any of their statements.

The Quality Myth


When the bullshitters are LSPs, they are obviously always technology-savvy, localization technology developers, and positive users of the most amazing KPIs—which they always forget to enumerate. They consistently boast of having (unmistakably) seasoned linguists and localization and globalization experts in their (unquestionably) great teams, known for — guess — outstanding quality of delivery and service, and famous for their technical and technological capabilities. More and more often, the icing on the cake is a set of wonderful tools available for free.

When the bullshitters are translators, the mantra becomes perfect, absolute quality. The bravest bullshitters venture intrepidly into dispensing with hands full, their futile directions to navigate out of the wild and stormy bulk market, up to the bright and warm oasis of the premium market, a mirage if not a hoax as most believers would discover at their own expense. Indeed, if there ever were a premium market, it would most likely be a small segment of a bigger market, otherwise, it would be so large as to welcome everyone, and it wouldn’t be premium anymore. There are certainly premium clients, but everyone can tell, they are hard to reach, win, and retain. To penetrate the fabulous ‘premium market’, bullshitters all have the same strategy and advice for peers and newcomers: specialize, brand and market yourself, raise your rates, educate your prospects and customers. However trivial, it is worth reminding readers, that studying (to specialize) and networking (for branding and marketing) are costly activities that consume time and incorporate the work of others, although their costs might be indirect, hidden, postponed, redistributed, or transferred elsewhere. Not surprisingly, these bullshitters never produce a single piece of evidence of the effectiveness of their advice, and never talk about how much money they get from implementing the strategies they advocate, or where their money comes from.

When the bullshitters are advisors, their landmarks are faddy and fancy words and phrases like simship, continuous delivery, KPIs, agile, lights-out project management, augmented translation, and the inexorable big data and disruption. And if you manage to use blockchain and smart contract in the same paragraph, then you’ve really authored a masterpiece. That’s marketing, baby. Marketing! And there’s nothing you can do about it. Nothing! It’s a basic truth of the human condition that everybody lies. The only variable is about what. And you don’t need to be Seth Stephens-Davidowitz to know. On the other hand, why lie when storytelling can be equally effective?

The Innovation Myth


Even innovation is a casualty of this mouth-breathing hype. Bullshitters from all categories battle on the innovation field. Although the translation industry has not seen real innovation coming from its players for years, they have always been talking grandly as they were competing for ranting the loudest and shooting further. Indeed, the translation industry is overcrowded with futurists, visionaries, and wishful thinkers, with the latter being largely more numerous. They can all be found in any localization conference around the world. The narrative/storytelling frenzy virtually affects every LSP. Every LSP has a brilliant ambitious idea, an innovative process or some new technology that is going to disrupt or revolutionize the industry. And this does not happen just once in a year but happens recurrently, whatever the event. To quote Renato Beninatto and Tucker Johnson from their recent feat, “the language services industry has proven itself to be horribly equipped to actually innovate in any meaningful way.” This deluge of bombastic bullshit is nothing but the effect of a self-breeding epidemic of ‘delusions of grandeur.’ In fact, most LSPs do not have the spirit, the capability and the resources to innovate, they simply cannot afford it, let alone become the driving force for innovation. The bigger they are, and the more they are focused on basic financial indicators, making profits, and possibly growing their business, the more unresponsive they are to invest in innovation.
Unfortunately, as Beninatto and Johnson say, “critical thinking and skepticism are often thrown out the window in favor of recycled ‘headlines’ and flattering commentary.” In fact, there are no contrarians in this industry, though one may just occasionally stumble into a talking cricket.

This is a reason for not discrediting the unreliable surveys that have been circulated to constantly feed ourselves with a reassuring echo and lead people in the industry to believe everything they hear.

However, when you need the hype, it usually means you’re in trouble. Once the hype starts, it often continues on and on, and the longer the hype is sustained, the bigger the problem, and capturing the attention of the random public is not the same as revolutionizing an industry or a market.

Hype does not affect technology only; it also affects market analyses, branding, sales, and marketing policies. The essence of marketing is about narrowing the focus. Too many in the translation industry translate this fundamental principle in a self-defeating approach, using quality as a magic wand. No wonder if it works no wizardry. “Focus on the high end of the market!”, the (in)famous premium segment as if anyone is interested in the low end, where the emphasis is on price only. “Raise your rate, quality is worth a higher price!” The problem is that since no one proclaims themselves as the “un-quality” [or bulk market] player, everybody stands for quality, and as a result, nobody does. You cannot narrow the focus with quality or any other idea that doesn’t have proponents for the opposite point of view. Especially if “quality” is the only way to prove you are better than your competitors.


The "Educate The Customer" Myth



Most people want to believe they can get to the top by being better than others, but actually, the best way to do this, if not the only way to get there is by being first. It is the first one of the 22 immutable laws of marketing enumerated and illustrated by Al Riesand and Jack Trout in their 1993 book. Riesand & Trout also wrote that many people believe that the basic issue in marketing is convincing prospects that you have a better product or service, despite this, people tend to stick with what they have already got. It is a common belief among industry players that educating the client is an absolute necessity when, on the other hand, the idea is equally widespread that salespeople fail because they don’t understand the customer needs, as they are better trained on LSP processes, rather than on client issues. In fact, most LSPs are still in the nuance-specifics-of-translation-industry state of mind, and simply dismiss people who are not from the translation industry as to those who need to be ‘educated.’

Educating customers is not a cost-effective marketing strategy for most small businesses like LSP’s, especially towards first-time buyers who probably don't know very much about translation. Translation industry players who may be tempted to approach their customers this way possibly do this from an illusory sense of superiority. They offer their products/services (solution) to people who don’t recognize that they have a demand (problem) and assume the client has a sort of functional illiteracy. They presume the client has an interest in features rather than in benefits, and thus fail to understand and quantify the client’s needs, thus showing their ignorance and lack of interest in their customer’s business.

Rarely are customers willing to be instructed by someone who does not belong to the same class of business, especially if they understand the issue behind their demand and are searching for answers. On the contrary, the “I have no questions” attitude is fairly common in customers who are not concerned in disentangling the intricacies of an extraneous and unimportant business. So, what word should a translation business own in the minds of prospects? Tip: customers want to see instant results.

All this to say that educating the client is bullshit too.

The Growth Myth


But how is bullshitting affecting market analyses? In the way, the industry news is presented.

According to Aiman Copty, Vice President of International Product Solutions for Oracle Corporation, since translation is now increasingly at a general utility stage, “people should not need to think about it,” and “the industry is rich with translation and subject matter expertise,” the keyword is no longer cost or quality, but efficiency. According to “captain of the translation industry” Adolfo Hernandez, “localization is far too labor-intensive” and “for the foreseeable future, the best results in localization will come from the best humans using the best machines.” Luckily, SDL is creating ‘islands of stability’, whatever that means. Another captain, Rory Cowan, invites readers to observe patterns across other industries and get the pace right. No matter if the financial results of his company in two decades could hardly be labeled as astonishing in spite of “the growing opportunity” H.I.G. Capital is supposed to have seen, according to Lionbridge’s chief sales officer Paula Shannon “in the company’s business and the value in the long-term relationships that Lionbridge has with customers in verticals such as IT and financial services”. Smith Yewell, founder, and CEO of Welocalize is strongly convinced that the value-add won’t definitely be in technology but in the strength of the service.

What bright future translation and translators have ahead! And forget about the Bodo Dilemma.[ an abundance of tools, technology, data and innovative solutions combined with a painstaking shortage of human talent to properly deploy them. ]

This is just a part of the problem. Another major issue comes from how numerical data on the industry are presented.

For example, the growth of the translation industry over the last decade or more is usually presented as linear, steady and unceasing, but it is expressed with revenues only. If the same trend is displayed on a combined graph together with percentages, things look a little bit different.


If profits or volumes are taken into account, things might be even less exciting. In fact, while volumes have been possibly—undoubtedly if we should trust the same sources—growing as much as revenues—indeed they should have been much higher according to the same sources—we might discover that profits have not been growing at the same pace. Any real industry ‘veteran’ with a ungarbled memory can tell that, in the last twenty-five years, prices have been undergoing an increasing pressure and compensations, at best, have remained unchanged, i.e. in real terms they have halved. On the contrary, it does not take a genius to figure out that, over the same period, IT has made volumes increase by at least a 10x factor while productivity has, at most, tripled. In other words, revenues are not the best metric to measure growth. Also, the translation industry is notoriously made up of very ‘light’ players, who rely almost exclusively on outsourcing. Therefore, even the average revenue per employee and the average revenue per salesperson are not reliable metrics. A volumes/revenue ratio would be more suitable, but the two figures might be very hard to get from players (remember, everybody, lies.) EBITDA (i.e. earnings before interest, taxes, depreciation, and amortization) might be a good metric to evaluate profitability, even though it has its drawbacks too. In fact, it is often used as an accounting gimmick to dress up a company’s earnings. More properly, it is good to meet the original purpose to indicate the ability of a company to service debt.

The noise around the recent acquisitions or the interest in a few translation businesses by some private equity firms is just more wood for the hype fire.

The recent deal for the acquisition of Moravia by RWS is a purely industrial (i.e. not just financial) transaction. Incidentally, Moravia has been entirely held by a private equity since 2015. Clarion Capital Partners sold Moravia to RWS for twice the company’s revenues, 11.8 times the 2016 EBITDA. H.I.G. bought Lionbridge for a fraction (64%) of the company’s revenues. RWS acquisition of Moravia will be a typical LBO, through a combination of equity (60%) and debt (40%.) RWS exposure is therefore expected to be substantial (roughly USD 400 million in total) corresponding to the combined pro-forma annual revenues.

After a bitter—to say the least—three-year war to win control of the company, the forced sale of TransPerfect might hardly be anywhere near to the projected USD 1B.

In essence, organic growth in the translation industry has long been left to small businesses. Even medium-size businesses are now relying on M&A to expand. See Arancho Doc’s M&A history with the acquisition of the fellow Italian, 40-year-old LSP Soget in April to be subsequently acquired by Technicis.

Technology is not the primary interest of private equity funds looking for investments, nor is it the service, however profitable. In an industry where growth increasingly happens through M&A, mid- and large-sized translation businesses are easy preys and vehicles for easy money. Also, the size of translation companies is considerably lower than that of other companies with comparable performance in other industries and this makes them even more appealing.

Why the hullabaloo, then, around the alleged interest of private equity funds for translation businesses? What do you expect from an intelligence channel with a boasted base of a few thousand readers releasing the results of casual surveys run through its main outlet with a rate of response of 0.9%? Tertium non datur: either the released news is just gossip, or industry players are dancing on the Titanic.

Chasing the hype ends up with us getting lost. In 2016, the eight fastest growing industries to invest in were 3-D printing, drones, marijuana, virtual reality, AI, food e-commerce, wind energy, and green building. In 2017, they grew to eleven, virtual reality, video games, elderly health care services, physical therapy, translation and interpretation services, biotechnology, VoIP, drones, green energy, water and water treatment, and marijuana.

Maybe marijuana is the one secure investment, judging by certain analyzes and those who support them. Incidentally, BLS expects a 29 percent increase in the number of jobs in the translation and interpretation service industry by 2024.

After The No Asshole Rule and The Asshole Survival Guide, do we need a No Bullshit Rule or a Bullshit Survival Guide?

A Bullshit Process Graphic



=======================

Luigi Muzii's profile photo


Luigi Muzii has been in the "translation business" since 1982 and has been a business consultant since 2002, in the translation and localization industry through his firm. He focuses on helping customers choose and implement best-suited technologies and redesign their business processes for the greatest effectiveness of translation and localization related work.

This link provides access to his other blog posts.



Monday, October 16, 2017

The Use of Machine Translation in eDiscovery

There are some kinds of translation applications where MT just makes sense, and it would be foolish to even attempt these kinds of projects without decent MT technology as a foundation. Usually, this is because these applications have some combination of the following factors:
  • Very large volume of source content that simply could NOT be translated without MT in any useful time frame
  • Rapid turnaround requirement (days, hours or minutes) for the content to have any value to the translation consumers
  • A user tolerance for lower quality translations at least in early stages of information review
  • To enable information and document triage when dealing with large document collections and help to identify highest priority content from a large mass of undifferentiated content. This process also helps to identify the most important and relevant documents to send to higher quality human translation.
  • Translation Cost prohibitions (usually related to volume)
One can find this combination of requirements in several customer communications oriented applications like technical support knowledge-base, eCommerce product listings, customer service, and CX reviews for all kinds of products and service experiences. However, in an increasingly digital world, we see the need to be able to process large volumes of business content to identify what is most relevant and valuable for ongoing business mission needs as well. One such business information triage application is eDiscovery. In my time in working with MT, I have seen that this is an ongoing need that will continue to build momentum as we become digitally focused workers.

SYSTRAN has been a leader amongst MT solution providers in the eDiscovery segment, and have a long track record of success in this segment, and from my vantage point, a greater sensitivity to the customer needs of this segment than most others. Recently, they gave me unhindered access to a few of their eDiscovery customers, who provided insight into what really matters in terms of MT from the user perspective. This post will describe some key requirements from an active user’s perspective, especially Alvarez & Marsal in London.  In particular, their willingness to share their insights enabled me to provide and validate my own observations made in the substance of this post. I have also had a previous guest post from iQwest that also described the use of MT in eDiscovery applications from a service provider perspective.



What is eDiscovery?


Electronic discovery (sometimes known as e-discovery, eDiscovery, or e-Discovery) is the electronic aspect of identifying, collecting and producing electronically stored information (ESI) in response to a request for production in a lawsuit or internal corporate investigation. ESI includes, but is not limited to, emails, documents, presentations, databases, voicemail, audio and video files, social media content, and websites.

The processes and technologies around eDiscovery are often complex because of the sheer volume/variety of electronic data produced and stored. Additionally, unlike hard-copy evidence, electronic documents are more dynamic and often contain metadata such as time-date stamps, author and recipient information, and file properties. Preserving the original content and metadata for electronically stored information is required in order to eliminate claims of spoliation or tampering with evidence later in a litigation scenario.

What typically happens with an initially large mass of documents in an eDiscovery scenario is that some combination of the following activities is run to help organize and identify the most important material from a large document mass (Not sure it is quite a corpus – usually it is much too unstructured to call it that). Practitioners use phrases like “analytics phase”, “predictive analytics”, “predictive coding”, or “analysis phase” to the process they apply to winnow the document mass into a relevant set of high-value documents. It usually includes:

Classification: Users gather a select representative set of the documents from the existing document mass that represents the key interests and relevance of subject matters to be analyzed.
Clustering: They build out documents selected in the classification stage to find similar documents that match required cluster definitions and algorithms of the representative documents.
Summarization: This organization assists the user in selecting key sections of these documents as keywords, phrases, and summaries for use in litigation or corporate governance applications.
N-Grams: N-Grams are the basic co-occurrence of multiple words that are within any context. These could help identify a set of documents that have higher relevance and value in specific investigations and review and be useful in the winnowing process, or in understanding the linguistic profile of the mass of documents
The EDRM model overviews the typical process journey to increased relevance

Thus, after organization, collation and identification documents are sent to a translation process which will often require MT because of the sheer volume. MT allows the right documents to be identified for further refinement (with human translation) or analysis and review. This identification of a smaller set of more important documents from a large set is the essence of the triage process.

“Our projects are varied and are not all focused around litigation. For example we often perform regulatory exercises and investigations. In these situations, it is often not known at the onset what is required; therefore, the culling of data is based more upon an investigative nous [investigative mindset] and the utilization of analytics features such as document categorization or clustering. In this instance, samples of various documents, related to different investigatory routes, are sent for translation to [MT to] help our teams develop an understanding of the data. The ability to provide our investigators with the option to translate documents on the fly is also a massive benefit in these types of matters.” Alvarez & Marsal, UK

In terms of languages that matter in eDiscovery, the sense I get from my investigation is that it is quite diverse, but a lot of the work involves going from a variety of source languages into English (or German). Some say that CJK and FIGS matter most in an increasingly global world, but the needs are always case-specific so it can be as far ranging as Greek, Norwegian, and Swedish. In terms of subject domains of focus, we see that in the litigation scenarios, product liability, and patent infringement tend to dominate, but these categories could cover a wide range of domains ranging from consumer electronics, IT, automotive, pharmaceuticals/medical equipment, to financial and also extractive industries.

While many equate eDiscovery projects only with litigation related content, the market beyond litigation seems to be growing just as rapidly. In an increasingly digital world, the need to understand electronic data flows within a global enterprise for information governance needs can be useful for many different reasons as A & M again point out:
“Alvarez & Marsal get instructed on a very wide range of matters, including contentious projects around internal investigations, dispute resolution, insolvency, and compliance programs. However, not all of them are contentious in nature – for example, performance improvement and valuations. A common thread is that they are document ‘heavy’ and therefore require our skill sets to effectively conduct them. The use of the technology differs in each scenario. As a result, understanding the client requirements and the capabilities of the technology allows us to devise suitable workflows for handling the documents. However, where foreign languages are involved we use Systran translation technologies to the same effect. “
eDiscovery is basically a data culling and relevance ranking process

What Matters in an MT Solution for eDiscovery?

  • Rapid and Straightforward Accessibility: Attorneys, corporate governance and compliance professionals who function from within an eDiscovery platform environment need to be able to operate MT with ease. And most typically this will be from directly within the document analysis and organization platform that is the key application for many of these professionals. However, in very large cases documents may be sent in bulk to MT, but again the ability to manage and review relevant documents from within the review platform is a key requirement.
  • Language Identification: One of the first steps in classification and organization of documents is to group documents by source language and thus this is a critical step in the process. The ease and efficiency of this language identification process is very important for many users, as it is the first level of triage. Also, some languages may need different processing flows if MT is not available and non-automated procedures need to be incorporated. The ability to automatically identify the source language on-the-fly for a large variety of languages is also a key requirement, as reviewers follow relevance threads and need ad-hoc translations of documents on-the-fly that are related to investigation subject matter. Often reviewers will submit a batch of documents that may be in different languages, thus an MT solution that can automatically identify and translate is an advantage, and allows batches of files to be uploaded without concern regarding what language they are in.
  • Integration with the eDiscovery Platform: This needs to be much deeper than being able to pass source and target text files back and forth. Relativity is a particularly important document review platform in eDiscovery, especially in litigation scenarios. They also have been used extensively as the review platform of choice by many who care about processing multilingual content. One reason that SYSTRAN dominates in the eDiscovery segment is that they have a native Relativity connector. This is a “deep integration” that is built to integrate seamlessly into the software interface already familiar to Relativity users, and is built with Relativity best practices in mind, and validated by Relativity and their existing customers to provide value in real-world multilingual discovery cases. The deep integration with this platform not only allows single language identification and translation but also allows for multiple language identifications and translation within a single document, which is especially important for email threads. I have noticed over many years in the MT business that integration with a document review platform is a particularly important requirement, and while Relativity is not the only eDiscovery platform available, it is probably the most important one. Here is a Gartner Magic Quadrant for eDiscovery software where you can see that kCura (Relativity) is a leader.
  • Ability to Process Primary Document Formats: This would at a minimum be emails, Office documents, text files, PDFs, web content, and increasingly social media content from Twitter and Facebook, as well as audio and video content. More and more, we see that emails are the most common document format that is processed in a review platform. Often an email thread could be in two or more languages and thus the market need for MT solutions that can handle multiple languages within the same document has become much more urgent and even a mandatory requirement.
  • Security and Data Privacy: For some matters, users care that systems can be installed on-premise and that no data is transported outside a secure firewall. There are often data custody restrictions linked to projects which also greatly constrain what MT solutions can be used.
  • Scalability - Ability to process Very Large Data Sets in addition to Ad-Hoc needs: Some cases may require that terabytes and even petabytes of data are involved. In such cases, MT efficiency can be a significant factor and drive MT system selection. On these very large PB sized projects, RBMT solutions have a clear advantage (in terms of performance and raw processing efficiency) and this perhaps also explains why SYSTRAN has been a long-term and dominant player in this market segment. They can provide a range of MT solutions that can meet different user requirements. The degree of automation should be such that 10,000 documents can be submitted with the same ease as 10 documents can.
  • Easily Customizable: Customization of MT systems can vary in complexity and time investment requirements. It can be done rapidly with dictionaries and glossaries, or in some cases some vendors provide pre-built domain focused baselines MT engines e.g. automotive, financial, chemical, IT, legal. For very long-running and high-value cases/subject matter the need may arise for translation memory based customization, but the most common scenario in eDiscovery seems to be rapid customization. The availability of a range of domain glossaries and domain focused engines make higher quality MT output possibly with minimum effort. There seems to a market need for a web-based simple point-and-click interface for adding dictionary terms or translation memories (TMs), that can include integrated testing and deployment features, and also out-of-the-box domain-specific MT for a variety of domains as described above. Also, a typical flow may involve that limited customization is done on the bulk level but once a document set is culled, it makes sense to customize the MT system to improve MT output quality. MT output quality is an important determinant of selection, as we see from the user comment below. An effective customization process also helps to extract the most relevant set of documents for human translation efforts.
  • Special Features: There are several things that MT vendors can do to help users get better output results, and some vendors provide ways to perform rapid customization with glossaries that are driven by n-gram analysis, use monolingual data to improve fluency and quickly incorporate available TM to tune the engine on the subject matter of interest. Other capabilities that also exist in MT solutions include:
    • Some systems allow for anonymization and/or pseudonym-enabling of review data to enable and facilitate cross-border data transfers & reviews. This allows data sharing between work groups, while still complying with international data privacy laws and legal chain of custody requirements. 
    • For advanced and more technical users there are also some vendors who provide toolkits to do corpus analysis and modification. This would allow users to add linguistically informed routines to enhance the data above and beyond what the eDiscovery platform can do.
    • Audio & Video. The need to be able to handle digital “documents” now increasingly includes voicemails, conference call recordings and video.

While I am not suggesting that SYSTRAN is the only MT vendor who could service eDiscovery market MT needs, I am saying that they have solved several very specific problems that really matter to an eDiscovery user, and thus are likely to be a preferred vendor in many cases related to multilingual eDiscovery, in the same way that Relativity is for eDiscovery applications in general. In support Alvarez & Marsal comments:
“A key reason for using SYSTRAN was the depth of integration with Relativity, which means our clients see it is as one connected, flexible and effective solution – providing them with reassurance and comfort in only having to use one tool [Relativity]. In addition, the speed and accuracy of the translations were impressive when benchmarked against other providers, as well as the simplicity of accurately translating documents with a few mouse clicks.
The outlook for the future suggests that the eDiscovery will only gain momentum as corporate governance begins to monitor social media, and as we realize that email is increasingly understood to be a source of problems for information governance issues and compliance. Emerging regulations, especially in Europe, suggest the need will be even greater in the EU. Several eDiscovery service providers I talk to have suggested that multilingual documents are now increasingly common and this trend will only gain momentum in future. A closing comment from A & M:
“The need for accurate and efficient translations is definitely growing within the eDiscovery market… We are consulting more and more with clients whose data contains a mix of various languages and we do not see this need slowing down in the near future. “