Pages

Showing posts with label digital transformation. Show all posts
Showing posts with label digital transformation. Show all posts

Thursday, November 7, 2019

The Global Data Explosion in the Legal Industry

As we consider and look at the various forces impacting the legal industry today, we see several ongoing trends which are increasingly demanding more attention from both inside and outside counsel. These forces are:
  • The Digital Data Momentum
  • Increasing Concern for Data Security
  • The Growing Importance of Information Governance
  • Increasing Globalization 

 

The Digital Data Momentum


Several studies by IDC, EMC and academics have predicted for years that we are facing an ever-growing data deluge and content explosion. The prediction that the digital universe will be 44 zettabytes by 2020 means little to most of us. But if you state that 500 million tweets, ~300 billion emails, 65 billion Whatsapp messages are sent, and 3.5 billion Google searches are made every single day, many more of us would understand the astounding scale of the modern digital world. While only a small fraction of this data will flow into the purview of the legal profession, the impact is significant and most legal teams will admit this increase in content is a major challenge today.



The enterprise is also affected by this content explosion, and a recent eDiscovery Business Confidence survey identified increasing data volumes as THE primary concern for the coming future. In eDiscovery settings, this also means that the information triage process is complicated since we are seeing not only significant increases in volume, but we are also seeing a greater variety of data types. The modern legal purview can include mobile data, voice and image data from various sources in addition to the data flowing in various enterprise IT systems. 

 Increasing Concern for Data Security

 

While data security has not been a concern in the past, it is increasingly being seen as a key concern. At recent Davos conferences, cybersecurity and data privacy breakdowns are seen as the biggest threats to businesses, economies, and societies around the world. According to the World Economic Forum (WEF), attacks against businesses have almost doubled in five years and the costs are rising too. “The world depends on digital infrastructure and people depend on their digital devices and what we’ve found is that these digital devices are under attack every single day,” said Brad Smith, president, and chief legal officer, Microsoft. He added that attacks by organized criminal enterprises are becoming “more prolific and more sophisticated”, often “operating in jurisdictions that are more difficult to reach through the rule of law but use the internet to seek out victims literally everywhere.”

This rise of artificial intelligence and machine learning also means that global enterprises are interested in acquiring and harvesting data, wherever and whenever they can. Businesses are looking to acquire as much information as possible, about customers, interactions, brand opinions, and extracting insights that might give them an edge over the competition. Data-guzzling machine learning processes promise to amplify businesses’ ability to predict, personalize, and produce. However, some of the world’s largest consumer-facing companies have fallen victim to data breaches affecting hundreds of millions of customers. By all measures, the disruptive, data-centric forces of the so-called fourth industrial revolution appear to be outpacing the world’s ability to control them.

Legal professionals will need to play a larger role in managing these new risks, which can be devastating and cost millions in reparations and negative consequences.  Increasingly these threats originate in foreign countries and sometimes even with support from foreign governments

 Internal Investigations

 

The Growing Importance of Information Governance

 

The modern global enterprise has a very different risk tolerance profile from similar companies, even as recently as 10 years ago. The “datafication” of the modern enterprise creates special challenges for both inside and outside counsel.  Recent surveys by Gartner suggest that legal leaders have to start investing in digital skills and capabilities, reflecting the evolving role of the legal department as a strategic business partner.

“How legal departments build capabilities to govern risk within digital initiatives matter more than the legal advice they provide” says Christina Hertzler, Practice Vice President, Gartner.

To be digitally ready, legal departments must shift their approach to manage specific changes created by digitalization — more stakeholders, more speed and iteration, and the increased technical and collaborative nature of digital work, as well as handling new information-related risks.

As organizations change the way they operate, generate revenue and create value for their customers, new compliance risks are emerging — presenting a challenge to compliance, which must identify, assess and mitigate risks like those tied to fundamentally new technologies (e.g., artificial intelligence) and processes.

Information Governance

There is a growing list of US companies already subjected to GDPR-related EU regulatory actions, including, Amazon, Apple, Facebook, Google, Netflix, Spotify, and Twitter. Indeed, the French Data Protection Authority, CNIL, recently levied upon Google a record fine of approximately $57 million dollars for “lack of transparency, inadequate information and lack of valid consent regarding ads personalization.” The risks to US companies include providing proof of measures taken to protect, process, and transfer personal data from the EU to the US in connection with regulatory investigations or litigation.  A report published in late February by DLA Piper cited data from the first eight months of GDPR enforcement, during which 91 fines were imposed. "We expect that 2019 will see more fines for tens and potentially even hundreds of millions of euros, as regulators deal with the backlog of GDPR data breach notifications," the report said. Taking meaningful steps now toward GDPR compliance is the best way for US companies doing business of any kind involving EU personal data—including those with no physical presence in the EU—to prepare for and mitigate their risk.

The penalties of non-compliance with regulatory policies continue to mount.  Google was fined $170 million and asked to make changes to protect children’s privacy on YouTube, as regulators said the video site had knowingly and illegally harvested personal information from children and used it to profit by targeting them with ads. We can only expect that data privacy and compliance regulations will be taken more seriously in the future and that legal teams will play an expanding role in ensuring this.

Facebook agreed to pay a record-breaking $5 billion fine as part of a settlement with the Federal Trade Commission, by far the largest penalty ever imposed on a company for violating consumers' privacy rights. Facebook also agreed to adopt new protections for the data users share on the social network and to measures that limit the power of CEO Mark Zuckerberg. Under the settlement, which concludes a year-long investigation prompted by the 2018 Cambridge Analytica scandal, the social networking giant must expand its privacy protections across Facebook itself, as well as on Instagram and WhatsApp. It must also adopt a corporate system of checks and balances to remain compliant, according to the FTC order. Facebook must also maintain a data security program, which includes protections of information such as users' phone numbers. The issue of data privacy and compliance will continue to build momentum as more people understand the extent of the data harvesting that is going on.

Taking meaningful steps now toward robust information governance and compliance for all kinds of privileged and confidential data will be necessary for the modern digital-centric enterprise, and the modern legal department will need to be able to be an active partner and help the enterprise prepare for and mitigate their risk.


Compliance and Regulation Processes

 

Increasing Globalization = More Multilingual Data

 

While these forces we have just described continue to build momentum, driven by increasing digitalization and the resultant ever expanding content flows, we also have an additional layer of complexity: language. The modern enterprise is now much more rapidly and naturally global, and thus now the modern legal department and outside counsel need to be able to process content and information flows in multiple languages on a regular basis. The variety and volumes of multilingual content that legal professionals need to process and monitor can include any and all of the following:
  • International contract negotiations and disputes
  • Patent-infringement litigation
  • Human Resource communications in global enterprises
  • Customer communications
  • GDPR Compliance related monitoring and analysis 
  • Cross-border regulatory compliance monitoring
  • FCPA compliance monitoring 
  • Anti-trust related matters
The volumes of multilingual content can vary greatly, from very large volumes that might involve tens of thousands of documents in litigation related eDiscovery, to specialized monitoring of customer communications to ensure regulatory compliance, to smaller volumes of sensitive communications with global employees.

Multilingual issues are especially present in cross-border partnerships and business dealings which are now increasingly common across many industries.
The AlixPartners Global Anticorruption Survey polled corporate counsel, legal, and compliance officers at companies based in the US, Europe, and Asia in more than 20 major industries. The perceived corruption risks are elevated in Latin America and China, and Russia, Africa, and the Middle East have emerged as regions of increasing concern. The survey found that 90% and 94% of companies with operations in Latin America and China, respectively, reported their industries are exposed to corruption risk. Of the 66% of respondents who said there are regions where it is impossible to avoid corrupt business practices, 31% said Russia is one such place and 27% cited Africa.

The sheer volume of information companies must collect, translate, and analyze is the biggest obstacle to tackling corruption, according to 75% of survey respondents. 

These concerns surrounding the management of data are expected to increase with increasing data privacy regulation such as the EU’s General Data Protection Regulation.

 Data Growth

 

End-to-end translation solutions for the legal industry 


Thus, we see today that language translation production capabilities have become imperative for the modern global enterprise and that the needs for translation can range from rapid translation of millions of documents in an eDiscovery scenario to very careful and specialized translation of critical contract and court-ready documentation. Given the volume, variety, and velocity of the information that needs translation, legal professionals must consider a combination of technology and human services. Ideally, solving these kinds of varying translation challenges would be done by technologically informed professionals who solve complex and varied translation problems and who can adapt language technology and human expertise to the challenge at hand. 

Language Translation



Several MT and language service vendors provide an enterprise-class, vendor agnostic, secure translation platform that allows you to combine regulatory compliance and translation best practice. Securing the translation supply chain needn’t come at the cost of trusted suppliers, existing relationships or impact time to market.

Multilingual Data Triage



This blog was originally published on SDL.COM with more SDL product information.

Friday, April 26, 2019

Understanding MT Quality - What Really Matters?

This is the second post in our posts series on machine translation quality. 

The reality of many of these comparisons today is that scores based on publicly available (i.e. not blind) news domain tests are being used by many companies and LSPs to select MT systems which translate IT, customer support, pharma, financial services domain related content. Clearly, this can only result in sub-optimal choices.

The use of machine translation (MT) in the translation industry has historically been heavily focused on localization use cases, with the primary intention to improve efficiency, that is, speed up turnaround and reduce unit word cost. Indeed, machine translation post-editing (MTPE) has been instrumental in helping localization workflows achieve higher levels of productivity.




Many users in the localization industry select their MT technology based on two primary criteria:
  1. Lowest cost
  2. “Best quality” assessments based on metrics like BLEU, Lepor or TER, usually done by a third party
The most common way to assess the quality of an MT system output is to use a string-matching algorithm score like BLEU. As we pointed out previously, equating a string-match score with the potential future translation quality of an MT system in a new domain is unwise, and quite likely to result in disappointing results. BLEU and other string-matching scores offer the most value to research teams building and testing MT systems. When we further consider that scores based on old news domain content are being used to select systems for customer support content in IT and software subject domains it seems doubly foolish.

One problem with using news domain content is that it tends to lack tone and emotion. News stories discuss terrorism and new commercial ventures in almost exactly the same tone.  As Pete Smith points out in the webinar link below, in business communication, and customer service and support scenarios the tone really matters. Enterprises that can identify dissatisfied customers and address the issues that cause dissatisfaction are likely to be more successful. CX is all about tone and emotion in addition to the basic literal translation. 

Many users consider only the results of comparative evaluations – often performed by means of questionable protocols and processes using test data that is invisible or not properly defined – to select which MT systems to adopt.  Most frequently, such analyses produce a score table like the one shown below, which might lead users to believe they are using the “best-of-breed” MT solution since they selected the “top” vendor (highlighted in green). 

English to French
English to Chinese
English to Dutch
Vendor A – 46.5
Vendor C – 36.9
Vendor B – 39.5
Vendor B – 45.2
Vendor A – 34.5
Vendor C – 37.7
Vendor C – 43.5
Vendor B – 32.7
Vendor A – 35.5

While this approach looks logical at one level, it often introduces errors and undermines efficiency because of the administrative inconsistency between different MT systems. Also, the suitability of the MT output for post editing may be a key requirement for localization use cases, but this may be much less important in other enterprise use cases.




Assessing business value and impact


The first post in this blog series exposes many of the fallacies of automated metrics that use string-matching algorithms (like BLEU and Lepor), which are not reliable quality assessment techniques as they only reflect the calculated precision and recall characteristics of text matches in a single test set, on material that is usually unrelated to the enterprise domain of interest. 

The issues discussed challenge the notion that single-point scores can really tell you enough about long-term MT quality implications. This is especially true as we move away from the localization use case. Speed, overall agility and responsiveness and integration into customer experience related data flow matters much more in the following use cases. The actual translation quality variance measured by BLEU and Lepor may have little to no impact on what really matters in the following use cases.



The enterprise value-equation is much more complex and goes far beyond linguistic quality and Natural Language Processing (NLP) scores. To truly reflect the business value and impact, evaluation of MT technology must factor in non-linguistic attributes including:
  • Adaptability to business use cases
  • Manageability
  • Integration into enterprise infrastructure
  • Deployment flexibility   
To effectively link MT output to business value implications, we need to understand that although linguistic precision is an important factor, it often has a lower priority in high-value business use cases. This view will hopefully take hold as the purpose and use of MT is better understood in the context of a larger business impact scenario, beyond localization.

But what would more dynamic and informed approaches look like? MT evaluation certainly cannot be static since systems must evolve as requirements change. Instead of a single-point score, we need a more complex framework that provides an easy, single measure that tells us everything we need to know about an MT system. Today, this is unfortunately not yet feasible.


A more meaningful evaluation framework


While single-point scores do provide a rough and dirty sense of an MT system’s performance, it is more useful to focus testing efforts on specific enterprise use case requirements. This is also true for automated metrics, which means that scores based on news domain tests should be viewed with care since they are not likely to be representative of performance on specialized enterprise content. 

When rating different MT systems, it is essential to score key requirements for enterprise use, including:

  • Adaptability: Range of options and controls available to tune the MT system performance for very specific use cases. For example, optimization techniques applied to eCommerce catalog content should be very different from those applied to technical support chatbot content or multilingual corporate email systems.
  • Data privacy and security: If an MT system will be used to translate confidential emails, business strategy and tactics documents, human evaluation requirements will differ greatly from a system that only focuses on product documentation. Some systems will harvest data for machine learning purposes, and it is important to understand this upfront.
  • Deployment flexibility: Some MT systems need to be deployed on-premises to meet legal requirements, such as is the case in litigation scenarios or when handling high-security data. 
  • Expert services: Having highly qualified experts to assist in the MT system tuning and customization can be critical for certain customers to develop ideal systems. 
  • IT integration: Increasingly, MT systems are embedded in larger business workflows to enable greater multilingual capabilities, for example, in communication and collaboration software infrastructures like email, chat and CMS systems.
  • Overall flexibility: Together, all these elements provide flexibility to tune the MT technology to specific use cases and develop successful solutions.

Ultimately, the most meaningful measures of MT success are directly linked to business outcomes and use cases. The definition of success varies by the use case, but most often, linguistic accuracy as an expression of translation quality is secondary to other measures of success. 


The integrity of the overall solution likely has much more impact than the MT output quality in the traditional sense: not surprisingly, MT output quality could vary by as much as 10-20% on either side of the current BLEU score without impacting the true business outcome. Linguistic quality matters but is not the ultimate driver of successful business outcomes. In fact, there are reports of improvements in output quality in an eCommerce use case that actually reduced the conversion rates on the post-edited sections, as this post-edited content was viewed as being potentially advertising-driven and thus less authentic and trustworthy.



True expressions of successful business outcomes for different use cases


Global enterprise communication and collaboration
  • Increased volume in cross-language internal communication and knowledge sharing with safeguarded security and privacy
  • Better monitoring and understanding of global customers 
  • Rapid resolution of global customer problems, measured by volume and degree of engagement
  • More active customer and partner communications and information sharing
Customer service and support
  • Higher volume of successful self-service across the globe
  • Easy and quick access to multilingual support content 
  • Increased customer satisfaction across the globe
  • The ability of monolingual live agents to service global customers regardless of the originating customer’s language 
eCommerce
  • Measurably increased traffic drawn by new language content
  • Successful conversions in all markets
  • Transactions are driven by newly translated content
  • The stickiness of new visitors in new language geographies
Social media analysis
  • Ability to identify key brand impressions 
  • Easy identification of key themes and issues
  • A clear understanding of key positive and negative reactions
Localization
  • Faster turnaround for all MT-based projects
  • Lower production cost as a reflection of lower cost per word
  • Better MTPE experience based on post-editor ratings
  • Adaptability and continuous improvement of the MT system

A more detailed presentation and webinar that goes into much more detail on this subject is available from Brightalk. 


In upcoming posts in this series, we will continue to explore the issue of MT quality assessment from a broad enterprise needs perspective. More informed practices will result in better outcomes and significantly improved MT deployments that leverage the core business mission to solve high-volume multilingual challenges more effectively.

Again, this is a slightly less polished and raw variant of a version published on the SDL site. The first one focused on BLEU scores, which are often improperly used to make decisions on inferred MT quality, where it clearly is not the best metric to draw this inference.