With the advent of Large Language Models (LLMs), there are exciting new possibilities available. However, we also see a large volume of mostly vague and poorly defined claims of "using Al" by practitioners with little or no experience with machine learning technology and algorithms.
The signal-to-noise (hype-to-reality) ratio has never been higher, and much of the hype fails to meet real business production use case requirements. Aside from the data privacy issues, copyright problems, and potential misuse of LLMs by bad actors, hallucinations and reliability issues also continue to plague LLMs.
Enterprise users expect production IT infrastructure output to be reliable, consistent, and predictable on an ongoing basis, but there are very few use cases where this is currently possible with LLM output. The situation is evolving, and many expect that the expert use of LLMs could have a dramatic and favorable impact on current translation production processes.
There are several areas in and around the machine translation task where LLMs can add considerable value to the overall language translation process. These include the following:
- LLM translations tend to be more fluent and acquire more contextual information, albeit in a smaller set of languages
- Source text can be improved and enhanced before translation to produce better-quality translations
- LLMs can carry out quality assessments on translated output and identify different types of errors
- LLMs can be trained to take corrective actions on translated output to raise overall quality
- LLM MT is easier to adapt dynamically and can avoid the large re-training that typical static NMT models require
At Translated, we have been carrying out extensive research and development over the past 18 months into these very areas, and the initial results are extremely promising, as outlined in our recent whitepaper.
The chart below shows some evidence of our progress with LLM MT. It compares Google (static), DeepL (static), Lara RAG-tuned LLM MT, GPT-4o (5-shot), and ModernMT (TM access) for nine high-resource languages. These results for Lara are expected to improve further.
At Translated, we have been carrying out extensive research and development over the past 12 months into these very areas, and the initial results are extremely promising, as outlined in our recent whitepaper.
One approach involves using independent LLM modules to handle each category separately. The other approach is to integrate these modules into a unified workflow, allowing users to simply submit their content and receive the best possible translation. This integrated process includes MTQE as well as automated review and post-editing.
While managing these tasks separately can offer more control, most users prefer a streamlined workflow that focuses on delivering optimal results with minimal effort, with the different technology components working efficiently behind the scenes.
LLM-based machine translation will need to be secure, reliable, consistent, predictable, and efficient for it to be a serious contender to replace state-of-the-art (SOTA) NMT models.
This transition is underway but will need more time to evolve and mature.
Thus, SOTA Neural MT models may continue to dominate MT use in any enterprise production scenarios for the next 12-15 months, except where the highest quality automated translation is required.
Currently, LLM MT makes the most sense in settings where high throughput, high volume, and a high degree of automation are not a requirement and where high quality can be achieved with reduced human review costs enabled by language AI.
Translators are already using LLMs for high-resource languages for all the translation-related tasks previously outlined. It is the author’s opinion that there is a transition period where it is quite plausible that both NMT and LLM MT might be used together or separately for different tasks in new LLM-enriched workflows. NMT will likely perform high-volume, time-critical production work as shown in the chart below.
In the scenario shown above, information triage is at work. High-volume content is initially processed by an adaptive NMT model, followed by an efficient MTQE process that sends a smaller subset to an LLM for cleanup and refinement. These corrections can be sent back to improve the MT model and increase the quality of the MTQE (not shown in the diagram above).
However, as LLMs get faster and it is easier to automate sequences of tasks, it may be possible to embed both an initial quality assessment and an automated post-editing step together for an LLM-based process to manage.
An emerging trend among LLM experts is the use of agents. Agentic AI and the use of agents in large language models (LLMs) represent a significant evolution in artificial intelligence, moving beyond simple text generation to create autonomous, goal-driven systems capable of complex reasoning and task execution.
AI agents are systems that use LLMs as their core controller to autonomously pursue complex goals and workflows with minimal human supervision.
They potentially combine several key components:
- An LLM core for language understanding and generation
- Memory modules for short-term and long-term information retention
- Planning capabilities for breaking down tasks and setting goals
- Some ability to iterate to a goal
- Tools for accessing external information and executing actions
- Interfaces for interacting with users or other systems
One approach involves using independent LLM agents to address each of the categories below as separate and discrete steps.
The other approach is to integrate these steps into a unified and robust workflow, allowing users to simply submit content and receive the best possible output through an AI-managed process. This integrated workflow would include source cleanup, MTQE, and automated post-editing. Translated is currently evaluating both approaches to identify the best path forward in different production scenarios.
Agentic AI systems are capable of several advanced capabilities that include:
- Autonomy: Ability to take goal-directed actions with minimal oversight
- Reasoning: Contextual decision-making and weighing tradeoffs
- Adaptive planning: Dynamically adjusting goals and plans as conditions change
- Natural language understanding: Comprehending and following complex instructions
- Workflow optimization: Efficiently moving between subtasks to complete processes
A thriving and vibrant open-source community will be a key requirement for ongoing progress. The open-source community has been continually improving the capabilities of smaller models and challenging the notion that scale is all you need. We see an increase in recent models that are smaller and more efficient but still capable and are thus often preferred for deployment.
All signs point to an exciting future where the capabilities of technology to enhance and improve human communication and understanding get better, and we are likely to see major advances in bringing an increasing portion of humanity into the digital sphere for productive, positive engagement and interaction.