Pages

Wednesday, May 18, 2011

Can a Controlled Language Help Machine Translation?

Here is another guest posting initiated from the LTAC conference at LUSPIO in Rome.This post authored by Orlando Chiarello, provides an overview of the benefits of controlled language, or in a broader sense improved / standardized  source material to any translation process. Historically CL has often been associated with RbMT but the benefit of cleaning and standardizing source material is beneficial to SMT as well, as the example below shows. Any efforts made to improve and/or standardize source material are very likely to result in better MT quality and help any ongoing translation automation process. While the degree of control suggested by CL is not always possible with dynamic customer content, this post presents some examples of where this approach does make sense.
 
For a more complete set of links and further discussion on this subject, some may also wish to refer to the old but still relevant discussion in the LinkedIn Automated Translation Group (requires membership) Discussion on the use of Controlled Language in SMT vs RbMT This link has a detailed discussion on how CL or source language simplification can improve the results obtained from MT. 
==================================================

The ASD* Simplified Technical English Maintenance Group, or STEMG (www.asd-ste100.org) is having its Spring Meeting these days (17 – 20 May) at Airbus in Toulouse. I am the Chair of this group and I would like to take this opportunity to give a brief overview of ASD Simplified Technical English, ASD-STE100 (STE).

STE is an international specification for the preparation of maintenance documentation in a controlled language.
 
It was developed in the early Eighties (as AECMA Simplified English) to help the users of English-language documentation to understand what they read. The STE provides a set of Writing Rules and a Dictionary of controlled vocabulary.


The Writing Rules cover aspects of grammar and style; the Dictionary specifies the general words that can be used. These words were chosen for their simplicity and ease of recognition. In general, there is only one word for one meaning, and one part of speech for one word. In addition to the specified general vocabulary, STE accepts the use of company-specific or project-oriented technical words (Technical Names and  Technical Verbs), provided that they fit into one of the categories listed in the specification.
 
The international language of many industries and specifically of the aviation industry is English and English is the language most used for technical documentation. However, it is often the native language neither of the readers nor of the authors of such documentation. Many readers have knowledge of English that is limited, and are easily confused by complex sentence structures and by the number of meanings and synonyms which English words can have.
 
The controlled grammatical structures and vocabulary – on which STE is based – have the purpose of producing texts that are easily understandable and, consequently, STE reduces errors during the maintenance tasks.
 
Although this controlled language was originally designed for the aviation  industry, companies from other industries and domains use it to standardize their documentation in an easy, understandable and unambiguous way. As an example, in March, I gave a two day training course on STE to a company located in Munich producing medical devices. The course turned out to be a great success.
 
Also, the LUSPIO University in Rome was involved in a project with an Italian company producing furniture for the development of a controlled Italian to be used by that company in all their documentation. The STE principles and rules have been the primary basis for the creation of this Controlled Italian. The results of this project were presented at the LTAC (LUSPIO Translation Automation Conference)  on 5 and 6 April where I was also invited and made a presentation of STE.
 
STE can really help also Machine Translation, which was one of the primary objectives when this Controlled Language was developed. As an example, the following is a paragraph in STE (taken from a component maintenance manual) translated into Spanish by simply copying the text in the Web Google Translator and run it:
 
Original text in STE:
The procedures in this manual are a guide to do the correct maintenance of the component. Some equivalent procedures - that come from the experience and skills of the maintenance personnel - are also satisfactory.
 
Text translated by Google:
Los procedimientos en este manual son una guía para hacer el mantenimiento correcto de los componentes. Algunos procedimientos equivalentes - que vienen de la experiencia y habilidades del personal de mantenimiento - son también satisfactorios.

As we can see, the result is quite impressive.
 
To conclude, the above example proves that if the "source text" is English and the text is written in STE, Machine Translation can be dramatically helped by the principle of "one word = one meaning". A further help to Machine Translation could be the availability of a "mirror" Controlled Language based on STE. For example, the French Aviation Industries (GIFAS) in the Eighties created the "Rationalized French" based on STE. They actually used the same structure of the Writing Rules and Dictionary and adapted them to French. The result was exceptionally good with benefit to translations in both ways. Other attempts were made and other are currently in progress with other Languages including Swedish, German, Spanish, Chinese and Italian.

“ASD represents the aeronautics, space, and defense industries in Europe. ASD has 28 member associations in 20 countries, representing over 2000 companies with a further 80 000 suppliers, many of which are SMEs. Total annual industry turnover is over €137 billion



Orlando 1
Orlando Chiarello is the Product Support Manager of Secondo Mona, an Italian aerospace equipment manufacturer. He is responsible for the aftermarket support of the company products.
 
He is also the Chairman of the ASD Simplified Technical English Maintenance Group (STEMG), responsible for the development and maintenance of the ASD-STE100 Specification.

17 comments:

  1. Thanks so much for your post, Orlando. I think this is an excellent explanation of STE and its value both in industry and to MT. Strict use of ASD-STE 100 may be a bit drastic for some companies. That said, controlling your language - meaning, following a standard set of style rules and term usage - makes everything much easier to read and translate. I couldn't agree with you more. ~Val

    ReplyDelete
  2. Great post, as usual, Kirti. I see two challenges and one opportunity this space: a) the name, b) the focus on translation, and c) the usage of tools.

    Few creative people, particularly writers (even technical ones) like to be put in a situation where their creative choices are constrained, limited or "controled". We should have a prize for whoever replaces "controled English" with a more appealing phrase, a la "negative growth".

    On the other hand, I think that we miss the point by constantly linking CL with translation. Writers do not typically have translation in mind. Their goal is messaging and readability. This is the opportunity to "sell" CL to them. This effort is not about limiting your creative freedom. Instead, it is about expanding the amount of people who can understand and profit from it. This is particularly relevant when a large percentage of your readers do not have English as their mother tongue.The fact that we can get translation improvements, especially in connection with MT, is the icing on the cake.

    Consider the following practical exercise: extract the longest 100 sentences from your documentation. Show them to the content originators. Ask them to explain what they meant. Then ask them to imagine how a remote translator, not so familiar with the subject matter, "may" interpret them. Finally, for the entertaining value, run them through an MT engine.

    In terms of opportunities, we have to keep in mind that writers are not translator-hating evil individuals. They are busy people with a large number of guidelines, glossaries, style recommendations, language bulletins, etc etc. in mind. This abundance of, sometimes contradictory, data stresses the limits of human processing capacity. Once a CL standard is agreed upon, it must be codified and programmed into the authoring tool. The goal must be to make writers more productive with CL, rather than using it as a hammer against them.

    Pedro
    Posted by Pedro Gomez

    ReplyDelete
  3. Pedro

    You should really be giving Orlando the credit for this post.

    I agree the phrase "Controlled Language" is not ideal but neither is STE or IQ in my opinion. They all point to improving source content to make it more useful and effective and provide varying degrees of assistance and guidelines.

    I also agree that just having it be useful, period, should be the goal rather than be useful for translation.

    Language tools are really tough to get right because machines are often wrong (MT,CL,IQ, TM) but yet we still need to find some kind of a balance where they do provide enough utility and value to be considered necessary. In MT volume and speed are drivers for building the use case rationale in any cases - I am not sure what they would be for CL.

    Orlando may also respond to this.

    Thanks

    Kirti

    ReplyDelete
  4. Each communication situation requires its own language and register: writing ability corresponds to being able to cross different communication layers, managing to choose the language that suits each event best.
    The goal of every (technical) writer should be informing his readers in a precise and concise manner. When some writers say that a controlled language is limiting their creativity they are, in fact, caring for their 'creativity' only, drowning in their narcissism, showing little or no attention and respect, if not real contempt, for the end user, who is exactly interested only in understanding the content to use it according to his goals. Therefore, the author's ability may, even be exalted with a wise use of a controlled language. As a matter of fact, the most valuable of all talents is that of never using two words when one will do, and any writer who cares for his readers writes for translation. He may not know, but he does, even when he just cares for messaging and readability. Any text than can't be 'understood' by machines can also be 'understood' by humans. Translators, as readers and like readers, should not interpret a writer's intentions, since fundamental accuracy of statement is the one sole morality of writing.

    ReplyDelete
  5. Kirti- Thanks for asking Orlando guest-post. I hear nothing but glowing reviews for that conference in Rome!

    Orlando- Your post is a great example of clear writing. Thank you for sharing.

    Pedro- "Once a CL standard is agreed upon, it must be codified and programmed into the authoring tool." I could not agree more. We can't expect content creators to memorize standards, glossaries, etc. They must be at their fingertips, integrated into their authoring tools.

    "The goal must be to make writers more productive with CL, rather than using it as a hammer against them." Indeed! I was speaking with an Information Architect yesterday about his biggest concerns in integrating his standards and terms into his authors' writing tools environment. His primary concern was that the writers be more productive as a result, not hampered. He also was concerned that his writers see this integration as an empowering opportunity to learn and deliver more consistent, measurable quality.

    Best wishes,
    /Mary

    ReplyDelete
  6. Pedro Gomez wrote:
    I see two challenges and one opportunity this space: a) the name, b) the focus on translation, and c) the usage of tools... Few creative people, particularly writers (even technical ones) like to be put in a situation where their creative choices are constrained, limited or "controled".

    Pedro, I am a technical writer. I like STE. The constraints of STE help me to write clearly.

    As an alternative to 'controlled', use 'optimised'. Probably, most people agree that optimisation is good.

    Posted by Mike Unwalla

    ReplyDelete
  7. Mike, yep, "optimised language" sounds very good, better than "controlled language"!

    Posted by Jean-Marie Le Ray

    ReplyDelete
  8. Orlando,

    It has been pointed out to me that the example shown in the posting suggests that very little consideration was given to the User Experience.

    While it meets the CL requirements and does produce acceptable MT output it does not really make the subject clear to the user. Shouldn't the value to the user (UX) be the most important driver of what is produced?

    ReplyDelete
  9. The advantage of Controlled English is that the text is ideal for MT on a one to many basis. Our Controlled English and MT is handling 30,000 pages a month. This volume is not possible by traditional means. The trick is to FIX the English. Most of our work is
    aircraft maintenance reports, repetitive, but must be accurate. Airlines must use
    ASD-STE100 Simplified Technical English (STE) for insurance compliance. Would you fly on an aircraft where the manual states, "you should have adjusted the aileron".
    Posted by John Smart

    ReplyDelete
  10. In a word: yes. When we did a controlled language/MT pilot at GM about a decade ago the benefit was substantial. The model year 2000 maintenance manual for the Chevy Metro was translated into French and we sent the MT results out to Alpnet in Montreal for post-editing. It came back at a 61% cost savings and 40% faster than by traditional means. Based on pilot we were able to obtain a production contract that represented cost savings even better than that, over 80% savings.

    Posted by Kurt Godden

    ReplyDelete
  11. If you go to the Controlled Language website page :
    https://www.box.net/shared/qug5r1m9tp

    go to the each of the links of the various CLAW workshops which provide links to the research and implementations done by various industry players (including Kurt Godden above) and tools providers (including John Smart above) on the results obtained.

    Jeff
    former CLAW chairperson and/or technical committee member (CLAW 2000, 2003, 2006)
    Posted by Jeff Allen

    ReplyDelete
  12. The case for quality source content facilitating translation (human, MT, TM) is well made. However the English example here is very poor. It could have been written so more concisely. If that's an improvement, I'd hate to see the original!

    It could have been something like:

    "Use the procedures in this manual to maintain the component. You may also rely on the procedures that the maintenance personnel have developed through direct experience with the component."

    Lack of insight into how technical writers (and indeed software developers) actually work and take pride in their work that makes it harder to introduce tools and processes that benefit everyone, and ultimately the consumer of the information, regardless of language.

    ReplyDelete
  13. Disappointed O Software UserMay 24, 2011 at 12:31 AM

    I won't spend a penny for a piece of software with poorly written clumsy message wording developed by a company whose employees go around the world talking about things they cannot apply and actually do not apply.

    ReplyDelete
  14. Dear Kirti,
    I found Ultan's comment really unrespectful for the many people who are effectively developing user documentation for ASD companies. He may possibly show us a few example of the excellent pieces of documentation he wrote for his company, or of software user interface and documentation his company developed. We could all have to learn from him.

    ReplyDelete
  15. Orlando ChiarelloMay 25, 2011 at 5:26 AM

    Dear Kirti and all,

    Thank you for your comments.

    Due to recent engagements I have not been able to answer to each specific post but I will try to do it soon.

    Just two points of clarification, if useful: ASD-STE100 is an international specification and not a software product.
    Software products that support STE are excellent authoring tools but they do no think in our place. At the end, it is the author that must decide if the text is correct and clear. This is applicable when you write in STE or you apply MT as in the example I produced.

    The second point is that ASD is not a company but a wide Association which represents the aeronautics, space, and defense industries in Europe. Its equivalent in the United States is the AIA (Aerospace Industries Association) which is part of the STEMG since the very beginning of the project. ASD has 28 member associations in 20 countries, representing over 2000 companies www.asd-europe.org

    The creation of STE was driven by the Customers (as explained in the slide presentation) and the representative of the Customers are part of the STEMG.

    Regarding little consideration was given to the User Experience, if useful I'm quoting the Specification itself which states in its Introduction:
    ******************
    "The use of the specification requires a high standard of professionalism on the writer's side".
    *****************
    And also:
    *****************
    "The writer must have a good command of written English. The specification will help the writer present complex information in a simple form. Writing clearly is a complex task, and writing in STE requires language fluency."
    ****************

    I hope it helps.

    Kindest regards,

    Orlando

    ReplyDelete
  16. As a member of Orlando's STEMG and a professional linguist, I would like to make a point about the difference between writing in Simplified Technical English (STE) and translation. Although translation is not, strictly speaking, always a matter of preserving exact meaning, meaning preservation is a primary goal of the translation process. The translator strives to convey the sense intended by the author of the source text. This is less true for the writer of STE, whose primary goal is to clarify language. STE authors must constantly judge the source text for two different types of information: missing information and extraneous information. When you add missing information or omit extraneous information, you are changing the meaning of the original source text, sometimes even the intentions of the original author.

    To better understand what I mean here, let's consider a simple example. STE bans the use of the verb "inspect" and "examination", but it allows "inspection" and "examine". The problem with "inspect" is that there are many different ways to inspect something. One could look at it, feel it, smell it, taste it, and so on. So we want writers of maintenance procedures to avoid the use of a general verb that can hide the method of inspection from the reader. On the other hand, a writer might well cite a named convention--a type of "inspection"--that is defined elsewhere in the maintenance document.

    The verb "examine", on the other hand, is defined in our dictionary as "to look carefully at". So its method of inspection is more specific. In this way, those of us who have worked on the STE specification have designed it to try to drive writers to express themselves more clearly. In the process of doing that, we sometimes encourage them to change the meaning of the source text or ideas that they start with when they begin to formulate a set of instructions.

    ReplyDelete
  17. As a member of Orlando's very hard-working STEMG, I would just add that the primary goal of STE has always been to convey a technical message as clearly and unambiguously as possible. STE is not "a goal in itself": indeed - as has been pointed out by others - it was developed with the full support of customers, and customers continue to be involved in the ongoing development process.

    Don't forget also that STE's major added value is as a unifying language to facilitate comprehension between people for whom English is not the native language. English is the second language of many mechanics / pilots / ground personnel and, while they may have a good general understanding, their skills may not stretch to comprehending idiomatic expressions and instructions that seem perfectly clear to native speakers.
    In short, some STE wording may sound "unnatural" to the native ear... but STE is technical writing, not poetry! And many, many years of research and testing have been put into this ever-evolving controlled language.

    Shirley Blume

    ReplyDelete