tag:blogger.com,1999:blog-6748877443699290050.post7715058610882400555..comments2024-03-28T22:05:39.562-07:00Comments on eMpTy Pages: The Moses Madness and Dead FlowersKirti Vasheehttp://www.blogger.com/profile/16795076802721564830noreply@blogger.comBlogger25125tag:blogger.com,1999:blog-6748877443699290050.post-12945474387202016762011-12-12T05:39:17.420-08:002011-12-12T05:39:17.420-08:00@Mr Wiggins. Your claim that @Mr Hoar challenged A...@Mr Wiggins. Your claim that @Mr Hoar challenged AO's claim to "hear-human quality" is blatantly false and intentionally misleading.<br /><br />Please read my post carefully. I wrote I "missed the 'hype' from the DIY crowd claiming 'near human quality.'" In the entire post, I did not reference AO or challenge its claims to "near human quality. My comments about 'near human quality' were clearly limited to the DIY community. Surely, a word craftsman and former VP of Gartner Research can see the distinction. It was your comments that brought AO into the discussion, not once, but twice.<br /><br />By the way, has Gartner commented on AO's leveraging its Hype Cycle methodology in the blog's opening graph.tahoarhttps://www.blogger.com/profile/06893656133001786619noreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-9817987799994213232011-12-11T03:23:38.026-08:002011-12-11T03:23:38.026-08:00...Continued from previous post.
The reality is t......Continued from previous post.<br /><br />The reality is that Google and Bing will often give better results than a DIY or self-service solution, especially when the LSP does not have an understanding of the data and the necessary knowledge that is outlined above. Good MT systems can come from both DIY and self-service with this knowledge and skill. But this knowledge cannot be encapsulated behind a single click button now or ever. It takes time to build the necessary knowledge. There are some great results from DIY solutions such as Autodesk's example, but that was achieved via a lot of hard work, some failures, some experimentation and learning. Uploading data with a single click can be achieved, as can some cleaning of the data. But the real work on the data is not to remove formatting tags, it is to apply the right data necessary in the right quantities to build the best engine for a specific task. This is why, as @Gavin correctly acknowledges, a completely customized solution will always give the best results.<br /><br />Any MT vendor that tries to convince you that you can “upload your TM, 1 click and you have a fantastic MT system almost instantly” is misleading customers and will ultimately disappoint. Once the MT system is built, customers must use it and will immediately find that they have to spend far more on post editing the MT output than if they had built a system properly in the first place. The effort must be put in somewhere. There is no magic wand for great MT systems. They come from hard work and knowledge of what it takes to build a MT system. If you don’t put the effort into building the MT system, you will put even more effort into editing the MT output and ultimately end up costing more.Dion Wigginshttp://www.asiaonline.netnoreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-18108089919395000442011-12-11T03:20:00.761-08:002011-12-11T03:20:00.761-08:00...Continued from previous post.
@Dion: "Wha......Continued from previous post.<br /><br />@Dion: "What surprises me is that some DIY and Self Service MT promoters seem to be afraid to discuss these issues and instead convey the unrealistic message of ‘upload your TM, 1 click and you have a fantastic MT system almost instantly.’"<br /><br />@Gavin: It is not unrealistic, on the contrary it is very realistic! <br /><br />I was originally encouraged by the post from @Andy in his “Overgilding the lily” blog entry. Excerpts are below:<br /><br />@Andy: “As an academic, for over 20 years I’ve been educating students – many of whom now work in the industry – to not over-hype the capability of their companies’ tools and services.”<br /><br />@Andy: “We must all act collectively to make realistic claims about what we can and cannot achieve, and to be aware when engaging in publicising recent developments in our respective companies to be aware of the rich history of developments in our field, and to ensure that claims are in synch with what has gone before, lest we alienate the very people we’re all trying to attract.”<br /><br />However, my enthusiasm was rapidly eroded as I read recent posts with claims that “upload your TM, 1 click and you have a fantastic MT system almost instantly” are realistic, while also stating in other recent posts that “completely customized systems will always give the best results” sends a mixed and somewhat contradictory message. Even those who try DIY solutions will note that after the installation is done (or in the self service model, data uploaded) that nothing is instant – training an MT system takes time, CPU resources, disk resources and human resources.<br /><br />I agree with Andy that we should not overhype and when statements are made, back them up with fact and proof points. That is why I replied to @Tom when he challenged Asia Online's claim of being able to deliver “near-human quality” MT output with actual case studies published or presented by third parties other than Asia Online. This showed clearly where our technology was used successfully and provided actual proof points without any hype. In one case (http://bit.ly/s2KPyq), a major multinational was able to deliver English->Chinese MT output that beat their first pass human translators and Sajan, the LSP that we worked with to create the fully customized engines for this client is on video record with their presentation at the LRC conference in Limerick earlier this year. <br /><br />@Gavin "There was a comment made in another post which suggested that anything less than a hand tailored suit was not worth having." The source of the sewing machine and tailor metaphor was Asia Online’s recent presentations on ROI (http://bit.ly/t7f7wf). Not once did we say what @Gavin has suggested. It seems @Gavin has not reviewed our presentation (slides and video) properly or he would have understood what was meant by our reference in a bullet to “off the rack suite”. In the presentation, we refer to the “off the rack suit” as representing Google and Bing. A DIY or self-service solution is not the equivalent of a rack suit. We point out in the presentation that a DIY or self-service solution is the equivalent of being presented with a sewing machine and a bunch of fabric and then being told to make your own suit with it. <br /><br />Continued in next post...Dion Wigginshttp://www.asiaonline.netnoreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-7941742183145862522011-12-11T03:08:28.462-08:002011-12-11T03:08:28.462-08:00@Gavin and @Andy, - I guess I (like others) are ha...@Gavin and @Andy, - I guess I (like others) are having trouble reconciling some of the mixed messages emanating from your company. They are confusing and I personally feel misleading in many cases due to either conflict, lack of clarity or over simplification. Examples provided below within quotes are taken verbatim from recent discussions:<br /><br />@Gavin: "Firstly I think it is worth saying that I completely agree with the fact that a completely customized SMT engine with lots of work on data cleaning will always give the best results."<br /><br />@Andy: "with self-serve MT, clients without the necessary MT and computing expertise to install Moses themselves, have for the first time the ability to build an MT system based on their own user requirements pretty much instantly."<br /><br />@Gavin: "The point and click type solutions, ourselves included are not suggesting that it will be as good as full customization." <br /><br />So if I am to understand the message coming from your company correctly it is:<br /> "We process the data that you upload quickly to make an MT system – we call this “self-service” because the only thing you need to do is upload the data and you do not need MT or computing expertise to build a system. However the fact is that a completely customized SMT engine with lots of work on data cleaning will always give better results than our systems.” <br /><br /> Another way to say this more simply would be:<br />“The best MT systems are always those that are fully customized.”<br />“The best MT systems are always those that have been built through a focus on the data.”<br /><br />Pretending for a moment that an “upload your TM, 1 click and you have a fantastic MT system almost instantly” approach actually was realistic, is it must then be realistic in this hypothetical scenario that the LSP has all the knowledge necessary to answer the following questions?<br />1.What is the right data to upload for my MT system?<br />2.How should I prepare my data?<br />3.What cleaning can I do that the magic 1 click button does not do?<br />4.What impact will my data have on the MT system? <br />5.Will the data I upload improve or decrease quality? <br />6.What will mixing data from multiple domains do to my MT system?<br />7.Should I add some or all of the TAUS data to my system? <br />8.Once I have a system, how can I make it better?<br />9.When I see an error in my MT output, how can I know the cause of the error?<br />10.When I see an error in my MT output, how can I fix the error?<br />11.….<br /><br />I could keep listing many areas of knowledge that builders MT engines using any MT technology will need to know in order to build a quality engine and why a “1 click instant MT” approach will always deliver lower quality. Of course it is not realistic for the LSP to possess all this knowledge – at least not without training and experience. Just to be clear – this is experience the LSP needs, not the MT provider. Several have been very quick to caim their own level of expertise, but this is no substitution for the LSP having their own expertise and knowledge. It does not matter how experienced the team is at the MT provider if the LSP has not concept or knowledge of what happens when they upload a specific set of data. In the report “Study on the Impact of Data Consolidation and Sharing for Statistical Machine Translation” (http://www.asiaonline.net/resources/reportID4523.aspx) this is analyzed in great depth and shows conclusively that the wrong data can have a vastly negative impact on engine quality. <br /><br />I do agree with @Gavin that a fully customized system will always give the best results. However, were I disagree is in my position is that knowledge is required – even in a single click approach, rather than the opposing view point put forward by @Gavin and @Andy that knowledge is not required and a single click will deliver a fantastic MT system.<br /><br />@Dion in a response to @Andy earlier in this blog post “Surely you are not promoting that your 25 years of experience is encapsulated behind a single button that makes instant high quality MT systems.”<br /><br />Continues in next post...Dion Wigginshttp://www.asiaonline.netnoreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-28716639261845067732011-12-08T03:10:28.337-08:002011-12-08T03:10:28.337-08:00@Dion It is not unrealistic, on the contrary it is...@Dion It is not unrealistic, on the contrary it is very realistic!Gavin Wheeldonhttp://www.smartmate.conoreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-40771535554704766052011-12-06T22:11:43.046-08:002011-12-06T22:11:43.046-08:00@Manuel
I am not sure what war you are referring ...@Manuel<br /><br />I am not sure what war you are referring to. What I see is a maturing of the industry. In fact, what I see is that the industry has passed a tipping point in the year 2011, with a great many steps forward:<br /><br /><br />1. Google starting to charge for MT, which led to Microsoft starting to charge for MT and many of their users looking to alternatives. If they had to pay, they wanted better quality and more control.<br />2. Google and Microsoft charging for MT was seen as a sign of maturity that a technology that was once viewed as a joke previously may now be worth investigation.<br />3. As investigation grew, companies began popping up to take advantage of these new opportunities. Some packaged the work of others to make it easier, while others built their own MT solutions and invested heavily into R&D.<br />4. Proof points and case studies are began appearing in the market more than ever before – for both failure and success<br />5. With the wave of investigation and case studies, came new hype, new discussion and a natural pushback from various sources that one model is better than the other. <br /><br /><br />What is happening in the MT industry is far from a war. On the contrary, there are clean signs of a rapidly maturing of the industry as was noted in our recent newsletter (http://www.asiaonline.net/newsletters/201110.htm#3). This is a good thing for industry players and end users of MT.<br /><br /><br />Additionally, I do not see Kirti’s post as biased. I see it as an open exploration of what is really and truly required to build successful MT systems. Other than some DIY/Self Service promoters, nearly all comments are in alignment with this, not flaming or in disagreement. Total Cost of Ownership (TCO) is being investigated and Return On Investment (ROI) metrics are being sought. In order to determine these, honesty is required from the market players, where the real effort, time and costs are expressed to the market. This is what Kirti and others have been exploring in this blog post. Through this exploration, issues are being identified and then addressed. The value and issues encountered with a DIY MT installer is one such example. As @Tom pointed out in one of his posts “Mastery takes time… time to learn, time to practice, time to fail, time to experiment and time to contribute.” All of these come at a cost. <br /><br /><br />As I said to @Tom, omission of this information by DIY/Self Service promoters (thus far) may not be deliberate, but now that this message has been raised by Kirti, Alon and others, there is no excuse from the DIY/Self Service promoters for omitting this important information in future. Now that the realities are being discussed openly, continued failure to make prospective customers aware of these issues would truly be misleading.<br /><br /><br />With respect to your last point about “how long initiatives/companies will last”, again this has been taken out of context. @Kirti said “The learning curve for this technology is long and arduous and it may take a while for the dust to settle from the current hype, but I fully expect that by December 21st, 2012 it will be clear that expertise, experience and knowledge does matter with something as complex as Moses.” This does not in any way refer to the companies providing these technologies or individuals driving these initiatives within those companies such as yourself. Nor does it refer to how long any such company will exist. It refers to the end users, who are the actual users of these tools, finding out for themselves the appropriate skill level, knowledge and expertise required through actual real world experience.<br /><br /><br />What surprises me is that some DIY and Self Service MT promoters seem to be afraid to discuss these issues and instead convey the unrealistic message of “upload your TM, 1 click and you have a fantastic MT system almost instantly”. As Kirti mentioned, this is claimed in TAUS videos and even in one of the responses to this blog entry.Dion Wigginshttp://www.asiaonline.netnoreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-19251330354893520702011-12-06T00:19:36.796-08:002011-12-06T00:19:36.796-08:00@Manuel
I think you are confusing my comments reg...@Manuel<br /><br />I think you are confusing my comments regarding the unrealistic expectations some LSPs have about Moses instantly delivering value, as criticism of your capabilities. If you look more closely at the post, you will see that the bulk of my comments are directed to users (especially LSPs) not to the tools developers. I am aware that most DIY Moses developers understand what they themselves are doing, but, many of their customers have very little understanding of what SMT involves. My point is that the path to good MT engines that provide business value, involves more than pushing a few buttons in a DIY Moses offering.<br /><br />There were in fact some DIY Moses presentations at TAUS 2011, that did in fact promise that usable MT systems can be produced almost instantly. This is the hype I refer to. Check out the videos – you will see some do say this.<br /><br />The Adobe and Autodesk efforts referenced by several here, ignore the fact that their team members are sophisticated technical users, or in the case of Adobe actually have NLP experience. Thus their efforts cannot be equated to what an average LSP might do.<br /><br />Moses can work for an LSP, but like every other MT effort, requires knowledge, data and some experience and A STEERING EFFORT. The tools are only as good as the people using them, having a sewing machine does not automatically make you a tailor. I am only claiming that pushing a button that sets Moses into motion, without understanding how it works and what might happen is not likely to give you MT engines that improve translation production.Kirti Vasheehttps://www.blogger.com/profile/16795076802721564830noreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-57932381954763095262011-12-05T19:03:26.051-08:002011-12-05T19:03:26.051-08:00Ljubomir Lukov •
Another good analysis on the bl...Ljubomir Lukov • <br /><br />Another good analysis on the blossoming of the DYI Moses engines we see these days. Still, I think that much of the success on the MT tools depends on the people who work on their output and their motivation to improve the text quality.Ljubomir Lukovnoreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-22440259853872905072011-12-05T10:01:35.972-08:002011-12-05T10:01:35.972-08:00when I wrote about "The beginning of the MT w...when I wrote about "The beginning of the MT wars" in April 2010 (http://blog.pangeanic.com/2010/10/04/final-dominance-by-final-technology-the-beginning-of-the-mt-wars-ii/) , I did not expect that the "wars" would come in this shape.<br /><br />I find the original entry too opinionated and biased Kirti. LSPs like Pangeanic, who have run the full circle of learning, adopting, implementing and exporting the technologies would not put their money in operations if the technology had not been tested.<br /><br />You make a case for AO working closely in a quality feedback loop with translators and LSPs, but it is precisely those LSPs which develop tools around the Moses kit, they deserve public scorn - despite having gone through extensive product testing. I know everyone else who has replied here has done proper product testing, too.<br /><br />I do not know of any company who claims "instant DIY Moses deployment" without an initial level/stage of customization. We do claim to provide the tools to clean and update the engine(s), which is something any good data-driven MT implementer has developed. Other initiatives provide a harness on Moses (Adobe, see their ppt in Xiamen and the EU-sponsored LetsMT). <br /><br />Personally, I spent a significant part of my life working with machines that made machines people drive (your analogies to the car industry) - so I am a big fan of automation. <br />I do believe the future is in empowering the community and what we are witnessing here is a case of professional jealousy or conflicts of business models.<br /><br />About the comment about how long these initiatives/companies will last, we plan to stick around for a pretty long term. Our business model is not so much output-driven (sell words cheap to LSPs) but about empowering them with the tools we test and develop.Manuel Herranzhttps://www.blogger.com/profile/12181996112717483992noreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-71079800831284127972011-12-05T06:29:53.382-08:002011-12-05T06:29:53.382-08:00Thanks for getting to the heart of the matter Kirt...Thanks for getting to the heart of the matter Kirti and Dion.<br /><br />And indeed this is where we believe TAUS has and will continue to play a pivotal role - working hard to ensure the informed use of translation automation.<br /><br />This is what Jaap and Andrew began fostering when large organizations first shared their experiences at early TAUS meetings in 2005.<br /><br />It's what reports like Manager's Guide to Implementing Open Source SMT (http://tinyurl.com/c4pccvg), How to Implement Open Source MT Solutions (http://tinyurl.com/bpxtbfl), among many others (http://tinyurl.com/bpkps8r) are aimed at.<br /><br />It's also why we began providing workshops (hands on tutorials) a couple of years ago (http://tinyurl.com/bux46ch). Something we will probably (at least partially) move online next year to open up access for many more. <br /><br />It's what the members' inspired TAUS Tracker (http://taustracker.com/) - a set free translation and language directories, which enables users to provide feedback on a myriad of tools - is all about.<br /><br />Not to mention TAUS Labs (http://tauslabs.com) where we begin to work with members to operationalize our and members' visions of better applications of dynamic quality, interoperability and open source MT.<br /><br />You are all no doubt aware of our events at which so many have shared (http://www.youtube.com/user/TAUSvideos)<br /><br />All the above seek to help tackle the skills and knowledge gap.<br /><br />TAUS Data Association (http://tausdata.org) is of course aimed squarely at opening up data - for the long haul.<br /><br />No hype, no slight of hand - just plain insights. This will remain a nuanced field for sometime. We believe that by sharing knowledge and resources we will all grow our share of a growing pie.<br /><br />It's great to see that TAUS members are contributing to this discussion thread!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-3610950305472029622011-12-05T05:53:49.758-08:002011-12-05T05:53:49.758-08:00@ Mr Wiggins. Got it.@ Mr Wiggins. Got it.tahoarhttps://www.blogger.com/profile/06893656133001786619noreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-36708078906663356622011-12-04T19:43:32.811-08:002011-12-04T19:43:32.811-08:00This is corroborated by your statement (which I ag...This is corroborated by your statement (which I agree with) that “companies considering its benefits should tread carefully, focus on one language pair and learn what's involved, including the possibility to use the technology themselves at their pace in their budgets.” To expect to achieve results at the levels Sajan was able to achieve with a single click solution in a language pair such as Chinese (or any for that matter, we repeated this in other languages also such as Spanish) is naïve. DIY promoters that advocate knowledge and experience are doing their target audience a favor. Unfortunately there are very few that address the real issues. <br /><br />Holding up the Autodesk case study as evidence of what can be done with Moses is good. But ignoring the skill, effort and knowledge that Autodesk put into these systems to achieve this result and making prospective MT users believe they can do the same without the effort is foolhardy simply misleading. Omission of this information by DIY promoters may not be deliberate, but now that this message has been raised by Kirti, Alon and others, there is no excuse from the DIY community for omitting this important information in future.Dion Wigginshttp://www.asiaonline.netnoreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-73023958012081428562011-12-04T19:43:07.238-08:002011-12-04T19:43:07.238-08:00You state in your response that expertise is requi...You state in your response that expertise is required that “exceeds most localization engineers, system administrators and project managers of today’s LSPs.” Along with @Alon’s statement that “Users may be disappointed with what they get from DIY Moses, and more detrimentally, become convinced that that's the best they can accomplish, when in fact letting expert MT developers do the work can result in far better performance results.” Neither you, Kirti, Alon or I are in any way stating that people should not try. What I believe we are all saying is that expertise is needed. If an LSP wants to try, it should acquire the expertise, either by hiring them or from third parties. Those who attempt MT without acquiring the necessary expertise have a higher risk of failure and will be unlikely to achieve the best MT results possible with their data. <br /><br />Using the car analogy, today anyone with a small amount of training can drive a car. But how many of these drivers understand enough about the car to give it a tune-up. Instead the driver takes the car to a specialist. In the past the car was easier to tune. Today like MT, the complexity of a car’s engine is increasing. More sophistication and knowledge is required in order to tune the car to its optimal performance. Often complex and expensive tools and equipment are required to tune the car. Again, this is like MT, with professionals such as Asia Online developing comprehensive tools that optimize and tune the engine and data to give optimal performance. These tools do not ship in a DIY solution and are proprietary, several of which are in the process of being patented. You will recall that a car is supposed to be maintained by a specialist on a regular basis and drivers take their cars to a service center for this. A small percentage of drivers may change their own oil. An even smaller percentage will work directly on their cars engine to improve it. They take their car to a specialist for this. <br /><br />As Alon points out “these [DIY] efforts primarily target and solve the *engineering complexity* of deploying Moses.” … “I think there is a potential pitfall here that users that are not MT experts (the vast majority) would come to believe that that's all it takes to build a state-of-the-art MT system.” The DIY community keeps ignoring this very important point and frequently even pushes against it.<br /><br />With respect to my comment that you quote “I do not think that the market is mature enough to say that one model is out and the other is in... This position presented in this topic is a marketing and sales line, not reality in the market." – this comment was in the context of @Andy’s statement that “Looks like the licence-based model for MT procurement is on its way out in favour of pay as you go”, where he immediately supported his statement with pricing information. The DIY model is one of several and suitable for some and not others. In no way am I advocating that some must not try. What I am stating is that those who do try should be aware of the issues and the challenges and come prepared to address them. Ignoring them will be costly and most likely result in failure.<br /><br />Continued in next post...Dion Wigginshttp://www.asiaonline.netnoreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-13389284390689673482011-12-04T19:41:30.007-08:002011-12-04T19:41:30.007-08:00@Tom
You are correct, we do use the term “near hu...@Tom<br /><br />You are correct, we do use the term “near human quality” in our marketing, but it is far from hype. We have multiple case studies from LSPs and their end customers who agree and are on record agreeing. We even have case studies with some customers where the MT system was delivering better quality than the human translators. The client I am referring to is a major LSP, Sajan. Their client is one of the largest IT company globally and translated millions of words and received 60% saving on costs and 77% savings on time. The client, not the LSP came back and reported that the MT output from English into Chinese was beating their first pass human translators. It is easy for a MT vendor to make these claims and easy for competitors to call them hype. But these claims are verified by both the LSP and a major multinational IT client directly. They are on record here in both slides and video at the LRC conference in Limerick earlier this year.<br /><br />Sajan: “The client told us that the quality of the Chinese language machine translation is better than the human translators for the first translation phase of the TEP (Translate, Edit and Proof) process. In other words, it is absolutely near human and the post editing is only needed. It is absolutely there and that was corroborated by the customer.” <br /><br />Slides: http://www.localisation.ie/resources/conferences/2011/presentations/LRCXVI_Sajan_MT_LRC_2011.pdf <br />Video: http://www.youtube.com/watch?feature=player_detailpage&v=hjK17GWynoU#t=1535s<br /><br />So yes, we can claim and deliver “near human quality”. No it is not hype. When you have a sophisticated LSP like Sajan, who went through the training process with Asia Online and learned how to use the tools and worked with Asia Online to build their data correctly, this is indeed possible. If Sajan has just thrown the data at a server to create an “instant” solution, then they would have not achieved this level of quality and the customer would not have been as satisfied with the result. The reality is that most LSPs are not as sophisticated as Sajan. This is where Asia Online works even closer with the LSP to build the system and train the LSP and work with the data. <br /><br />Sajan’s experience was not a fluke. We have many other case studies also. In a few days from now the latest Asia Online newsletter will be sent out. Inside is another example of “near-human quality”, with the LSP (Hunnect) going on the record about their experience with Asia Online. Their language pair is English into Hungarian, which is known to be particularly difficult for MT. On Hunnect’s very first project they were able to save 46% on time and increase their profit margins from 25% to 45%. Hunnect also put the effort in and worked with Asia Online to understand how to build high quality MT systems. They even created an online training course for their translators. <br /><br />I think the analogy that you provided does not fit at all. The automobile hides much of the complexity of its inner workings. Even the engine, you buy as a complete unit. You don’t buy it in small bits. DIY Moses is the equivalent of packaging the engine as a complete unit. It still requires the user to learn how to drive once the car is assembled. Kirti is pointing out the challenges that are required in order to deliver a system. “learning how to drive” your MT system is not as easy as learning how to drive your car. It may be in time, but the reality is it is not simple today. Any driver can put gasoline in the tank. But what if the driver is new and puts kerosene, diesel or perhaps even water in the tank instead of gasoline. This is a more pertinent analogy. In Kirt’s messaging, he is not prohibiting people from doing trying to build their own MT systems. Rather he is ensuring that they are aware of the challenges. If they wish to attempt the challenge with this knowledge in hand, then they will have a better chance of success. <br /><br />Continued in next post...Dion Wigginshttp://www.asiaonline.netnoreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-30605243948960212872011-12-04T19:14:01.001-08:002011-12-04T19:14:01.001-08:004. SCAREMONGERING
You state that “This article ad...4. SCAREMONGERING<br /><br />You state that “This article adds to that scaremongering; the way forward is not to keep the doors closed and say that the only way in which people can access state-of-the-art MT solutions is by leaving it all to the experts who know better: that's real arrogance, IMHO ...”<br /><br />Kirti does not state that it should be left to the experts such as those with 25 years’ experience like you. What he makes the point of is the many things you will need to know and the skills you will need to acquire and the tasks you will need to perform. The DIY promoters, such as yourself, pay little attention to these issues and instead keep saying messages that are misleading and overpromise or “overgilding the lily” by not informing the potential users of the technology of the full picture of what is involved. <br /><br />By educating potential users of issues and things that will help to make them be more successful in their DIY efforts, the chances of projects using DIY approaches for success increases. On the contrary, this is the opposite of scaremongering. Brushing them under the rug or ignoring them will only come back to bite in the future. <br /><br />With a clear understanding of all that is involved and what systems can and cannot do, the prospective user of MT (in any form) can make better informed decisions.Dion Wigginshttps://www.blogger.com/profile/06923497470039599497noreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-742511942320892772011-12-04T19:13:15.905-08:002011-12-04T19:13:15.905-08:00Directly addressing the point of translator empowe...Directly addressing the point of translator empowerment (perhaps the term “enablement” is more clear), this should be in the form of tools that make their job easier. One of the criticisms of MT is that it makes the same mistake over and over. <br /><br />This is where we have focused on a rapid improvement process that takes direct translator feedback. As Kirti points out, the DIY solutions require huge volumes of data, as do solutions from SDL and others that take the dirty data SMT approach. We recently replaced a SMT system provided by SDL that had very little improvement over a 2 year cycle. This was despite several hundred thousand dollars of expense in post editing feedback being provided to improve the engine. In the end, the client was told to provided even more data to get any real improvements. Naturally the client was disillusioned with MT and the promises made. <br /><br />We knew going in that we had a challenge with this customer’s perspective of MT. We did a pilot with our unique approach to clean data and the quality was immediately better from the very first system. We extended the pilot and took in editing feedback and got significant improvement within weeks. This is the area where translators are empowered, getting rapid improvement from their editing efforts and providing the means to control the quality of future translations. This case study will be presented in one of our webinars shortly. <br /><br />Our approach focuses on enabling greater productivity of the translator, not empowering them to have ownership of the entire translation process. In our approach, every edit counts and improves future quality. If the data that the engine is trained on is not right from the outset, this approach will fail. It is all too easy to mistake the ability to train with the ability to improve quickly. As Kirti correctly states, many of our competitors tell their clients they will need to add the equivalent of 20-25% of the initial training data volume to see meaningful improvements. Given this reality, being able to train often is no substitute for being able to improve quality often and at meaningful levels. <br /><br />Continued in next post...Dion Wigginshttps://www.blogger.com/profile/06923497470039599497noreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-43850554200045854452011-12-04T19:10:46.871-08:002011-12-04T19:10:46.871-08:00So why did these people take this path? Was it sim...So why did these people take this path? Was it simply because it was there? Perhaps they were excited by the promise of “custom” MT and the promise that a DIY solution would magically make up for 25 years of experience. This is the false promise that is being portrayed by many of the DIY or one click solutions. Kirti’s blog post provides a simple explanation to those considering going the DIY path to understand what they are going to need to do and understand in order to be successful. By not providing this information, the promoters of DIY are leaving a gap in the full story of what is required to customized MT. The fact is that software alone is not enough. Throwing random collections of data at this software is also not enough. Any MT vendor that claims it is as simple as uploading your data and having an instant MT system that is customized to a customer’s needs and high quality is misleading their customers. I am surprised that the DIY tools promoters are not more up front about this as it only helps the LSPs and others who “customize” MT with these solutions have a better chance of success. <br /><br />If you follow Kirti’s blog back through the various articles he has published, you will find that he often raises key issues that are pertinent for the industry to understand that few others have addressed. This is one of them. Having a trail of MT failures helps no one and follows in the path of MT history – 50 years of empty promises. Discussing issues with and informing those who are considering the use of MT for their business is the right thing to do. By not making clear the challenges and issues, it only increases the chance of failure and frustration, while increasing the already existing perception of many that MT cannot deliver. In a LinkedIn post you stated “I'm an MT 'veteran', but new to the industry. To me, we don't do ourselves any favors by overselling our capabilities, or by failing to acknowledge, what is after all, fairly recent MT history” and this leads into an article about “Overgilding the lily”. By not fully informing potential users of MT of any form, but especially users of DIY MT of what they are engaging in, this is the exact kind of hype that you spoke out against.<br /><br />In terms of delivering better performance or quality translations, the reality is that even some of the biggest LSPs do not have the data volume to do so. We work with many LSPs from large to small and deliver quality solutions. Many of the top 50 LSPs globally simply do not have enough data in a single language pair and a single domain to achieve their quality goals and as such frequently fall short on quality. Many try to pool data from multiple customers and that often makes things worse, not better. They then download data from TAUS and sometimes this helps, sometimes it makes things worse. We did an in-depth study with TAUS some time back and showed clearly how lower quality data from one source and have a significant negative impact. This report is titled “Study on the Impact of Data Consolidation and Sharing for Statistical Machine Translation” and can be downloaded from http://www.asiaonline.net/resources/reportID4523.aspx. This is a comprehensive report with 29 different SMT engines created with different combinations of the same data. The results speak very clearly for themselves that clean data is essential and that outside data can do more harm than good if it is not managed properly.<br /><br />The reality is that many LSPs would like to do SMT, but most lack the experience in how to work with the data. Providing easy install software does not change the experience level or increase knowledge. Most LSPs claim their TMs are very high quality. Yet, they forget that each project they do has different quality goals and budget. Often LSPs do not manage their TMs very well. This results in mixed domain data and even mixed languages. Asia Online’s data cleaning process typically rejects 20-40% of TMs that an LSP sends us. This is fact, not an opinion. <br /><br />Continued in next post...Dion Wigginshttps://www.blogger.com/profile/06923497470039599497noreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-89098350202132453662011-12-04T19:09:36.923-08:002011-12-04T19:09:36.923-08:00Surely you are not promoting that your 25 years of...Surely you are not promoting that your 25 years of experience is encapsulated behind a single button that makes instant high quality MT systems. As a person with 25 years’ experience, you would know the example above from research published by others and probably even tried it yourself. Yet, this is one of the simplest examples where linguistic knowledge, an understanding of SMT concepts and technical programming knowledge are required. This level of knowledge cannot be expected from the average LSP. In fact it cannot be expected from almost any LSP. At Asia Online we work with some of the largest LSPs in the world and I can tell you as a matter of fact that they do not have these skills. Some LSPs are certainly capable and could be trained or in some cases we are training them directly. However the vast majority of LSPs do not and will not ever have these skills. <br /><br />The reality is that most do not want to have these skills. The metaphor of a car comes back to mind. In this instance the driver does not need to assemble the car from parts and does not need to gather the parts from places around the globe or scavenge in the trash can and local tips (equivalent of crawling the web for dirty data SMT) for parts. Instead, they buy a mature technology designed to give them a driving experience. They may customize with accessories to make it their own, but they are not starting from the requirement that they must understand how the car works in order to drive it. They are starting from the perspective that trusted professionals have built the car as part of a production line. <br /><br />2. ACCESS TO CUSTOMIZED SMT SOLUTIONS<br /><br />You make the point that for most people access to customized SMT solutions is pretty much nil. This is not accurate. Moses has been around for many years. Putting a tool over Moses that makes it easier to install makes it more accessible for less technical people. But all this has really achieved is lowering a technical barrier at the time of installation. The issues faced with data quality and quantities have not changed just because you have a smoother install process for the software. There have been thousands of downloads of Moses by companies and academics alike. But making the install process easier is a tiny part of a huge challenge.<br /><br />The real question should be “do you need to do this yourself or should you engage a professional?” There is a DIY home improvement model also. Many may have tried this. Some succeed, but many are not satisfied with the results. Due to mistakes, learning curve and lack of experience, most projects are less than satisfactory and can even cost more than if a professional with experience had been engaged in the first place.<br /><br />The analogy that Kirti provides of the suit and the sewing machine makes a point very clearly. If you have the tools (sewing machine / Moses), it does not make you an instant expert in use of the tools. You need training, experience, skill and knowledge. You can take an off the shelf suit (Google or Bing) and have some level of control. <br /><br />We have talked with many who don’t have the experience and go ahead and “customize” their MT with Moses and their data. They don’t know how to prepare the data properly. They don’t have all the knowledge listed above. They don’t even know if they have enough data (usually they do not) and they don’t know which data should be used for best results. Naturally they do not get the full potential of their system, even with their “custom” data. Often they buy expensive hardware and spend months trying to make things work. This is despite using a DYI software installer for Moses or other translation tools. They find out the hard way that the data is just as important, if not even more important than the software. So they are no better off with their “custom” SMT system than when they started. <br /><br />Continued in next post...Dion Wigginshttps://www.blogger.com/profile/06923497470039599497noreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-52478493781517854272011-12-04T19:08:00.000-08:002011-12-04T19:08:00.000-08:00@Andy, I think you are taking Kirti’s words and us...@Andy, I think you are taking Kirti’s words and using them out of content. My read of the above is that you are actually both in agreement with each other.<br /><br />1. SKILLS AND KNOWLEDGE<br /><br />Kirti says “The people who build SMT engines at Microsoft, Google, Asia Online and other MT research teams have built thousands of MT engines during their MT careers. The skills developed and lessons learned during this experience are not easily replicated and embedded into open source code. “ <br /><br />You talk about your 25 years’ experience and the experience of your colleagues. Surely that would classify you in the “other research teams” group mentioned by Kirti.<br />Kirti’s point is that “To expect that any “instant” Moses solution is going to capture and encapsulate all of this is naïve and somewhat arrogant.” In other words, it is naïve to think that it is possible to capture and encapsulate the equivalent your 25 years of experience simply by clicking a button. <br /><br />So in this instance, you are both saying that you need experience to do MT properly. <br /><br />I agree with both you and Kirti on this point. You certainly have the experience to understand and build MT systems, as do people like Asia Online’s Philipp Koehn and Hieu Hoang, 2 of the main developers of the Moses decoder. With resources such as Philipp and Hieu on our team, we have a deeper understanding of what Moses can and cannot do than most. Without these skills and deep understanding of SMT, we could not deliver the quality systems that we have for many of our customers. <br /><br />The reality is that using Moses alone or DIY Moses solutions is a small part of the overall SMT challenge. Your 25 years of study, research and knowledge is not encapsulated in an instant click of a button. Building a high quality MT solution requires software such as Moses, but that software will not deliver without experience and knowledge, especially in the context of data. <br /><br />The naivety that Kirti is referring to would be the equivalent of installing Microsoft Word and expecting that you are instantly a good writer and can author a best-selling novel simply because you have installed word processing software. To be a good writer, you need training and experience. Similarly, building a high quality SMT system also requires training and experience. Making the installation easier does not make the “art” any easier. <br /><br />An SMT system will not deliver to most of its potential unless the people deploying it have:<br /><br />• Linguistic knowledge<br />• Natural language programming knowledge<br />• Knowledge of how SMT works internally (at least at to a basic level)<br />• Knowledge of what data will work well with SMT<br />• Knowledge of what data will not work well or could even negatively impact SMT<br />• Knowledge of data cleaning <br />• Knowledge of the clients requirements<br /><br />Yes, DIY Moses tools encapsulate some of the complexities of the software, but none of the complexities of data management for building and refining SMT systems. As Alon states, this solves engineering complexity, but leaves much more that is unsolved.<br /><br />In addition, knowledge is required of what SMT cannot do or does not do well. With this knowledge, technical and linguistic skills need to be combined to overcome those limitations. As a simple example, performing linguistic analysis on German when translating into English and reordering the main verb to be more German like increases the quality of the translations. This is relatively simple, but in order to do this, you would need programming skills, linguistic skills and an understanding of how SMT reordering works. As such, it would be naïve to think that this could be done out of the box. The preparation of the training data is specific to the language pair as is the runtime processing of the data. <br /><br />Continued in next post...Dion Wigginshttps://www.blogger.com/profile/06923497470039599497noreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-3860328647829195202011-12-04T18:50:10.296-08:002011-12-04T18:50:10.296-08:00As Kirti is part of the Asia Online team, we talk ...As Kirti is part of the Asia Online team, we talk often about many issues facing the translation industry. Kirti does not write his posts or debate points on a whim. He gives his perspective from firsthand experience, much of which is dealing with Asia Online customers and prospective customers. <br /><br />I commend Kirti for taking his personal time to help the market as a whole to better understand the issues in MT. As this is his personal blog, he does not always discuss with me what he is writing about. When I saw this article, I immediately thought this was going to be one that was provocative as it addresses the issue of skill and knowledge that the DIY MT community seems to brush under the rug a little too easily. I waited to see the response, fully expecting Kirti’s opinions to be challenged, especially by those in the DIY MT community, in particular those that are marketing DIY MT solutions. And I can see from the responses already that I was correct. Already 2 DIY MT providers have come out on the attack. <br /><br />I would like to state for the record that I do agree with nearly all that Kirti has posted in this blog post. As this is Kirti’s personal blog and his personal opinion, I seldom comment on his posts. However in this case, @Andy has commented with information that is taken out of context and states a position of Asia Online that is factually incorrect and misleading. @Tom has also taken my words from another context and used them for a different purpose. As such, I am compelled to respond to set the record straight and clear up any misleading information from these 2 DIY MT promoters.Dion Wigginshttps://www.blogger.com/profile/06923497470039599497noreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-63764417065647150342011-12-04T09:09:39.599-08:002011-12-04T09:09:39.599-08:00@Tom
I think that your analogy is inaccurate and ...@Tom<br /><br />I think that your analogy is inaccurate and irrelevant in this case.<br /><br />I feel that a better way to describe this is: Moses is a sewing machine that can empower people who are interested in sewing or empower professional tailors to produce custom clothes faster and and more efficiently. My point is that owning or having access to a sewing machine does not make you a tailor. Even when somebody gives you a pattern. You still need interest and some knowledge.<br /><br />While some amateurs can definitely learn to sew and produce professional looking clothes after they get some experience, professional tailors are the ones who are most likely to make the best custom fit clothes, and benefit the most from new sewing machine technology. At least for the initial period until the knowledge of tailoring becomes more widespread.<br /><br />Anybody interested in learning to be a tailor is likely to benefit, just as anybody who is willing to make the investment to learn about SMT/NLP could benefit from Moses. However, simply being able to push the button to start Moses (without understanding anything else) is far from this, and in most cases is unlikely to produce anything of real business value as I stated in the post.<br /><br />Knowledge matters. Experience matters and learning takes time.Kirti Vasheehttps://www.blogger.com/profile/16795076802721564830noreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-438973696031948222011-12-03T07:37:42.782-08:002011-12-03T07:37:42.782-08:00I couldn't agree more. SMT is a complex domain...I couldn't agree more. SMT is a complex domain. However, I think I missed the "hype" in the DIY crowd claiming "near human quality." Oh! For fair and full disclosure, my company owns and distributes DoMY, the first package distribution of all Moses components.<br /><br />The Moses Toolkit is comprehensive in its abilities. However to achieve excellent results across all possible language combinations requires expertise that exceeds localization engineers, system administrators and project managers at most of today's LSP's. Companies considering its benefits should tread carefully, work with one language pair and learn what's involved, including using the technology themselves at their own pace and in their own budgets.<br /><br />However, I believe a 2-5 year outlook is over-simplistic and shortsighted. To say that only academics and NLP/computational linguists can unlock the secrets of SMT or should experience the technology in that period smacks of the stereotyping at turn of the 20th century regarding women drivers. Are there similarities to a 1998 paper titled, The Automobile and Gender: An Historical Perspective, by Martin Wachs of the University of California, Berkeley?<br /><br />"While women who drove in the first decades of the century were assumed to have at least some interest in the mechanical properties of automobiles, during the twenties in order to preserve the boundaries between mens’ and women’s spheres it was increasingly asserted that women lacked interest in or aptitude for mechanical devices."<br /><br />Here we are 110 years later. Women driver jokes abound in some circles but like the stereotyping, and many consider to be in poor taste.<br /><br />In a recent linkedin.com discussion about "licence-based model for MT procurement" by Andy Way, Mr Wiggins chimed in "I do not think that the market is mature enough to say that one model is out and the other is in... This position presented in this topic (discussion) is a marketing and sales line, not reality in the market." -- dittotahoarhttps://www.blogger.com/profile/06893656133001786619noreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-65663328068894301642011-12-02T12:43:49.697-08:002011-12-02T12:43:49.697-08:00@Andy
Thanks for your comments. You are right, i...@Andy<br /><br />Thanks for your comments. You are right, it’s only a blog and only my opinion and I guess we have different views on the value of DIY Moses.<br /><br />If you look at the post more closely, you may see that my comments are directed at casual users of instant/DIY Moses solutions, not the developers of the tools. I do not question your (team) competence and credibility in any way, and clearly you have much more experience with MT than I do for certain. I am sure all the other developers are also competent to use the tools themselves.<br /><br />However, I remain skeptical that a casual user (an average LSP or a power translator) who has little or no understanding of SMT/NLP is likely to benefit from throwing in his bucket of data and producing an instant engine that will beat what he could get for free. I am even more skeptical that they will know what to do next. I am saying that it does matter that they understand the technology to some extent, and that this understanding and informed actions thus will produce better systems. I am also saying that working with experts will produce the “best” systems in terms of long-term business value which you can of course disagree with.<br /><br />To the extent that your tools help create superior systems, I expect we will see long-term use, as long as these MT systems do in fact improve the throughput of translators/post-editors using them. It is possible I might be wrong in my assumption, but at this point in time I would bet I am right.<br /><br />As far as empowering translators, I think that train has left the station. GTT allows individual translators to customize already usable MT engines instantly, for free, and has been allowing this for years now. The fact that they might keep and use your TM data for years after, does not seem to stop translators, and I would not be surprised if tens of thousands translators use the public engines regularly as part of their standard work process.<br /><br />DIY Moses is not equivalent to empowering translators in my opinion. Of course any and all opinions can be wrong, and if I see evidence to the contrary I will admit I was wrong.<br /><br />We live in a world where complexity is around everywhere. In medicine we have reached a point where no individual doctor can handle or even know all the diagnosis codes (12,000+ by the way). Specialization is necessary and specialists have to have work together with general family doctors to solve patient problems. <br /><br />In the same way, it is my opinion that specialists (MT experts, LSPs, Translators and linguists) working together are likely to produce much better results, especially in the long-term. And it is my opinion that people who go down this path are likely to get the maximum benefit from MT and produce the “best” and “most effective” systems in terms of business value and ROI.<br /><br />Honestly, I am not even sure that everybody at Asia Online agrees with me, these are really only personal opinions. I speak as an individual observer and I do understand that my views are not completely impartial.Kirti Vasheehttps://www.blogger.com/profile/16795076802721564830noreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-15123900639763865042011-12-02T05:14:56.329-08:002011-12-02T05:14:56.329-08:00Hey, what a lovely plea for quality in the impleme...Hey, what a lovely plea for quality in the implementation of MT! I'm sure that none of us translator "quality wimps" could have phrased it more eloquently! <br />Sorry about the sarcasm spawned by my rather strange British sense of humour. If you come looking for me you'll find me hiding under the table! ;-) ;-) ;-)Victor Dewsberyhttps://www.blogger.com/profile/18342577630994069368noreply@blogger.comtag:blogger.com,1999:blog-6748877443699290050.post-5278549562368561402011-12-02T03:50:36.339-08:002011-12-02T03:50:36.339-08:00Kirtee,
I take exception at pretty much all that y...Kirtee,<br />I take exception at pretty much all that you say here. Far from being "naive" or "arrogant", ALS' self-serve MT platform SmartMATE is founded on a group of MT expertise which you would be hard-pressed to beat anywhere, even in Asia Online. I've been working on MT for nearly 25 years now, and have over 200 peer-reviewed published papers. My colleagues, including Jie Jiang, Sergio Penkale and Rejwanul Haque, and I have a massive amount of MT experience with Moses and many other systems, and I assure you we know exactly what we're doing. <br /><br />Your article ignores completely the fact that for most people, access to _customised_ SMT solutions is pretty much nil. This is where your analogy with Google & Microsoft fails; with self-serve MT, clients without the necessary MT and computing expertise to install Moses themselves, have for the first time the ability to build an MT system based on their own user requirements pretty much instantly. My bet would be that the vast majority of those engines would deliver better performance than Google or Bing.<br /><br />Asia Online seems to be the only company on the Language Technology LinkedIn discussion who think that empowering translators in this way is a bad thing. This article adds to that scaremongering; the way forward is not to keep the doors closed and say that the only way in which people can access state-of-the-art MT solutions is by leaving it all to the experts who know better: that's real arrogance, IMHO ...<br /><br />If you really want me to, I could go through your article and disassemble your arguments one by one. I know it's only a blog, and your opinion, but I think most people will see through it for what it is. You can expect many more flames on this, for sure ...<br />Andy.Andy Wayhttp://www.appliedlanguage.comnoreply@blogger.com