TAUS Post-editing Webinar for Spanish language module

this just quickly a couple of words about these webinar series we have started at taos well first of all the outline of this webinar so I mentioned we I will give you an introduction on the taos post-editing course and on the the certification around it and then we will have a Q&A section an informal section with the Lucia and Martha from CPS L they will share their experience on post editing and also tell you a little bit more about how they work with machine translation at cps l and at the end you will have a chance to ask questions so why a webinar well this webinar will introduce post editing as I mentioned we would like to have the series as a kind of discussion platform about best practices offer an overview of available tools and methods for post editing and each webinar will have a language focus so today we have spanish we invited speakers from from a Spanish LSP and and each time a speaker or a number of speakers will elaborate on post editing problems for that given language and and yeah participants can ask questions or bring up different topics related to post editing to be discussed in in further sessions okay so the objective of the webinar is to to address the challenges for for linguists who are deciding to work on post editing and and I would like to say a couple of words about the post editing course which has been lounged around Easter time so in April it’s basically a common reference to to develop skills and best practices for successful post editing and we have already a couple of positive testimonials and a number of good feedback on the course so number of people already did the course and we are happy that that we we gave the receive so many positive feedback that taos positing course outline so a couple of words about the making of the course how it was created it it has been a collaborative industry initiative so it wasn’t only Taos involved in creating the course but we have done it with a couple of members and and also non member companies it was coordinated by Taos with a close cooperation with v localization and connect and we have received contributions from from different LSPs here are the the LSPs that have been involved in the first phase creating language assignment for rabuck from arab eyes cpsm contributed to the spanish language assignment we had the global text where for dodge hammock for hungarian and version international for french so a little bit more about the course the courses in English so why do I talk about language assignments because there are language specific assignments which are implemented in a course so there are two big assignments in the first one you have to evaluate empty output in different languages Arabic Dutch French Hungarian and Spanish and we are busy adding new languages rethink that theory is not enough you need to do post editing real-life post editing to be able to gain knowledge and and yeah to be an to become an experienced post editor so we thought we will then the 10,000 words will be a good start to Train post editors so we are also adding Japanese Korean Italian turkish-german polish this summer and yes and we are

hoping to add even more languages in the future so the course is made up of six modules including quizzes to longer assignments as I mentioned and there are there’s also after at the end of each module a couple of pages with relevant literature so if you want to read on you can consult the references section around six hours to complete the theoretical part the practical assignments are can take longer depending on the level of experience of the participant and it’s of course post-editing course but we wanted to put post editing in a broader framework so we also giving lots of information on machine translation different types the history of machine translation how to set up a machine translation project so to introduce this subject gently to the participants also some facts and trends on on post editing are included in the course and just some more points there is a glossary with post-editing related terms and definitions as i mentioned we have this language assignment so you you have the chance to to yeah real to do real life post editing of empty output and we have language specific well I mentioned language specific module where you will be able to address the particular errors of a given language so here is the structure of the course first an introduction the different types of empty systems how to evaluate empty system styles is also has also developed a quality evaluation framework DQ f dynamic quality framework which also features in in the post editing course in this third module so you get also introduction to to this state to the state of the art evaluation platform module 4 is about concert language and preheating which is well of course very much in connection or related to the quality of the output you can expect from an empty engine based on on this knowledge you will be able to yeah predict or or well assess the quality of the the empty output post-editing is the well the main subject of module 5 here you will get all the tips and tricks on post editing and finals module 6 will give you some background information how to set up an empty project how to organize the post editing project at your company just a quick example of how for example more you five looks like there is an overview there are definitions the different skills and competencies are explained how to provide feedback to for instance the empty engineer at your company some instructions on how to complete the post editing exercise every module contains a summary and references who is the positive course aimed at I could say to everyone but especially to people interested in translation automation and mainly linguist translators students and researchers and also employees of different LSPs project managers terminologies language technologists engineers so even people who are not directly involved in post editing in practice but would like who would like to learn about this this topic for their job so the benefits for for certified post editors they will

receive the skills necessary to provide high-quality content they will gain thorough background knowledge on own machine translation m own post editing and they will receive a post editing certificate upon completion and there are also benefits for companies so we are offering the course also as a group subscription so and company would be able to take a group subscription for 10 in-house translators for instance and let them do the post editing course which means your translators will get certified for the market they will learn to use the right tools to to gain higher productivity it will also give greater understanding among translators of the benefits of empty but also the risks of using an empty engine and in the course and the certification will give a common reference for pricing duration of a positive in projects and the quality of these projects so here’s the slide and a picture of the certificate we are offering and and let me just offer you for all participants at twenty-five percent discount on this course so if you would like to do the course and you are on this webinar please take this code and if you feel the same while registering for the course you will get a twenty-five percent discount so I would like to offer this opportunity to you too well to get your certificate for post editing and this is the moment that I would have a look if there are some questions I don’t see any questions right now so I’m now going and unmuting Martha see if it goes well and unmuting lucia and i will also share lucia screen no see are you ready for that yeah okay and I will make you a presenter and you will talk about your experience at MV SPSL with machine translation post editing and then after your presentation there will be a chance to ask questions from Lucien Martha and and I will I have some multi questions so i will also have a to ask them so let me see i will make a presenter and if you want to share your screen yes yes something is coming up and you see it now Attila yes yeah okay okay fine well hello everyone again and i would like to start start talking a bit about cps else approach to machine translation and post editing and because as you can imagine we’ll see p SL even if we have offices in barcelona madrid in Germany we not only translate from english into spanish fit into many other language combinations and well our main industries are the medical and pharmaceutical and also public sector as i mentioned before one of our biggest clients are the european commission and european parliament and also we deal with a text from the transport and the engineering domains and we’ve been here for more than 20 years ago so this is quite a long time and well it all started like four or five years ago we’ve been working with CAD tools for many many years different types of CAD tools depending on the customer and the language combination expectations etc and we have the feeling that we were

very limited by this type of tools because even if you use you know remote translation memories you still have to be you are still limited by the amount of words that a human translator can translate per day or proofread per day so there’s a very specific limitation with this and I think that’s another limitation as well with CAD tools and it’s the fact that they are very useful when there are repetitions at the segment level you know that usually computer-aided translation tool segments in sentences but when there are other kind of repetitions such as the word or collocation levels cut tools are not really useful so um we we also have the feeling and you know as well as me that in the translation sector we are being required to cut down costs and also improve deadlines as much as possible so we thought why don’t we jump into that machine translation why don’t we give it a try and and we contacted a company called tell you I think that was in 2009 or 2010 and this is a machine translation provider and we together developed several several translation machine translation and Giants but as you can see in the slide we we didn’t do that all all together we we started with the first phase and in this first phase we chose very specific language combinations it was from Spanish into the co official languages in Spain which are Catalan Galician and well valença Valencian so many people say that Catalan imbalances are exactly the same but you know that for Spanish public institutions there they are different so we decided it was better to set up a different than giant so decent Giants were hybrid which means that they both combined rule-based machine translation and also statistical machine translation you will learn everything about that holder if you do the course but just to show you what we do and they were they covered the generic domain so we they were trained with any kind of materials and the results were impressive because they were very very good and this language combinations have an advantage is that these languages are very very similar both of the morphological and at the lexical level and this really helps in obtaining good results so in this first phase we created those in Giants and we of course we keep on using them and also fine-tune them but they really require very few fine-tuning for the reasons I commented before it’s just those languages being so so similar we also tried with a different language combination at that time which was english into spanish but unfortunately the results were not so good in fact i would say that the result the the row empty output was almost useless and I think that Marta can elaborate elaborate a little bit more on that because she was involved at that project Marta yes hello honking I can say that the first the first project I was involved with that him that was related to post addition it was actually a software project english into spanish translation and it was quite frustrating because when we saw that the quality of the of the output it was it was horrible so there was an Indian there wasn’t any post addition at all he was a real translation from almost wrong scratch okay it was I don’t know it was like you know you put 50,000 into google translate and that’s what you get it was very similar to that and so we’ve come a long way from from that priority it was approximately 40 years ago and later on we’ll see a few examples of another post addition report and we’ll see the difference I mean the difference in quality from home yeah I think the good thing about machine

translation is that there’s a always a trial and error I mean you can learn from the past and I think that what we did wrong was to train the engine with any kind of contents and based on our experience the more domain related those contents those corpora are the better results you get so as Marta said we’ve learned point quite a lot since since then unfortunately the English to Spanish IT and join now gives much better results and you will be able to see them at the end of this webinar so I’m in the second phase we tried with Spanish to Portuguese and Spanish to French and well I think the first one works better than the second one and just well because of again similarities between the languages but we are trying to improve the results of the Spanish into French machine translation engine with more specific contents I think that’s one of the key of the key strategies and also trying to involve all the older post editors in giving feedback as much feedback as possible because that would allow us to you know to introduce new rules regarding grammar spelling punctuation and anything that can that can help us to to obtain better results next time and there was a big surprise for us the third phase because we think we’ve been working for quite a long time for several companies translating technical patents and we decided to give it a try with these two language combinations English into German and English into French and it was a surprise for us because we thought that it was not going to work because those languages are very different and you know that Germany’s is has a morphology and think that’s very different from the other languages but the results were quite good we are currently working on a regular basis with machine translation in these two language combinations in this particular field which is a technical patent this is very very specific and I think the key to success was again being able to train the engine with very specific contents and and also to involve the whole team and create like feedback templates which they have to send me every time they they complete a job with their suggestions so this this is this is a you can take this as a suggestion if you are planning to you know build a new machine translation enjoying having everyone involved giving feedback and try to be as much specific as possible with with the contents and well as to the domains we are currently using in machine translation as I said before are the generic for the first phase language combinations and for the others we are working with pharmacy techs and also automotive and technical patents it’s really very specific but you know we don’t take this as the end of the road of course not we would like to keep on adding more language combinations and more domains because as I said before we work with many different types of customers and many language combinations and I well based on our experience I see that anything is possible so hopefully in the next next couple of years we will add up more more languages and domains we will keep on working I would also like to mention a very interesting project we did for an international organization we which consisted of evaluating and machine translation system developed by them which was being assessed on on the final quality it was May to translate technical patterns so it’s field we know quite very well and the results were more or less the same at the same as we had with her English to French technical patent in Jane the results were quite good it was also statistical enjoying and with her feedback our client was very happy with because they were able to to improve their system I am sorry I cannot say they cannot mention the name because it’s a rather confidential at this time but I’m sure that we will all know very

soon mm-hmm so well before we go into the into the world yes we can we can go on with the statistics in this graphic you can see how was the evolution of the the percentage of of words translated with machine translation which is the lowest one compared to the total number of words translated per year here at CBS L it might not seem too much but I think it’s very interesting to see the evolution because because it almost doubled from one year to the next one and well hopefully for next year we’ll have we’ll get to 10 let’s see how it goes on and yes before Martha comments on the examples that we that we have shared with Attila and that are part of the post editing course I would also like to talk a little bit how we work with with the post editing team mm I we usually I’m trying to involve them as much as possible as i mentioned before and always try to create some guidelines but this is something that you will learn also if you take the course and there’s I think it’s not possible to create a general you know general post-editing guidelines because they they depend heavily on the language combination of combination on the type of machine translation and giant and of course on the domain so we try to create very specific guidelines and these are very helpful especially when we have regular projects in or long translation projects because they can be shared among all the translators and providers involved so this was referring to the guidelines and as to the price which is also a tricky question and a very hard question to deter mine I’ve heard some some companies tend to pay per hour and at the beginning we were quite puzzled I must say because we didn’t know what to do we are more used to pay per word and we thought that it was easier to to keep on paying for steady thing like that using a per word rate but you know we think it’s impossible again to to precise discount for all kinds of post editing that’s absolutely impossible I think it’s quite the same as trying to create generic guidelines we work on a per project basis and we have our own metric system which allows us to to know exactly how many changes were done in the post editing so we know exactly how many words or how many characters were changed and with this information and more subjective information and the feedback templates that we get from the post editors we try to offer the affair word rate and this is again done on a project-by-project basis we think this is them currently the affair the most fair way to to set up a price is fair for the for the post editor and it’s also fair for us and also fair for the customer and yes now that we talked about the customers and well our experience is that um for some customers it’s not easy to understand what machine translation is they would run scared from us if we tell them that we are machine translating their contents because they would think that we are not post editing them at all and I think that just got to you know we still have to to explain a little bit more our customers about machine translation and I think that Attila the organization is doing a great work in this trying to involve also the translation buyers I think we haven’t come we haven’t reached the point when the translate my machine translation is you know very welcome everywhere but FB I was very surprised when I first heard about the European Commission you know that I what we translate a lot of documents for them is one of our biggest customers and they they they set up

annual meetings with the spanish translation unit and i was very surprised surprised to hear a couple of years ago died that they use machine translation for everything and the results are very very good and that’s no surprise because well the quality of their content is very good because they they have an excellent team of translators at brussels and luxembourg and they professed everything so if the quality of the corpora you use for your engine is good the results would be good so our approach with customers really again depends on the project and on the customer we had filled them asking specifically for use in machine translation and without post editing because they had you know a big amount of thousands and thousands of words to be translated for an unrealistic date and we offer the machine translation as a solution without post editing this was not time for for anything else that then you know putting all the contents into the Machine and when they saw the result they they didn’t like it at all even if we have explained at the beginning what it was going to be like and what were the risks and the advantages but you know that’s again it’s at the end of the road we there’s always the opportunity to tell our customers what you know the advantages not only in cost but also in improving deadlines and we will have more chances to explain that they might obtain exactly the same result as using you know human translators but using machine translation and full post editing but this is this is really this is our work and well now let’s talk a little bit about what was her contribution to the post editing course we were required to a machine translate a document of about 10,000 words from English into Spanish from the ite domain and we chose a software users guide I think it was from our modem and with the row empty output we created a list of errors and examples based on on this information we classified all these errors by linguistic categories such as grammar spelling mistranslations etc I would like to comment on them later on and later on we chose a couple of examples for each category and later on we were also required to you know apart from showing the row machine translation output we were required to propose a human translation a post editing translation and in both in two options as a light post editing and as a full post editing I’m not sure if everyone knows but just in case the light post editing consists of doing a few modifications the minimum required so that the text gets understood whereas full post editing is more close to human translation and style modifications are also involved and well you’ll find these examples in the in the post editing course and but we’ve also brought three of them here to this webinar to show you what you will find in this course and now I think its martyrs turn to comment on on them yes thank you Lucia so we see your grammar wrong tense uh-huh nice turn tomorrow uh yes much as turn to share the hair screen okay so i will share make marta presenter yes and then let me see so Marta if yes I see your

screen hello yes hi Marta I think we let me see we just saw your screen ok good shall I carry on then yeah yes please okay so um just very quickly so there is time for you all to make any questions or all your shiny concerns you may have we thought we would show you three examples of well the project with as a looky i mentioned it was a part of software project and was english into spanish so we chose to three examples to show you so you can all understand what we talk about when we talk about post editing so for the first example which has a grandma or intense use and the original stringing english was this view and control security cameras connected to your hands hmm let’s see this is the well the road translation that the engine gave very controlling as you can see there’s this a problem there with the use of birds so this is the post edited versions very controlar las cámaras de seguridad and so on obey I control and you would ask asking yourself probably right now why they’re two different possibilities I said because you need to give to possibility for every stream no it’s because depending on the context you could use one or the other so both of them correct so for example the second one may I controlar las calles de seguridad etc you would you this if for example you have a list of instructions for the end user so you usually in Spanish you usually use an imperative like bacon trolling right what is the first one better control line could be for example a title of a section right so upon context you would use one or the other it doesn’t mean that you will use both of them at the same time right so for the second mistake that we chose this is a very typical mistake between English and Spanish the capitalization of words as you know both languages and have a different system of capitalizing words so if you work with a post editing you most certainly find this this mistake quite often so this is the English string the appendix a contact the technical support very typical you will find this if you ever work with the software with support projects at some point you will have a sexual or a new chapter with this title or very similar one so this is the Appendix A let’s see the translation that we got from the engine it was up an affair and we don’t know why but the engine decided to capitalize the first word and and also decided to copy like to capitalize the beginning of every single other world so on this is probably one mistake that could be corrected with a rule if we defined a rule for the for the engine for the translation engine so we can sell the ending not to reproduce the use of capitalized and capitalization of English into the translation right and again we have two possibilities opened if you while the translation system is the same the first one up and if you obviously the first word you don’t need to come to capitalize every single letter just with the first letter A surfaces on punishing on that the one servicio tecnico and why is it that we put the second one well because it depends on the customer and well this is a real I’ve seen this in many projects and so where I i collaborated with that some customers particularly with titles of guides of years against I am specifically say that we should keep the capitalization of let of every important word in the titles and important ones are well always apart from prepositions on articles so as you say the a from appendices appeal from ponerse and so on are capitalized because of that so it depends on the style guide of the customer you would use first option or the second option and finally we’ve chosen an example of what it means light versus full post editing as Lucia just explained what these two types of editing so I I won’t

explain it again so let’s move on to the to the example here is going to be the original string in English if you aren’t able to connect to the mind the linked registration site a warning message notifies you an accessible connection so now this is the original translation that the engine Gail on it’s quite good as you see but suddenly something goes wrong at the end of the sentence and you have something very strange something like Messiah divertenti open in dakar le queda una conexión so the the post elephant needs to correct that so needs to intervene there and choose between light or full post editing for a light post editing this is the solution we gave so intervening the correcting the less possible elements in the original translated sentence so we just changed towards the end of the sentence about a phenomena has evidently operandi curly que no se conectado and for the filbert’s to editing so then the changes are a bit more profound and we have seen no se ha podido conectar con el sitio de registro de mai de link a peripheral o mensaje de advertencia / in dakar salon so why can we see that this is the full post editing versus the other the other version that the light post editing well it is at this point it is very difficult for a machine to to translate the to change the original English into that final / ND Carcillo so that the translation value will always try to translate exactly what the English sense so if the English sentences and warning message notifies you of an exercise for connection the translation ending will give you exactly translation translation for that right leg in decarli kenosha conectado or as if you compare both flights versus full post eating you see that both of them are understandable both of them transmit the original meaning so the end user of this particular software will understand the meaning of the sentence but then the one on the right the full post editing sentence is somehow a bit more elegant if you will and you may think that this example is not very relevant but think about because this is just an example taking out of context but if you think about um 50,000 words are long translation Oh user guide Oh Marta I think we lost Marta for a second Martha can you hear us hello Martha can you hear us yes hi okay i just lost you maybe it was me but i just lost you for a second so let’s see Lucia let me see hi Lucia can you hear us yes I can hear you chill okay did you also lose Marta for a cell or it was it me no no no no I think it was only you that was me okay great so sorry about that so I just missed the last part so I’m wondering if if you were finishing your talk or did I just interrupt oh no no I’ll go ahead I had finished my presentation okay yeah no no so I i would like to thank you for for his presentation and i was wondering if if there are questions right now so there are two options the audience can ask

questions through the chat box so you can ask questions or you can raise your hand which is also possible ingu go to webinar so if you have a question please write it down in the chat box or raise your hand and in the meantime I have a question to you because you mentioned that post editors regularly give feedback to you I was wondering how do you do it you said you have feedback templates what what template include are the error types they have to fill in or it’s just an informal text they have to write about the quality of the text how does it work what well at the beginning I started receiving too many informal emails and honestly I prefer to create our feedback template a specific one for machine translation feedback so the template is just an excel file very simple with several columns and it contains quite the same as these examples shown by by Martha column for the source text another one for the for the row and T output and another one for the suggested posted it post editing if they want to add it and of course the last one for the comments and then they can they can explain the issue as they want ok I have here a question from simo so I will unmute simo and maybe you can ask your question yes hi how are you all right nice to meet you simmer nice to meet you too and I love at like to ask Martin look here if and they could get worse maybe general estimate of a performance in post editing both light and full cost anything and I’ve understood that there that and well there are different levels of difficulty in post editing but maybe we came to the example that Martha have given us and could you maybe give us some estimating performance that you’ve that you’ve seen in the project you’ve worked with you mean like a percentage of failure or our success I mean in time I mean how long are ok sir I say say how long does it take him to edit for example 10,000 work at 1000 word oh that’s really very language and end of my independent I don’t really have those figures but well obviously a light post editing will be a faster that’s obviously and force but full post editing will take longer because style is involved I don’t really have those figures but when you have to decide if you need full or light post editing based on our experience it’s very important to define both options with your end customer what are their expectations because otherwise you might be there might be some misunderstanding you might be you know working harder than you need you might be full post editing a text when the customer doesn’t need such level of quality so its first of all you have to you know be a very precise on what the customer needs and that would be a final results and then the figures you’re asking me will depend on that but I’m sorry I don’t really I can’t really tell you how many words birthday or per hour someone can post edit because for instance I i can tell that spanish into catalan and Galician is post editing is very very quick because you find very few errors and on the contrary English into German or English into French takes quite a long time even if the results are acceptable but I’m sorry stuff that if that doesn’t has answered your question well i was thinking specifically of english into spanish and maybe I’ve always worked with example a performance rate of 250 words for a translator per hour and around 1000 words where are we viewer per hour and I’m i assume that the the performance rate will be something in between but i would like to know if it

closer to the translation rate or closest with you yeah i am based on my experience with not only english into spanish but with other language combinations combinations it really is something in between translation and proofreading many people think that it’s a it should be exactly the same speed as proof reading but it is not it’s something between translation and proofreading that’s absolutely true and I agree with this but again it I think it really depends not only on the languages but also on the kind of enjoying of the contents that that engine is based on and and also in the domain because as we explained before the first time we tried with the English into Spanish with a text from the IT domain it was a complete disaster but fortunately with these examples you can see that I mean there’s not too much to to change there are some important emissions and very some awkward sentences in terms of grammar especially which is I think it’s the area the most affected area when translating from english into spanish because of you know there’s several the morphological differences between the two languages but i think again the speed should be something between translation and improve reading it really depends on on on the content and on the row empty output okay so i also have a question here from lorena guerra who asks and this will be our last question how do you choose your post editors which experience or qualifications are requested that’s a good one not now that we have the course we will only choose those taking part of the course oh no I’m joking I’m joking now seriously and you know if you work for the translation industry that not all translators are suitable for for reviewing or proofreading so it’s more or less the same with post editing i think that the profile of a post editor is more closed that of a proofreader so should be someone with you know who can read the translation and check what’s wrong and being able not to change everything just because because he would have liked to say that in a different way but be able to you know distinguish what should be corrected to have a fully understandable text if it’s like post editing or a high quality like human translation text if it’s full post editing so I think in general terms that the profile is more or less that of the of the proofreader with button with a specific knowledge on on on post editing I think that Marta I can also add something to this question because we were talking about this yesterday and well she said that the first time that she heard about post editing was like what’s this it was the same it was the same for me so I think this is something that you can learn sure not sure if Martha can I would like to share something else okay yes Lucia Oh Matthew you share my little secret now which is our little secret it was exactly the same for me Martha well it doesn’t sound very very good yeah that that was like my first post editing project and the project manager said well Martin we have a new software project and but this one has a little bit different because it’s post editing and my reaction was come again what did you say so um post editing it was just a few years ago well you you knew more or less what they meant about that but in but still still you needed to check to double check and but it wasn’t after all it wasn’t a problem because if you have as Lucia said if well if you if your profile is more of a proofreader and you have experience in that topic in the specific topic either pattern technical patterns or software or pharmaceutical medical whatever it is you have some experience with that I

think you can develop the skills of a post editor and that that’s a very good final remark and I see that we are we are just a little bit over time so i would like to thank you both for for ya presenting on this webinar and for participating and and sharing your experience and i would like to thank the participants and show you one more time the slide which contains the code Thank You Lucia Thank You Martha and I will collect some of the questions which we haven’t answered there are some more questions so I will try to send them to you and maybe you can send me back the information and the answers so I if yes I would like to thank you all and thank you for the participants and well maybe see some of you next week we will have also some webinars on different languages and I hope you will be able to well find time to do the taos post-editing course have a nice evening Thank You T life yes everyone okay thank you thanks a lot bye bye bye bye all right thank you