Wąsaty tłumacz

memoQ templates for translators

admin — Fri, 08 Jul 2022 10:25:48 +0000

Translation and revision are almost always tasks that require some organization and management. We receive jobs from our clients, often with some additional resources, like TMs, TBs and references. We also need to be able to track our progress on jobs we do, which involve different tasks. All this led to introduction of project-based organization of work in CAT tools: to translate a document(s) you create a project in your tool of choice, import or select required resources and perform the tasks required for the job: analyze the documents against translation memories used in the project, maybe pre-translate the files and possibly customize some settings. After you finish the actual translation or revision, some other tasks may be required, for example bilingual export for backup purposes. In memoQ a lot of these tasks can be automated with project templates.

Projects are very convenient way of organizing and tracking translation-related work – you decide what resources you are going to use and what settings for things like QA, custom export path rules, user dictionaries or TM penalties. However, if you need to translate just one small file, creating a project may seem like a lot of work. And it may be true – modern CAT tools include option of drag-and-drop file translation or “Translate in/with XXX”, where you don’t have to click too much and you can jump to file translation. What you may not realize is that in this case a project is still created, it is just less visible to the user, because it’s creation is automated with project templates.

Templates are a way of automating project creation: template settings are used for assigning or creating resources like TMs and TBs and various other aspects of the project. memoQ comes with several default templates for quick and easy project creation, and one of them (One TM and one TB per language pair) is by default used for projects created by dragging and dropping files into memoQ’s “Start translating” Dashboard area. However, the memoQ templates give users a lot of options for automation of everyday tasks. Some of them are specific to project creation, like project naming rules, translation memories, term bases and various light resources with settings used in the project, but others can be an important help even in long-standing projects used for multiple jobs done for a particular client – this may include automatic analysis report of files added to the project and exporting a bilingual backup when you remove a file from the project. And more.

The templates offer different level of functionality depending on the license you have: PM license owners with access to memoQ TMS (memoQ server) have access to all functions of ‘Translator Pro’ license with some extras, which include greater variety of reports and automated action triggers (project milestones used to execute automated actions) and some server-specific tasks.

In general, for users with ‘Translator Pro’ license, memoQ templates can be used to automate the following:

Create a project with a pre-defined source language or language pair
Name a project according to pre-defined rules
Populate metadata fields with pre-defined values or values selected from pre-defined pick lists
Include any heavy resources (TMs, TBs, Muses and corpora): selected explicitly and/or automatically based on metadata (e.g. client name)
Include relevant light resources (e.g. export path rules, non-translatables, ignore lists, custom QA settings, etc.)
Apply file import filters with non-standard settings (e.g. import Word files with comments for translation or with change tracking)
Automatically apply custom file filters for defined file extensions and/or automatically select the correct filter for a given XML file type
Perform source file modifications before import (find & replace) and apply custom filters for one-step importing of non-standard files, and reverse changes on export
Automatically perform actions like analysis (Statistics) on files, pre-translate, X-translate, lock/unlock segments, export mono- and bilingual files at certain project milestones
Perform TM maintenance-related operations: confirm and update rows, delete working translation memories

Many memoQ users who discover templates try to create their own, but are often overwhelmed with the available options and quickly give up, sticking to manual project creation and loosing the potential benefits provided by the templates. But now there is a resource you can use to effectively employ memoQ templates: an eBook ‘memoQ Inside Out: Templates for Translators‘. This publication for memoQ users with ‘Translator Pro’ license provides a comprehensive overview of templates functionality, offering the following information:

Usage scenarios for templates, from very minimalistic (but still very helpful) to very complex ones, with descriptions of several sample templates.
Descriptions of default memoQ templates: what they do and how you can use them as a starting point for your own templates with similar functionality.
Information on how to create a project based on a particular template and customize template selection list.
Detailed description of all available templates settings, including automated actions, with information regarding usage scenarios for them.
Instructions on how to configure scripts that can be used for pre-processing of imported files and post-processing of exported files to customize files for import, e.g. for automated handling of normally not supported file formats; this also include ‘Execute custom code’ automated action.

The eBook is available here: https://payhip.com/b/agrxM

For a limited time this and other eBooks are available with a 20% discount: use promo code YFGQSKHBJM (valid until July 31^st). If you are unsure abut the purchase, you can download a preview file with table of content and introductory chapter.

My other publications in the memoQ Inside Out series:

memoQ Inside Out: Machine Translation

The publication is a comprehensive overview of setting-up a machine translation with MT plug-ins available in memoQ, with information on various ways the MT content can be used, how to evaluate the usefulness of a particular MT engine in your work, how to hide the use of MT from your clients and tips on how to spot an MT use in memoQ, if you are a project manager. Pseudo-translation plug-in is also covered.

Available also in French, Dutch and Polish: https://payhip.com/Wasaty

memoQ Inside Out: Tags

Publication discussing all aspects of tags in memoQ: from the basic introduction to tags (what are they and why are they important) to various ways of tag insertion, commands available for tag handling, tag editing, conversion of text into tags (and back) and all other tag-related stuff.

Machine translation in memoQ

Wasaty — Thu, 03 Mar 2022 07:24:15 +0000

Machine translation (MT) is here to stay. Used properly, it can be a tremendously useful tool, improving translators productivity and simplifying work on relatively simple translations within certain domains. At the same time, improper use of MT can be a hindrance for translators, slowing them down and frustrating with useless suggestions. For “proper” use of MT, you need two things: an MT engine that’s right for your language pair and subject domain and a way to use generated content in the most efficient way. memoQ is a CAT tool that can help you with the latter, if you know how to configure it according to your needs.

memoQ offers well over 20 plugins for various MT engines out of the box (all of them require a commercial account with the MT provider), and you can find multiple additional plugins you can download and add to your configuration, if you want to try something different. There are also multiple ways you can use machine translation to augment your work in memoQ:

Machine translation suggestions can be displayed as interactive translation options, the same way as translation memory and LiveDocs corpora matches.
MT configured in interactive mode where the engine can be used as a ‘dictionary’ for selected source text, i.e. MT concordance results.
MT engines can be used to correct fuzzy substitutions automatically with the MatchPatch feature.
An entire project, selected documents or parts of a document can be populated in batch mode using the Pre‑Translate feature.

What’s more, while memoQ defaults offer you come flexibility with regard to how you can use machine translation content in your work, with some additional knowledge you can tweak memoQ behaviour quite extensively, for example by changing the match value below which the tool will send segment content to MT engine in interactive mode (default options are: sending everything, matches below 100% or matches below 95%), but you can set your own custom value. This applies also to MT used in pre-translation, and you do that by changing the “Good match” threshold in TM settings.

Other MT-related features of memoQ include options for evaluation of MT usefulness with Edit distance reports and tracking time spent on projects with Editing time reports.

When you use MT in your translation, information on the origin of matches is stored in the bilingual files and can be easily seen. If – for whatever reason – you don’t want to provide that information, there are ways you can hide the origin of matches.

I wrote an eBook on how to use MT in memoQ to it’s full potential. It’s available for some time now, but I only write this post now because:

I’m not good at marketing, to say it mildly,
there are Polish, French and Dutch translations available now.

If you want to improve your knowledge of the machine translation use in memoQ, I suggest you give it a try. Links to all language variants below, and at the end of the page you’ll find a table of content for the ebook.

memoQ Inside Out: Machine Translation

Tout sur memoQ : la traduction automatique

Meer doen met memoQ: Machinevertaling

memoQ na wylot: tłumaczenia maszynowe

Oh, and if you want to know how to obtain keys to some of the more popular MT engines, you’ll find detailed information here: https://marcinbas.github.io/api_mt/#4-how-to-obtain-api-keys-for-mt-plugins

Content
Introduction

Configuration of MT settings

Using MT in translation

Interactive mode MT

MT concordance

MatchPatch with MT

Pre translation with MT

Evaluation of MT benefits

Tracking editing time

Edit distance reports

Hiding the use of MT in the translation process

Methods to hide the use of MT

Editing the MQXLIFF file

Copy-paste

LiveDocs corpora or TM

Pre translation method

Term base

Project manager perspective

MT use in online projects

Blocking or enforcing MT usage

No enforced restrictions on MT use

Documents pre-translated with MT

MT content provided as TM or LiveDoc

Evaluation of an MT engine

Spotting illicit MT use

Pseudo translation

Defining the Good match threshold value

Literary translations with CAT tools

admin — Sun, 20 Oct 2019 17:51:07 +0000

What are CAT tools

If you deal with literary and creative translation only, you have probably never used any special translation-related software other than dictionaries (and a word processor), but you may have heard of “Computer Assisted Translation” tools or CATs for short. These were developed with technical translation in mind, and their main premise is to “never translate the same sentence twice”, which makes a lot of sense in the world of repetitive instruction manuals, but can this be useful in literary translation? Let’s see, but please note that this text does not touch the subject of machine translation (MT).

Let me start with a short introduction: I started translating literature in 2000, while still working on my (failed) Ph.D. in chemistry, “retyping” paper books into a text editor. Several ergonomic improvements later, it was finally electronic text in two windows on a single screen. And somewhere along the line, I started translating “technical” texts as well (medicine and chemistry), where the use of CAT software was a requirement. And after I got used to the software, I started using it for literary translation too at some point. And never looked back.

Default layout of memoQ, one of the popular Computer Assisted Translation (CAT) tools

So what’s the deal with the CAT tools? The main idea behind them is that the text for translation is “segmented” into sentences, and once you translate a sentence (segment), both source and target are stored in a database called a “translation memory” (or “TM”), which you can use in the future translations and share with others. When you encounter identical segments in the future, the software will insert the prior text into your current translation, so you don’t have to waste time on something you – or someone else, if you received a translation memory along with the files for translation – already did, while ensuring consistency, which is very important in technical communication. If you encounter a sentence similar to something you already translated (a so-called “fuzzy match”), the software will also show you the previous translation with differences between the current and previous segment texts highlighted, again helping you work faster and in a more consistent way.

Translation memory match in memoQ: (1) Current sentence for translation, (2) Similar sentence found in translation memory, highlighting differences between current text and TM match (3) Translation of the segment found in TM, (4) Information on TM match: who translated/edited it, when, what was the document name, similarity score (match percentage) etc.

Of course, that’s not necessary something we want in a literary translation, but it’s sometimes useful, and the CAT tools offer way more than just help with consistency, definitely helping me work in a more comfortable, efficient way. Let me tell you how.

Benefits

I never actually did the translation in Word thing: I used a Linux text editor, but the idea was the same: open the source text in an editor window, open a second window to the right/left/above/below, make sure the windows are right size and in the right places, and then you can translate. Also, once in a while you need to switch to the source text window to change the scroll. And if you take a longer break for any reason or close the source text window, you need to find where in the source text you now are. It doesn’t take long, but this adds up in the course of a work day.

When you work with a CAT tool, some preparation is required at the beginning in most cases, as most tools employ the concept of a “project”: you need to create a project with a name, define a language pair and create a new translation memory and term base or mark existing ones for use, and then import the source file. For something you’ll do once in several months it’s really not a big deal. Once you have done this, you can run an analysis in which the software will tell you how many segments/words/characters source file has and if there are any repetitions – identical segments written more than once in the text. For literary texts, in most cases this will be things like “Chapter:” and for an English source text also “he/she said” and the like. Running an analysis is a great way to track your progress: while the software will display progress information in real time, I like being able to record this information, so I always run analysis at the end of my working day to be able to track and compare daily progress. But it’s optional, and the number of characters/words should be the same as reported by a word processor.

Let’s start the translation. Once the text is imported, you can open it for translation. Depending on the actual software used, various levels of formatting will be shown – some programs, like Trados Studio, replicate font color, size and typeface from Microsoft Word documents while others, like memoQ, show only most basic formatting (bold, italic, underline), using single, customizable font face for all text. I actually prefer this approach, since it’s easier to focus on content.

Let’s list the actual benefits of working with a CAT

Focus/ergonomics – Regardless of the fonts, in case of most CAT tools the source text will be displayed as segments: each sentence separately, and you are supposed to type in the translation – depending on software or settings – to the right of source or below source text. This has three benefits: it helps you focus on a single sentence and makes it easy to find the current text to work on – it’s usually in the middle of the screen, highlighted in some way. It’s also really hard to forget to translate some part of the text: you’ll get a warning if you’ll try to export a translation that’s not completed.
Formatting – You will see more or less “clean” text (the amount of formatting depends on the software and your preferences). If the paragraphs are formatted in some complex way, you don’t have to worry about that, software will use that formatting when you export finished translation. You can just focus on the translation, applying simple stuff like bold or italics along the way or using special tags for more complex formatting.
Original source text formatting is usually displayed as a live preview in the CAT tool interface (but not in all such programs), and the preview is updated as the translation progresses.

Term bases – You can use term base (glossary) features to speed up your work and make it more consistent. Do you need to translate some place name? Add the source term and its translation to the term base (TB). If the name shows up in a source segment, it will be highlighted, and the translation will be shown somewhere in the CAT tool’s user interface. You can then insert the translation quickly by double-clicking, using a shortcut key or just by starting to type it and using the predictive typing suggestion. Do you have a long, complex place/person/company/product name? Add it to a TB to facilitate quick typing or insertion. Are you translating from English to some inflected language and there’s some character in your novel whose gender you can’t remember? Add the name to TB with note on gender. You can type it faster and see the gender info quickly.

Faster typing – Term base hits can be inserted very quickly, but this also works for short segments. Plus in some programs like memoQ or Trados Studio you can generate special predictive typing dictionaries that will suggest words or even multi-word phrases based on source segment content. This works best if you have large translation memories and languages which are not inflected.

Concordance – All CAT tools offer a concordance feature: simply select source word or phrase and use the corresponding keyboard shortcut or function button to look up how it was translated before. No more scrolling through documents to match source file with target. All instances in which the expression occurs are shown in one window, with context. This makes it much easier to ensure consistent translation of some particular phrase used by one of the characters, or just the opposite – ensure diverse translations if preferred.

Auto-concordance – It gets even better: short, repeated segments (like “he said”) and their translations can be shown automatically. You can use this feature to ensure consistency or as a sort of thesaurus for increasing diversity, which is often needed in a literary context.

Quotations – Does your author repeat statements made earlier? How well do you recall these? You don’t have to look for the quote; the CAT tool will show you previous the translation automatically. And even show the differences if the author changed something (deliberately or accidentally).
Comments – Do you use comments to note something for later? No problem! You can use the commenting feature in your CAT tool and perhaps assign one of various comment categories (e.g. information, warning, etc.) for different purposes. Later you can filter to find the segments with comments quickly and even export those comments (all of them or just a selected category) to the target document.
No sentence is lost – Once you confirm a segment (usually with Ctrl-Enter), it gets marked as confirmed in a translation editor, but it’s also stored in a translation memory (database). Some programs save the file at the same time (in others this happens at pre-defined time intervals). Even if your computer crashes or there’s a power outage, the translation is safe – you can always restore it from the translation memory. This provides another level of safety/backup for your work.
Progress tracking / improved productivity – A CAT tool will show your progress based on the number of segments, words or characters. I already mentioned the analysis feature and real time progress information. But working with segments has an additional benefit: you can use them for a variant of the Pomodoro technique, where instead of time, you focus only on your translation for a given number of segments. For me in literary texts it’s usually 50 – when I start working, I ignore emails and other distractions until I’ll translate 50 segments. Then I take shorter or longer break (for Facebook, preparing tea or loading the washing machine) and start another batch. The number can be different depending on the complexity of your source text, but this technique allowed me to improve my productivity considerably. Of course, you don’t have to use a CAT tool for this; you can do this based on pages instead of segments, but I find it easier with segment numbers.
Filtering for words/phrases – You already know that it’s easy to check how something was translated before, but what if you changed your mind and you want to use a different translation? Use Find and replace… or filter the text based on a source or target expression and edit the selected segments containing the words you need in context. Please note this feature is not available in all CAT tools, only those which use an editor with a “table” interface.
Built-in/optional web search tools – You won’t find this in every tool, but the good ones have it. Do you need to run a Google/Bing search or use some web dictionary? Once you configure built-in/optional web search feature, you can just select a word or phrase and use shortcut key to look up that text on multiple web sites at the same time.
Automated fuzzy matches correction – This doesn’t happen very often in literary text and it’s may not be something you rely on, but sometimes a fuzzy match (sentence stored in TM similar to your current one) can be “fixed” automatically to create a correct translation. And it may even be something worth keeping without rephrasing.

There are also other useful features, like: being able to generate a Microsoft Word file with source and target texts in a table for quick verification in the word processor’s environment; auto-correction lists; the ability to use monolingual reference files; quality assurance functions; and much more.

Limitations

So is it all roses? As with any software, there are some limitations.

Sentence-based translation – As mentioned before, with CAT tools we are working with segments, which are mostly sentences. But in literary/creative translation we need to consider the larger context, usually a whole paragraph or more, and the stylistic impact on a reader. Context and style are, of course, important in technical translation but are often weighted differently.. Breaking text up into sentence-level chunks makes it easier to focus on the current segment, but at the same time a bit harder to think about whole paragraph. But it’s not like the rest of the text is hidden – just the opposite, current segment is highlighted element with previous and next segments plainly visible, so it’s just a matter of adjusting how you look at things. You can also easily change the segmentation to paragraph-based, but this may negate some of CAT benefits (e.g. it’s easier to miss a sentence). Quite often you will also have to change sentence length – joining two or more source sentences (segments) into one in translated text or splitting longer sentences into shorter ones. This is actually quite easy: if you need to change long source sentence into two shorter in translation, simply write those two sentences into a single target segment. If you need to merge several short sentences into one, you can either join the segments so the software will show two or more sentences in a single source segment (translation table “cell”), or just translate fragments in proper segments. They will comprise a single sentence in output target file.

If you need to split text into separate paragraphs, you will have to use some placeholder symbol (I use “\\”) and use Find and Replace in Microsoft Word after exporting the translation.

And once you finish your translation, do the final proofreading of the exported target file in Microsoft Word, not in the CAT tool. This will help you see paragraphs and larger blocks of text, not sentences, in an overview which makes it easier to polish the text.

Learning curve – You can grok the basics of CAT tool functionality with several hours of online webinars/training courses and a little practice, but you have to be prepared for some frustration and a steep learning curve, especially if you never used any such tool before and need to learn new concepts. Different programs offer different levels of complexity, and while general workflows are the same, implementations differ. When working with literary texts you won’t need all the features of a modern CAT tool, and some of the features mentioned above (such as web search) require some work for configuration.
Cost – Let’s face it: CAT programs are professional software, and they are priced accordingly. Will your productivity improve enough to justify investment of several hundred euros? Maybe, with time. And maybe not. But you can start with free software, like Open Source OmegaT, free to use SmartCAT (read the EULA carefully) or Wordfast Anywhere, try subscription programs, e.g. Memsource or experiment with trial versions of commercial tools like memoQ (my favorite, with a 45-day trial period) or SDL Trados Studio. But remember the old adage: you get what you pay for. I don’t want to knock the free tools, but you do get more with commercial desktop software.

So, let’s summarize. Can CAT tools be used in literary/creative translations? Definitely. Will they help? Yes, in many ways. Enough to invest in commercial tools? You have to decide for yourself.

Full disclosure: I’m not employed by any software company and I don’t get comission on any sales, but as a certified trainer, I do run commercial memoQ trainings.

Programy CAT w tłumaczeniach literackich

admin — Sun, 20 Oct 2019 09:30:22 +0000

Czym są programy CAT

Jeśli zajmujecie się wyłącznie tłumaczeniami literackimi i kreatywnymi, mogliście nigdy nie używać żadnego specjalnego oprogramowania do tłumaczenia poza słownikami i edytorem tekstu, ale mogliście słyszeć o programach do „komputerowego wspomagania tłumaczeń” (Computer Assisted Translation, CAT). Stworzono je z myślą o tłumaczeniach technicznych a ich głównym założeniem jest idea, że „nigdy nie trzeba dwa razy tłumaczyć takiego samego zdania”, co ma mnóstwo sensu w świecie powtarzalnych tekstów komunikacji technicznej (np. instrukcji obsługi), ale czy takie oprogramowanie może się do czegoś przydać w tłumaczeniach literackich? Zaznaczam, że tekst w najmniejszym stopniu nie dotyczy tłumaczeń maszynowych (Machine Translation, MT). Zapraszam do lektury.

Zacznę od paru słów o sobie: teksty literackie zacząłem tłumaczyć w 2000 r., wciąż pracując nad (nieudanym) doktoratem z chemii, a robiłem to „przepisując” papierową książkę w edytorze tekstu. Po kilku kolejnych usprawnieniach ergonomicznych któraś kolejna książka była już tekstem w formie elektronicznej, a tłumaczenie wpisywałem w drugim oknie na ekranie. I gdzieś po drodze zacząłem wykonywać również tłumaczenia tekstów „technicznych” (chemia i medycyna), gdzie wymagane było używanie narzędzi CAT. A gdy w dostatecznym stopniu przyzwyczaiłem się do tego oprogramowania, zacząłem go używać również do tłumaczeń literackich, których mam za sobą prawie 40. I bardzo je sobie cenię.

Domyślny układ interfejsu programu memoQ, jednego z popularnych narzędzi to wspomagania tłumaczeń (CAT)

Jak działają programy typu CAT? Podstawowa zasada działania polega na tym, że tekst do tłumaczenia jest dzielony (segmentowany) na zdania, a gdy raz przetłumaczy się zdanie (segment), tekst źródłowy i tłumaczenie zapisywane są w bazie danych nazywanej pamięcią tłumaczeń (Translation Memory, TM), której można używać w kolejnych tłumaczeniach i przesyłać innym. Gdy w przyszłości napotka się identyczny segment, oprogramowanie wstawi istniejące tłumaczenie, dzięki czemu nie trzeba tracić czasu na coś, co wcześniej przetłumaczyliśmy – lub zrobił to ktoś inny, jeśli wraz z plikami do tłumaczenia dostaliśmy pamięć tłumaczeń – zapewniając tym samym spójność, co jest istotnym elementem komunikacji technicznej. W przypadku natrafienia na zdanie podobne do czegoś, co zostało już przetłumaczone (tzw. „dopasowanie rozmyte”), program również wyświetli wcześniejsze tłumaczenie podkreślając różnice między bieżącym segmentem źródłowym a tym z pamięci tłumaczeń, co znowu przyspiesza tłumaczenie i ułatwia zachowanie spójności.

Podstawienie z pamięci tłumaczeń w programie memoQ: (1) bieżące zdanie do tłumaczenia, (2) podobne zdanie znalezione w pamięci tłumaczeń ze wskazaniem różnic między bieżącym tekstem i dopasowaniem z TM, (3) tłumaczenie segmentu znalezionego w TM, (4) informacje o podstawieniu: kto i kiedy je przetłumaczył/edytował, jak nazywał się dokument, z którego pochodzi, poziom podobieństwa (procentowy poziom dopasowania) itp.

Oczywiście identyczne tłumaczenie takich samych lub bardzo podobnych segmentów to niekoniecznie coś, czego chcemy w tłumaczeniach literackich, choć czasem bywa to potrzebne, a narzędzia CAT oferują dużo więcej, niż tylko pomoc w zachowaniu spójności i zdecydowanie pomagają mi pracować dużo wygodniej i wydajniej. Zobaczmy jak.

Korzyści

Tak naprawdę nigdy nie tłumaczyłem książek w Wordzie – używałem edytora tekstów pod Linuksem, ale zasada była taka sama: tekst źródłowy w oknie edytora, tłumaczenie w drugim oknie z prawej/lewej/powyżej/poniżej, dopilnować właściwej wielkości okien i pracujemy. Trzeba tylko co kilkadziesiąt linijek przełączyć okna, żeby przewinąć tekst źródłowy. W przypadku dłuższej przerwy trzeba też poświęcić chwilę na wyszukanie w tekście właściwego miejsca do tłumaczenia, co może potrwać dłużej, jeśli z dowolnego powodu trzeba było zamknąć okno edytora. Oczywiście takie wyszukiwanie jednorazowo nie trwa zbyt długo, ale nawet te chwile się dodają, a zdarzy się, że pominiemy jakieś zdanie.

Gdy zaczyna się tłumaczenie z programem CAT zwykle wymagane jest przygotowanie, ponieważ większość programów stosuje podejście „projektowe”: należy utworzyć projekt z wybraną nazwą, wybrać język źródłowy i docelowy oraz utworzyć nową/wybrać istniejącą pamięć tłumaczeń i bazę terminologii, a potem zaimportować plik(i) do tłumaczenia. Jednak w przypadku tłumaczeń literackich zajmujących średnio parę miesięcy trwające najwyżej kilka minut przygotowanie nie stanowi problemu – oczywiście wymagany jest tekst źródłowy w formie elektronicznej. Po utworzeniu projektu można przeprowadzić analizę – oprogramowanie poinformuje, ile segmentów/słów/znaków znajduje się w pliku źródłowym i czy są jakieś powtórzenia – identyczne segmenty występujące więcej niż raz. W przypadku tekstów literackich zwykle będą to krótkie elementy typu „Rozdział”, a w przypadku tłumaczeń z angielskiego również krótkie fragmenty dialogów „he/she said” i tym podobne. Przeprowadzenie analizy jest też świetnym sposobem na śledzenie postępów pracy: choć program wyświetla informacje o postępie tłumaczenia w czasie rzeczywistym, osobiście cenię sobie możliwość śledzenia postępów w czasie, więc zawsze wykonuję analizę na koniec dnia pracy, dzięki czemu mogę śledzić postępy. Jednak analiza nie jest bezwzględnie konieczna, w przeważającej większości wypadków liczba słów/znaków będzie taka sama, jak podana przez Worda.

Pora przejść do tłumaczenia zaimportowanego tekstu. W zależności od używanego oprogramowania, program będzie wyświetlał różne poziomy formatowania – niektóre narzędzia, takie jak SDL Trados Studio odtwarzają pełne formatowanie tekstów z Worda, włącznie z kolorem i rozmiarem czcionki, podczas gdy inne, jak memoQ, wyświetlają tylko podstawowe formatowanie (wytłuszczenie, kursywę i podkreślenie), stosując jednolitą czcionkę do całego tekstu. Osobiście wolę to podejście, bo dzięki niemu łatwiej skupić się na treści. Warto pamiętać też o wyłączeniu funkcji „auto-propagate”, która automatycznie wstawia zatwierdzone tłumaczenia do identycznych segmentów w tekście (np. fragmenty dialogów; w memoQ Translations > Translations settings > Auto-propagation > wyłączyć).

Przyjrzyjmy się teraz zasadniczym korzyściom oferowanym przez programy CAT

Skupienie uwagi/ergonomia – niezależnie od czcionek, w przypadku większości programów CAT tekst źródłowy wyświetlany jest w segmentach: każde zdanie osobno, a tłumaczenie wpisuje się (w zależności od programu i/lub ustawień) na prawo od źródła lub poniżej. Ma to trzy zalety: pomaga skupić się na pojedynczym zdaniu (patrz uwaga w części o wadach) i sprawia, że znalezienie bieżącego tekstu do tłumaczenia jest bardzo łatwe – zwykle znajduje się na środku ekranu, w jakiś sposób wyróżniony. Trudno też jest pominąć jakąś część tekstu, bo program wyświetli ostrzeżenie w przypadku próby wyeksportowania tekstu z brakującymi segmentami.
Formatowanie – mamy do czynienia z mniej lub bardziej „czystym” tekstem, z ilością formatowania widocznego na ekranie zależną od konkretnego programu i preferencji. Jeśli akapity tekstu źródłowego są sformatowane w jakiś skomplikowany sposób, w ogóle nie trzeba się tym martwić, bo program odtworzy to formatowanie przy eksporcie gotowego tłumaczenia. Można się skupić na tłumaczeniu pilnując tylko prostego formatowania w rodzaju wytłuszczenia lub wstawiając specjalne znaczniki w przypadku bardziej złożonego formatowania.
Oryginalne formatowanie jest przy tym zwykle wyświetlane w formie podglądu aktualizowanego na żywo w interfejsie programu, z tłumaczeniem wstawianym w miejsce gotowych segmentów źródłowych.

Bazy terminologii – można używać bazy terminologii/glosariusza do przyspieszenia pisania i zapewnienia większej spójności. Potrzebne przetłumaczenie nazwy jakiegoś miejsca albo np. statku? Wystarczy dodać nazwę źródłową i jej tłumaczenie do bazy terminologii (zwykle przez proste zaznaczenie i użycie polecenia z interfejsu lub skrótu klawiszowego). Jeśli nazwa wystąpi w segmencie źródłowym zostanie podświetlona, a właściwe tłumaczenie będzie wyświetlone gdzieś w interfejsie użytkownika programu. Możesz wtedy szybko wstawić tłumaczenie przez dwukrotnie kliknięcie, użycie skrótu klawiszowego lub po prostu przez wpisanie paru pierwszych liter i skorzystanie z podpowiedzi pisania. W tekście występują długie i złożone nazwy miejsc, imiona czy nazwiska albo nazwy firm lub produktów? Wystarczy dodać je do bazy terminologii, by szybko wstawić. Tłumaczenie z angielskiego na polski, w którym jednak trzeba znać płeć postaci, by właściwie je odmieniać, a dana osoba występuje w tekście rzadko? Wystarczy wrzucić imię do bazy terminologii z zaznaczeniem płci: będzie można szybko wpisać imię i jednym spojrzeniem sprawdzić właściwą formę.

Szybsze pisanie – wpisy bazy terminologii można wstawiać bardzo szybko, ale działa to też dla całych krótkich segmentów, a na dodatek niektóre programu, jak memoQ czy SDL Trados Studio mogą generować specjalne słownik „automatycznego pisania”, które sugerują słowa a nawet wielowyrazowe zwroty na podstawie zawartości segmentu źródłowego. Sprawdza się to najlepiej w przypadku większych pamięci tłumaczeń, choć dla języków fleksyjnych, takich jak polski, problemem może być mnogość form.

Konkordancja – wszystkie programy CAT oferują funkcję konkordancji: wystarczy zaznaczyć słowo lub zwrot w tekście źródłowym i użyć skrótu klawiszowego, żeby sprawdzić, jak ten tekst został wcześniej przetłumaczony. Koniec z przewijaniem przez dokumenty, dopasowując właściwe miejsca w tekście źródłowym i docelowym. Wszystkie wystąpienia w jednym oknie, z kontekstem. W ten sposób dużo łatwiej np. dopilnować, by jakiś charakterystyczny zwrot był zawsze tłumaczony tak samo… albo wręcz przeciwnie, zwiększyć różnorodność i bogactwo tekstu.

Automatyczna konkordancja – może być jeszcze lepiej: krótkie, często występujące segmenty (np. „he said” w angielskich dialogach) oraz ich tłumaczenie mogą być wyświetlane automatycznie. I znowu, można tego użyć do zapewnienia spójności lub zwiększenia różnorodności, często pożądanej w tekstach literackich.

Cytaty – czy autorowi zdarza się powtarzać jakieś fragmenty tekstu? Może ktoś przypomina sobie wcześniejszą wypowiedź? To dość częsty zabieg np. w kryminałach. Nie trzeba już szukać, jak wcześniej przetłumaczyliśmy ten tekst, program sam podpowie istniejące tłumaczenie. I nawet pokaże różnice, jeśli autor zmienił coś w cytacie (świadomie lub nie).
Komentarze – czy korzystacie z komentarzy do tekstu, żeby zapisać sobie np. coś do sprawdzenia później? Żaden problem, wystarczy użyć funkcji komentarzy i przydzielić danej notatce jedną z kilku dostępnych kategorii (np. informacja, ostrzeżenie itp.). Później można przefiltrować tekst do szybkiego wyświetlenia tych z komentarzami, a nawet wyeksportować komentarze (wszystkie lub określoną kategorię) do dokumentu docelowego.
Nic nie zginie – po zatwierdzeniu segmentu (Ctrl-Enter), jego status w edytorze tłumaczeń zmienia się na zatwierdzony, a oprócz tego zostaje zapisany w pamięci tłumaczeń (bazie danych). Niektóre programy równocześnie zapisują cały plik, w innych zapis całości odbywa się w zdefiniowanych odstępach czasowych. Nawet jeśli dojdzie do awarii systemu lub przerwy w zasilaniu, tłumaczenie jest bezpieczne – można zawsze je przywrócić z pamięci tłumaczeń. Zapewnia to dodatkowy poziom bezpieczeństwa / kopii zapasowej pracy.
Śledzenie postępów / zwiększona produktywność – program CAT pokazuje postępy pracy w czasie rzeczywistym na podstawie liczby segmentów, słów lub znaków. Wspominałem już o funkcji analizy oraz informacji o postępach, ale praca z segmentami oferuje dodatkową zaletę: można je wykorzystać do zmodyfikowanej techniki Pomodoro, w której zamiast określonego czasu, skupiamy się na tłumaczeniu do ukończenia określonej liczby segmentów. W moim przypadku dla tłumaczeń literackich trzymam się 50 – zaczynając pracę ignoruję e-maile, powiadomienia i inne rozpraszacze do czasu przetłumaczenia 50 segmentów. Wtedy robię sobie przerwę (może to być Facebook, zaparzenie herbaty albo nastawienie prania) i siadam do kolejnej porcji. Liczba segmentów może być różna, w zależności od złożoności tekstu źródłowego, ale ta technika pozwoliła mi znacząco poprawić produktywność. Co więcej, niektóre programy (np. memoQ) mają możliwość automatycznego rejestrowania czasu pracy (Options > Miscellaneous > Editing time > Reord editing time when I am working; raporty generuje się przez wybranie Project home > Overview > Reports > Editing time), co może być przydatne dla osób zainteresowanych faktyczną wydajnością swojej pracy.\\Oczywiście do stosowania techniki Pomodoro nie trzeba używać programu CAT i można skupiać się na pracy np. na jedną stronę, ale dla mnie liczba segmentów świetnie się sprawdza.
Filtrowanie na podstawie słów/zwrotów – pisałem już, że łatwo sprawdzić, jak coś zostało wcześniej przetłumaczone, ale co zrobić, gdy zmienimy zdanie i chcemy użyć innego tłumaczenia? Zawsze można użyć funkcji znajdowania i zamiany, ale w programach CAT mamy też do dyspozycji filtrowanie na podstawie tekstu źródłowego lub docelowego. Funkcja ta dostępna jest tylko w programach stosujących interfejs „tabelkowy”, jak na zamieszczonych tu zrzutach ekranowych, ale umożliwia wyświetlenie wszystkich wystąpień wyszukiwanej frazy i dostosowanie tłumaczenia w kontekście, w razie potrzeby z modyfikacją form gramatycznych reszty tekstu.
Wbudowane/opcjonalne narzędzia wyszukiwania w Internecie – nie są dostępne we wszystkich programach, ale te lepsze oferują tę funkcję. Trzeba coś sprawdzić w Google/Bing lub użyć jakiegoś słownika online? Po skonfigurowaniu wbudowanej/opcjonalnej funkcji wyszukiwania w sieci, wystarczy zaznaczyć słowo lub zwrot i użyć skrótu klawiszowego, do wyszukania tego tekstu we wszystkich zdefiniowanych serwisach równocześnie (w memoQ: Options > Default resources > Web search).
Automatyczne poprawianie dopasowań rozmytych – w tekstach literackich funkcja ta nie przydaje się często i niekoniecznie korzystam ze zmodyfikowanych w ten sposób tłumaczeń, ale czasami dopasowanie rozmyte (zdanie zapisane w TM podobne do bieżącego) może zostać automatycznie „poprawione” odpowiednio do bieżącego tekstu. I czasami można taki tekst zachować bez modyfikacji.

Dostępne są oczywiście także inne przydatne funkcje, takie jak możliwość wygenerowania dokumentu Worda z tabelą zawierającą tekst źródłowy i docelowy w sąsiednich komórkach do szybkiej weryfikacji w Wordzie, listy autokorekty, możliwość używania wcześniejszych tłumaczeń jednojęzycznych jako plików referencyjnych, funkcje kontroli jakości (technicznej, jak sprawdzanie poprawności liczb) i wiele innych.

Ograniczenia

To co, same zalety? Podobnie jak w przypadku każdego oprogramowania, są pewne ograniczenia.

Tłumaczenie zdaniami – jak już wspominałem, w programach CAT pracuje się z segmentami, które przeważnie są zdaniami. Choć bardzo łatwo jest zmienić podział tekstu na akapitowy, traci się w ten sposób sporo zalet CATów. W tłumaczeniach literackich/kreatywnych konieczne jest uwzględnienie szerszego kontekstu, zwykle przynajmniej całego akapitu (zwracam uwagę, że kontekst jest istotny również w tłumaczeniach technicznych). Segmentacja zdaniowa ułatwia skupienie się na bieżącym zdaniu, choć równocześnie trochę utrudnia myślenie o całym akapicie, zwłaszcza osobom początkującym. Przy tym to wcale nie tak, że tekst poza bieżącym segmentem jest ukryty – wręcz przeciwnie, bieżący segment jest wyróżnionym elementem interfejsu, w którym wyraźnie widoczne są poprzednie i następne segmenty, to tylko kwestia właściwego podejścia. A jeśli wymagany jest jeszcze szerszy kontekst, wystarczy spojrzeć na podgląd, którego okno można ustawić tak, by obejmowało np. całą stronę.
Podczas pracy z tekstem literackim często wymagane jest odstępowanie od źródłowego podziału na zdania – połączenie dwóch lub większej liczby krótkich zdań (segmentów) źródłowych w jedno zdanie tłumaczenia lub podzielenie długich zdań na krótsze. W praktyce to bardzo proste: podzielenie długiego zdania na krótsze wymaga po prostu wpisania dwóch zdań w jednym segmencie (choć można i użyć funkcji podzielenia segmentu źródłowego). Przerobienie kilku krótkich zdań źródła w jedno zdanie tłumaczenia może polegać na połączeniu segmentów źródłowych w jeden (jedna „komórka” tabeli), albo po prostu tłumaczeniu fragmentów w odpowiednich segmentach z użyciem właściwej interpunkcji. W gotowym pliku tłumaczenia będą widoczne jako jedno zdanie.

Jeśli zachodzi potrzeba podzielenia tekstu na osobne akapity, trzeba użyć jakiegoś symbolu zastępczego (ja używam \\), a później skorzystać z funkcji znajdowania i zamiany w Wordzie po wyeksportowaniu pliku docelowego.

Po zakończeniu tłumaczenia ostateczne sczytanie/redakcję warto zrobić w Wordzie (czy innym edytorze tekstu), nie w programie CAT, co ułatwi pracę z większymi blokami tekstu i pomoże zniwelować potencjalne „nieciągłości” wynikające z tłumaczenia zdanie po zdaniu.

Krzywa uczenia – podstawowe funkcje programu CAT można opanować w parę godzin korzystając z bezpłatnych materiałów szkoleniowych dostępnych online i odrobiny ćwiczeń, ale należy się liczyć z frustracją i stromą krzywą uczenia się, zwłaszcza jeśli to pierwsze zetknięcie z tego typu narzędziami i wymagane jest poznanie nowych pojęć oraz (mniej lub bardziej) wymuszonego przebiegu pracy. Programy oferują różny poziom złożoności i choć generalnie przebieg pracy jest wszędzie taki sam, poszczególne implementacje i wymagane czynności mogą się różnić. Zwracam uwagę, że przy pracy z tekstami literackimi zwykle nie jest wymagana znajomość wszystkich, często bardzo rozbudowanych, funkcji współczesnych programów CAT, ale niektóre ze wspomnianych przeze mnie funkcji (np. wyszukiwanie w internecie) wymagają pogrzebania w ustawieniach i skonfigurowania.
Koszt – powiedzmy bez ogródek: programy CAT to oprogramowanie profesjonalne i tak też jest wyceniane. Czy wydajność pracy z takimi programami wzrośnie na tyle, żeby uzasadnić wydanie kilkuset euro? Może, z czasem. A może nie. Można jednak zacząć od darmowego oprogramowania, takiego jak OmegaT (open source) czy działający online SmartCAT (radzę uważnie przeczytać umowę licencyjną) albo Wordfast Anywhere, spróbować programów z opłatą miesięczną, jak Memsource, czy zapoznać się z wersjami próbnymi programów czysto komercyjnych jak memoQ (mój ulubiony, z 45-dniową wersją próbną) czy SDL Trados Studio. Warto jednak pamiętać stare powiedzenie – dostajesz to, za co zapłaciłeś. Darmowe narzędzia potrafią być w zupełności wystarczające, ale płacąc za komercyjne oprogramowanie desktopowe można dostać dużo więcej.

Podsumujmy

Czy programy CAT mogą być używane w tłumaczeniach literackich/kreatywnych? Zdecydowanie. Będą pomocne? Tak, na wiele sposobów. Czy dostatecznie, by zainwestować w oprogramowanie komercyjne? Może, to zależy.

Full disclosure: nie sprzedaję programu memoQ ani nie pracuję dla firmy, ale prowadzę komercyjne szkolenia z obsługi tego programu.

Memsource files in memoQ

Wasaty — Tue, 30 Jan 2018 21:16:57 +0000

When it comes to online CAT-tools I personally consider Memsource as one of the better ones – it’s reasonably fast, offers decent functionality and usability. It’s even better with local (but not offline) client, which can undock some windows, so you can have concordance and TM matches on screen at the same time. Still, it’s not the same as memoQ when it comes to comfort and features, so whenever possible, I process Memsource files in memoQ – that is, if PM allows to download .mxliff files for work with local tool.

To do so I’m running pre-translation with relatively low threshold, download project files and open them in memoQ with another copy in Memsource for concordance and TB checks. After translation I open the files in memsource editor and use upload functionality to synchronize files with the server.
Unfortunately, the XLIFF format is a bit loose when it comes to how certain features should be implemented, so things like match rate and translation status are not imported by default and need some tinkering. I have created a template that can be used to import Memsource files in a bit more comfortable way.

There is a substantial edit concerning the template:

Originally .mxliff files were modified only before importing into memoQ, and I left cleaning up extra content to memsource editor. Now when you export finished translation, extra bits added for memoQ’s compatibility will be removed.

Additionally:

Segments confirmed in memoQ will show as confirmed in Memsource. Of course you should still update the Memsource TM.
Content of “Alt-trans” imported by default to memoQ from Memsource as comment will be ignored now. I was asked by multiple users to disable this. If you want alternative translations from Memsource, you can enable them in memoQ filter filter options.
I removed my export path rules from template settings – this was causing problems for some users. Still, it’s a good idea to edit export path rules so memoQ will overwrite original files and you won’t have to manually re-name files exported from memoQ (to remove defalut “_iso” target language addition).

Original post content (still valid):

The template work by performing automated actions: the files are edited with regex-based Find and Replace rules to insert additional attributes for match rate and segment states (regex rules are included in .xml configuration files). Subsequently customixed XLIFF filter is used, configured to recognize the introduced attributes and regex-based tagging is run to convert Memsource tags into memoQ tags.

To use the template you need to follow instructions below for import and setup. When everything is configured correctly, you can create project from template and work with Memsource files more comfortably. But always remember to back up your files in case something goes wrong. Also while the conversions and template were tested by two people without any issues (at the time of publication), I can’t predict every possible case and setup, so you are doing this at your own risk – if something goes wrong, I may tray to help, but I won’t be held responsible.

Before you start: the template will work only with memoQ 8.x and 7.8, not older. However, executable file “FindAndReplace.exe” is only installed with memoQ version 8.1 and newer. If you have older version, you need to download it separately (see below).

Template Memsource does the following:

match rate will be visible in memoQ
locked status will be kept (segments locked in memsource will be locked in memoQ)
“translated” status will be kept
segments populated with machine translation will have “MT” status in memoQ
memsource tags will be converted into memoQ tags

Preparation:

1. memoQ version 8.1 and newer: none
2. memoQ version 7.8: download and unzip executable FindAndReplace (alternatively download and install newest memoQ version, you can still use 7.8, but FindAndReplace will be installed in the default path).

Installation

Download this file: memsource_updated.zip
Unzip the content (remember where you unzipped it).
Start memoQ, open Resource console.
Select Filter configurations.
Select Import new and import: ChainedConverter#memsource-tagged.mqres
Select Project templates.
Select Import new > Memsource.mqres
The template contains hard-coded path for configuration file: C:\memoQ\FindAndReplace\memsource.xml. If you don’t want to edit the template, create this folder and put “memsource.xml” in this path. Alternatively edit the template:
Select Resource console > Templates > memsource with states > Edit > Automated actions > Script before import and edit path in Command line arguments field, then click Update.
You can now create template project and import Memsource files.

Project creation

From the main memoQ screen select New Project.
Create new project from template dialog will be displayed.
Add documents or Add folder structure, then click Next.
Select Memsource from Project-template drop-down select languages and fill metadata fields.memoQ will import files applying filter settings. Tagging mechanism will create empty (single) tags for numbers enclosed in curly brackets ( {1} ), opening (left) tags for any content starting with left curly bracket and closed by “greater than” sign (e.g. {1> or {i> ) and closing (right) tags for any content starting with “less than” sign closed by right curly bracket (e.g. <1} or

Please note that the project won’t have any TMs or TBs and only default resources will be attached, so you need to add TM and other resources manually using relevant Project home cards (Translation memories, Term bases, etc.).

PRO tip: File extension “mxliff” is not recognized by memoQ, so when you are adding files to an existing memoQ project, you need to “Show all“. But when you create a project based on this template you can skip “Add files” step (just don’t add any files in the step 2 above) and when the template-based project is created, simply drag and drop mxliff files into “Translations” memoQ window. Files will be recognized and correct filter will be applied automatically.

Troubleshooting

If instead of importing files into project Document import options dialog is displayed with red exclamation mark to the left of the file name, it means filter configuration is not properly installed or recognized. Repeat steps 3-5 of the Instalation section and try again.

If that doesn’t help, or when importing filter you’ll get a message “A filter configuration with the same name already exists in this location“, you can try the following steps:

Open Resource console and go to Filter configurations section.
Click memsource-tagged filter and select Properties command.
Rename the filter, e.g. to memsource-tagged2.
Still in the Resource console go to Project templates section.
Select Memsource and click Edit.
Click Settings > Language-independent resources.
In the Filter configuration section select new name entered in step 3 (e.g. “memsource-tagged2”) from the Filter configuration drop-down.
Click OK and create new template-based project.

If the above procedure also fails, you can try contacting me, but before you do, please make sure all your paths and file names are correct and you followed all the steps described here.

To update already installed template you need to:

Use Resouce console to delete existing filter configuration and project template and install updated version
OR
Use Resouce console to rename existing filter configuration and project template and install updated version
OR
Import updated filter configuration with new name, import updated project template with new name and modify it to include updated filter configuration name.

Please note: the templates reference FindAndReplace.exe file, which should be available at C:\Program Files (x86)\Kilgray\FindReplace Tool folder (for memoQ 8.1 and up). If you have non-standard memoQ installation path, you need to edit exe configuration in template settings (see below). If you have memoQ version older than 8.1, you need to download the file separately and edit the path in template settings:

Select Resource console > Templates > memsource with states > Edit > Automated actions > Script before import > Select findandreplace.exe and click Delete.
Click Add files…, browse to and select FindAndReplace.exe.
Click Update.

Template can be further customized with your default languages, TMs, light resources etc.

Additional help for templates with find and replace scripts can be found here: https://help.memoq.com/8-3/en/index.html?edit-template-find-and-replace.html
Template configuration file (actual find and replace commands) is commented and you can customize it any way you like.

Terminology mining from Eur-Lex corpora

Wasaty — Sat, 29 Oct 2016 15:29:11 +0000

The conference #TranslatingEurope Forum 2016 was held on October 27-28 in Brussels. Besides presentations, the conference included two sessions of mini-workshops: the idea was to present some practical aspect of translation in small group – three people at a time. The workshops included my submission on terminology mining. I realized from the start that participants won’t be able to remember anything from 15-minute session with three different software tools, I have prepared handouts with some background information I had no time to introduce and description of all steps presented in the procedure. Since the workshop was quite popular (all groups were far larger than 3), I’m publishing somewhat extended version of the handout, which is available here: Terminology_mining.pdf.

You can use the text freely, but if you use it for any derivative work, please credit the author.

Tłumaczenia kart charakterystyk – przydatne materiały

Wasaty — Sun, 13 Mar 2016 08:33:15 +0000

Zamieszczone tu materiały stanowią uzupełnienie mojej prezentacji na Konferencji Tłumaczy z dnia 13 marca 2015 r. i w dużej części stanowią przywołanie materiałów zamieszczanych na blogu już wcześniej. Dla zainteresowanych dostępna jest też prezentacja w formacie PowerPoint z pełnym tekstem wykładu.

Serwis Eur-lex: akty prawne Unii Europejskiej. W pisać słowa kluczowe w polu wyszukiwania lub wpisać rok i numer szukanego dokumentu (np. 1907/2006).
Agencja ECHA: poradniki dotyczące sporządzania dokumentów związanych z kartami charakterystyk i przykładowe dokumenty. Część dostępna tylko po angielsku, część również w innych językach, również po polsku.
Baza terminologii ECHA: ponad tysiąc trzysta terminów z legislacji dotyczących chemikaliów, z definicjami i źródłami. Możliwość pobrania całości lub wybranych działów i języków.
Baza terminologii IATE: Interactive Terminology for Europe, terminologia Unii Europejskiej z całego zakresu dziedzin, również związanych z REACH i CLP.
Linguee: serwis oferujący wyszukiwanie w korpusach wielojęzycznych. Dzięki dużej bazie dopasowanych dokumentów świetne miejsce wyjścia do wyszukiwania terminologii/tłumaczeń dokumentów unijnych – ale zawsze należy sprawdzać, skąd pochodzi znalezione tłumaczenie.
Glosariusz akronimów: opracowany i utrzymywany przeze mnie glosariusz akronimów spotykanych w kartach charakterystyk. Korzystanie na własną odpowiedzialność.
Zestawienie zwrotów H i P: w artykule dostępne łącza do zestawienia wszystkich zwrotów w oryginalnej postaci dla wszystkich oficjalnych języków UE oraz rozbudowane zestawienie dla pary angielski-polski.
Zestawienie zwrotów R i S: formalnie nieaktualne, w tłumaczeniach będą występować jeszcze przez wiele lat.
Zestawienie nazw ONZ substancji niebezpiecznych z ustaw ADR/ADN/RID, z roku 2015: arkusz pobrany w listopadzie 2015 r. ze stron Ministerstwa Infrastruktury i Rozwoju, Departament Transportu Drogowego. Korzystanie na własną odpowiedzialność.

Tutaj dostępna jest prezentacja na temat tłumaczeń chemicznych, którą przedstawiłem na konferencji.

Crash course in regular expressions

Wasaty — Thu, 10 Mar 2016 16:49:10 +0000

I was invited to write a guest post on regular expressions in FrameMaker at Adobe blog platform. While it is written with FrameMaker in mind, I think the text can help anyone grasp basics of regex for any applications, including translation tools like memoQ or Trados Studio.

memoQ auto-translatables for numbers

Wasaty — Sun, 15 Nov 2015 12:24:23 +0000

memoQ is an excellent CAT/TEnT tool which offers plenty of advanced productivity features out of the box, but with little tinkering can be made even more useful. One of the advanced memoQ features is Auto-translatables: using regular expressions you can define how certain source text should be modified in target language. The rules can be quite advanced, e.g. for date format conversion from English “Monday, January 3rd, 2106” to Polish “Poniedziałek, 3 stycznia 2016 r.”. One of the most frequent uses for auto-translatables is the conversion of numeric strings according to target language rules. It used to be the most basic auto-translatable application – it’s not needed so much now, since memoQ offers automatic, out-of-box number format conversion: if your source segment contains numbers formatted according to source language rules, just press Ctrl and from the displayed list select value formatted according to target language rules. The thing is, source format not always follows the rules (defined in Microsoft libraries used by memoQ).

Sometimes it’s still more convenient to use auto-translatables for format conversion. memoQ comes with some rules defined for the following target languages: English, French, German, Hungarian and Swiss, so in theory with some basic regular expressions (regexp for short) knowledge you should be able to modify them if they don’t work exactly as needed. Unfortunately, the rules shipped with memoQ have two drawbacks:

they are “one size fits all” with regard to source number format, so some trade-offs were necessary. And there are specific cases where they fail,
they are quite messy, not easy to interpret and modify.

Since the subject of auto-translatable rules for numbers is regularly raised on memoQ mailing list, I decided to share the simple, but rather robust rules created for for English to Polish format conversion that are easy do customize according to any source/target language combination. There is also a second, more complex set of En-Pl rules that I actually use in my daily work.

If you want to modify the rules, I strongly suggest to edit them using a text editor before importing into memoQ. Unfortunately the built-in memoQ editor for auto-translation rules is not very convenient, because due to its fixed size, you can see only small part of longer expressions, plus rules are re-ordered every time you change one of them. I suggest using Notepad++ for editing, but any text editor will do. If you do use Notepad++, after opening the file click Language menu and select XML – this will make editing much easier, because editor will show elements with different colors (see screenshot below). The rules are numbered, but you won’t see the numbers (or comments) once you import the file into memoQ.

Auto-translatable rules for numbers format conversion opened as .mqres file in Notepad++ with XML syntax clolring

First some background information:

The file is designed to convert English numbers into Polish formats – that means comma (,) as thousands separator and period (.) as decimal separator for source and (non-breaking)space ( ) as thousands separator and comma (,) as decimal separator for target. Several examples:

English	Polish
123	123
12,345	12 345
123.45	123,45
12,345.67	12 345,67

English financial documents sometimes use space instead of comma for thousands separator. The rules won’t work in such case, they would require modification (see near the end of post). To explain how this work, I’ll describe the rule number 2 (screenshot above).

(?<!\d,|\d\.|\d)([-–]?\d{1,3}),(\d{3})(?!,\d|\.\d|\d)

$1 $2

Match rule:

(?<!\d,|\d\.|\d) – this part says: do not match, if before the numbers there is a digit (\d) followed by comma (,) or (I) number followed by period (\.) or number. The “<” is actually way of encoding “<”, and that’s what you will see once you import the rules into memoQ.
([-–]?\d{1,3}),(\d{3}) – this part says: match if one, two or three digits (\d{1,3}, first numbered group), comma (,), and exactly three digits (\d{3}, second numbered group). Optionally there can be a dash or minus sign at the beginning ([-–]?), if present, it will be part of the first group.
(?!,\d|\.\d|\d) – this part says: do not match, if after the point 2. there is a comma and digit (,\d) or period and digit (\.\d) or digit (\d).

This will match: 1,234 or 12,345 or 123,456

Replace rule:

$1 $2 – this means get the write of the first group (point 2. above), then non-breaking space, then content of the second group (point. 2 above).

The conversion result: 1 234 or 12 345 or 123 456

As you can see, the actual number matching is done by rule in point 2, so what are points 1 and 3 for (blue text)? They are called “assertions” and they are there to limit matching to only the group you want. Let’s examine what would happen, if we’d write two following rules to match thousands in two ranges:

([-–]?\d{1,3}),(\d{3}) for numbers xxx,xxx

([-–]?\d{1,3}),(\d{3}),(\d{3}) for numbers xxx,xxx,xxx

Let’s use them on actual segment with numbers:

As you can see, the second rule matches as it should, but the first one matches too, because the rule matches also to a part of longer number. And that’s why I have used the assertions – to limit matching. And to exclude matching of longer numbers, one must exclude combination of the symbols appearing in the numbers formatting with numbers. That’s why in the “before” part I have excluded “\d,” (digit followed by comma), “\d\.” (digit followed by period) and “\d” (just number), and in the “after” part the same in reverse order: “,\d” (comma followed by digit), “\.\d” (period followed by digit) and “\d” (just digit).

So the important thing to remember is that if you want to modify the rules to different system, you need to replace current thousands separator and decimal symbol in both matching and assertions parts.

Before we proceed to actual modifications, one more important explanation:

Source (matchRule) and target (replaceRule) are interpreted differently when it comes to symbols. In regular expressions period (.) has a special meaning (“any character”), so if we want to match exactly period, we need to use (\.). However on the replacement side everything except “$number” is treated literally, so if you want to use period, just write a period. Similarly, while “\s” means “space” in the matching part, to get a space on the replacement side just type a space. Or paste non-breaking space from any program (e.g. Word or memoQ translation editor).

Now I will show you how to modify the rules for different source/target combination using Swiss as the source language format: apostrophe (’) as thousands separator, period (.) as decimal separator and Norwegian as target format: period (.) as thousands separator, comma (,) as decimal separator. Examples:

Swiss	Norwegian
123	123
1’234	1.234
12’345	12.345
123.45	123,45
12’345.67	12.345,67

Of course start by downloading the file, then open it in text editor.

Let’s try to modify rule number 2:

(?<!\d,|\d\.|\d)([-–]?\d{1,3}),(\d{3})(?!,\d|\.\d|\d)

$1 $2

We need to replace comma (,) with apostrophe (’) in the source part:

(?<!\d’|\d\.|\d)([-–]?\d{1,3})’(\d{3})(?!’\d|\.\d|\d)

And replace space with period in the target part:

$1.$2

That’s it. I have replaced comma with an apostrophe, but I didn’t touch the period, because it’s also used in this source system, just in a different role. To cover everything, let’s try the rule number 6, with both number grouping symbol and decimal separator:

(?<!\d,|\d\.|\d)([-–]?\d{1,3}),(\d{3})\.(\d+)(?!,\d|\.\d|\d)

$1 $2,$3

We need to change it to:

(?<!\d’|\d\.|\d)([-–]?\d{1,3})’(\d{3})\.(\d+)(?!’\d|\.\d|\d)

$1,$2.$3

This kind of changes needs to be done for every rule in the document, both for source (matchRule) and target (replaceRule). I advise not to use “Find and replace” for rules editing (especially not “replace all”), because it’s easy to mess up something by mistake.

And since there are two types of apostrophes (straight and curly: ‘ and ’), to make the rules more foolproof you may want to create two sets of auto-translatables – one for each kind and use both in your projects (if you prefer to keep regexes simple). Or you may try to use one set of rules making it a bit more complex:

(?<!\d’|\d'|\d\.|\d)([-–]?\d{1,3})(?:’|')(\d{3})\.(\d+)(?!'\d|’\d|\.\d|\d)

Now there are separate conditions in assertions and alternative matching condition for separator: (?:’|’ ). The use of “?:” makes this group non-numbered, so this not affects target groups.

And one last example for space as thousands separator instead of comma:

(?<!\d\s|\d\.|\d)([-–]?\d{1,3})\s(\d{3})(?!\s\d|\.\d|\d)

Once you finish your edits, save the file (it has to be plain text with .mqres extension and UTF-8 encoding and simply import to memoQ (Resource console > Auto-translatables > Import). If memoQ complains:

during import – it means that XML structure is broken. Make sure all the <> parts are intact and that for every opening tag () there is exactly one closing tag of the same type (),
when adding rules to project – it means there is some error in the regular expressions part. When you click “More” button on the error notification dialog memoQ will show first offending line. Make sure all brackets, parentheses and curly brackets are paired and the syntax is correct.

I also encourage you to check the auto-translatable set I use for numbers conversion – there are additional rules for matching telephone numbers (rules 11 and 12), proper recognition and conversion of percentage values (15, 16) and temperatures (°C and °F, 13 and 14). Plus rules for special case of numbers between 1,000 and 9,999, where thousands separator is not used in Polish (so 1000 and 9999 respectively).

* * *

An additional note regarding ease of rules editing. memoQ offers excellent mechanism of #lists# and in theory it should be easy to create rule set where all you have to do to modify rules according to your source rules would be to change content of #thousands_separator# and #decimal_place# lists. I tried, that was my original idea behind this post. Unfortunately it’s not possible, because you can’t use lists in assertions (well, you can, but the results are not what one would expect).

Importing Studio TM and TB into memoQ

Wasaty — Sat, 07 Nov 2015 15:13:34 +0000

memoQ is great tool with many interoperability features, including easy SDLXLIFF file import/export, ability to import Studio packages and generating Studio return packages. And while packages are imported with translation memories and term bases, strangely the ability to import stand-alone Studio TMs (SDLTM) and TBs (SDLTB) is nowhere to be found in the UI. And this can be a seroius problem if a client sends you SDLXLIFF and SDLTM files instead of proper package (happens often enough). And while there is a solution to this problem, I’m offering a relatively simple alternative.

Since memoQ can import both TM and TB native formats as parts of Studio package, all we need is to add the files we need into a proper Studio package and import it into memoQ. Using a simple package as a starting point I’ve created a bogus Studio package you can use to import the files you need. Here’s how:

Download this file.
Unpack its content to an empty folder (e.g. “Package”)
Go to TM folder.
Copy the TM file you need to import (e.g. “My_Studio_TM”) into this folder (optional: you can delete “Example_TM.sdltm” file)
Browse one level up, to folder “Package”.
Go to Termbases folder.
Copy the TB file you need to import (e.g. “My_Studio_TB”) into this folder (optional: you can delete “Example_TB.sdltb” file).
Browse one level up, to folder “Package”.
Open “Import_wrapper.sdlproj” with text editor (right click the file, select “Open with” and choose Notepad or any other text editor).
Optionally you can rename the file, adding extention .txt, which will help with file editing.
Find string “Example_TM.sdltm” and replace it with the name of the TM you want to import (e.g. “My_Studio_TM.sdltm”).
Find string “Example_TB.sdltb” and replace it with the name of the TB you want to import (e.g. “My_Studio_TB.sdltb”).
Find string “Example TB” and replace it with the name of your TB (e.g. “My Studio TB”).
Replace all occurrences of string “en-GB” with the code for your source language (e.g. “de-DE” for German-Germany).
Replace all ocurrences of string “pl-PL” with the code for your target language (e.g. “fr-FR” for French-France).
If you do import term base, remember to replace index languages (English, Polish) with your source and target languages.
Save edited “Import_wrapper.sdlproj” file. If you changed the extension in step 8, remember to re-name it back to .sdlproj.
Create a ZIP archive of the folder (e.g. “Package”) content.
Rename the archive by changing .zip extension into .sdlppx
Import content to memoQ with “Import package” command. memoQ will notify you during import that there are no files to translate, but TM and/or TB will be imported and can be used in different memoQ projects.

Please note that since Import_wrapper.sdlppx file does not contain any files for translation, it can’t be imported into Studio. And if you have problems with changing file extensions, please see here.