Jun 05

MemoQ automation – auto-translatables

MemoQ is a fast, easy to use and very powerful CAT tool. However, in the default configuration it does have two features I don’t like: the QA module often gives warnings concerning numbers (“invalid number format in source/target prevents strict matching”), and the software does not allow to copy numerical values from source segments by keyboard shortcuts, like in Trados. Both these problems can be addressed by simple operations: turning on the auto-translateables and modifying the QA rules.

Let’s start with turning on “automatic” insertion of numbers. The feature works by detecting numerical values in the source text, and if they are in a proper format of given language, offering the same numerical value formatted for target language on the translation results pane. Why the format change? Because different languages use different formatting rules for numbers. For example “10,189.53” in English should be written as “10 189,53” in Polish. The modified values can be inserted exactly the same way as other elements from the translation results pane, e.g. by double clicking or entering Ctrl-hit number from the list.

The procedure of enabling this function will look differently if we want to turn on auto-translatables for current project (and only for this one), or in case we would like to turn them for all new projects.

Let’s start with enabling automatic number “translation” for current project — say to check, how this works.

  1. Open/create a project.
  2. In the Settings panel of the current project click Auto-translation rules. The list of available national standards will be displayed.
  3. Choose your target language (in case of Polish one can use Hungarian instead). The name of active rule set is displayed with bold font, exactly as for other resources. Below you can see the example of the numerical values recognized by default auto-translation rules.

    Example of numerical values recognized by default auto-translation rules. The numbers separated by hyphens are recognized individually.

This will definitely simplify the translation of texts with lots of numbers, especially if they are in “standard” format. On the other hand, this will not help much if the text we are dealing with contains a lot of non-standard numbers, e.g. catalog numbers where numbers are grouped with letters, numbers separated by hyphens or references to sub-points of some documentation or legal texts. As you can see above, not all of them will be recognized.

Fortunately, auto-translation rules can be easily defined using regular expressions. To make my life easier, I’ve defined myself the group of four rules for recognizing non-standard numbers:

  1. Letters and numbers combined (e.g. catalog numbers). Example: N665, Alpha9beta
  2. Combined text and numbers separated by hyphen (unlimited number groups, e.g. catalog numbers, regulations). Example: IEC60001-11-1, CAP1283-34
  3. Groups of numbers separated by at least two dots (e.g. software version numbers, sub-subparagraph numbers, legal texts). Example: 654.123.456, 84.12.79.1
  4. Groups of numbers separated by at least two hyphens (e.g. catalog numbers). Example:6-234-957, 543-21-8
  5. ADDED: Rules for recognizing email addresses and URLs (http and ftp). If you need to localize the URLs, just remove the rules starting with “http”.

To use my rules you have to download this file and in auto-translation settings (see point 2) use Import new command and select the downloaded file (Non-standard numbers.mqres). Then select it for the current project. Examples of recognized number groups are shown on the picture below.

Numbers groups recognized by „Non-standard numbers” rule set.

The numbers with one dot are not recognized to avoid a conflict with standard rules, and groups with only one hyphen are not recognized because they usually (but not always) have to be treated separately in Polish (ie. 45-69 should be formatted as 45 – 69). If we activate both rules groups (standard for target language and my non-standard numbers), the result for my example file will look like this:

Active rules "Numbers-PolishGroup" and "Non-standard numbers".

If the auto-translation rules are meeting your expectations, you can activate them for all newly created projects. To do so:

  1. In the Tools menu select Options. The Default resources is the first active card.
  2. From the list on the Default resources pane click Auto-translation rules. Select rules, you want to activate for new projects. In my case this is Numbers-PolishGroup and (optional) Non-standard numbers.
  3. Click OK.

It is worth knowing that auto-translatables can be used for many other things besides numbers. For example, if there are HTML tags in the Word or Excel file, we can detect them in the source segments and insert them by keyboard shortcut. In this case you have to use rules editor: enter “(\<.+?\>)” in the Auto-translation rules and “$1” in the Replace order rules field. Similar rules can be created e.g. for consistently marked text strings which are not to be translated. Auto-translation rules can be used also for automatic unit conversion – this subject is described in the memoQ help file.

Now, the use of auto-translation rules can simplify our job, but there are still “non-standard number format” alerts we have to deal with. How to remove them/reduce their number?

For the current project:

  1. In the project Settings panel select QA settings. Since the default settings cannot be modified, you have to clone them using Clone/use new command in the lower right part of the window. Enter the meaningful name, e.g. Default-corrected numbers.
  2. Select new QA rules and click Edit command.
  3. In the Edit QA settings window select Check auto-translatables option.

  4. Click the Numbers tab of the Edit QA settings window.
  5. Deselect all checking options. EDIT: Deselect all checkmarks except the one for Verify that numbers are matched on the target side (leave that one checked, see Denis Hay comment below post).

With these settings memoQ will no longer verify the source/target numbers using internal rules over which we don’t have any control, but auto-translatables verification won’t let us miss any incorrect number in the target text. Depending on the type of text and target language rules, we can drastically reduce or even eliminate the “unknown format” warnings. Some will still be there, though – if there will be e.g. “point 6.8” or “version 12.1”, we’ll still have warnings there (at least for Polish, where the dot should be changed to comma).

To apply the above change in QA settings to all new projects:

  1. From the Tools menu select Options. The Default resources is the first active card.
  2. From the list on the Default resources pane click QA settings.
  3. Select modified QA settings (e.g. Default-corrected numbers).
  4. Click OK.

The modified rules will be active as defaults for all new memoQ projects.

EDIT:

Added rules for recognizing email addresses and URLs (http and ftp). If you need to localize the URLs, just remove the rules starting with “http”.

 

6 comments

Skip to comment form

  1. Nice article!

    Just as a comment on numbers QA, I wouldn’t uncheck ALL checks and still leave “Verify that numbers are matched on the target side”.

    At least, the “quantities” are checked, regardless of their format. This is a useful check. The ones often producing too much “noise” in QA results are the strict interpretation on source/target side, as well as the check on numbers format.

    As for HTML and other similar tags, memoQ 5.0 will be able to deal with them properly, there will be no need to “overload” your translations results pane with too many auto-translatables offered.

    But thanks for the article! It is refreshing to see people investigating into advanced productivity boosters such as auto-translatables.

  2. Thanks for the advice – I’ve checked, it works like charm.
    And I can’t wait to get the 5.0 version, the promised feature list is very impressive.

    • Monika on 2011/06/23 at 15:04
    • Reply

    “1.Open/create a project.
    2.In the Settings panel of the current project click Auto-translation rules. The list of available national standards will be displayed.
    3.Choose your target language (in case of Polish one can use Hungarian instead). The name of active rule set is displayed with bold font, exactly as for other resources. Below you can see the example of the numerical values recognized by default auto-translation rules. ”

    Hi Marek,
    I’ve got a Polish target file, but when I open Settings and Autotranslation rules I have 5 language options but not Polish.
    What do I do then?

    Regards

    Monika

  3. Use Hungarian rules. See the Polish version of the article.

  4. Hi,
    Thank you very much for that article and also the settings file. It is exactly what I was looking for.

    Have a very nice day.

    Viszlat.

    Gabriel

  5. Hi Pawel,
    I haven’t noticed that was your site. I attended to your presentation at the TriKonf last year. That was very interesting.
    Hope to see you again soon.

    Gabriel

Leave a Reply

Your email address will not be published.