Feb 27

How to export multilingual MultiTerm glossary

MultiTerm_logoMultiTerm is one of the most powerful data terminology management tools on the market. Unfortunately, it is very far from being user friendly. MultiTerm is very flexible, but the logic behind the UI is… twisted (do you know H.P. Lovecraft works?). Anyway, while it is quite simple to export bilingual termbase via Tab-delimited export definition to CSV file, exporting a multilingual TB is another mater entirely. I’m going to detail a process of exporting MultiTerm termbase/glossary with metadata.

When dealing with MultiTerm (MT) termbase (TB), one can encounter four possible TB states:

  1. Bilingual TB. Export is pretty straightforward: just go to Catalog tab, click Export node in the Catalog pane, then right-click “Tab-delimited export definition“, choose Process (how intuitive) and use the Wizard. You’ll get a CSV file with terms and all metadata you can easily filter off using Excel or Calc.
  2. Multilingual TB with translations of all terms in all languages. Export should work exactly like in point 1. However, in case it does not (languages may be mixed up in export), see point 3.
  3. Multilingual TB with all terms in one language and mixed coverage in others (example: all terms in English, but some are translated only to French, some to German, others to Spanish and some to all of them). You can export all languages with metadata in pairs (base language-any other) and then combine them into single multilingual file (Excel, CSV) and import to other tool (e.g. memoQ) as multilingual glossary.
  4. Multilingual TB with no single language with all terms. You can export all language pairs with metadata and import them individually to other tools as bilingual TBs.

OK but how to export terms from a multilingual TB? Unfortunately MT exports the data to CSV without headers and it does not fill “empty” languages, so if you do not have all languages in your TB, such exported file is useless. I haven’t checked this but it’s quite possible that languages may be mixed also in case 2 above. What you have to do, is define an export filter and export only two languages at a time. For cases 2 and 3, they can be easily combined later into single multilingual excel file. For case 4 unfortunately not (at least, not easily).

Now the actual exporting. To illustrate the process I’ve created a TB with 5 languages (EN, DE, PL, FR, ES). All terms are available in English, other not. In each language there’s one term with synonym, just to show how to deal with them. Additionally, there are two entry-level metadata fields which will be exported: Source and Note. The goal is to export multilingual TB from TM and import it to memoQ, preserving metadata from Source and Note fields, but you can use the procedure to prepare files for other tools.

This is the content of my termbase:

01_starting_point

Lets’s start the conversion process. You have to start by defining the export filter for the first language pair — in this case, EN-DE:

  1. Click Catalog tab of the MultiTerm window, then click Export node of the Catalog pane.
  2. Right-click within the right-side pane of the MT window and choose Create from the context menu (see picture below).
    03_Create_filterThe Export Wizard window will open. Click Next on the first page.
  3. Export Wizard step 1: enter the name of the filter — I suggest to use the language pair, in this case EN-DE. Click Next.
  4. Enter (choose) the path and the name of the export file. Add .txt or .csv extension. Log file name field will be filled automatically. Click Next.
  5. Export Wizard, step 3: choose Custom export. Click Next.
  6. Export Wizard, step4: Field selection. Right-click on the “Entry level” entry in the “Export definition” field, and choose entry level files you want to include in your export. See picture below.
    04_Adding_fields
    In this case these are Entry number, Note, Source, EN and DE. If you want to add any term level fields, like Synonym, right click on the term and add desired field. You can re-arrange fields using mouse.  

    WARNING: there are two entry-level “Source” fields. One is for the source language selected for the current TB view in MT, the other is actual “source” metadata field. Choose the right one. If you add any unnecessary fields, you can remove them by clicking the field and pressing Delete key on keyboard. The finished export definition structure should look like on picture below (of course, more fields can be added).

    WARNING: If you add a metadata field (entry or term level) which is not present for all terms, export result file will be worthless.

    05_Structure_of_export_fileWhen finished, click Next.

  7. Export Wizard step 5 – General options. This is one of the most crucial steps. You have to define what will be actually exported and the separators for the exported data.Before I’ll go into details, it is important to understand the reasons for the choices made for these fields. I’m assuming the MT data will be imported into memoQ. While MT can store virtually unlimited number of entry level, free-form metadata fields, at this point memoQ offers only one such field (Note), plus several predefined ones (like Project, Domain, etc.). So, we’ll import the names and contents of several fields (in this case Note and Source) into just one – Note). It is possible to import also term level data (in memoQ: Usage field), but if there are several synonyms of the term, the term level data will be imported only for the last synonym on the list.Keeping in mind the above:
    Click Entry level node. Make sure that “Export field label” checkbox is NOT checked. Enter “n” in the “After structure” field (this is a symbol for the new line).
    Click Entry number node. Make sure that “Export field label” checkbox is NOT checked. Enter “t” in the “After structure” field (this is a symbol for tabulator).
    Click Note node. “Export field label” should be checked. In the “After field label” field enter “: ” (colon and space). In the “After field contents:” field enter “; ” (semicolon and space).
    Click Source node. “Export field label” should be checked. In the “After field label” field enter “: ” (colon and space). In the “After field contents:” field enter “t”.
    Click EN node. Make sure that “Export field label” checkbox is NOT checked. Enter “t” in the “After structure” field.
    Click Term node. Make sure that “Export field label” checkbox is NOT checked.
    Click Synonym node. Make sure that “Export field label” checkbox is NOT checked. In the “Before field contents:” field enter “;” (or any other, unique symbol for separating synonyms (like @ or #).
    Click DE node. Make sure that “Export field label” checkbox is NOT checked.
    Click Term node. Make sure that “Export field label” checkbox is NOT checked.
    Click Synonym node. Make sure that “Export field label” checkbox is NOT checked. In the “Before field contents:” field enter “;” (or any other, unique symbol for separating synonyms (like @ or #).
    Click Next

    (This will create file with the following structure: Entry number, tab, Text “Note: ” and note content, then semicolon, text “Source: ” and source content, tab, entries for the EN language with synonyms separated by semicolon, then tab and entries for the DE language with synonyms separated by semicolon. After the last synonym a new line symbol will be entered.)

  8. In the “File header” field enter “Entry_IDtNotetENtDEn”. This will be the header for the file with column names (first column contains entry number, second combined Note and Source, third EN term and fourth DE term, then new line symbol). You can ommit this step, but you’ll have to add the header later on manually in Excel. Click Next, then Finish.
  9. Right-click newly created filter and choose Process from the context menu. Follow the wizard. The export for the first language pair will be created.
  10. Right-click the newly created filter (e.g. EN-DE) and choose Duplicate. Right-click copy of the filter and choose Edit. Follow the wizard repeating steps above, but remove second language (e.g. DE) and add another one (leave EN without changes) and remember to use exactly the same settings for new language (e.g. PL) as for the one you removed.
  11. Repeat for all additional languages in your TB.

After exporting all language pairs to CSV files it is time to process them in Excel.

  1. Start Excel. From the Data ribbon/menu choose Data -> From text. Browse to and select first of the language pairs export. Excel should automatically chose correct separation method (Tab).  Import the file (choose UTF-8 for Encoding). The file should look somewhat like this:
    07_Exported_language_pair
  2. Repeat for all exported language pairs. Copy non-English language columns to the first Excel sheet. This way you’ll get Excel sheet with all languages present in TM TB. After copying all languages to one sheet move the “Note” column in such way, that it will be the last data column in the Excel sheet (e.g.: Entry_ID, EN, DE, PL, FR, ES, Note). When importing to memoQ actual terms must be imported before additional data. Save the file as “Text unicode”.

    OPTIONAL
    If you want to generate files for each language pair:

  3. Select all columns containing data, click Data -> Sort and choose “DE” for sorting column. This will reorder data, grouping all rows with DE translation:
    08_Exported_language_pair_sorted
  4. Remove rows without “DE” translation and reorder columns like on the picture below:
    08_Exported_reordered_language_pairSave the file as text unicode. If necessary, repeat for all language pairs.

Now the final step: importing to memoQ.

  1. Create memoQ termbase with all languages you need: in memoQ click Resource console icon, then Termbase tab and choose Create new. Select appropriate languages.
  2. Click Import from CSV/TMX. Select appropriate settings (see picture below). If you have synonyms, remember to select “Split alternatives in field by” and enter correct separator (I suggested “;”). If fields have correct headers, memoQ should map them automatically, if not, map columns to appropriate fields using drop-down menus in the center of the dialog.

09_ImportingAnd… that’s all.

BTW, if you are not sure if your termbase contains all translations for one language, there’s a relatively simple way to check this. Just contact me.

Leave a Reply

Your email address will not be published.