Sep 27

Dealing with Sajan TMS exports

One of my end clients recently introduced Sajan TM management system. As a result, instead of Trados TMs and TTX files which were a standard previously, I began to receive jobs as an Excel files, containing source segments and pre-translation (from 80% matches up) in separate columns. Additionally there’s a column with TM match percentage and some additional info/comments. And the Excel has a macro for propagating exact repetitions. Great. My task is to translate what’s missing and make sure the 100% matches strictly follow a mandatory glossary (in a separate Excel file). The general attitude was “we have a great new system, deal with it”. So I did. And since I’m sure I’m not the only one, I decided to share my solution.

Excel is a great tool for spreadsheets, but it really sucks for text editing. And checking for correct translation of glossary terms in Excel would take an eternity. Also formatting tags are just text, easy to miss or damage. It was obvious that I have to find a workaround for working with these files in a decent CAT environment. Fortunately the task came out to be quite simple and preparation takes just about 5-15 minutes, depending on number of sheets – because every chapter of a file for translation is inserted into a separateExcel sheet.

Take a look at the Sajan-exported file (click for full size):

If only 100% matches were pre-translated, I would just copy source and translation into a separate file, create a TM using this tool, then copied source column into target and imported selected columns for translation. However, with non-100% matches the task was more complicated… or so I thought. Eventually I found an easier and less time-consuming solution. The procedure was designed to import text into memoQ, but should work with any tool accepting Trados bilingual DOC format.

The procedure:

  1. Create a backup copy of the source file.
  2. Select and copy to new Excel file columns Match %, Basis, Source and Target from all Excel sheets.
  3. Remove Basis column.
  4. Move column Match % between Source and Target (see picture below).
  5. Select all three columns and copy them (Ctrl-C).
  6. Open Word.
  7. Click on a triangle below Paste icon and select Paste special.
  8. Select Non-formatted Unicode text. After pasting, turn on the option to display hidden characters.
  9. Insert an empty line before pasted text by pressing Enter at the beginning of the first line.
  10. Press Ctrl-H to open Search & Replace dialog.
  11. Click More to display additional dialog options.
  12. Select Use wildcards check box.
  13. In the Find field enter: ^9([0-9]*)^9
  14. In the Replace with field enter: <}\1{>
  15. With cursor in the Replace with field click Format > Style and select tw4winMark.
    If you don’t have this style on a style list, open any bilingual Trados .doc or .rtf file and copy a line or two at the beginning of your file. Then try again – tw4winMark style will be on styles list.
  16. Click Replace all.
  17. In the Find field enter: ^13
  18. In the Replace with field enter: <0}^13{0>
    The field must still have tw4winMark style.
  19. Click Replace all.
  20. Remove empty first line and any additional text pasted in step 15 (if any).
  21. Save the file as Word 97-2003 (.doc) or .rtf file.
  22. Start memoQ (or use other CAT tool).
  23. Create a project in memoQ, add document saved in step 22.
  24. Open the file for translation, then click Format > Run regex tagger.
  25. In the Regular expression field enter: {.*?}
  26. Click Add.
  27. Click Run tagger now (click OK in the warning dialog).
  28. Translate/review, importing Excel glossary into a project term base.
    If no review of 100% matches is needed, you can sort the file by match percentage, select all 100% matches and confirm them in one go by pressing Ctrl-Enter.
  29. When finished, open original Excel file again.
  30. For all sheets select Source column, copy to Target column. Save.
  31. Open the Excel file in memoQ importing only target columns for each sheet (Add document as).
  32. Pre-translate file, perform QA.
  33. Export finished file.

Of course the procedure creates some additional overhead, but at the same time it makes it possible to work with received files in a decent environment with all the features of a modern CAT tool, including QA on tags. Of course you can also save the file as XLIFF to perform an additional QA in an external tool, like Xbench or CheckMate.

Leave a Reply

Your email address will not be published.