{"id":770,"date":"2014-06-22T14:18:24","date_gmt":"2014-06-22T13:18:24","guid":{"rendered":"http:\/\/wasaty.pl\/blog\/?p=770"},"modified":"2014-06-22T15:45:48","modified_gmt":"2014-06-22T14:45:48","slug":"xml-translation-part-1-introduction","status":"publish","type":"post","link":"http:\/\/wasaty.pl\/blog\/2014\/06\/22\/xml-translation-part-1-introduction\/","title":{"rendered":"XML translation &#8211; part 1: introduction"},"content":{"rendered":"<div style=\"float: right; margin-left: 10px;\"><a href=\"https:\/\/twitter.com\/share\" class=\"twitter-share-button\" data-via=\"Wasaty\" data-count=\"vertical\" data-url=\"http:\/\/wasaty.pl\/blog\/2014\/06\/22\/xml-translation-part-1-introduction\/\">Tweet<\/a><\/div>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/xml_ico.png\" alt=\"xml_ico\" width=\"80\" height=\"80\" class=\"alignleft size-full wp-image-786\" \/><\/p>\n<p style=\"text-align: left;\">The XML has a bad reputation amongst translators &#8211; quite often it&#8217;s being perceived as something complicated and terribly difficult to translate. However, armed with the basic knowledge of the XML file structure and modern translation environment tools it&#8217;s actually usually very easy to correctly translate XML. This is the first installment of a three-part series of posts I&#8217;m going to write about XML. Part one is a very basic introduction to XML &#8211; why and what. Part two will cover XML import in memoQ and part three import in Trados Studio.<br \/>\n<!--more--><br \/>\nThe posts are an adaptation of the presentation I gave at the <a href=\"http:\/\/www.translation-conference.com\/\">Translation Conference<\/a> in Warsaw, March 2014.<\/p>\n<p>So, what exactly is this XML thing?<\/p>\n<p>XML stands for eXtensive Markup Language, and it is really a <em>metalanguage<\/em>, that is a language for defining a markup language. XML specifies rules and general syntax for such markup languages, which by themselves are called applications. Generally markup language is a set of codes or tags that surround content and describe what that content is, or in some cases what it should look like when displayed. Therefore in XML applications tags surround the document and tags may have attributes that further qualify the context of the content.<br \/>\nLet\u2019s say we want to catalogue our book collection. To do so, we need to record some information about the books: it\u2019s author, title, publisher, publication year, genre or subject, and maybe a note. How can we do that?\u00a0 Well, the simplest way would be just to write all that data in a text file. For example:<\/p>\n<div id=\"attachment_771\" style=\"width: 400px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/01.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-771\" class=\"size-medium wp-image-771\" alt=\"Example of a simple bibliographic note\" src=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/01-300x187.png\" width=\"400\" height=\"249\" srcset=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/01-300x187.png 300w, http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/01.png 996w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/a><p id=\"caption-attachment-771\" class=\"wp-caption-text\">Example of a simple bibliographic note<\/p><\/div>\n<p>That&#8217;s easy. It&#8217;s not difficult for a human to distinguish and correctly assign different types of information. But what about this example?<\/p>\n<div id=\"attachment_772\" style=\"width: 400px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/02.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-772\" class=\"size-medium wp-image-772\" alt=\"Somewhat less unambiguous bibliographic note\" src=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/02-300x187.png\" width=\"400\" height=\"249\" srcset=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/02-300x187.png 300w, http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/02.png 996w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/a><p id=\"caption-attachment-772\" class=\"wp-caption-text\">Somewhat less unambiguous bibliographic note<\/p><\/div>\n<p>Which of the three first fields contains author, publisher and title?\u00a0 Of course I intentionally picked a name which won\u2019t be hard to recognize, but what about someone less known?\u00a0 As you can see, the correct classification of information may become a challenge. Sure, we can overcome this quite easily:<\/p>\n<div id=\"attachment_773\" style=\"width: 400px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/03.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-773\" class=\"size-medium wp-image-773\" alt=\"Bibliographic note with human-readable description\" src=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/03-300x187.png\" width=\"400\" height=\"249\" srcset=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/03-300x187.png 300w, http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/03.png 996w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/a><p id=\"caption-attachment-773\" class=\"wp-caption-text\">Bibliographic note with human-readable description<\/p><\/div>\n<p>Now it\u2019s easy, right?\u00a0 But how do we know where one kind of information ends and other one begins?\u00a0 A line breaks?\u00a0 If so, then how do we know the note does not ends after the first line?\u00a0 And what if we write both title and author in the same paragraph?\u00a0 Well, that\u2019s where XML tags come in handy. We can use them to explicitly describe the information.<\/p>\n<div id=\"attachment_774\" style=\"width: 400px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/04.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-774\" class=\"size-medium wp-image-774\" alt=\"XML tags for bibligraphic note\" src=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/04-300x187.png\" width=\"400\" height=\"249\" srcset=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/04-300x187.png 300w, http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/04.png 996w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/a><p id=\"caption-attachment-774\" class=\"wp-caption-text\">XML tags for bibliographic note<\/p><\/div>\n<p>The &lt;something&gt; around the data are\u00a0<em>tags<\/em> or\u00a0<em>elements<\/em> describing information inside them, so they are basically an information about information. Now the information is unambiguous and clear. What&#8217;s more, we can re-arrange the content in any way we like and it won\u2019t matter, because it will still be clear. And we can add certain attributes to our tags:<\/p>\n<div id=\"attachment_775\" style=\"width: 400px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/05.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-775\" class=\"size-medium wp-image-775\" alt=\"XML tags with attributes\" src=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/05-300x187.png\" width=\"400\" height=\"249\" srcset=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/05-300x187.png 300w, http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/05.png 996w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/a><p id=\"caption-attachment-775\" class=\"wp-caption-text\">XML tags with attributes<\/p><\/div>\n<p>As you can see in this case I have defined attributes concerning the look of the content. Additionally there are some tags which does not define the type of content, just formatting. The text from such XML file can be transformed using special rules stored in XSLT file (eXtensibe Stylesheet Language Transformations) to generate graphical output like this<\/p>\n<div id=\"attachment_776\" style=\"width: 400px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/06.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-776\" class=\"size-medium wp-image-776\" alt=\"Formatted output of an XML file after XSLT transformation\" src=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/06-300x187.png\" width=\"400\" height=\"249\" srcset=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/06-300x187.png 300w, http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/06.png 996w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/a><p id=\"caption-attachment-776\" class=\"wp-caption-text\">Formatted output of an XML file after XSLT transformation<\/p><\/div>\n<p>However, proper XML file requires some more data than just these tags.<\/p>\n<p>The first line of an XML document should contain XML declaration, which specifies the XML version and the character encoding used in the document.<\/p>\n<div id=\"attachment_777\" style=\"width: 400px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/07.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-777\" class=\"size-medium wp-image-777\" alt=\"Added XML version and encoding declaration\" src=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/07-300x187.png\" width=\"400\" height=\"249\" srcset=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/07-300x187.png 300w, http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/07.png 996w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/a><p id=\"caption-attachment-777\" class=\"wp-caption-text\">Added XML version and encoding declaration<\/p><\/div>\n<p>If there is no version declaration, XML version 1.0 is assumed and the default encoding is UTF-8 (8-bit Unicode Transformation Format). In the next line most of the XML files contain DTD declaration (or whole definition) \u2013 a Document Type Definition defines the elements, attributes and other markup allowed in the given XML document.<\/p>\n<div id=\"attachment_778\" style=\"width: 400px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/08.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-778\" class=\"size-medium wp-image-778\" alt=\"Added DTD declaration\" src=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/08-300x187.png\" width=\"400\" height=\"249\" srcset=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/08-300x187.png 300w, http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/08.png 996w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/a><p id=\"caption-attachment-778\" class=\"wp-caption-text\">Added DTD declaration<\/p><\/div>\n<p>In this case a DTD for this particular file type is referenced. It\u2019s a file I created myself and it is not very complex (although actually contains definition for additional attribute &#8212; translate &#8212; not used in the examples):<\/p>\n<div id=\"attachment_779\" style=\"width: 400px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/09.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-779\" class=\"size-medium wp-image-779\" alt=\"DTD file for the XML file used in examples\" src=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/09-300x129.png\" width=\"400\" height=\"210\" \/><\/a><p id=\"caption-attachment-779\" class=\"wp-caption-text\">DTD file for the XML file used in examples<\/p><\/div>\n<p>Instead of DTD you can see an XSD file referenced in the declaration \u2013 XSD stands for XML Schema Definition and is somewhat more advanced than DTD. But you don\u2019t really have to know anything about DTD or XSD other than the fact, that it\u2019s a file containing a description (or definition) of possible tags and attributes in an XML file. So, back to our XML file.<\/p>\n<p>What we really mised here before was the <strong>root<\/strong> element \u2013 it can be called anything, but the most common is just &lt;root&gt; or &lt;body&gt;. In our case, since we are creating a book catalogue, we can name our root element &lt;collection&gt;.<\/p>\n<div id=\"attachment_780\" style=\"width: 400px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/10.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-780\" class=\"size-medium wp-image-780\" alt=\"Fully valid XML file with a single entry\" src=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/10-300x187.png\" width=\"400\" height=\"249\" srcset=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/10-300x187.png 300w, http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/10.png 996w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/a><p id=\"caption-attachment-780\" class=\"wp-caption-text\">Fully valid XML file with a single entry<\/p><\/div>\n<p>Still, it&#8217;s just a single book, while we want to have a whole catalogue, so, I think we need to add some more structure:<\/p>\n<div id=\"attachment_781\" style=\"width: 400px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/11.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-781\" class=\"size-medium wp-image-781\" alt=\"Fully valid XML file with two &quot;book&quot; entries\" src=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/11-300x187.png\" width=\"400\" height=\"249\" srcset=\"http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/11-300x187.png 300w, http:\/\/wasaty.pl\/blog\/wp-content\/uploads\/2014\/06\/11.png 996w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/a><p id=\"caption-attachment-781\" class=\"wp-caption-text\">Fully valid XML file with two &#8220;book&#8221; entries<\/p><\/div>\n<p>The &#8220;collection&#8221; is our root element. Book is the level one element, and it&#8217;s a parent for elements like title, author or publisher. These elements are children. Each root element can have many child elements, but each child can have only one parent.<\/p>\n<p>Of course this is a very simple example XML file, but it does illustrate the basics &#8212; XML file structure, tags and attributes. I also think that it should be obvious to you what do we actually want to translate in this file.<\/p>\n<p>In the next part I&#8217;ll describe the process of importing an XML file into memoQ and how to adapt an XML filter to a particular XML file.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Tweet The XML has a bad reputation amongst translators &#8211; quite often it&#8217;s being perceived as something complicated and terribly difficult to translate. However, armed with the basic knowledge of the XML file structure and modern translation environment tools it&#8217;s actually usually very easy to correctly translate XML. This is the first installment of a &hellip; <\/p>\n<p><a class=\"more-link btn\" href=\"http:\/\/wasaty.pl\/blog\/2014\/06\/22\/xml-translation-part-1-introduction\/\">Continue reading<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[41,24,43,35],"class_list":["post-770","post","type-post","status-publish","format-standard","hentry","category-wskazowki","tag-filtry","tag-porady","tag-xml","tag-znaczniki","item-wrap"],"_links":{"self":[{"href":"http:\/\/wasaty.pl\/blog\/wp-json\/wp\/v2\/posts\/770","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/wasaty.pl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/wasaty.pl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/wasaty.pl\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/wasaty.pl\/blog\/wp-json\/wp\/v2\/comments?post=770"}],"version-history":[{"count":5,"href":"http:\/\/wasaty.pl\/blog\/wp-json\/wp\/v2\/posts\/770\/revisions"}],"predecessor-version":[{"id":787,"href":"http:\/\/wasaty.pl\/blog\/wp-json\/wp\/v2\/posts\/770\/revisions\/787"}],"wp:attachment":[{"href":"http:\/\/wasaty.pl\/blog\/wp-json\/wp\/v2\/media?parent=770"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/wasaty.pl\/blog\/wp-json\/wp\/v2\/categories?post=770"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/wasaty.pl\/blog\/wp-json\/wp\/v2\/tags?post=770"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}