[TAG] OmegaT - an open source tool for Computer Assisted Translation

Jimmy O'Regan joregan at gmail.com
Mon Oct 1 02:34:27 MSD 2007


OmegaT (http://www.omegat.org/omegat/omegat_en/omegat.html) is a tool
for Computer Assisted Translation. It's a Translation Memory system,
so it remembers sentences that have already been translated, which can
then be reused. There are several tools like this which edit po files,
but OmegaT is able to read from several kinds of files, including HTML
and OpenOffice.org files (as well as po files).

It supports projects with several files, using several glossaries and
translation memories.  It uses TMX as its native format - TMX
(Translation Memory eXchange - http://www.lisa.org/standards/tmx/) is
"the vendor-neutral open XML standard for the exchange of Translation
Memory (TM) data created by Computer Aided Translation (CAT) and
localization tools." It's widely used: "Corpas Comhthreomhar
Gaeilge-B?arla" (http://borel.slu.edu/corpas/index.html), an
English-Irish parallel corpus, is able to return search results in TMX
format, and bitext2tmx (http://bitext2tmx.sourceforge.net/) is a tool
to convert already translated documents into TMX format.

The glossary format is a 3 column tab-separated text file:
``
No c??	Oh well
straszna lipa	load of rubbish	Slang
Pozdrawiam	Greetings
Pozdrawiam	Love	(Closing a letter)
na bie??co	up to date
na bie??co	without delay
''

(to use UTF8 files, the file - either for glossaries, or as a
translation source - should have a .utf8 extension)

As a simple project, I wanted to make a translation of this email:

``
Date: 15 Aug 2006 19:23:08 +0200
Subject: Re: Hurling
To: Jimmy O'Regan <joregan at gmail.com>
X-ORIGINATE-IP:83.7.212.114
X-Mailer: PSE3
Message-Id: <20060815172308.95824250499 at poczta.interia.pl>

Jimmy O'Regan napisa=B3(a):
> http://www.tpi.poznan.pl/index.php?pid=3D11
>=20
>=20
Czesc!!!
To chyba moj pierwszy e-mail do Ciebie :)
Nie ma lekko - bedzie po polsku :)
Jak znalazles ta strone? Zawiera ciekawe informacje...
Poczytam sobie bo troche mi sie nudzi na urlopie :)
Te moje wakacje to straszna lipa a tak na nie czekalam...
No coz bede musiala to sobie jakos odbic :)
Pozdrawiam!!!
''

Entries from the glossaries are automatically suggested if they appear
in the segment you're working on, as do close matches from translation
memory - I added a second file, stripped of email headers, and with
Polish letters added, and didn't have to retype any of my translation.
The TMX file from that is below:

''
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE tmx SYSTEM "tmx11.dtd">
<tmx version="1.1">
  <header
    creationtool="OmegaT"
    creationtoolversion="1.6.1"
    segtype="sentence"
    o-tmf="OmegaT TMX"
    adminlang="EN-US"
    srclang="PL"
    datatype="plaintext"
  >
  </header>
  <body>
    <tu>
      <tuv lang="PL">
        <seg>Cze??!!!</seg>
      </tuv>
      <tuv lang="EN-GB">
        <seg>Hi!!!</seg>
      </tuv>
    </tu>
    <tu>
      <tuv lang="PL">
        <seg>To chyba m?j pierwszy e-mail do Ciebie :)</seg>
      </tuv>
      <tuv lang="EN-GB">
        <seg>I suppose this is my first e-mail to you :)</seg>
      </tuv>
    </tu>
    <tu>
      <tuv lang="PL">
        <seg>Nie ma lekko - b?dzie po polsku :)</seg>
      </tuv>
      <tuv lang="EN-GB">
        <seg>Not lightly - it'll be in Polish :)</seg>
      </tuv>
    </tu>
    <tu>
      <tuv lang="PL">
        <seg>Jak znalaz?e? t? stron??</seg>
      </tuv>
      <tuv lang="EN-GB">
        <seg>How did you find that page?</seg>
      </tuv>
    </tu>
    <tu>
      <tuv lang="PL">
        <seg>Zawiera ciekawe informacje...</seg>
      </tuv>
      <tuv lang="EN-GB">
        <seg>It contains interesting information...</seg>
      </tuv>
    </tu>
    <tu>
      <tuv lang="PL">
        <seg>Poczytam sobie bo troch? mi si? nudzi na urlopie :)</seg>
      </tuv>
      <tuv lang="EN-GB">
        <seg>I'll read it because I'm a little bored on holiday :)</seg>
      </tuv>
    </tu>
    <tu>
      <tuv lang="PL">
        <seg>Te moje wakacje to straszna lipa a tak na nie czeka?am...</seg>
      </tuv>
      <tuv lang="EN-GB">
        <seg>This holiday is a load of rubbish and that's not what I
was waiting for...</seg>
      </tuv>
    </tu>
    <tu>
      <tuv lang="PL">
        <seg>No c?? b?d? musia?a to sobie jako? odbi? :)</seg>
      </tuv>
      <tuv lang="EN-GB">
        <seg>Oh well I'll have to recapture it for myself :)</seg>
      </tuv>
    </tu>
    <tu>
      <tuv lang="PL">
        <seg>Pozdrawiam!!!</seg>
      </tuv>
      <tuv lang="EN-GB">
        <seg>Greetings!!!</seg>
      </tuv>
    </tu>
  </body>
</tmx>
''

I can now use this on any further translations I do, or use it as
input to apertium-transfer-tools
(http://xixona.dlsi.ua.es/apertium-www/?id=apertium-transfer-tools),
which can generate translation patterns from existing translations.



More information about the TAG mailing list