2.1 - Translation Memory

IDevice Icon Reading Activity

How does a TM system work?

This technology works by automatically comparing a new source text against a database of texts that have already been translated. When a translator has a new segment to translate, the TM system consults the database to see if this new segment corresponds to a previously translated segment. if a matching segment is found, the TM system presents the translator with the previous translation. The translator can consult this previous translation and decide whether or not to incorporate it into the new translation.

 

Step 1: Segmentation

Before the actual translation process may start, the source text has to be split into smaller units called SEGMENTS. In most cases, the basic unit of segmentation in a TM system is the sentence. Basically, what the software does, is that it looks for symbols defined by the translator that count as segment boundaries where a text is split.

However, not all text is written in sentence form. Headings, list items, and table cells are familiar elements of text, but they may not strictly qualify as sentences. Even though the process is fairly straightforward, there are problems associated with automatic segmentation. For instance, the use of fullstop in decimal numbers, or numbered list items; fortunately, some of these problems may be prevented by the so-called STOPLIST. A stoplist in this context is a list of all abbreviations, such as Mr. or Mrs., that do not indicate the end of a sentence (another type of stoplist will be mentioned in Unit 8).

__________________

Reflection #1: Which is the basic unit of translation? Which symbols should count as segment boundaries?

 

Step 2: Matching

Once the source text is segmented, the real translation job may start. When translators open their WORKBENCHES (which is a common name for CAT software integrating several features) they move from one segment to the next. Each segment is read by the machine and compared against a TM, i.e. matched. If there is a similarity found, it is automatically offered to the translator.

Most TM systems present the user with a number of different types of segment matches. These may be "exact", "full", "fuzzy" and "term" matches.

EXACT MATCH is a situation when a segment is exactly the same as one already included in a TM, i.e. including formatting and typeface.

FULL MATCH is when the only difference is the so-called variable element (numbers, dates, times, currencies, etc.).

The most interesting and important variety though are FUZZY MATCHES (see Figure 2), i.e. the ones that are similar to segments in a TM but not the same.

Translators may generally set the sensitivity threshold in % (see Figure 3), meaning that the TM system automatically locates and offers for use previously translated segments that (to a varying degree) differ from the new source text segment. Normally, translators prefer to se the sensitivity threshold somewhre between 60 and 70 percent. Although fuzzy matching is arguably the key feature of CAT tools, it requires careful proofreading and editing before it can be used in the new target text.

TERM MATCHES happen when the software identifies a word or phrase that is identical to an item contained in the so called termbase (i.e. a list of repeated items that the translator recorded on purpose). This means that when no exact or fuzzy matches are found for source text segments, the translator may at least find some translation equivalents for individual terms in the term base. Also, having a large termbase associated with the project would also save many keystrokes as the translator would not have to retype the same item repeatedly.

(Bowker 2008, 94-101)

__________________

Reflection #2: What is probably the most useful functionality of most CAT tools? What makes it possible? How is it done?


For suggested answers to reflection questions click the following button: