Automatic Translation Software Technology

Automatic translation software (also referred to as Machine Translation) systems use sophisticated translation technology with comprehensive dictionaries along with a collection of linguistic rules that translate one foreign language into another without relying on human translators.

An automatic translation software system interprets the structure of sentences in the source language (the language the user is translating
from) and generates a translation based on the rules of the target language (the language the user is translating to). The process involves breaking down complex and varying sentence structures; identifying parts of speech; resolving ambiguities; and synthesizing the information into the components and structure of the new language.

Automatic language translation software is considered a “gisting” application, producing translations that enhance the end-user’s understanding of the original document. It does not produce the same level of translation that a human translator could provide. The "gisting" level can be improved to publishable results with a combination of software and a human translator.

Types of Translation Technology

New Pure Neural Translation: Human Quality translations are possible with this ground breaking new technology. Delivering higher accuracy than any other previous technology. Learn More about Pure Neural MT.

Pure Neural Translation Technology

Older Technologies:

Dictionary-Based: This is also referred to as word for word translation. This is where a document is translated using dictionary filled with pre-translated terms. The software performs a search and replace of the individual words without regard to the context of the word and how it was used in the sentence. This is the least accurate form of automatic translation.

Dictionary-Based Context Sensitive: This is very close to the original word for word translation but with the improvement of being able to control the translations based on part of speech for each word. Example, if you have the word ‘ship’ in the dictionary, it can have two entries along with the part of speech for the word, one will be a noun and one will be a verb. When used in conjunction with Rule Based Translation software, the words surrounding ‘ship’ will determine if the translation should refer to a ship (noun) on the ocean, or to ship (verb) a package. This form of a dictionary increases the translation results dramatically.

Rule Based Translation: The translation engine uses a set of linguistic rules for each language. The rule based system analyzes every word, the sentence structure, grammar, punctuation and determine the translated results. By using the grammar rules of each language this form of translation produces a translation that can produce more fluid results retaining the meaning of the original sentence. For example, if the original language uses a different verb word order, with rule based it will change the order to the correct usage in the results. This has been considered the most accurate for over 30 years.

Statistical Translation: This system is based on training. Out of the box the system cannot do any automatic translation until the system has been trained. To train the system, millions of previously translated documents are fed into the database in both languages. The software then looks for patterns and statistical relationships between the words. This technology is introducing a more fluid translation and is now providing impressive translation quality.

From the list above it shows that Rule Based and Statistical are the most accurate but there are Pros and Cons.

Rule Based

Statistical

Pros

System will attempt to translate any sentence automatically, and can produce results even if part of the text is not in the dictionary / database.
Higher Accuracy

Pros

System will can produce smoother fluid translations learned from human input.
System can learn slang and produce more localized results after training.

Cons

Requires grammatically correct documents.
Translations are more stilted sounding.
Produces more literal translations.

Cons

If a segment / phrase does not match it cannot translate.
Slow, CPU intensive, huge banks of data are required in order to produce useable translations.
Training Required

SYSTRAN uses a combination of both Rule Based (more accurate) and Statistical (more flexible) technology.

SYSTRAN's HYBRID Translation Engine

Systran developed the first HYBRID Translation Engine which utilizes a combination of both Statistical and Rule based. This gives you the accuracy of Rule Based technology with the flexibility of Statistical Machine Translation. Statistical and Rule based were both introduced back in the 1960's but Rule Based was much more accurate because there simply was not enough data to properly create Statistical Models. With the internet and the huge amounts of digitized data in foreign languages, along with the ability to crowd source, the Statistical method has once again become a contender.

With the next versions you are sure to see more developments as the technology is improved.

NEW SYSTRAN's PURE NEURAL Translation Engine

With the development of the Pure Neural Machine Translation technology, everything has changed. Using deep learning methods means the computer actually learns to translate a language by learning the language like a human. This technology is so so advanced that it is difficult to tell the difference between automated translation software and human. In many cases the software can do a better job because it is more consistent and easier to control. New languages are also developed at a much faster pace so expect to see many additions in literally weeks whereas it used to take years to develop a new language for software.

See a list of translation software products.

Related Items:

History of Systran - see the time line.

Translation Tips - Getting higher accuracy with software.