Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 2.65 KB

File metadata and controls

7 lines (4 loc) · 2.65 KB

Machine translation is a part of natural language processing that deals with the process of translating a source language text into a target language. Researchers have been working on the problem of machine translation for decades, and it’s considered one of the earliest pursuits in computer science. However, given the difficulties associated with parts of language like assumptions, relations, expectations and conditions, achieving high quality machine translation results has proven to be an elusive goal.

The process of completing a machine translation starts with analysis of a text input. Sentence are analyzed and often classified based on the degree of difficulty of translation. Often, looking at sentences as individual instances isn’t sufficient, due to the inter-connected nature of adjacent sentences and ideas that are carried from one sentence to another. Common knowledge, as well as more localized knowledge (especially specific to the country or countries where the source language is spoken) may be necessary to achieve high quality results.

Deformatting and reformatting may also be necessary, since the source material may contain charts and diagrams that don’t require translation. Analysis and transfer are also essential. Morphological analysis is carried out to determine and tag the parts of speech, while semantic analysis determines if a word is a subject or an object. This analysis allows for an accurate transfer of a sentence to a target natural language. This is closely tied to the process of parsing and tagging, where tagging identifies the linguistic properties of each word, and parsing looks at their relation to each other.

One emerging area of machine translation is example-based machine translation (EBMT), where existing translations are used as the basis for new translations of similar texts or topics. This process involves three stages, matching, alignment, and recombination. In the matching stage, examples are found that will contribute to the translation on the basis of the similarity of the input. In a process known as sequence comparison, matching can occur by comparing each character of the texts. During sequence comparison, tags may be added to parts of speech as they are identified, shortening the length of time necessary to create a final translated product. During the alignment phase, EBMT algorithms identify the pieces of the corresponding translation that are to be used. This is carried out using a bilingual dictionary, or via comparison with other examples. Finally, during recombination, the algorithms work to ensure that the reusable pieces identified are put together in a sensible way that aligns with natural language.