Skip to content

Translation tool using NLP concepts and TensorFlow, Keras libraries. The dataset is tokenized then fed to GRU (Gated Recurrent Unit) for encoding, then the decoder layer converts the output from the GRU to complete sentences.

Notifications You must be signed in to change notification settings

HEMANGANI/Language-Translator

Repository files navigation

Language-Translator

Goal:

Build a language translation layer that can sit on top of a chatbot with minor adjustments. Such a layer should account for custom business-centric keywords and provide a way to update them as required. The layer should be easily accessible via an API. Keep in mind that the API should connect with other services and platforms (WhatsApp, Slack, Teams, etc.). The layer should also identify and omit offensive and inappropriate language.

Introduction:

Today’s day and age, technology has brought the world closer than ever. Information flow is on a larger scale today that facilitates business operations. Around the world, a business might have to interact with many entities to carry out their operations in different capacities. The translation is necessary to spread new information, knowledge, and ideas worldwide. It is essential to achieve effective communication between different cultures. In spreading new information, translation is something that can change history.

Methodology:

PART 1: LANGUAGE TRANSLATION

We create a translation tool using Natural Language Processing concepts and the TensorFlow library. The model is trained on the dataset provided, with a separate set of weights for each language pair. To clarify, weights are generated and stored for language pairs like English-French, English-German, etc.

The NLP model works as follows:-

  1. The dataset is tokenized, i.e., sentences are broken into individual words or phrases with components like punctuation marks and common prefixes/suffixes removed. The Tokenizer class from Keras is used to perform this operation.
  2. The neural network used here is a GRU, i.e., a Gated Recurrent Unit, which will encode the tokens from the previous step. It is a neural network with feedback connections and the ability to retain context-based connections, thus ideal for sentence translation in a business setting.
  3. Finally, the decoder layer converts the output from the GRU to complete sentences. To improve performance, optimizers like RMSProp and loss minimization techniques like EarlyStopping are used. These ensure that the model is not overly fitted to the training data and can perform robustly in a test environment.

PART 2: API INTEGRATION

A web application interface is built using Django. The trained weights from Part 1 are integrated with a Django database to display the translated output message in a chat interface.

The chat interface works this way:-

  1. The user is given a dropdown menu to pick the “from” and “to” languages.
  2. They then proceed to type their message in the chatbox.
  3. The web app interfaces with the Google Colab notebook on which the NLP model is stored and comes back with the translated speech.
  4. An API is also used to integrate this translation tool with chat platforms like WhatsApp and Discord so that the user does not have to rely on the above chat interface alone.

About

Translation tool using NLP concepts and TensorFlow, Keras libraries. The dataset is tokenized then fed to GRU (Gated Recurrent Unit) for encoding, then the decoder layer converts the output from the GRU to complete sentences.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published