This repository is archived as of October 2024. While it was a fun project, it turns out that scaling this application beyond a certain complexity in Python just is not feasible. I do think there is value in this project, but I would have to do an entire rewrite in a programming language better suited for this usecase.
This application's goal is to enable people in learning languages while conversing with native speakers. It is not a standalone language-learning app; instead, it aims to provide translations for everyday phrases while explaining the grammatical structure and vocabulary of those sentences.
lingolift uses a mixture of Generative AI and Natural Language Processing (NLP) to perform translation and sentence analysis. For example, both the idiomatic translation of the input sentence and the literal translations are generated by an LLM (currently using the OpenAI API); however, the syntactical analysis of sentences is largely achieved using the spaCy library.
As of now, lingolift can do the following:
- Auto-detect the language of the input sentence
- Translate sentences from other languages to English
- Provide a literal translation of each word in the input sentence (up to certain sentence lengths)
- Provide a coherent syntactical analysis of the input sentence based in part-of-speech tagging
- Provide response suggestions for the user to continue the conversation
Currently, those features can be accessed via Chatbot-like UIs on both the Streamlit Community Cloud and Telegram.
I'm currently working on error detection. Also, I'm looking to move away from language detection, instead focussing on specific languages. Language detection is difficult and takes time; and this application won't work equally well for all languages anyway, so it makes more sense to focus on a few languages and make them work well.
I am hosting an instance of the application on the Streamlit Community Cloud and on Telegram here. The backend, as defined in this repository, is hosted as a set of serverless functions on AWS Lambda, abstracted behind an API Gateway.
You can run lingolift locally as a dockerized Flask server. To do so, you need to have Docker installed on your machine. You can simply pull a pre-built Docker image (amd64 only) for a given language from Docker Hub:
docker pull tobiaswaslowski/lingolift-webserver-de:latest
docker run -p 5001:5001 -e OPENAI_API_KEY="$OPENAI_API_KEY" tobiaswaslowski/lingolift-webserver-de:latest
Note that this image can only perform syntactical analysis for German. I host another model for the Russian language
(tobiaswaslowski/lingolift-webserver-ru
); if you would like more images, you have to build them yourself. This
is not terribly difficult. You can build an image for a given language with the following command:
# Build the image for the Spanish language
# Retrieve model id here: https://spacy.io/models
./do build_webserver --spacy_model es_core_news_sm --source-lang es
./do run_webserver es
The easiest option to interact with the provided endpoints is to clone the Streamlit-based frontend and run it locally:
git clone git@github.com:twaslowski/lingolift-frontend.git && cd lingolift-frontend
poetry install --no-root
./do run
All contributions are welcome! If you want to contribute, please fork the repository and create a pull request.
You can run tests with ./do test
and perform linting, import sorting and formatting with ./do pc
or
pre-commit run --all-files
.
The codebase for this project is split into four distinct repositories. You are currently in the main repository that provides the backend functionality. The primary frontend is hosted in the lingolift-frontend repository. The Telegram bot is hosted in the lingolift-telegram-bot repository. Lastly, there is a shared repository that contains client functionality for accessing the API provided here as well as models for all tasks to ensure type safety.