Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.
/ lingolift-core Public archive

Lingolift supports you in learning a language by breaking down the syntax of sentences. It shines when learning strongly inflected languages, such as German, English, Russian and most other Indo-European languages. Serverlessly hosted on AWS and accessible via the Streamlit Community Cloud and a Telegram Bot.

License

Notifications You must be signed in to change notification settings

twaslowski/lingolift-core

Repository files navigation

Build Coverage

Archival

This repository is archived as of October 2024. While it was a fun project, it turns out that scaling this application beyond a certain complexity in Python just is not feasible. I do think there is value in this project, but I would have to do an entire rewrite in a programming language better suited for this usecase.

About

This application's goal is to enable people in learning languages while conversing with native speakers. It is not a standalone language-learning app; instead, it aims to provide translations for everyday phrases while explaining the grammatical structure and vocabulary of those sentences.

lingolift uses a mixture of Generative AI and Natural Language Processing (NLP) to perform translation and sentence analysis. For example, both the idiomatic translation of the input sentence and the literal translations are generated by an LLM (currently using the OpenAI API); however, the syntactical analysis of sentences is largely achieved using the spaCy library.

Features

As of now, lingolift can do the following:

  • Auto-detect the language of the input sentence
  • Translate sentences from other languages to English
  • Provide a literal translation of each word in the input sentence (up to certain sentence lengths)
  • Provide a coherent syntactical analysis of the input sentence based in part-of-speech tagging
  • Provide response suggestions for the user to continue the conversation

Currently, those features can be accessed via Chatbot-like UIs on both the Streamlit Community Cloud and Telegram.

I'm currently working on error detection. Also, I'm looking to move away from language detection, instead focussing on specific languages. Language detection is difficult and takes time; and this application won't work equally well for all languages anyway, so it makes more sense to focus on a few languages and make them work well.

Usage

I am hosting an instance of the application on the Streamlit Community Cloud and on Telegram here. The backend, as defined in this repository, is hosted as a set of serverless functions on AWS Lambda, abstracted behind an API Gateway.

Running

You can run lingolift locally as a dockerized Flask server. To do so, you need to have Docker installed on your machine. You can simply pull a pre-built Docker image (amd64 only) for a given language from Docker Hub:

docker pull tobiaswaslowski/lingolift-webserver-de:latest
docker run -p 5001:5001 -e OPENAI_API_KEY="$OPENAI_API_KEY" tobiaswaslowski/lingolift-webserver-de:latest

Note that this image can only perform syntactical analysis for German. I host another model for the Russian language (tobiaswaslowski/lingolift-webserver-ru); if you would like more images, you have to build them yourself. This is not terribly difficult. You can build an image for a given language with the following command:

# Build the image for the Spanish language
# Retrieve model id here: https://spacy.io/models
./do build_webserver --spacy_model es_core_news_sm  --source-lang es
./do run_webserver es

The easiest option to interact with the provided endpoints is to clone the Streamlit-based frontend and run it locally:

git clone [email protected]:twaslowski/lingolift-frontend.git && cd lingolift-frontend
poetry install --no-root
./do run

Contributing

All contributions are welcome! If you want to contribute, please fork the repository and create a pull request. You can run tests with ./do test and perform linting, import sorting and formatting with ./do pc or pre-commit run --all-files.

Project Overview

The codebase for this project is split into four distinct repositories. You are currently in the main repository that provides the backend functionality. The primary frontend is hosted in the lingolift-frontend repository. The Telegram bot is hosted in the lingolift-telegram-bot repository. Lastly, there is a shared repository that contains client functionality for accessing the API provided here as well as models for all tasks to ensure type safety.

About

Lingolift supports you in learning a language by breaking down the syntax of sentences. It shines when learning strongly inflected languages, such as German, English, Russian and most other Indo-European languages. Serverlessly hosted on AWS and accessible via the Streamlit Community Cloud and a Telegram Bot.

Topics

Resources

License

Stars

Watchers

Forks