Skip to content

harvard-lil/open-french-law-rag-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open French Law RAG Pipeline

CLI utility for running the Open French Law RAG experiment.

This Retrieval Augmented Generation pipeline:

  • Ingests the COLD French Law Dataset into a vector store.
    • Only French content is ingested. English translations present in the dataset are not part of this experiment.
  • Uses the resulting vector store and a combination of text generation models to answer a series of questions.
    • Questions are asked both in English and French.
    • Questions are asked both with and without context retrieved from the vector store
    • Questions are asked against both an OpenAI model and open-source model, which is run via Ollama
  • Outputs raw results to CSV

Usage

This pipeline requires Python 3.11+ and Python Poetry.

Pulling and pushing data from HuggingFace may require the HuggingFace CLI and valid authentication.

1. Clone this repository

git clone https://github.com/harvard-lil/open-french-law-chabot.git

2. Install dependencies

poetry install

3. Configure the application

Copy and edit .env.example as .env in order to provide the pipeline credentials to the OpenAI API and Ollama.

cp .env.example .env

The pipeline's configuration can be further edited via /const/init.py.

3. Run the "ingest" script

This script generates a vector store out of the content from the COLD French Law Dataset.

# See: ingest.py --help for a list of available options
poetry run python ingest.py

See output under /database.

4. Run the "ask" script

This scripts runs the full list of questions through the pipeline and writes the output to CSV.

# See: ask.py --help for a list of available options
poetry run python ask.py

See output under /output/*.csv.


Output groups

The experiment's output is organized in groups:

Group name Text gen. Model RAG Language
a_en LLama2 70B NO EN
a_fr LLama2 70B NO FR
b_en LLama2 70B YES EN
b_fr LLama2 70B YES FR
c_en GPT-4 NO EN
c_fr GPT-4 NO FR
c_en GPT-4 YES EN
c_fr GPT-4 YES FR

About

CLI utility for running the Open French Law RAG experiment.

Resources

License

Stars

Watchers

Forks

Languages