Skip to content

A utility that enables users to talk to any repository of their choice leveraging LLMs

Notifications You must be signed in to change notification settings

pramodv1993/CodeQA

Repository files navigation

CodeQA

Problem Statement:
Given a URL of a Github repository, the proposed solution enables the user to have a priliminary understanding of the repository by asking the system in a conversation style setup.

High Level Design:

drawing
3 main modules, each of which function as a standlone dockerized microservice:

  • API: That captures the main tasks of downloading the repository, processing it, embedding the same and so on. More details can be found here
  • UI: Simple interface to see the solution in action.
  • VecDB: A vector database that supports CRUD of vector embeddings as well as some metadata information.

Demo:

See demo here

QuickStart:

  • Include a .env file with a key for OPENAI_API_KEY="" in the API module (ie in this path)
  • Download the embedding model from here and place it in this location
  • The microservices are encapsulated as composable docker services. Hence run
    docker-compose up ---build
    at the root location.
  • You can find each of the modules in the following URLs:

More technical details:

  • UI: Streamlit is used for building a basic interactive app. Screenshot:
    drawing
  • API: Fast API is used for implementing RESTful services. More details about the supported APIs can be found here.
  • VectorDB: Self-hosted Qdrant database is used as a vector database.

Next steps:

  • Improvements can be done at several places, Some of them (but not limited to) could be:
    • Filtering strategies while downloading the repo
    • Detecting programming language of the scripts and performing appropriate cleaning strategies
    • Creating richer metadata for the scripts at both document and chunk level such as summaries of functions, comments, function names etc.
    • More advanced strategies while embedding the scripts, to bring about "Contextual RAG".
  • I have tried to add comments (@TODO) in appropriate places in the scripts.

About

A utility that enables users to talk to any repository of their choice leveraging LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published