Skip to content

Latest commit

 

History

History
99 lines (84 loc) · 4.19 KB

README.en.md

File metadata and controls

99 lines (84 loc) · 4.19 KB

Data-Copilot Steps

An Interpretable Natural Language Database Query System (RAG) for Large Language Models (LLM) Query databases using natural language, intelligently parse database structures with large language models, perform structured multi-table queries and statistical calculations on data, and intelligently generate various types of charts based on query results. The entire generation process is open, modifiable, and interpretable, achieving reliable natural language data analysis. Pywebio interactive front-end web page, no need for OpenAI API, 100% pure Python code. 🚩 简体中文

📺 Project Online Demonstration Link

Related Projects

Function Overview

    1. Query using natural language
    1. Implement multi-table structured queries and statistical calculations
    1. Intelligent generation of various types of ECharts
    1. Intelligent parsing of database structures, no extra configuration needed for different MySQL databases
    1. ✨ The entire generation process is open, modifiable, and interpretable, achieving reliable natural language data analysis
    1. Handle anomalies such as instability in large language models
    1. Support local offline deployment (requires GPU) for huggingface format models (e.g., qwen-7b)
    1. Support openai format (e.g., glm, deepseek) and dashscope qwen API interfaces

Technical Innovations

  • Independent from frameworks like langchain, completely open implementation
  • ✨ The entire generation process is open, modifiable, and interpretable, achieving reliable natural language data analysis

How to Use

Install Dependencies

Python version 3.10

pip install -r requirement.txt

Fill in Configuration Information

./config/config.yaml is the configuration file.

Database Configuration

Connect and the model will automatically read the database structure without extra configuration

mysql: mysql+pymysql://root:[email protected]/data_copilot
# mysql: mysql+pymysql://username:password@address:port/databasename

Large Language Model Configuration

If using the deepseek API (recommended)

llm:
  model_provider: openai #qwen #openai
  model: deepseek-chat
  url: "https://api.deepseek.com/v1/"
# https://api-docs.deepseek.com/

If using the OpenAI API (here you fill in the OpenAI-compatible API for glm)

llm:
  model_provider: openai
  model: glm-4
  url: "https://open.bigmodel.cn/api/paas/v4/"
# glm-4
# https://open.bigmodel.cn

For local offline deployment, the relevant code is in ./llm_access/qwen_access.py

Obtain API Key

If using the openai format API's API-key If obtaining the deepseek large language model API-key from the deepseek-api official website If obtaining the chatglm large language model API-key from the bigmodel official website Save the api-key to llm_access/api_key_openai.txt

Run

main.py is the entry point of the project. Run this file to start the server.

python main.py

Interface Display

Intelligently parse the structure of any database, and the user inputs natural language questions LLM intelligently generates SQL, and the user checks and executes it If you are not satisfied with the query results, you can modify or regenerate the SQL LLM intelligently generates the Python code for plotting Automatic plotting

Open Source License

This translation is for reference only. The English version in the LICENSE file prevails. MIT Open Source License: Copyright (c) 2025 bytesc Permission is hereby granted, free of charge, to any person obtaining a copy of this software and