Awesome-Text2GQL

This is the repository for the text2GQL generator implementation. Awesome-Text2GQL aims to generate cyphers/gqls and corresponding prompts as training corpus for fine-tuning of large language models (LLMs). Based on TuGraph-DB, the training corpus helps to train the Text2GQL and Text2Cypher models that are suitable for TuGraph-DB query engine capabilities.

Quick Start

Preparation

Environment

For Linux, it is recommended to use miniconda to manage your python environment while other tools may also work.

conda create --name text2gql python=3.10 
conda activate text2gql
git clone https://github.com/TuGraph-family/Awesome-Text2GQL
cd Awesome-Text2GQL
mkdir output

Install related python dependency packages

pip install .

Setup for LLMs

To run generating qusetions and generalization functions based on LLMs，apply API-KEY before you run the whole flow.

Apply API-KEY

We build the corpus generalization module based on the Qwen Inference Service served by Aliyvn, you can refer to Aliyvn to apply the API-KEY.

Set API-KEY via environment variables (recommended)

# replace YOUR_DASHSCOPE_API_KEY with your API-KEY
echo "export DASHSCOPE_API_KEY='YOUR_DASHSCOPE_API_KEY'" >> ~/.bashrc
source ~/.bashrc
echo $DASHSCOPE_API_KEY

Install related python dependency packages

pip install .

Setup for LLMs

To run generating qusetions and generalization functions based on LLMs，apply API-KEY before you run the whole flow.

Apply API-KEY

We build the corpus generalization module based on the Qwen Inference Service served by Aliyvn, you can refer to Aliyvn to apply the API-KEY.

Set API-KEY via environment variables (recommended)

# replace YOUR_DASHSCOPE_API_KEY with your API-KEY
echo "export DASHSCOPE_API_KEY='YOUR_DASHSCOPE_API_KEY'" >> ~/.bashrc
source ~/.bashrc
echo $DASHSCOPE_API_KEY

Run

The whole flow

Make sure you have done the preparations above. To experience the whole flow recommended, you can run as below：

sh ./scripts/run_the_whole_flow.sh

The following steps will be execuated in sequence:

generate cyphers by generation module based on Antlr4 with templates as input.
generate questions by generalization module based on LLMs with the cyphers generated in the last step as input.
generalize the questions generated in the last step by generalization module based on LLMs.
transform the corpus generated above into model training format.

Run parts seperately

Cypher generation

sh ./scripts/gen_query.sh

The corpus generation module can be run in two modes, that is generating querys by instantiator and generating questions by translator.

Set GEN_QUERY=true to generate querys according to templates in batch.

Question generation

generate questions based on LLMs with template as additional input(recommened)

sh ./scripts/gen_question_with_template_llm.sh

generate questions based on LLMs without template as input. It helps to generate questions which don't have corresponding template initially.

sh ./scripts/gen_question_directly_llm.sh

generate questions based on Antlr4(deprecated)

Set GEN_QUERY=false to generate questions using translator of the generation module based on Antlr4.

sh ./scripts/gen_question.sh

Corpus generalization

generalize corpus with query and question as input(recommened)

sh ./scripts/generalize_llm.sh

generalize question without querys as input(deprecated)

sh ./scripts/general_questions_directly_llm.sh

Transform

transform the corpus generated above into model training format.

sh ./scripts/generate_dataset.sh

Attention

This project is still under development, suggestions or issues are welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
base		base
cypher		cypher
db_data		db_data
images		images
input_examples		input_examples
llm_process		llm_process
scripts		scripts
test		test
utils		utils
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
config.json		config.json
generalize_llm.py		generalize_llm.py
generate_dataset.py		generate_dataset.py
generator.py		generator.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Text2GQL

Quick Start

Preparation

Environment

Setup for LLMs

Setup for LLMs

Run

The whole flow

Run parts seperately

Cypher generation

Question generation

Corpus generalization

Transform

Attention

About

Releases

Packages

Languages

License

Panghy1106/Awesome-Text2GQL

Folders and files

Latest commit

History

Repository files navigation

Awesome-Text2GQL

Quick Start

Preparation

Environment

Setup for LLMs

Setup for LLMs

Run

The whole flow

Run parts seperately

Cypher generation

Question generation

Corpus generalization

Transform

Attention

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages