- 3/18/2024: CoPilot is available now in Alpha (v0.0.1). It uses a Large Language Model (LLM) to convert your question into a function call, which is then executed on the graph in TigerGraph. We would love to hear your feedback to keep improving it so that it could bring more value to you. If you are trying it out, it would be helpful if you could fill out this sign up form so we can keep track of it (no spam, promised). And if you would just like to provide the feedback, please feel free to fill out this short survey after you have played with CoPilot. Thank you for your interest and support!
TigerGraph CoPilot is a natural language query service that allows users to ask questions about their graph data in plain English. The service uses a Large Language Model (LLM) to convert the user's question into a function call, which is then executed on the graph database. The service is designed to be easily extensible, allowing for the addition of new LLM providers and graph schemas. TigerGraph CoPilot consists of 3 components, InquiryAI (available now), SupportAI (available Q2 2024), and QueryAI (available Q4 2024).
InquiryAI is the first component of TigerGraph CoPilot. It is a natural language query service that allows users to ask questions about their graph data in plain English. The service uses a Large Language Model (LLM) to convert the user's question into a function call, which is then executed on the graph database. The service is designed to be easily extensible, allowing for the addition of new LLM providers and LangChain tools. InquiryAI is available now in alpha.
InquiryAI leverages a Large Language Model (LLM) to convert a user's question into a function call, which is then executed on the graph database. In order to do this, InquiryAI uses an AI agent that has two tools available to it: MapQuestionToSchema
and GenerateFunction
.
The MapQuestionToSchema
tool maps the user's question to the graph schema.
The GenerateFunction
tool generates a function call based on the user's question and the graph schema. The function is then executed on the graph database, then the result is processed by the LLM service and returned to the user. GenerateFunction
utilizes a vector RAG (Retrieval-Augmented Generation) process to suggest the most relevant functions to the user's question. Currently, the vector store used by GenerateFunction
is ephemeral, meaning that the available queries will not be persisted when the container is shut off. If the container dies, the desired queries available to the CoPilot service will have to be re-registered.
InquiryAI is currently in alpha and is being actively developed. The roadmap for InquiryAI is as follows:
-
February 2024: InquiryAI alpha is released, with support for OpenAI, GCP, and Azure hosted models. The service is packaged as a Docker container and is available via GitHub. To store the available queries and function calls, an ephemeral vector store (FAISS) is used. This means the available queries will not be persisted when the container is shut off. Support for LangChain and pyTigerGraph interaction with the CoPilot service is also available.
-
March 2024: InquiryAI will support the ability to connect with AWS hosted LLMs. LlamaIndex integration will begin.
-
April 2024: InquiryAI will move into beta stage. A persistent vector store will be implemented, allowing for the storage of available queries and function calls.
-
May 2024 +: The InquiryAI will support continous conversations, and the agent will be aware of previous questions and answers. Integration with common open-source models such as Llama and Mistral will be performed. InquiryAI will be available on TigerGraph Cloud by July 2024.
SupportAI is the second component of TigerGraph CoPilot. It is designed to ingest a set of documents, extract a knowledge graph from the information, and enable hybrid search of the documents and graph data through natural language queries. This functionality will enrich RAG (Retrieval-Augmented Generation) pipelines with graph data, enabling more accurate and informative responses to user queries. SupportAI is under active development and will be publicly released in alpha in Q2 2024. If you are interested in evaluating at its current phase, please contact us at [email protected].
QueryAI is the third component of TigerGraph CoPilot. It is designed to be used as a developer tool to help generate graph queries in GSQL from an English language description. This will enable developers to write GSQL queries more quickly and accurately, and will be especially useful for those who are new to GSQL. QueryAI is available in alpha Q4 2024.
- Copy the docker-compose file or clone the repo
- compose file: docker-compose.yml
- clone:
git clone https://github.com/tigergraph/Copilot
- Setup env files
touch configs/db_config.json configs/llm_config.json configs/milvus_config.json
- LLM config
- Tigergraph config
- Milvus config (optional)
- Logging config (optional)
docker compose up
- Use Copilot!
To configure the logging level of the service, edit the CoPilot service's LOGLEVEL
env variable in the docker-compose file. By default, the logging level is set to "INFO"
.
This line can be changed to support different logging levels. The levels are described below:
- CRITICAL: A serious error
- ERROR: Failing to perform functions
- WARNING: Indication of unexpected problems, e.g. failure to map a user's question to the graph schema
- INFO: Confriming that the service is performing as expected.
- DEBUG: Detailed information, e.g. the functions retrieved during the GenerateFunction step, etc.
- DEBUG_PII: Finer-grained information that could potentially include PII, such as a user's question, the complete function call (with parameters), and the LLM's natural language response.
- NOTSET: All messages are processed
In the configs/llm_config.json
file, copy your provider's JSON config template below, and fill out the appropriate fields.
In addition to the OPENAI_API_KEY
, llm_model
and model_name
can be edited to match your specific configuration details.
{
"model_name": "GPT-4",
"embedding_service": {
"embedding_model_service": "openai",
"authentication_configuration": {
"OPENAI_API_KEY": "YOUR_OPENAI_API_KEY_HERE"
}
},
"completion_service": {
"llm_service": "openai",
"llm_model": "gpt-4-0613",
"authentication_configuration": {
"OPENAI_API_KEY": "YOUR_OPENAI_API_KEY_HERE"
},
"model_kwargs": {
"temperature": 0
},
"prompt_path": "./app/prompts/openai_gpt4/"
}
}
Follow the GCP authentication information found here: https://cloud.google.com/docs/authentication/application-default-credentials#GAC and create a Service Account with VertexAI credentials. Then add the following to the docker run command:
-v $(pwd)/configs/SERVICE_ACCOUNT_CREDS.json:/SERVICE_ACCOUNT_CREDS.json -e GOOGLE_APPLICATION_CREDENTIALS=/SERVICE_ACCOUNT_CREDS.json
And your JSON config should follow as:
{
"model_name": "GCP-text-bison",
"embedding_service": {
"embedding_model_service": "vertexai",
"authentication_configuration": {}
},
"completion_service": {
"llm_service": "vertexai",
"llm_model": "text-bison",
"model_kwargs": {
"temperature": 0
},
"prompt_path": "./app/prompts/gcp_vertexai_palm/"
}
}
In addition to the AZURE_OPENAI_ENDPOINT
, AZURE_OPENAI_API_KEY
, and azure_deployment
, llm_model
and model_name
can be edited to match your specific configuration details.
{
"model_name": "GPT35Turbo",
"embedding_service": {
"embedding_model_service": "azure",
"azure_deployment":"YOUR_EMBEDDING_DEPLOYMENT_HERE",
"authentication_configuration": {
"OPENAI_API_TYPE": "azure",
"OPENAI_API_VERSION": "2022-12-01",
"AZURE_OPENAI_ENDPOINT": "YOUR_AZURE_ENDPOINT_HERE",
"AZURE_OPENAI_API_KEY": "YOUR_AZURE_API_KEY_HERE"
}
},
"completion_service": {
"llm_service": "azure",
"azure_deployment": "YOUR_COMPLETION_DEPLOYMENT_HERE",
"openai_api_version": "2023-07-01-preview",
"llm_model": "gpt-35-turbo-instruct",
"authentication_configuration": {
"OPENAI_API_TYPE": "azure",
"AZURE_OPENAI_ENDPOINT": "YOUR_AZURE_ENDPOINT_HERE",
"AZURE_OPENAI_API_KEY": "YOUR_AZURE_API_KEY_HERE"
},
"model_kwargs": {
"temperature": 0
},
"prompt_path": "./app/prompts/azure_open_ai_gpt35_turbo_instruct/"
}
}
"model_name": "Claude-3-haiku",
"embedding_service": {
"embedding_model_service": "bedrock",
"embedding_model":"amazon.titan-embed-text-v1",
"authentication_configuration": {
"AWS_ACCESS_KEY_ID": "ACCESS_KEY",
"AWS_SECRET_ACCESS_KEY": "SECRET"
}
},
"completion_service": {
"llm_service": "bedrock",
"llm_model": "anthropic.claude-3-haiku-20240307-v1:0",
"authentication_configuration": {
"AWS_ACCESS_KEY_ID": "ACCESS_KEY",
"AWS_SECRET_ACCESS_KEY": "SECRET"
},
"model_kwargs": {
"temperature": 0,
},
"prompt_path": "./app/prompts/aws_bedrock_claude3haiku/"
}
}
Copy the below into configs/db_config.json
and edit the hostname
and getToken
fields to match your database's configuration. Set the timeout, memory threshold, and thread limit parameters as desired to control how much of the database's resources are consumed when answering a question.
If you are running TigerGraph outside of docker compose, change the hostname to match its address (http://localhost
, https://your-TgCloud-hostname
). Once authentication is enabled in TigerGraph, set getToken to true
.
You can also disable the consistency_checker, which reconciles Milvus and TigerGraph data, within this config. It is true by default
{
"hostname": "http://tigergraph",
"getToken": false,
"default_timeout": 300,
"default_mem_threshold": 5000,
"default_thread_limit": 8,
"enable_consistency_checker": true
}
Note: Milvus is not necessary for deployment or development currently. Copy the below into configs/milvus_config.json
and edit the host
and port
fields to match your Milvus configuration (keeping in mind docker configuration). username
and password
can also be configured below if required by your Milvus setup. enabled="true" means use Milvus as the embedding store, enabled="false" means use FAISS - This will be removed eventually.
{
"host": "milvus-standalone",
"port": 19530,
"username": "",
"password": "",
"enabled": "true"
}
Copy the below into configs/log_config.json
and edit the appropriate values to suit your needs. The log rotation is based on size and backups are kept. These configurations are applied in the LogWriter to the standard python logging package. Operational and audit logs are recorded. Outputs include log.ERROR, log.INFO, and log.AUDIT-COPILOT
{
"log_file_path": "logs",
"log_max_size": 10485760,
"log_backup_count": 10
}
docker run -d -v $(pwd)/configs/llm_config.json:/llm_config.json -v $(pwd)/configs/db_config.json:/db_config.json --name copilot -p 80:80 tigergraphml/copilot:latest
There are five ways to interact with the TigerGraph CoPilot service: through the Swagger documentation page, through the pyTigerGraph client, through the LangChain client, or through the REST API.
There are two options to authenticate with the service. First is an username/password pair generated from the TigerGraph database. The second is a GSQL secret, also obtained from the database. However, when using the GSQL secret, the username field must be specified as __GSQL__secret
, with the password field containing the secret. If pyTigerGraph is being used and a connection is created with the gsqlSecret
parameter, this will already be done for you.
Navigate to http://localhost/docs
when the Docker container is running.
Navigate to http://localhost/graphname/chat
when the Docker container is running, where graphname is the name of the graph you want to query.
First, update pyTigerGraph to utilize the latest build:
pip install -U git+https://github.com/tigergraph/pyTigerGraph.git
Then, the endpoints are availble when configured with a TigerGraphConnection
:
from pyTigerGraph import TigerGraphConnection
# create a connection to the database
conn = TigerGraphConnection(host="DATABASE_HOST_HERE", graphname="GRAPH_NAME_HERE", username="USERNAME_HERE", password="PASSWORD_HERE")
### ==== CONFIGURE INQUIRYAI HOST ====
conn.ai.configureInquiryAIHost("INQUIRYAI_HOST_HERE")
### ==== RETRIEVE TOP-K DOCS FROM LIBRARY ====
# `top_k` parameter optional
conn.ai.retrieveDocs("How many papers are there?", top_k = 5)
### ==== RUN A NATURAL LANGUAGE QUERY ====
print(conn.ai.query("How many papers are there?"))
# prints: {'natural_language_response': 'There are 736389 papers.', 'answered_question': True, 'query_sources': {'function_call': "getVertexCount('Paper')", 'result': 736389}}
### ==== REGISTER A CUSTOM QUERY ====
# Prompt for PageRank query - could be read in as JSON file.
pr_prompt = {
"function_header": "tg_pagerank",
"description": "Determines the importance or influence of each vertex based on its connections to other vertices.",
"docstring": "The PageRank algorithm measures the influence of each vertex on every other vertex. PageRank influence is defined recursively: a vertex’s influence is based on the influence of the vertices which refer to it. A vertex’s influence tends to increase if either of these conditions are met:\n* It has more referring vertices\n* Its referring vertices have higher influence\nTo run this algorithm, use `runInstalledQuery('tg_pagerank', params={'v_type': 'INSERT_V_TYPE_HERE', 'e_type': 'INSERT_E_TYPE_HERE', 'top_k': INSERT_TOP_K_HERE})`, where the parameters are:\n* 'v_type': The vertex type to run the algorithm on.\n* 'e_type': The edge type to run the algorithm on.\n* 'top_k': The number of top scoring vertices to return to the user.",
"param_types": {
"v_type": "str",
"e_type": "str",
"top_k": "int"
}
}
# Register Query
conn.ai.registerCustomQuery(pr_prompt["function_header"], pr_prompt["description"], pr_prompt["docstring"], pr_prompt["param_types"])
# Run Query
print(conn.ai.query("What are the 5 most influential papers by citations?"))
# prints: {'natural_language_response': 'The top 5 most cited papers are:\n\n1. [Title of paper with Vertex_ID 428523]\n2. [Title of paper with Vertex_ID 384889]\n3. [Title of paper with Vertex_ID 377502]\n4. [Title of paper with Vertex_ID 61855]\n5. [Title of paper with Vertex_ID 416200]', 'answered_question': True, 'query_sources': {'function_call': "runInstalledQuery('tg_pagerank', params={'v_type': 'Paper', 'e_type': 'CITES', 'top_k': 5})", 'result': [{'@@top_scores_heap': [{'Vertex_ID': '428523', 'score': 392.8731}, {'Vertex_ID': '384889', 'score': 251.8021}, {'Vertex_ID': '377502', 'score': 149.1018}, {'Vertex_ID': '61855', 'score': 129.7406}, {'Vertex_ID': '416200', 'score': 129.2286}]}]}}
To use LangChain with InquiryAI, first install the LangChain fork here in your Python environment:
pip install git+https://github.com/langchain-ai/langchain.git
Then, you can get answers from the graph with the below:
import pyTigerGraph as tg
conn = tg.TigerGraphConnection(host="DATABASE_HOST_HERE", graphname="GRAPH_NAME_HERE", username="USERNAME_HERE", password="PASSWORD_HERE")
### ==== CONFIGURE INQUIRYAI HOST ====
conn.ai.configureInquiryAIHost("INQUIRYAI_HOST_HERE")
from langchain_community.graphs import TigerGraph
graph = TigerGraph(conn)
result = graph.query("How many servers are there?")
print(result)
# {'natural_language_response': 'There are 46148 servers.',
# 'answered_question': True,
# 'query_sources': {'function_call': 'getVertexCount(vertexType="BareMetalNode")',
# 'result': 46148}
The REST API can be used to interact with the service. The endpoints can be found on the Swagger documentation page.
If you wish to deploy TigerGraph CoPilot and disable the Swagger documentation page, you can do so by setting the PRODUCTION
environment variable to true
in the Docker run command.
docker run -d -v $(pwd)/configs/llm_config.json:/llm_config.json -v $(pwd)/configs/db_config.json:/db_config.json -e PRODUCTION=true --name copilot -p 80:80 tigergraphml/copilot:latest
TigerGraph CoPilot is designed to be easily extensible. The service can be configured to use different LLM providers, different graph schemas, and different LangChain tools. The service can also be extended to use different embedding services, different LLM generation services, and different LangChain tools. For more information on how to extend the service, see the Developer Guide.
If you want to use Weights And Biases, your API key needs to be set in an environment variable on the host machine.
export WANDB_API_KEY=KEY HERE
Make sure that all your LLM service provider configuration files are working properly. The configs will be mounted for the container to access.
docker build -f Dockerfile.tests -t copilot-tests:0.1 .
docker run -d -v $(pwd)/configs/:/ -e GOOGLE_APPLICATION_CREDENTIALS=/GOOGLE_SERVICE_ACCOUNT_CREDS.json -e WANDB_API_KEY=$WANDB_API_KEY -it --name copilot-tests copilot-tests:0.1
docker exec copilot-tests bash -c "conda run --no-capture-output -n py39 ./run_tests.sh all all"
docker-compose -f ./docker-compose.yml up -d --build
To edit what tests are executed, one can pass arguments to the ./run_tests.sh
script. Currently, one can configure what LLM service to use (defaults to all), what schemas to test against (defaults to all), and whether or not to use Weights and Biases for logging (defaults to true). Instructions of the options are found below:
The first parameter to run_tests.sh
is what LLMs to test against. Defaults to all
. The options are:
all
- run tests against all LLMsazure_gpt35
- run tests against GPT-3.5 hosted on Azureopenai_gpt35
- run tests against GPT-3.5 hosted on OpenAIopenai_gpt4
- run tests on GPT-4 hosted on OpenAIgcp_textbison
- run tests on text-bison hosted on GCP
The second parameter to run_tests.sh
is what graphs to test against. Defaults to all
. The options are:
all
- run tests against all available graphsOGB_MAG
- The academic paper dataset provided by: https://ogb.stanford.edu/docs/nodeprop/#ogbn-mag.DigtialInfra
- Digital infrastructure digital twin datasetSynthea
- Synthetic health dataset
If you wish to log the test results to Weights and Biases (and have the correct credentials setup above), the final parameter to run_tests.sh
automatically defaults to true. If you wish to disable Weights and Biases logging, use false
.
To add a new InquiryAI test suite to TigerGraph CoPilot, read the documentation here.
If you would like to contribute to TigerGraph CoPilot, please read the documentation here.