GPTZoo

GPTZoo is a large-scale dataset designed to support academic research on GPTs. This repository contains 730,420 instances of GPTs, each with rich metadata, instructions, knowledge files, and information on third-party services used during its development.

To promote open research and innovation, the GPTZoo dataset will undergo continuous updates.

Overview

GPTZoo aims to provide researchers with a comprehensive resource to study the real-world applications, performance, and potential of GPTs. The dataset includes:

Metadata: 21 attributes describing each GPTs instance.
Instructions: Detailed prompt instructions used to create each GPTs instance.
Knowledge files: Supporting documents and files used during the development of each GPTs instance.
Third-party services: Information on external services integrated with each GPTs instance.

Due to copyright and ethical considerations, we partially open access to the instructions, knowledge files, and third-party services data. If you require full access for scientific research purposes, please fill out the Google Form. We will respond as soon as possible.

Getting Started

Prerequisites

Ensure you have the following prerequisites installed:

Installation

Clone the repository to your local machine:

git clone https://github.com/security-pride/GPTZoo.git
cd GPTZoo

Install the required Python packages:

pip install -r requirements.txt

Usage

Command-Line Help

The CLI supports keyword-based searching of the dataset. To use the CLI, navigate to the repository directory and run:

python gptzoo.py -help

Data Retrieval

Retrieve GPT instances based on specific criteria:

python gptzoo.py -search --tags "programming" "software guidance" --description "software development"

Data Analysis

Analyze specific subsets of the dataset:

python gptzoo.py -analyze --name "Unknown" --chat_count

Dataset Structure

The dataset is structured as follows:

GPTZoo
├── automated_cli/
│      ├── data_analysis.py
│      ├── data_retrieval.py
│      └── help.py
├── crawling/
│      ├── crawl_links.py
│      ├── crawl_metadata.py
│      ├── links.txt
│      └── try_gpt_links/
├── data_processing/
│      ├── deduplication.py
│      ├── standardization.py
│      └── statistical_analysis/
│            ├── chat_count/
│            │      ├── chat_count.xlsx
│            │      └── export_chat_count.py
│            ├── description/
│            │      ├── description.py
│            │      ├── description.txt
│            │      ├── wordcloud.pdf
│            │      └── wordcloud.py
│            ├── rating/
│            └── tags/
├── dataset/
│      ├── meta_info_0.json
│      ├── meta_info_1.json
│      ├── ...
│      ├── meta_info_41.json
│      └── meta_info_42.json
├── gptzoo.py
├── requirements.txt
└── result/

Contributing

We welcome contributions from the community. Please feel free to open an issue or submit a pull request.

Acknowledgement

We would like to acknowledge GPTs App and the OpenAI GPT Store as the sources of the data used in this project.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

The collection of GPTZoo dataset relates to additional works performed by our research group. If you find GPTZoo useful, please consider citing our paper:

@article{zhao2024llm,
  title={LLM App Store Analysis: A Vision and Roadmap},
  author={Zhao, Yanjie and Hou, Xinyi and Wang, Shenao and Wang, Haoyu},
  journal={arXiv preprint arXiv:2404.12737},
  year={2024}
}

@article{su2024gpt,
  title={GPT Store Mining and Analysis},
  author={Su, Dongxun and Zhao, Yanjie and Hou, Xinyi and Wang, Shenao and Wang, Haoyu},
  journal={arXiv preprint arXiv:2405.10210},
  year={2024}
}

@article{hou2024gptzoo,
  title={GPTZoo: A Large-scale Dataset of GPTs for the Research Community},
  author={Hou, Xinyi and Zhao, Yanjie and Wang, Shenao and Wang, Haoyu},
  journal={arXiv preprint arXiv:2405.15630},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPTZoo

Overview

Getting Started

Prerequisites

Installation

Usage

Command-Line Help

Data Retrieval

Data Analysis

Dataset Structure

Contributing

Acknowledgement

License

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
automated_cli		automated_cli
crawling		crawling
data_processing		data_processing
dataset		dataset
.gitattributes		.gitattributes
README.md		README.md
gptzoo.py		gptzoo.py
requirements.txt		requirements.txt

security-pride/GPTZoo

Folders and files

Latest commit

History

Repository files navigation

GPTZoo

Overview

Getting Started

Prerequisites

Installation

Usage

Command-Line Help

Data Retrieval

Data Analysis

Dataset Structure

Contributing

Acknowledgement

License

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages