diff --git a/README.md b/README.md index 24b7caa..26edcca 100644 --- a/README.md +++ b/README.md @@ -1,66 +1,166 @@ -# Genesys +

+

+Prime Intellect -## install +--- + +

+GENESYS: Reasoning Data Generation & Verification +

+

+| Blog | X Thread | SYNTHETIC-1 Dashboard | +

+ +--- + + +Genesys is a library for synthetic reasoning data generation and verification, used to generate [SYNTHETIC-1](). + +The library has two main entrypoints: +- `src/genesys/generate.py` is used to sample responses to tasks from a given dataset using a teacher model. +- `src/genesys/verify.py` is used to verify responses and assign rewards using verifiers -quick install (only will work when repo got open source) -``` -curl -sSL https://raw.githubusercontent.com/PrimeIntellect-ai/genesys/main/scripts/install/install.sh | bash -``` +# Usage + +## Installation + +**Quick Install:** Run the following command for a quick install: ``` -curl -LsSf https://astral.sh/uv/install.sh | sh -source $HOME/.local/bin/env -git clone git@github.com:PrimeIntellect-ai/genesys.git -cd genesys -uv sync --extra sglang +curl -sSL https://raw.githubusercontent.com/PrimeIntellect-ai/genesys/main/scripts/install/install.sh | bash ``` -## run +## Data Generation -This is a short run to test if the repo is installed correctly +To check that your installation has succeeded, you can run the following command to generate data with a small model: ``` +# with config file; see /configs for all configurations uv run python src/genesys/generate.py @ configs/debug.toml + +# otherwise, with --flags +uv run python src/genesys/generate.py \ + --name_model "Qwen/Qwen2.5-Coder-0.5B" \ + --num_gpus 1 \ + --sample_per_file 8 \ + --temperature 0.6 \ + --max_tokens 16384 \ + --data.max_samples 16 \ + --data.batch_size 8 \ + --data.path "PrimeIntellect/verifiable-math" # Follow the schema from "PrimeIntellect/verifiable-math", "PrimeIntellect/verifiable-coding", etc. ``` -For pushing the data to s3/gcp bucket, you have to download a service account key file with the permission to push to your bucket, encode it to base64 and set the encoded file as `GCP_CREDENTIALS_BASE64`. Then you can specify your bucket via the `--gcp_bucket` flag: +Your file with responses will be saved to `/output`. + +**Run with Docker:** You can also generate data using the docker image: ``` -export GCP_CREDENTIALS_BASE64=$(base64 -w 0 /path/to/your/service-account-key.json) -uv run python src/genesys/generate.py @ configs/debug.toml --gcp_bucket checkpoints_pi/test_data +sudo docker run --gpus all -it primeintellect/genesys:latest uv run python src/genesys/generate.py @ configs/debug.toml ``` -automatically detect the right model to run -```sh -./script/entrypoint.sh -``` +## Verification -for dev setup: +To verify model responses, you can use the `src/genesys/verify.py` script along with the output file from `src/genesys/generate.py` located in `output`. ``` -uv run pre-commit install +uv run python src/genesys/verify.py --file # output file is usually at /output/out_.jsonl ``` +The verification loop runs asynchronously to parallelize verification and speed up processing. . -### running test +## Adding your own Tasks & Verifiers +Genesys is built to be easily extendable for your own tasks & verifiers. You can generate responses for your own data by using a Huggingface dataset with our schema and add your own verifier with minimal code. + +### Using your own Data + +To generate data from your own tasks using `src/genesys/generate.py`, you should pass a huggingface dataset with the same schema as `PrimeIntellect/verifiable-math` or others. This is what the schema looks like: + +```python +class Task(BaseModel): + problem_id: str + source: str # source of the dataset + task_type: str # this will be used to map the response to a verifier + in_source_id: Optional[str] + prompt: str + gold_standard_solution: Optional[str] + verification_info: Dict # dict is empty if no data needed + metadata: Dict # dict is empty if no data needed ``` -uv run pytest + +The output from the generation script is a `.jsonl` file with each line containing a `Response` object: + +```python +class Response(BaseModel): + problem_id: str + source: str + task_type: str + in_source_id: Optional[str] + prompt: str + gold_standard_solution: Optional[str] + verification_info: Dict + metadata: Dict + llm_response: str # llm response string + response_id: str + model_name: str + generation_config: Dict # sampling parameters ``` +### Adding a Verifier -## Docker +To implement a verifier, you have to 1) add a verifier class implementing a `verify` function that receives a `Response` object and 2) add the verifier to a verifier registry. +You can implement your own verifier in `src/genesys/verifiers`: +```python +from genesys.schemas import Response +from genesys.verifiers.base_verifier import BaseVerifier -build +class LengthVerifier(BaseVerifier): + max_parallel = 30 # how many times the task should be executed in parallel max - relevant when doing LLM calls with rate limits + timeout = 10 # timeout in seconds - when running LLM generated code, we want to avoid it getting stuck in an infinite loop -``` -sudo docker build -t primeintellect/genesys:latest . -``` + # optional: __init__ function for set up (e.g. LLM API client) + + def verify(self, result: Response): + """ + Required: this example verify function checks the length of the llm response + and rewards responses over a threshold specified in the dataset. -run + The output should be a dict with a score from 0-1 and a 'verification_result_info' dict containing metadata + """ + response = result["llm_response"] + threshold_small = result["verification_info"]["length_threshold_small"] + threshold_large = result["verification_info"]["ength_threshold_large"] + + if len(response) > threshold_large: + score = 1.0 + elif len(response) > threshold_small: + score = 0.5 + else: + score = 0.0 + + return dict(score=score, verification_result_info={}) # add metadata if needed in verification_result_info + + # optional: shutdown() function for termination, gets called after all responses are verified. For instance, we use this function in the code verifier to shut down docker images ``` -sudo docker run --gpus all -it primeintellect/genesys:latest uv run python src/genesys/generate.py @ configs/debug.toml + +Now, add your verifier to the verifier registry in `src/genesys/verifiers/registry.py`: +```python +from genesys.verifiers.code_test_verifier import CodeVerifier +from genesys.verifiers.math_verifier import MathVerifier +from genesys.verifiers.llm_judge_verifier import LlmJudgeVerifier +from genesys.verifiers.code_output_prediction_verifier import CodeUnderstandingVerifier +from genesys.verifiers.length_verifier import LengthVerifier # your verifier + +VERIFIER_REGISTRY = { + "verifiable_code": CodeVerifier, + "verifiable_math": MathVerifier, + "llm_judgeable_groundtruth_similarity": LlmJudgeVerifier, + "verifiable_code_understanding": CodeUnderstandingVerifier, + "length_adherance": LengthVerifier +} ``` + +Every task from your dataset with `"task_type": "length_adherance"` will now be verified with the implemented length verifier when running `src/genesys/verify.py`.