Large Dataset Generator

This Python script generates a large dataset and saves it to a CSV file in chunks. It is designed to handle the creation of datasets with potentially millions of records without overwhelming memory resources.

Description

The script uses the Pandas and NumPy libraries to generate a dataset with specified features and labels. It saves the data in chunks to ensure efficiency and manageability for large datasets.

Features

Generate a customizable number of records.
Save the dataset in chunks for efficient memory management.
Automatically label data for potential use in machine learning training.

Requirements

Python 3.x
Pandas
NumPy

Install the required packages using pip:

pip install pandas numpy

Usage
To run the script and generate data, you can specify the number of records you want to generate. 
For example, to generate 1,000,000 records:

python <script_name>.py
You can adjust the num_records parameter in the script to change the number of records generated.

Configuration
num_records: Total number of records to generate (default is 1,000,000).
chunk_size: Number of records per chunk (default is 100,000).
Output
The script outputs a CSV file named large_dataset.csv containing the following columns:

ID
Feature1
Feature2
Feature3
Label

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
dataset.py		dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Dataset Generator

Description

Features

Requirements

About

Releases

Packages

Languages

54J4N/Dataset_cvs_Creations

Folders and files

Latest commit

History

Repository files navigation

Large Dataset Generator

Description

Features

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages