Note
A dockerized healthcare data generator based on Synthea
- Git clone this repository into your preferred directory.
- Run
make setup
to download the necessary dependencies. - Modify the properties for synthetic data generation in
app/synthea.properties
if needed. - Run
make build
to build the docker image. - Run
make generate p=20
to generate 20 synthetic patients data, with specified number of patient to be generated. - Otherwise, run
make generate
to generate a default of 10 synthetic patients data. - Generated patients data will be stored in the
/data
folder.
- Pull image with
docker pull jackleejm/healthcare-data-gen:0.1.0
. - Create a temporary folder called
data/
. - Run
make generate
ormake generate p=20
to generate synthetic patient data.
- The default generated data should assume the following folder structure with the standard unchanged
synthea.properties
config file. - More details on the specifics of the CSVs, please visit the official Synthea's wiki.
data/
├── csv/
│ ├── allergies.csv
│ ├── careplans.csv
│ ├── claims.csv
│ ├── claims_transactions.csv
│ ├── conditions.csv
│ ├── devices.csv
│ ├── encounters.csv
│ ├── imaging_studies.csv
│ ├── immunizations.csv
│ ├── medications.csv
│ ├── observations.csv
│ ├── organizations.csv
│ ├── patients.csv
│ ├── payer_transitions.csv
│ ├── payers.csv
│ ├── procedures.csv
│ ├── providers.csv
│ └── supplies.csv
├── fhir/
│ ├── hospitalInformation1717732754594.json
│ └── practitionerInformation1717732754594.json
├── metadata/
│ └── 2024_06_07T03_59_14Z_100_Massachusetts_59f5e3a2_c0f2_4600_a4df_21d854747e53.json
└── symptoms/
└── csv/
└── symptoms.csv