conda env create --name mlops_env
pip install -r requirements.txt
https://dvc.org/doc/command-reference/init
git init
dvc init
git add .dvc/.gitignore
git add .dvc/config
git add .dvcignore
git commit -m "Initialize dvc"
dvc add data/..
Then add the data/*.dvc files to git tracking
Need to fitst install the dvc library for the remote server
pip install dvc_gdrive
dvc remote add --default drive gdrive://<Folder ID>
dvc remote modify drive gdrive_acknowledge_abuse true
This will ask you to authorize your google account access and save your credentials to a gdrive_credentials.json. The .dvc/config file gets updated to reflect the remote directory.
Push the data to the remote directory
dvc push
Pull the data:
dvc pull
If not able to authorize accessvia the internet, you can point dvc remote to the location of the credential file:
dvc remote modify gdrive gdrive_user_credentials_file ..\gdrive_credentials.json
dvc stage add --name preprocess
--deps data/MontgomerySet --deps data/ChinaSet_AllFiles
--outs data/datalist.csv
python src/pipline/preprocess.py
preprocess:
cmd: python src/pipeline/preprocess.py
deps:
- data/MontgomerySet
- data/ChinaSet_AllFiles
outs:
- data/datalist.csv
train:
cmd: python src/pipeline/train_dvc.py --params src/pipeline/params.yaml
deps:
- ./data/datalist.csv
params:
- ./src/pipeline/params.yaml:
- dataset.data_dir
- training_parameter.batch_size
- training_parameter.learning_rate
- network_parameter.input_size
- network_parameter.num_classes
- dataset.num_workers
dvc repro
First you need to install the dvc library for experiment tracking: DVCLive
pip install dvclive
DVCLive has a logger that supports pytorch lightning
To run the experiment:
dvc exp run --name NAME
dvc exp run --name --set-params training_parameter.batch_size=6
dvc exp list --all-commits # View all experiments
dvc exp push [git_remote] [experiment_name] --rev [can specify commit]
dvc exp list -all origin # See the experiments that exist in the remote repo