DVC MLOps tutorial

Set up python environment


conda env create --name mlops_env
pip install -r requirements.txt

If starting a new project:

Initialize DVC

https://dvc.org/doc/command-reference/init

git init
dvc init

Commit dvc files to git

git add .dvc/.gitignore
git add .dvc/config
git add .dvcignore
git commit -m "Initialize dvc"

Add the data to dvc tracking

dvc add data/..

Then add the data/*.dvc files to git tracking

Push the data to a gdrive remote folder

Need to fitst install the dvc library for the remote server

pip install dvc_gdrive

dvc remote add --default drive gdrive://<Folder ID>
dvc remote modify drive gdrive_acknowledge_abuse true

This will ask you to authorize your google account access and save your credentials to a gdrive_credentials.json. The .dvc/config file gets updated to reflect the remote directory.

Push the data to the remote directory

dvc push

On another machine/remote server

Git clone the DVC repository and pull the data

Pull the data:

dvc pull

If not able to authorize accessvia the internet, you can point dvc remote to the location of the credential file:

dvc remote modify gdrive gdrive_user_credentials_file ..\gdrive_credentials.json

DVC pipelines and experiment tracking

Add stages to a dvc.yaml file

Option 1: Through the command line

dvc stage add --name preprocess 
--deps data/MontgomerySet --deps data/ChinaSet_AllFiles 
--outs data/datalist.csv
python src/pipline/preprocess.py

Option 2: Manually add to dvc.yaml

preprocess:
  cmd: python src/pipeline/preprocess.py
  deps:
  - data/MontgomerySet
  - data/ChinaSet_AllFiles 
  outs:
  - data/datalist.csv

You can also add parameter dependencies

train:
  cmd: python src/pipeline/train_dvc.py --params src/pipeline/params.yaml
  deps:
  - ./data/datalist.csv
  params:
  - ./src/pipeline/params.yaml:
    - dataset.data_dir
    - training_parameter.batch_size
    - training_parameter.learning_rate
    - network_parameter.input_size
    - network_parameter.num_classes
    - dataset.num_workers

Running the pipeline

Option 1: Without experiment tracking:

dvc repro

Option 2: With experiment tracking

First you need to install the dvc library for experiment tracking: DVCLive

pip install dvclive

DVCLive has a logger that supports pytorch lightning

To run the experiment:

dvc exp run --name NAME
dvc exp run --name --set-params training_parameter.batch_size=6

Sharing experiments

dvc exp list --all-commits # View all experiments

dvc exp push [git_remote] [experiment_name] --rev [can specify commit]

dvc exp list -all origin # See the experiments that exist in the remote repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DVC MLOps tutorial

Set up python environment

If starting a new project:

Initialize DVC

Commit dvc files to git

Add the data to dvc tracking

Push the data to a gdrive remote folder

On another machine/remote server

Git clone the DVC repository and pull the data

DVC pipelines and experiment tracking

Add stages to a dvc.yaml file

Option 1: Through the command line

Option 2: Manually add to dvc.yaml

You can also add parameter dependencies

Running the pipeline

Option 1: Without experiment tracking:

Option 2: With experiment tracking

Sharing experiments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.dvc		.dvc
data		data
dvclive		dvclive
src		src
.dvcignore		.dvcignore
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
dvc_complete.yaml		dvc_complete.yaml
requirements.txt		requirements.txt

sadhana-r/DVC_MLFlow_Tutorial

Folders and files

Latest commit

History

Repository files navigation

DVC MLOps tutorial

Set up python environment

If starting a new project:

Initialize DVC

Commit dvc files to git

Add the data to dvc tracking

Push the data to a gdrive remote folder

On another machine/remote server

Git clone the DVC repository and pull the data

DVC pipelines and experiment tracking

Add stages to a dvc.yaml file

Option 1: Through the command line

Option 2: Manually add to dvc.yaml

You can also add parameter dependencies

Running the pipeline

Option 1: Without experiment tracking:

Option 2: With experiment tracking

Sharing experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages