Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve naming of state machines and executions #46

Open
3 tasks
alesolano opened this issue Oct 18, 2021 · 0 comments
Open
3 tasks

Improve naming of state machines and executions #46

alesolano opened this issue Oct 18, 2021 · 0 comments
Assignees
Labels
mlops MLOps related issues not-agent Here we don't affect the agent perse

Comments

@alesolano
Copy link
Member

Description

The naming of AWS state machines and executions can be improved. Let's do it as @DelgadoPanadero cleverly proposed:

  • State machines are linked to branches. The name of the branch must be linked to the experiment name.
  • Executions are linked to commits. The commit id must be linked to the execution name.

A reflection about ML experiments that came up when writing this issue.

Experiments
An experiment is a well-defined change in our model with an uncertain result.

The goal is often to see how some metrics vary when a small set of parameters change in a particular direction. The metrics are often related with the accuracy or performance of the model. The change in parameters can be anything, from playing with different values of the learning rate to a adding a new preprocessing technique. This change (this experiment), though, can't be too huge. For example, if we want to move from using behavior trees to using Q-learning, even if we measure our model performance with the same metrics, we'd better create a different repo.

Experiment side

Issue -> Branch -> State machine

In order to have a good track of all the experiments that we try, we should first and foremost explain our intention in an issue. Which parameters are we going to change/add and which metrics are we going to track. Then, we should create a new branch with the name experiment/<issue number>-first-words-of-the-issue-title.

Job
A job is the experiment put to the test.

With our experiment well defined and our code modified accordingly, it's time to run it on a machine. This machine will be on the cloud for two reasons:

  • Reproducibility. We want to make sure that the experiment can be run by anyone and the results are not subject to one's computer. Docker and the cloud are our friends here.
  • Speed. We want to have our results as fast as possible. Often, our experiments involve training, and training takes time and resources. Using the cloud we can pick a powerful machine and complete the experiment fast and without freezing our laptop. This also allow us to run several experiments in parallel, saving us more time.
Job side

Commit -> Execution

A job can have multiple outputs (artifacts) but the essential one is the logs file. In that logs file we will check how our metrics behaved in our experiment. Did the accuracy improved? Did the inference run faster? Additionally, in training jobs, we can also have the resulting models as artifacts.

Jobs are mainly defined by:

  • A Docker image that holds the dependencies and code at a particular commit.
  • The type of machine that will run the job.
  • The address where the artifacts will be saved.

AWS: state machines and executions
We have picked AWS as our cloud provider and we need to stick to its way of managing jobs. In order to create and run a job, we need to encapsulate it in a state machine. A state machine is a sequence of tasks defined by a JSON. Normally, we will only have one main task in our state machine: our job. AWS defines different types of jobs (processing, inference, training...) depending on the inputs and artifacts they produce. Right now, we will focus mainly on training jobs.

Creating a state machine is not the same as running it. A state machine can be run several times and each time—each execution—has a different name.

image

Naming guidelines
Aaand, after this extensive reflection, the conclusion that we wanted to reach: the naming guidelines.

  • A state machine is linked to a particular experiment (branch). The name of the state machine must be linked to the name of the issue that defines the experiment. Since our branch name is already linked to that issue, we can pass a variation of this name automatically to AWS via GitHub actions.
  • An execution is linked to a particular job (commit). The name of this execution will be linked with the commit id. Again, we can use GitHub actions for this.

Requirements

Things you need in order to complete the issue.

Acceptance Criteria

Criteria that must be met in order to close the issue successfully.

  • The name of the state machine is set automatically, not by input.
  • The name of the state machine is linked to the branch name.
  • The name of the execution is linked to the commit id.
@alesolano alesolano added the not-agent Here we don't affect the agent perse label Oct 18, 2021
@alesolano alesolano self-assigned this Oct 18, 2021
@alesolano alesolano added the mlops MLOps related issues label Oct 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlops MLOps related issues not-agent Here we don't affect the agent perse
Projects
None yet
Development

No branches or pull requests

1 participant