Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with lambda labs machines #285

Merged
merged 4 commits into from
Jun 6, 2024
Merged

Conversation

luke-lombardi
Copy link
Contributor

@luke-lombardi luke-lombardi commented Jun 5, 2024

  • Add optional skip tls verification flag (disabled by default)
  • Encapsulate remote config creation into a helper function
  • Clean up pool_external and ensure gpu count is set properly on remote machines
  • Pass remote config as JSON for simplicity
  • Remove secret mount stuff as it's no longer needed if passing config as JSON into an env var
  • Add total cpu / total mem / total gpu count fields to track workers maximum capacity

@luke-lombardi luke-lombardi requested a review from jsun-m June 6, 2024 00:04
@luke-lombardi luke-lombardi changed the title Fix a bunch of issues with lambda labs machines Fix issues with lambda labs machines Jun 6, 2024
@@ -199,10 +204,12 @@ func (wpc *ExternalWorkerPoolController) createWorkerOnMachine(workerId, machine
return nil, err
}

log.Printf("Created worked: %+v\n", worker)

return worker, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@luke-lombardi luke-lombardi merged commit 147c623 into main Jun 6, 2024
2 checks passed
@luke-lombardi luke-lombardi deleted the ll/remote-machine-bugfixes branch June 6, 2024 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants