Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand GPU instances or pick new ones for GPU CI #1152

Closed
mikemhenry opened this issue Jan 31, 2023 · 4 comments · Fixed by #1065
Closed

Expand GPU instances or pick new ones for GPU CI #1152

mikemhenry opened this issue Jan 31, 2023 · 4 comments · Fixed by #1065
Assignees
Labels
priority: high priority high tests Unit tests

Comments

@mikemhenry
Copy link
Contributor

Right now we are getting a lot of these errors: InsufficientInstanceCapacity: We currently do not have sufficient p2.xlarge capacity in the Availability Zone you requested (***d). Our system will be working on provisioning additional capacity. You can currently get p2.xlarge capacity by not specifying an Availability Zone in your request or choosing ***a, ***b, ***c, ***e.

I will first see if I can expand our availability zones, and if that fails, use a slightly more expensive GPU instance.

@mikemhenry mikemhenry added tests Unit tests priority: high priority high labels Jan 31, 2023
@mikemhenry mikemhenry added this to the 0.10.2 Bugfix release milestone Jan 31, 2023
@mikemhenry mikemhenry self-assigned this Jan 31, 2023
@jchodera
Copy link
Member

jchodera commented Feb 1, 2023

g4dn.xlarge instances are about half the price of p2.xlarge instances and feature an NVIDIA T4, which is more modern than (but relatively comparable to) the K80 from the p2.xlarge instances. Either probably works well for us!

@ijpulidos
Copy link
Contributor

I tried changing the instance type to use g4dn.xlarge and now we don't get the capacity error but the registering error that we have discussed before as in https://github.com/choderalab/perses/actions/runs/4168318343/jobs/7214977022#step:3:56

I tried changing some things in how we are using the runner, based on what's discussed in machulav/ec2-github-runner#127 without success.

@ijpulidos
Copy link
Contributor

For the consistency tests in #1065 we are already using g4dn.xlarge. @mikemhenry can you confirm this is no longer an issue and that we can close it? Thanks!

@mikemhenry
Copy link
Contributor Author

Yes, once we merge in #1065 this issue will be resolved

@mikemhenry mikemhenry linked a pull request Mar 6, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high priority high tests Unit tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants