Use max of instance type per ASG #333

thorro · 2019-03-25T08:51:00Z

Issue type

Feature Idea

At the moment, cheapest spot instance is used, which is fine. Issue arises when AWS reclaims one instance type, then all of those instances can go away at the same time.

We are willing to sacrifice some cost savings by diversifying instances more.

A new setting could be defined as:

max_instances_of_same_type = number or percentage

So AutoSpotting would only launch instances up to that number. After that, it would look for second cheapest option and so on.

cristim · 2019-03-25T09:29:29Z

@thorro Thanks for reporting this issue.

We used to have such a feature hardcoded into the logic, automatically switching the instance type if more than 20% of the instances were in the same AZ/type combination and would be outbid and then terminated if the spot price increased.

About a year ago we removed that because AWS is no longer terminating all the instances at once, but randomly claiming instances from a given instance type over time, regardless of the spot price and bid, which hopefully allows us to launch another instance with a different instance type.

Ever since this was changed I haven't seen anyone complain that all their instances are gone.

Have you actually seen this happen in practice?

I would be open to re-add this, and make it configurable as you suggested, as long as

there are enough people who report this issue
someone would contribute a PR for implementing it

Alternatively I can also implement it if at least a couple of Patrons are asking for it.

thorro · 2019-03-25T09:57:00Z

Hi @cristim

I don't have that much experience with this, as we don't run that many spot instances yet.

About five days ago AWS terminated all 3 instances of the same type in one ASG. I think they were all t3.2xlarge.

But I've noticed just on friday AWS terminated one spot instance of i3.4xlarge type, other four kept running. So this looks more like the case you describe.

I think it may depend on the instance count they need. If they need a lot of them, all or almost all could go down. If not they take one here and there, not to upset a single customer too much.

Could you post the removed hardcoded code or a link to a commit as it would be a good starting point for our own mods. Thanks.

cristim · 2019-03-25T10:34:08Z

Considering how the spot market works we can't exclude such scenarios, especially for popular instance types where there may be a lot of churn. Did your group lose all the capacity before any new instances were started?

It's definitely better to be prepared for this if possible, and as I said we can have this brought back if enough people complain about it.

As for the code, have a look around here:

https://github.com/AutoSpotting/AutoSpotting/blob/20fced19162c4ee1de87852fc7297e1bcf6c8353/core/instance.go#L147-L160

thorro · 2019-03-25T10:59:39Z

That ASG lost all capacity, luckily for us it was not a production workload.

Hope some more people chime in. Thanks for the code pointer, will brush up on my Go skills. :)

cristim · 2019-04-12T14:57:24Z

@ChienHuey just volunteered on Github to implement this as part of a hackathon.
A few things I mentioned that may make it a bit more challenging

I'd love for it to be configurable similarly with how AWS does it for the mixed spot ASGs
maybe that configurability work may need some more time than a full day of work
basically to be able to toggle it on/off using stack parameters and override using tags like we have for other config options, but also to control the level of instance type spread per type/AZ combination maybe defaulting to 2 when enabled
but the value 2 to be configurable to more if wanted so
also via stack params and overrideable by tags

cristim · 2019-09-18T16:05:06Z

@ChienHuey do you have any progress on this work?

cristim · 2023-03-06T16:24:41Z

@thorro is this issue still of interest to you?

@ChienHuey let me know if you're still interested to work on this.

thorro · 2023-03-06T16:35:11Z

@cristim no, we don't use Autospotting at the moment.

cristim · 2023-03-06T16:41:41Z

Thanks @thorro!

I'd love to learn your reasons why and what you're using instead, as well as any other feedback about AutoSpotting you may have.

BTW, Last week I released this open source Spot savings estimator tool https://github.com/LeanerCloud/savings-estimator/ 

I hope you find it useful and I'd also love to hear some honest feedback about it.

thorro · 2023-03-07T08:55:14Z

We've moved to EKS Managed Node Groups, don't know if Autospotting would work with that at all?

Will check out the tool, thanks!

cristim · 2023-03-07T15:40:29Z

Yes, it should work but not if you configured them to use Spot. I'm intentionally skipping those in order to not interfere or cause race conditions

lenucksi added Status: Help wanted Component: Core Logic labels Mar 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use max of instance type per ASG #333

Use max of instance type per ASG #333

thorro commented Mar 25, 2019

cristim commented Mar 25, 2019 •

edited

Loading

thorro commented Mar 25, 2019

cristim commented Mar 25, 2019 •

edited

Loading

thorro commented Mar 25, 2019

cristim commented Apr 12, 2019

cristim commented Sep 18, 2019

cristim commented Mar 6, 2023

thorro commented Mar 6, 2023

cristim commented Mar 6, 2023

thorro commented Mar 7, 2023

cristim commented Mar 7, 2023

Use max of instance type per ASG #333

Use max of instance type per ASG #333

Comments

thorro commented Mar 25, 2019

Issue type

cristim commented Mar 25, 2019 • edited Loading

thorro commented Mar 25, 2019

cristim commented Mar 25, 2019 • edited Loading

thorro commented Mar 25, 2019

cristim commented Apr 12, 2019

cristim commented Sep 18, 2019

cristim commented Mar 6, 2023

thorro commented Mar 6, 2023

cristim commented Mar 6, 2023

thorro commented Mar 7, 2023

cristim commented Mar 7, 2023

cristim commented Mar 25, 2019 •

edited

Loading

cristim commented Mar 25, 2019 •

edited

Loading