Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] What cloud providers can we use for bare-metal dev & testing? #13

Open
yonch opened this issue Nov 25, 2024 · 6 comments
Open

[RFC] What cloud providers can we use for bare-metal dev & testing? #13

yonch opened this issue Nov 25, 2024 · 6 comments

Comments

@yonch
Copy link
Contributor

yonch commented Nov 25, 2024

We'd like to have a list of suggested cloud providers that have affordable bare-metal machines that support memory bandwidth and cache monitoring.

These providers can be used for our CI system to test PRs automatically for contributors, and also for contributors to perform development (if they don't have a machine with hardware support).

Criteria:

  • Spin-up times: Should be almost instantaneous (shouldn't need a human in the loop to provision)
  • Minimum commitment: We want at most hourly requirements, ideally provider charges per second (like AWS)
  • Supported CPU. At this point the majority of documentation and resources are for Intel platforms, so let's at least require a CPU that supports Intel.

"Definition of Done":
There is a document in docs/ that describes the found providers, instance types that support RDT, and their cost (with note of when pricing was last reviewed).

@omokpo
Copy link

omokpo commented Nov 26, 2024

I was successful finding RDT support for both monitoring and Allocation from https://www.latitude.sh/

c2.large.x86
CPU: Dual Silver 4210, 20 Cores @ 2.2 GHz
RAM: 128 GB RAM
STR: 2 x 1 TB SSD
NIC: 10 Gbps

I will try other baremetal providers I know that offer hourly Baremetal and start the doc. I’ll make a PR this week.

@yonch
Copy link
Contributor Author

yonch commented Nov 26, 2024

Thanks! I don't see c2.large.x86 on the pricing list, is it still available?

@omokpo
Copy link

omokpo commented Nov 27, 2024

Thanks! I don't see c2.large.x86 on the pricing list, is it still available?

It was earlier. It seems like it is sold out - https://www.latitude.sh/dashboard/opensource/memory-collector/usage/billing

@yonch
Copy link
Contributor Author

yonch commented Nov 27, 2024

GCP

Bare metal support

It seems that https://cloud.google.com/bare-metal is titled "Bare Metal Solution for Oracle" and might require talking to sales.

There is a feature not to share nodes with others, sole tenancy, but workloads still run under a hypervisor, so RDT support seems to boil down to whether the hypervisor allows some VM sizes to access RDT.

So to use GCP, we'd probably need to check if some VM sizes allow access to RDT, maybe when allocating a VM with all available cores.

Instance types

Instance type -> CPU mapping page:
Supported on:

  • C4, N4: Yes
  • X4: Yes
  • C3, Z3, H3, A3: Yes
  • N2, M2: Yes
  • C2: Yes
  • M2: Yes
  • A2, G2: Yes
  • E2, m1-megamem, N1 (on Intel® Xeon® Scalable Platinum 8173M Processor): Yes

Not supported on:

  • m1-ultramem: No
  • E2 (on Intel® Xeon® E5-2696V4 Processor), N1 (on non 8173M processors): No

@yonch
Copy link
Contributor Author

yonch commented Nov 27, 2024

AWS

Instance types

GP instance types

  • m5, m5d, : Intel Xeon Platinum 8175: Likely no RDT
  • m5dn, m5n: Intel Xeon Platinum 8259 : Likely no RDT
  • m5zn: Intel Xeon Platinum 8252: Likely no RDT
  • m6i, m6id, m6idn, m6in: Intel Xeon Ice Lake
  • m7i: Intel Xeon Sapphire Rapids

Compute optimized

  • c5.metal, c5d.metal : 2nd Gen Intel Xeon Platinum 8275CL : Likely no RDT
  • c5n: Intel Xeon Platinum 8124M: Likely no RDT
  • c6i, c6id, c6in: Intel Xeon Ice Lake
  • c7i: Intel Xeon Sapphire Rapids

Memory optimized

  • r5, r5d: Intel Xeon Platinum 8175: Likely no RDT
  • r5b, r5dn, r5n: Intel Xeon Platinum 8259: Likely no RDT
  • r6i, r6idn, r6in, r6id: Intel Xeon Ice Lake
  • r7i, r7iz, : Intel Xeon Sapphire Rapids
  • U-3tb1, U-6tb1, U-9tb1: Intel Xeon Platinum 8176M: Likely no RDT
  • U-18tb1, U-24tb1: Intel Xeon Platinum 8280L: Likely no RDT
  • U-{12,16,24,32}tb: Intel Xeon Sapphire Rapids

RunsOn identified processors:

  • r7iz: Intel Xeon Gold 6455B: Yes
  • m7i, r7i, c7i: Intel Xeon Platinum 8488C : Yes
  • r6i, r6idn, c6i, m6id, m6idn, m6i, c6id, r6id: Intel Xeon Platinum 8375C : Yes
  • z1d: Intel Xeon Platinum 8151 : No

An observation: Intel's "Which Intel Processors Supports Intel® Resource Director Technology (Intel® RDT)?" claims most of:

  • Intel® Xeon® Scalable Processor Family generations 3,4,5, and
  • Intel® Xeon® 6 Processor Family
    support RDT. The first type seem to be the {Silver,Gold,Platinum} with second digit {3,4,5}. The second type are called 6???P.

Spot pricing:

From the pricing page, searching i.metal

c7i.metal-24xl $0.46
m7i.metal-24xl $0.5064
c6i.metal $0.5592
r7i.metal-24xl $0.635
m6i.metal $0.8427
c7i.metal-48xl $0.8628
m7i.metal-48xl $1.0308
i4i.metal $1.0982
r7i.metal-48xl $1.2701
r6i.metal $2.0085

@yonch
Copy link
Contributor Author

yonch commented Nov 28, 2024

Vultr

Several bare metal instances with GPUs support RDT:

  • Nvidia L40S: 2x Intel Gold 6448H $0.848
  • A100 PCIe: 2x Intel Gold 6248R $1.29
  • H100, HGX A100 with 2x Intel Platinum 8480+ at $1.49, $2.30

Limestone Networks

This page does not have any RDT-capable processors

However the AMD EPYC 7301, 7282 and 7402 appear to support AMD QoS. Prices quotes $0.67, $0.7, and $1.03 respectively (the 7301 is even dual-socket).

PheonixNAP

pricing page. RDT instances:

  • s4.* Intel Xeon 6 instances @ $2.52, $1.72 hourly
  • d3.{c,m}{1,2,3}.* instances, 5th gen Xeon Scalable: $1.27 to $1.98 hourly
  • d2.*, 3rd gen Xeon Scalable: $0.61 to $1.41 hourly
  • d3.c{4,5,6}: 4th gen Xeon Scalable: $1.20 to $1.47 hourly
  • d3.m*: 4th gen Xeon Scalable: $1.59 to $2.56 hourly
  • similar for memory instances

Scaleway

Pricing pages do not appear to contain compatible Intel hardware. AMD hardware might have support (did not check)

Atlantic.net

Appears to be monthly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants