Skip to content

Commit

Permalink
Hardcode VRAM size for AWS L4 (#92)
Browse files Browse the repository at this point in the history
The way AWS reports the VRAM size for L4 is
inconsistent with other GPUs. The reported size is
24 base-10 gigabytes, while L4 is actually 24
base-2 gibibytes, same as many other NVIDIA GPUs.
Additionally, for multi-unit L4 instances AWS
reports the total VRAM size instead of the size of
a single unit.

This commit hardcodes the correct L4 VRAM size
until there are changes from AWS' end.
  • Loading branch information
jvstme authored Aug 19, 2024
1 parent bdda169 commit bca64bb
Showing 1 changed file with 17 additions and 1 deletion.
18 changes: 17 additions & 1 deletion src/gpuhunt/providers/aws.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ def fill_gpu_details(self, offers: List[RawCatalogItem]):
gpu = i["GpuInfo"]["Gpus"][0]
gpus[i["InstanceType"]] = (
gpu["Name"],
gpu["MemoryInfo"]["SizeInMiB"] / 1024,
_get_gpu_memory_gib(gpu["Name"], gpu["MemoryInfo"]["SizeInMiB"]),
)

regions = {
Expand Down Expand Up @@ -230,6 +230,22 @@ def filter(cls, offers: List[RawCatalogItem]) -> List[RawCatalogItem]:
]


def _get_gpu_memory_gib(gpu_name: str, reported_memory_mib: int) -> float:
"""
Fixes L4 memory size misreported by AWS API
"""

if gpu_name != "L4":
return reported_memory_mib / 1024

if reported_memory_mib not in (22888, 91553, 183105):
logger.warning(
"The L4 memory size reported by AWS changed. "
"Please check that it is now correct and remove the hardcoded size if it is."
)
return 24


def parse_memory(s: str) -> float:
r = re.match(r"^([0-9.]+) GiB$", s)
return float(r.group(1))
Expand Down

0 comments on commit bca64bb

Please sign in to comment.