Skip to content

Commit

Permalink
Disable cuda malloc by default.
Browse files Browse the repository at this point in the history
  • Loading branch information
comfyanonymous committed Aug 14, 2024
1 parent e60e19b commit 50bf66e
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 4 deletions.
4 changes: 2 additions & 2 deletions comfy/cli_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,8 @@ def __call__(self, parser, namespace, values, option_string=None):
parser.add_argument("--disable-auto-launch", action="store_true", help="Disable auto launching the browser.")
parser.add_argument("--cuda-device", type=int, default=None, metavar="DEVICE_ID", help="Set the id of the cuda device this instance will use.")
cm_group = parser.add_mutually_exclusive_group()
cm_group.add_argument("--cuda-malloc", action="store_true", help="Enable cudaMallocAsync (enabled by default for torch 2.0 and up).")
cm_group.add_argument("--disable-cuda-malloc", action="store_true", help="Disable cudaMallocAsync.")
cm_group.add_argument("--cuda-malloc", action="store_true", help="Enable cudaMallocAsync.")
cm_group.add_argument("--disable-cuda-malloc", action="store_true", help="Disable cudaMallocAsync (The current default).")


fp_group = parser.add_mutually_exclusive_group()
Expand Down
8 changes: 6 additions & 2 deletions cuda_malloc.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import importlib.util
from comfy.cli_args import args
import subprocess
import logging

#Can't use pytorch to get the GPU names because the cuda malloc has to be set before the first import.
def get_gpu_names():
Expand Down Expand Up @@ -63,7 +64,7 @@ def cuda_malloc_supported():
return True


if not args.cuda_malloc:
if args.cuda_malloc:
try:
version = ""
torch_spec = importlib.util.find_spec("torch")
Expand All @@ -74,8 +75,11 @@ def cuda_malloc_supported():
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
version = module.__version__
supported = False
if int(version[0]) >= 2: #enable by default for torch version 2.0 and up
args.cuda_malloc = cuda_malloc_supported()
supported = cuda_malloc_supported()
if not supported:
logging.warning("WARNING: cuda malloc enabled but not supported.")
except:
pass

Expand Down

4 comments on commit 50bf66e

@Thireus
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it now disabled by default? Is this causing issues or has a performance impact?

@Danamir
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new default option greatly downgrade the performances of Flux NF4 on my system (roughly 5x slower). With --cuda-malloc the performances are back to normal.

The changes should be documented and/or mentioned in the Manager Announcements tab.

@RandomGitUser321
Copy link
Contributor

@RandomGitUser321 RandomGitUser321 commented on 50bf66e Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without --cuda-malloc, flux nf4 it/s are 2x as slow on my setup. Normally, I'm in the 2.5 sec/it range, with it off, it's around 5 sec/it. With the launch flag, it goes back to the faster speed.

This could be because I only have 8GB VRAM though, maybe malloc is needed for some shortcuts in the driver/windows level memory operations with shared/system ram or something when there isn't enough VRAM.

@Thireus
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark here, I've add to add --cuda-malloc back because perfs with Flux were just terrible without.

Please sign in to comment.