GCSFuse is extremely slow for `StableDiffusionPipeline.from_single_file` #2828

wlhee · 2024-12-22T19:23:23Z

Describe the issue
It takes 13mins+ for StableDiffusionPipeline.from_single_file to load 7.2GB weights (stable-diffusion-v1-5/stable-diffusion-v1-5/v1-5-pruned.safetensors).

System & Version (please complete the following information):

OS: [e.g. Ubuntu 20.04]: N/A
Platform [GCE VM, GKE, Vertex AI]: Cloud Run
Version [Gcsfuse version and GKE version]: HEAD at repo

Steps to reproduce the behavior with following information:

Please share Mount command including all command line or config flags used to mount the bucket.

gcsfuse -o ro --implicit-dirs --client-protocol=http1 --max-conns-per-host=100 \
          $BUCKET_NAME $MOUNT_PATH

Please rerun with --log-severity=TRACE --foreground as additional flags to enable debug logs.
Monitor the logs and please capture screenshots or copy the relevant logs to a file (can use --log-format and --log-file as well).
Attach the screenshot or the logs file to the bug report here.
downloaded-logs-20241222-111514.csv

Additional context

Why file-cache feature is not used? It's because Cloud Run doesn't support local SSD or PD, and it's local filesystem is backed by memory. Hence having a file-cache would consume too much RAM due to the large model files
What is the problem? The application StableDiffusionPipeline.from_single_file uses safetensors that uses mmap under the hood (see Add a disable_mmap option to the from_single_file loader to improve load performance on network mounts huggingface/diffusers#10305). It is not a pure sequential read pattern. The reads have the following characteristics: a. multiple processes share a single fd and file handle, which means there are seeking back and forth for the same file handle; b. for each pid, it's also somewhat seeky. [More investigation can be found in Google-internal bug ID: 381955920]
We could make GCSFuse much faster to handle this case by (a) have a per pid GCS reader / stream within the same file handle, so that each pid can seek independently; (b) GCSFuse can be less aggressive when it determines the range for each stream, so that each pid is not trapped in a 1MB stream forever.
Early results shows the optimization can reduce the read time from 13mins to 1min.

SLO:
We strive to respond to all bug reports within 24 business hours provided the information mentioned above is included.

The text was updated successfully, but these errors were encountered:

abhishek10004 · 2025-01-15T11:04:39Z

Discussed offline with @wlhee. For now, we will not take changes for per-pid-reader as this read pattern is counter intuitive. We will keep monitoring the workloads and if it is a common read pattern, we will reconsider this.
For the random read range issue, we will investigate (tracking it internally) if we should change the heuristic and update it, if required. Please reopen the issue, if required

wlhee added p2 P2 question Customer Issue: question about how to use tool labels Dec 22, 2024

wlhee changed the title ~~GCSFuse is slow extremely slow when for StableDiffusionPipeline.from_single_file~~ GCSFuse is extremely slow when for StableDiffusionPipeline.from_single_file Dec 22, 2024

wlhee mentioned this issue Dec 22, 2024

Support enable-per-pid-reader and min-sequential-read-size-mb to accelerate mmap reads #2829

Open

wlhee changed the title ~~GCSFuse is extremely slow when for StableDiffusionPipeline.from_single_file~~ GCSFuse is extremely slow for StableDiffusionPipeline.from_single_file Dec 22, 2024

abhishek10004 closed this as completed Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCSFuse is extremely slow for `StableDiffusionPipeline.from_single_file` #2828

GCSFuse is extremely slow for `StableDiffusionPipeline.from_single_file` #2828

wlhee commented Dec 22, 2024 •

edited

Loading

abhishek10004 commented Jan 15, 2025

GCSFuse is extremely slow for StableDiffusionPipeline.from_single_file #2828

GCSFuse is extremely slow for StableDiffusionPipeline.from_single_file #2828

Comments

wlhee commented Dec 22, 2024 • edited Loading

abhishek10004 commented Jan 15, 2025

GCSFuse is extremely slow for `StableDiffusionPipeline.from_single_file` #2828

GCSFuse is extremely slow for `StableDiffusionPipeline.from_single_file` #2828

wlhee commented Dec 22, 2024 •

edited

Loading