Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCSFuse is extremely slow for StableDiffusionPipeline.from_single_file #2828

Closed
wlhee opened this issue Dec 22, 2024 · 1 comment
Closed
Labels
p2 P2 question Customer Issue: question about how to use tool

Comments

@wlhee
Copy link

wlhee commented Dec 22, 2024

Describe the issue
It takes 13mins+ for StableDiffusionPipeline.from_single_file to load 7.2GB weights (stable-diffusion-v1-5/stable-diffusion-v1-5/v1-5-pruned.safetensors).

System & Version (please complete the following information):

  • OS: [e.g. Ubuntu 20.04]: N/A
  • Platform [GCE VM, GKE, Vertex AI]: Cloud Run
  • Version [Gcsfuse version and GKE version]: HEAD at repo

Steps to reproduce the behavior with following information:

  1. Please share Mount command including all command line or config flags used to mount the bucket.
gcsfuse -o ro --implicit-dirs --client-protocol=http1 --max-conns-per-host=100 \
          $BUCKET_NAME $MOUNT_PATH
  1. Please rerun with --log-severity=TRACE --foreground as additional flags to enable debug logs.

  2. Monitor the logs and please capture screenshots or copy the relevant logs to a file (can use --log-format and --log-file as well).

  3. Attach the screenshot or the logs file to the bug report here.
    downloaded-logs-20241222-111514.csv

Additional context

  1. Why file-cache feature is not used? It's because Cloud Run doesn't support local SSD or PD, and it's local filesystem is backed by memory. Hence having a file-cache would consume too much RAM due to the large model files
  2. What is the problem? The application StableDiffusionPipeline.from_single_file uses safetensors that uses mmap under the hood (see Add a disable_mmap option to the from_single_file loader to improve load performance on network mounts huggingface/diffusers#10305). It is not a pure sequential read pattern. The reads have the following characteristics: a. multiple processes share a single fd and file handle, which means there are seeking back and forth for the same file handle; b. for each pid, it's also somewhat seeky. [More investigation can be found in Google-internal bug ID: 381955920]
  3. We could make GCSFuse much faster to handle this case by (a) have a per pid GCS reader / stream within the same file handle, so that each pid can seek independently; (b) GCSFuse can be less aggressive when it determines the range for each stream, so that each pid is not trapped in a 1MB stream forever.
  4. Early results shows the optimization can reduce the read time from 13mins to 1min.

SLO:
We strive to respond to all bug reports within 24 business hours provided the information mentioned above is included.

@wlhee wlhee added p2 P2 question Customer Issue: question about how to use tool labels Dec 22, 2024
@wlhee wlhee changed the title GCSFuse is slow extremely slow when for StableDiffusionPipeline.from_single_file GCSFuse is extremely slow when for StableDiffusionPipeline.from_single_file Dec 22, 2024
@wlhee wlhee changed the title GCSFuse is extremely slow when for StableDiffusionPipeline.from_single_file GCSFuse is extremely slow for StableDiffusionPipeline.from_single_file Dec 22, 2024
@abhishek10004
Copy link
Collaborator

Discussed offline with @wlhee. For now, we will not take changes for per-pid-reader as this read pattern is counter intuitive. We will keep monitoring the workloads and if it is a common read pattern, we will reconsider this.
For the random read range issue, we will investigate (tracking it internally) if we should change the heuristic and update it, if required. Please reopen the issue, if required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p2 P2 question Customer Issue: question about how to use tool
Projects
None yet
Development

No branches or pull requests

2 participants