You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the issue
It takes 13mins+ for StableDiffusionPipeline.from_single_file to load 7.2GB weights (stable-diffusion-v1-5/stable-diffusion-v1-5/v1-5-pruned.safetensors).
System & Version (please complete the following information):
OS: [e.g. Ubuntu 20.04]: N/A
Platform [GCE VM, GKE, Vertex AI]: Cloud Run
Version [Gcsfuse version and GKE version]: HEAD at repo
Steps to reproduce the behavior with following information:
Please share Mount command including all command line or config flags used to mount the bucket.
Why file-cache feature is not used? It's because Cloud Run doesn't support local SSD or PD, and it's local filesystem is backed by memory. Hence having a file-cache would consume too much RAM due to the large model files
What is the problem? The application StableDiffusionPipeline.from_single_file uses safetensors that uses mmap under the hood (see Add a disable_mmap option to the from_single_file loader to improve load performance on network mounts huggingface/diffusers#10305). It is not a pure sequential read pattern. The reads have the following characteristics: a. multiple processes share a single fd and file handle, which means there are seeking back and forth for the same file handle; b. for each pid, it's also somewhat seeky. [More investigation can be found in Google-internal bug ID: 381955920]
We could make GCSFuse much faster to handle this case by (a) have a per pid GCS reader / stream within the same file handle, so that each pid can seek independently; (b) GCSFuse can be less aggressive when it determines the range for each stream, so that each pid is not trapped in a 1MB stream forever.
Early results shows the optimization can reduce the read time from 13mins to 1min.
SLO:
We strive to respond to all bug reports within 24 business hours provided the information mentioned above is included.
The text was updated successfully, but these errors were encountered:
wlhee
changed the title
GCSFuse is slow extremely slow when for StableDiffusionPipeline.from_single_file
GCSFuse is extremely slow when for StableDiffusionPipeline.from_single_fileDec 22, 2024
wlhee
changed the title
GCSFuse is extremely slow when for StableDiffusionPipeline.from_single_file
GCSFuse is extremely slow for StableDiffusionPipeline.from_single_fileDec 22, 2024
Discussed offline with @wlhee. For now, we will not take changes for per-pid-reader as this read pattern is counter intuitive. We will keep monitoring the workloads and if it is a common read pattern, we will reconsider this.
For the random read range issue, we will investigate (tracking it internally) if we should change the heuristic and update it, if required. Please reopen the issue, if required
Describe the issue
It takes 13mins+ for
StableDiffusionPipeline.from_single_file
to load 7.2GB weights (stable-diffusion-v1-5/stable-diffusion-v1-5/v1-5-pruned.safetensors
).System & Version (please complete the following information):
Steps to reproduce the behavior with following information:
Please rerun with --log-severity=TRACE --foreground as additional flags to enable debug logs.
Monitor the logs and please capture screenshots or copy the relevant logs to a file (can use --log-format and --log-file as well).
Attach the screenshot or the logs file to the bug report here.
downloaded-logs-20241222-111514.csv
Additional context
StableDiffusionPipeline.from_single_file
uses safetensors that uses mmap under the hood (see Add adisable_mmap
option to thefrom_single_file
loader to improve load performance on network mounts huggingface/diffusers#10305). It is not a pure sequential read pattern. The reads have the following characteristics: a. multiple processes share a single fd and file handle, which means there are seeking back and forth for the same file handle; b. for each pid, it's also somewhat seeky. [More investigation can be found in Google-internal bug ID: 381955920]SLO:
We strive to respond to all bug reports within 24 business hours provided the information mentioned above is included.
The text was updated successfully, but these errors were encountered: