when use streaming data ,how about read speed when training, just save on local. compare to just use pytorch dataloader #9

yja1 · 2025-01-20T03:31:24Z

when use streaming data ,how about read speed when training, just save on local. compare to just use pytorch dataloader

VSehwag · 2025-01-20T06:04:54Z

Though StreamingDataset offers loading from a remote, I recommend loading from local because it's fast. In loading from a remote (e.g., AWS s3 buckets) one would need a large cache to avoid thrashing. When training the large parameter models with descent SSDs and <=8 gpus, the dataloading shouldn't be a bottleneck.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when use streaming data ,how about read speed when training, just save on local. compare to just use pytorch dataloader #9

when use streaming data ,how about read speed when training, just save on local. compare to just use pytorch dataloader #9

yja1 commented Jan 20, 2025

VSehwag commented Jan 20, 2025

when use streaming data ,how about read speed when training, just save on local. compare to just use pytorch dataloader #9

when use streaming data ,how about read speed when training, just save on local. compare to just use pytorch dataloader #9

Comments

yja1 commented Jan 20, 2025

VSehwag commented Jan 20, 2025