Replies: 1 comment
-
@HarideP There are a few changes you can make to prevent cache overlow. You could chose to readjeust your memory allocationsettings. you could choose to use scalable memory to avoid fragmentation using the following query "export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True" or you could also chosse to expirement with max_split_size_mb option to control the size of memory allocations and reduce fragmentation. "export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128" |
Beta Was this translation helpful? Give feedback.
-
CUDA out of memory
When I reproduce this code, the GPU cache overflow always occurs. How do I fix this problem?
sft
config_full.yaml:
accelerate_config
zero3.yaml:
Beta Was this translation helpful? Give feedback.
All reactions