Skip to content

Commit

Permalink
Set HuggingFace metadata timeout (in seconds) for large clusters (#447)
Browse files Browse the repository at this point in the history
  • Loading branch information
shimomut authored Oct 7, 2024
1 parent 77c0e94 commit 559e018
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions 3.test_cases/10.FSDP/1.distributed-training.sbatch
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ export NCCL_DEBUG=INFO
## https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__UNIFIED.html
export FI_EFA_SET_CUDA_SYNC_MEMOPS=0

## Set HuggingFace metadata timeout (in seconds) for large clusters
export HF_HUB_ETAG_TIMEOUT=60

###########################
####### Torch Dist #######
###########################
Expand Down

0 comments on commit 559e018

Please sign in to comment.