Skip to content

Commit

Permalink
sagemaker use reserved instances (#258)
Browse files Browse the repository at this point in the history
* sagemaker use reserved instances

* lint
  • Loading branch information
sedrick-keh-tri authored Apr 26, 2024
1 parent 5e4e7f9 commit 083fa31
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions sagemaker_train/launch_sagemaker_train.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,7 @@ def get_job_name(base):
max_wait=5 * 24 * 60 * 60 if args.spot_instance else None,
input_mode="FastFile",
# environment={"TORCH_DISTRIBUTED_DEBUG": "DETAIL", "TORCH_CPP_LOG_LEVEL": "INFO"},
environment={"SM_USE_RESERVED_CAPACITY": "1"},
keep_alive_period_in_seconds=30 * 60 if not args.spot_instance else None, # 30 minutes
)

Expand Down

0 comments on commit 083fa31

Please sign in to comment.