Make braket_container.py
thread safe for CUDA-Q BYOC image
#679
Labels
enhancement
New feature or request
braket_container.py
thread safe for CUDA-Q BYOC image
#679
Is your feature request related to a problem? Please describe.
The
braket_container.py
script used for the CUDA-Q BYOC image to launch the user-provided algorithm script is not thread safe, which can create race conditions in paritcular in the step to download, extract and make available the customer code to be executed in the job. This becomes a problem, specifically, for (single and multi-instance) multi-GPU workflows, an area where CUDA-Q can provide acceleration, in particular. While the original script on the amazon-braket-containers repository, does not take into account multiple processes running in an MPI context, at all, the script in this repository at least performs some basic handling of the MPI ranks here:But, this handling is both, inefficient, and ultimately not bullet proof (for example, if the download of the user-provided algorithm code from S3 takes longer than expected).
Describe the solution you'd like
The script should be refactored for real thread safety.
Describe alternatives you've considered
It would be even better, IMO, to improve the original script (https://github.com/amazon-braket/amazon-braket-containers/blob/main/src/braket_container.py) and copy it directly in the Dockerfile rather than duplicating it locally, e.g.:
The text was updated successfully, but these errors were encountered: