-
Notifications
You must be signed in to change notification settings - Fork 12
feat: DF solution #24
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Reversaidx, the solution launched with errors -- no logs from container itself, but here is state error from pod manifest:
lastState:
terminated:
exitCode: 126
reason: ContainerCannotRun
message: >-
failed to create shim task: OCI runtime create failed: runc create
failed: unable to start container process: exec: "./start.sh":
permission denied: unknown
Fixed, thx you. |
second iteration launched, but CUDA reported OOM on start, full logs attached: |
Could you run again, decreased a num of workers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Reversaidx for this great solution, decreasing workers number helped -- here are our test results for your latest commit.
If you would like to work on your solution further, you can continue optimizing/improving it and re-request our review once done. Any contribution during the challenge period will be taken into account while choosing a winner. Many thanks!
No description provided.