Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

after helm install gpu-operator, no kata-qemu-nvidia-gpu runtimeclass, only kata-nvidia-gpu #59

Open
acblbtpccc opened this issue Jul 26, 2024 · 4 comments

Comments

@acblbtpccc
Copy link

acblbtpccc commented Jul 26, 2024

OS: Ubuntu 20.04
CPU: AMD EPYC 9354
GPU: NVIDIA RTX A6000 * 8

image

I have already labeled the node, (master and worker on same machine)
image

If I use kata-qemu-nvidia-gpu(which is included in the docs for 24.3.0), the pod cannot start

image

If I use kata-nvidia-gpu(which is not in the docs for 24.3.0) runtimeclass, the output is as follows:
image
image
image

After compare the helm manifest, I guess that the difference may due to the kata-manager version.

image image

The helm commands used is
image

The results above seems indicate that the docs is for kata-manager v0.1.0 rather than kata-manager v0.2.0, may I ask is there any documents for kata-manager v0.2.0? Or can I downgrade to kata-manager v0.1.0?

@acblbtpccc
Copy link
Author

Hi, I found that this problem is caused by the artifact image is not accessible now, which is needed by the k8s-kata-manager

image
image

May I ask any one have the rights to fix this?

@goutnet
Copy link

goutnet commented Sep 27, 2024

@zvonkok Hi, I am a colleague of @acblbtpccc , we are trying to reproduce the steps of the documentation provided by nVidia directly here:

https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-kata.html

Sorry for the bump on an old issue, I think we could have done better introducing ourselves ^^;

Would you have a few minutes to spare to give us some pointers on what we obviously did wrong on this?

@zvonkok your help would be greatly appreciated, thank you so much in advance!

@acblbtpccc
Copy link
Author

@zvonkok
Hi Zvonkok,

I hope this message finds you well. I wanted to bring to your attention that I've opened a related issue kata-containers/kata-containers#10360 when attempting to run directly from Kata Containers with GPU passthrough. I would greatly appreciate if you could take a look at this issue when you have a moment. I'm looking forward to your insights and thank you in advance for your time and expertise.

Additionally, I watched your interview videos on Youtube, which were very informative. If possible, would you be willing to share the environment configuration you used? This would be incredibly helpful for us to reference when trying to reproduce the setup.

Thank you again for your consideration and assistance.

@goutnet

@acblbtpccc
Copy link
Author

acblbtpccc commented Sep 29, 2024

@cdesiniotis

Hi Christopher, I noticed your comments in this issue. Are these artifacts still not open now? Does this mean we are still unable to reproduce the results in the official docs?

We are looking forward to your insights regarding some challenges we've encountered while using GPU-Operator with Kata. Your expertise would be greatly appreciated.

Thank you in advance for your time and assistance.

/cc @goutnet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants