-
Notifications
You must be signed in to change notification settings - Fork 46
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
212e726
commit 13870ef
Showing
4 changed files
with
127 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -209,3 +209,11 @@ Diátaxis | |
reStructuredText | ||
localhost | ||
HTML | ||
Nvidia | ||
nvidia | ||
AWSCLI | ||
SSM | ||
CUDA | ||
G4DN | ||
DN | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
Install NVIDIA drivers on a GPU-enabled EC2 instance | ||
===================================================== | ||
|
||
|
||
AWS provides GPU-enabled instance types for workloads that require GPU compute power. G4DN instances are powered by an Nvidia Tesla T4 GPU. This guide will walk you through the driver installation process, including CUDA for machine learning workloads. | ||
|
||
Launch your instance | ||
-------------------- | ||
|
||
Launch your Ubuntu 22.04 VM using either `AWSCLI or the web console`_. Ensure that you have enough disk space (at least 30GB) as driver installation requires a significant amount of space. You will need more space if you plan to train or run ML models later. | ||
|
||
SSH access is required, so make sure to either open port 22 or enable SSM to access the machine through Session Manager. | ||
|
||
Install the Nvidia driver | ||
------------------------- | ||
|
||
First, login into your instance and check if the GPU is present with this command: | ||
|
||
.. code:: | ||
sudo lshw -c video | ||
If you are using the correct instance type (G4DN in this case), you should see the following results: | ||
|
||
.. code-block:: none | ||
*-display:0 UNCLAIMED | ||
description: VGA compatible controller | ||
product: Amazon.com, Inc. | ||
vendor: Amazon.com, Inc. | ||
physical id: 3 | ||
bus info: pci@0000:00:03.0 | ||
version: 00 | ||
width: 32 bits | ||
clock: 33MHz | ||
capabilities: vga_controller | ||
configuration: latency=0 | ||
resources: memory:fe400000-fe7fffff memory:c0000-dffff | ||
*-display:1 UNCLAIMED | ||
description: 3D controller | ||
product: TU104GL [Tesla T4] | ||
vendor: NVIDIA Corporation | ||
physical id: 1e | ||
bus info: pci@0000:00:1e.0 | ||
version: a1 | ||
width: 64 bits | ||
clock: 33MHz | ||
capabilities: pm pciexpress msix cap_list | ||
configuration: latency=0 | ||
resources: iomemory:40-3f iomemory:40-3f memory:fd000000-fdffffff memory:440000000-44fffffff memory:450000000-451ffffff | ||
The Nvidia Tesla T4 GPU should be listed as unclaimed. Now, install the Nvidia driver: | ||
|
||
.. code:: | ||
sudo apt install nvidia-headless-535-server nvidia-utils-535-server -y | ||
> NOTE: Since we are using a headless server (no desktop), the headless driver is sufficient. If you are running this in a fully desktop environment (AWS Workspaces or your own EC2 Desktop), use `nvidia-driver-535`. | ||
|
||
After the installation, reboot the instance: | ||
|
||
.. code:: | ||
sudo reboot | ||
Test if everything got properly installed | ||
|
||
.. code:: | ||
sudo lshw -c video | ||
.. code-block:: none | ||
*-display:0 UNCLAIMED | ||
description: VGA compatible controller | ||
product: Amazon.com, Inc. | ||
vendor: Amazon.com, Inc. | ||
physical id: 3 | ||
bus info: pci@0000:00:03.0 | ||
version: 00 | ||
width: 32 bits | ||
clock: 33MHz | ||
capabilities: vga_controller | ||
configuration: latency=0 | ||
resources: memory:fe400000-fe7fffff memory:c0000-dffff | ||
*-display:1 | ||
description: 3D controller | ||
product: TU104GL [Tesla T4] | ||
vendor: NVIDIA Corporation | ||
physical id: 1e | ||
bus info: pci@0000:00:1e.0 | ||
version: a1 | ||
width: 64 bits | ||
clock: 33MHz | ||
capabilities: pm pciexpress msix bus_master cap_list | ||
configuration: driver=nvidia latency=0 | ||
resources: iomemory:40-3f iomemory:40-3f irq:10 memory:fd000000-fdffffff memory:440000000-44fffffff memory:450000000-451ffffff | ||
The Tesla T4 should no longer "UNCLAIMED". | ||
|
||
You can also perform an additional test to check if CUDA was installed: | ||
|
||
.. code:: | ||
nvidia-smi | ||
This should display the Nvidia GPU information, including the CUDA version in the top-right corner. If CUDA was not installed, you can visit the `Nvidia website`_ to download the CUDA version that matches the driver you just installed. | ||
|
||
|
||
.. _`AWSCLI or the web console`: https://discourse.ubuntu.com/t/how-to-deploy-ubuntu-pro-in-aws-in-2023/23367 | ||
.. _`Nvidia website`: https://developer.nvidia.com/cuda-downloads | ||
|
||
|