Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown GPU and CPU #667

Open
RomainWarlop opened this issue Sep 13, 2024 · 3 comments
Open

Unknown GPU and CPU #667

RomainWarlop opened this issue Sep 13, 2024 · 3 comments

Comments

@RomainWarlop
Copy link

  • CodeCarbon version: 2.7.1
  • Python version: 3.10.14
  • Operating System: Linux

Description

I'm trying to estimate the carbon impact of LLM using available models on hugging face. So far I'm trying this bloom model in the GPU version. I'm using a NVIDIA Tesla P100 GPU.

Here is my code

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigscience/bloomz-7b1"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto")

# track
tracker = EmissionsTracker()
tracker.start()
inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
tracker.stop()

I obtained the following error

[codecarbon INFO @ 13:46:18] [setup] RAM Tracking...
[codecarbon INFO @ 13:46:18] [setup] GPU Tracking...
[codecarbon INFO @ 13:46:18] Tracking Nvidia GPU via pynvml
[codecarbon WARNING @ 13:46:18] Failed to retrieve gpu total energy consumption
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/codecarbon/core/gpu.py", line 116, in _get_total_energy_consumption
    return pynvml.nvmlDeviceGetTotalEnergyConsumption(self.handle)
  File "/opt/conda/lib/python3.10/site-packages/pynvml/nvml.py", line 2411, in nvmlDeviceGetTotalEnergyConsumption
    _nvmlCheckReturn(ret)
  File "/opt/conda/lib/python3.10/site-packages/pynvml/nvml.py", line 833, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.nvml.NVMLError_NotSupported: Not Supported
[codecarbon INFO @ 13:46:18] [setup] CPU Tracking...
[codecarbon WARNING @ 13:46:18] No CPU tracking mode found. Falling back on CPU constant mode. 
 Linux OS detected: Please ensure RAPL files exist at \sys\class\powercap\intel-rapl to measure CPU

[codecarbon WARNING @ 13:46:19] We saw that you have a Intel(R) Xeon(R) CPU @ 2.30GHz but we don't know it. Please contact us.
[codecarbon INFO @ 13:46:19] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.30GHz
[codecarbon WARNING @ 13:46:19] Failed to retrieve gpu total energy consumption
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/codecarbon/core/gpu.py", line 116, in _get_total_energy_consumption
    return pynvml.nvmlDeviceGetTotalEnergyConsumption(self.handle)
  File "/opt/conda/lib/python3.10/site-packages/pynvml/nvml.py", line 2411, in nvmlDeviceGetTotalEnergyConsumption
    _nvmlCheckReturn(ret)
  File "/opt/conda/lib/python3.10/site-packages/pynvml/nvml.py", line 833, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.nvml.NVMLError_NotSupported: Not Supported
[codecarbon INFO @ 13:46:19] >>> Tracker's metadata:
[codecarbon INFO @ 13:46:19]   Platform system: Linux-5.10.0-31-cloud-amd64-x86_64-with-glibc2.31
[codecarbon INFO @ 13:46:19]   Python version: 3.10.14
[codecarbon INFO @ 13:46:19]   CodeCarbon version: 2.7.1
[codecarbon INFO @ 13:46:19]   Available RAM : 50.999 GB
[codecarbon INFO @ 13:46:19]   CPU count: 8
[codecarbon INFO @ 13:46:19]   CPU model: Intel(R) Xeon(R) CPU @ 2.30GHz
[codecarbon INFO @ 13:46:19]   GPU count: 1
[codecarbon INFO @ 13:46:20]   GPU model: 1 x Tesla P100-PCIE-16GB
[codecarbon INFO @ 13:46:20] Saving emissions data to file /home/jupyter/carbon genAI/emissions.csv
[codecarbon WARNING @ 13:46:20] Failed to retrieve gpu total energy consumption
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/codecarbon/core/gpu.py", line 116, in _get_total_energy_consumption
    return pynvml.nvmlDeviceGetTotalEnergyConsumption(self.handle)
  File "/opt/conda/lib/python3.10/site-packages/pynvml/nvml.py", line 2411, in nvmlDeviceGetTotalEnergyConsumption
    _nvmlCheckReturn(ret)
  File "/opt/conda/lib/python3.10/site-packages/pynvml/nvml.py", line 833, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.nvml.NVMLError_NotSupported: Not Supported
/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py:1258: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
[codecarbon INFO @ 13:46:20] Energy consumed for RAM : 0.000004 kWh. RAM Power : 19.12470817565918 W
[codecarbon WARNING @ 13:46:20] Failed to retrieve gpu total energy consumption
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/codecarbon/core/gpu.py", line 116, in _get_total_energy_consumption
    return pynvml.nvmlDeviceGetTotalEnergyConsumption(self.handle)
  File "/opt/conda/lib/python3.10/site-packages/pynvml/nvml.py", line 2411, in nvmlDeviceGetTotalEnergyConsumption
    _nvmlCheckReturn(ret)
  File "/opt/conda/lib/python3.10/site-packages/pynvml/nvml.py", line 833, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.nvml.NVMLError_NotSupported: Not Supported
[codecarbon INFO @ 13:46:20] Energy consumed for all GPUs : 0.000000 kWh. Total GPU Power : 0.0 W
[codecarbon INFO @ 13:46:20] Energy consumed for all CPUs : 0.000008 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 13:46:20] 0.000012 kWh of electricity used since the beginning.

Could you help me please?

@antgioia
Copy link

I have the same problem as you but I didn't find the solution

@RomainWarlop
Copy link
Author

I changed the P100 GPU to a T4 GPU and now it's working. So I think the P100 GPU is not tracked by the pynvml library

@benoit-cty
Copy link
Contributor

Yes, it seems than P100 drivers support PyNVML for pynvml.nvmlDeviceGetName but not for nvmlDeviceGetTotalEnergyConsumption.

Maybe you could call pynvml.nvmlSystemGetDriverVersion().decode() to see if you have the last drivers available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants