Kernel attributes #360

ksimpson-work · 2025-01-06T23:24:17Z

Add getters and setters for the kernel attributes.

close #205

copy-pr-bot · 2025-01-06T23:24:20Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

ksimpson-work · 2025-01-06T23:33:37Z

/ok to test

ksimpson-work · 2025-01-07T00:23:01Z

/ok to test

…ttributes

ksimpson-work · 2025-01-10T23:46:56Z

/ok to test

ksimpson-work · 2025-01-21T00:57:03Z

/ok to test

github-actions · 2025-01-21T01:15:38Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-360/
https://nvidia.github.io/cuda-python/pr-preview/pr-360/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-360/cuda-bindings/
Preview will be ready when the GitHub Pages deployment is complete.

ksimpson-work · 2025-01-21T20:01:38Z

/ok to test

ksimpson-work · 2025-01-21T21:26:03Z

I have a design question for any reviewers to weigh in on. There is another change in the works to add device properties to the Device class, and the way I've implemented that, is to have device_instance.properties -> DeviceProperties, where DeviceProperties lazy queries the properties and exposes them. In short you would get a property like such:

device = Device()
device.properties.property_a

The reason I put all of the properties in the subclass, is because there are a lot of them, and adding them straight to device would cause device to be very bloated.

The question is whether you think I should do the same thing here. Prior to making the deivce property change, I thought this was the best way to implement it, but I am now leaning towards sticking the attributes in a subclass so they would be accessed like:

kernel.attributes.attribute_a = True
variable = kernel.attributes.attribute_b

One considerable difference is that all the device properties are read only, while some of the kernel attributes are read/write.

leofang · 2025-01-21T21:30:19Z

The question is whether you think I should do the same thing here. Prior to making the deivce property change, I thought this was the best way to implement it, but I am now leaning towards sticking the attributes in a subclass so they would be accessed

I really think this is the way to go! We definitely do not want to bloat the kernel/device instance when hitting tab.

ksimpson-work · 2025-01-21T22:19:49Z

ok cool, I agree. Change made

ksimpson-work · 2025-01-21T22:19:54Z

/ok to test

ksimpson-work · 2025-01-27T21:47:26Z

updated the review to remove the setters on read/write properties in line with the discussion about deadlock between properties and launch config. + a couple formatting improvements to the docs

ksimpson-work · 2025-01-31T17:20:14Z

/ok to test

…ttributes

leofang · 2025-02-04T23:51:45Z

cuda_core/cuda/core/experimental/_module.py

+        This attribute is read-only."""
+        return handle_return(
+            driver.cuKernelGetAttribute(
+                driver.CUfunction_attribute.CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK, self._handle, None


Q: Here and below, why is device None?

leofang · 2025-02-04T23:56:20Z

cuda_core/cuda/core/experimental/_module.py

+    @property
+    def shared_size_bytes(self) -> int:
+        """int : The size in bytes of statically-allocated shared memory required by this function.
+        This attribute is read-only."""
+        return handle_return(
+            driver.cuKernelGetAttribute(
+                driver.CUfunction_attribute.CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES, self._handle, None
+            )
+        )


Could you do a little experiment for me: Compare the average timing of getting an attribute via

the current implementation, and

a Python layer cache, something like (can be improved)

@property def shared_size_bytes(self) -> int: out = self.cache.get("CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES") if out is None: out = handle_return( driver.cuKernelGetAttribute( driver.CUfunction_attribute.CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES, self._handle, None ) self.cache["CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES"] = out return out

The idea is to check whether the C API call overhead is larger or smaller than the Python overhead.

leofang · 2025-02-04T23:58:49Z

cuda_core/tests/test_module.py

+@pytest.fixture(scope="module")
+def cuda_version():
+    # WAR this is a workaround for the fact that checking the runtime version using the cuGetDriverVersion
+    # doesnt actually return the driver version buyt rather the latest cuda whcih is supported by the installed drive3r.
+
+    version = handle_return(runtime.cudaRuntimeGetVersion())
+    major_version = version // 1000
+    minor_version = (version % 1000) // 10
+    return major_version, minor_version


This is not correct. In cuda.bindings, this refers to the CUDA Runtime version which is static. (FWIW we also have cudart.getLocalRuntimeVersion(). But it's not usable for here either, see below.)

leofang · 2025-02-05T00:03:16Z

cuda_core/tests/test_module.py

+    if cuda_version[0] < 12:
+        pytest.skip("CUDA version is less than 12, and doesn't support kernel attribute access")


This is not working because Kernel encapsulates

CUkernel starting CUDA driver/bindings 12+, which needs cuKernelGetAttribute for attribute getter

CUfunction for CUDA 11, which needs cuFuncGetAttribute (notice the function signature is different)

I believe we can just reuse this check in the tests:

cuda-python/cuda_core/cuda/core/experimental/_module.py

Line 119 in 8ed9c03

self._backend_version = "new" if (_py_major_ver >= 12 and _driver_ver >= 12000) else "old"

(I feel we've done similar things for either Program or Linker? Something along the same line.)

ksimpson-work added 5 commits December 27, 2024 13:14

add first iter of the attributes

4703716

Merge remote-tracking branch 'upstream/main' into kernel-attributes

da61e9c

update the kernel attributes branch

8c24631

complete the update

71de911

remove unrelated files

5bbf259

ksimpson-work added 2 commits January 6, 2025 15:27

remove file

d4e966e

leverage fixture in fixture

41c4407

ksimpson-work self-assigned this Jan 6, 2025

ksimpson-work added enhancement Any code-related improvements P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Jan 6, 2025

skip test if cuda < 12

2feadfa

leofang added this to the cuda.core beta 3 milestone Jan 7, 2025

ksimpson-work mentioned this pull request Jan 7, 2025

Add Kernel attribute getter/setter #205

Open

leofang added feature New feature or request and removed enhancement Any code-related improvements labels Jan 8, 2025

ksimpson-work added 6 commits January 8, 2025 12:58

Merge branch 'main' into kernel-attributes

1913a73

Merge remote-tracking branch 'upstream/main' into kernel-attributes

bcc2c4e

handle exceptions better

46f648c

Merge remote-tracking branch 'upstream/main' into kernel-attributes

1a2dc73

Merge remote-tracking branch 'origin/kernel-attributes' into kernel-a…

4302df9

…ttributes

remove the context manager and improve the docs

c8d473e

ksimpson-work added 3 commits January 10, 2025 15:47

unremove the copyright header

887f6ea

merge main

0d55bc4

slight modifications

2f617eb

ksimpson-work added 2 commits January 21, 2025 11:02

update test

caebb92

update test

72031e3

add to release notes

69c9633

ksimpson-work marked this pull request as ready for review January 21, 2025 21:26

ksimpson-work requested a review from leofang January 21, 2025 21:26

add subclass

1ccdd81

ksimpson-work added 3 commits January 21, 2025 14:20

replace todo comment

3d6f30e

'Merge remote-tracking branch 'origin/main' into kernel-attributes

7f0a673

reformat the kernel attributes

f13fd1b

Merge branch 'main' into kernel-attributes

71aabcc

ksimpson-work added 3 commits January 31, 2025 09:31

Merge remote-tracking branch 'upstream/main' into kernel-attributes

a8f9387

Merge remote-tracking branch 'origin/kernel-attributes' into kernel-a…

1c6fa9e

…ttributes

Merge branch 'main' into kernel-attributes

ab2b587

leofang reviewed Feb 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel attributes #360

Kernel attributes #360

ksimpson-work commented Jan 6, 2025 •

edited

Loading

copy-pr-bot bot commented Jan 6, 2025

ksimpson-work commented Jan 6, 2025

ksimpson-work commented Jan 7, 2025

ksimpson-work commented Jan 10, 2025

ksimpson-work commented Jan 21, 2025

github-actions bot commented Jan 21, 2025

Preview will be ready when the GitHub Pages deployment is complete.

ksimpson-work commented Jan 21, 2025

ksimpson-work commented Jan 21, 2025

leofang commented Jan 21, 2025

ksimpson-work commented Jan 21, 2025

ksimpson-work commented Jan 21, 2025

ksimpson-work commented Jan 27, 2025

ksimpson-work commented Jan 31, 2025

leofang Feb 4, 2025

leofang Feb 4, 2025

leofang Feb 4, 2025

leofang Feb 5, 2025

		if cuda_version[0] < 12:
		pytest.skip("CUDA version is less than 12, and doesn't support kernel attribute access")

Kernel attributes #360

Are you sure you want to change the base?

Kernel attributes #360

Conversation

ksimpson-work commented Jan 6, 2025 • edited Loading

copy-pr-bot bot commented Jan 6, 2025

ksimpson-work commented Jan 6, 2025

ksimpson-work commented Jan 7, 2025

ksimpson-work commented Jan 10, 2025

ksimpson-work commented Jan 21, 2025

github-actions bot commented Jan 21, 2025

Preview will be ready when the GitHub Pages deployment is complete.

ksimpson-work commented Jan 21, 2025

ksimpson-work commented Jan 21, 2025

leofang commented Jan 21, 2025

ksimpson-work commented Jan 21, 2025

ksimpson-work commented Jan 21, 2025

ksimpson-work commented Jan 27, 2025

ksimpson-work commented Jan 31, 2025

leofang Feb 4, 2025

Choose a reason for hiding this comment

leofang Feb 4, 2025

Choose a reason for hiding this comment

leofang Feb 4, 2025

Choose a reason for hiding this comment

leofang Feb 5, 2025

Choose a reason for hiding this comment

ksimpson-work commented Jan 6, 2025 •

edited

Loading