Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel attributes #360

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

ksimpson-work
Copy link
Contributor

@ksimpson-work ksimpson-work commented Jan 6, 2025

Add getters and setters for the kernel attributes.

close #205

Copy link
Contributor

copy-pr-bot bot commented Jan 6, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ksimpson-work ksimpson-work self-assigned this Jan 6, 2025
@ksimpson-work ksimpson-work added enhancement Any code-related improvements P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Jan 6, 2025
@ksimpson-work
Copy link
Contributor Author

/ok to test

@ksimpson-work
Copy link
Contributor Author

/ok to test

@leofang leofang added this to the cuda.core beta 3 milestone Jan 7, 2025
@leofang leofang added feature New feature or request and removed enhancement Any code-related improvements labels Jan 8, 2025
@ksimpson-work
Copy link
Contributor Author

/ok to test

@ksimpson-work
Copy link
Contributor Author

/ok to test

Copy link

@ksimpson-work
Copy link
Contributor Author

/ok to test

@ksimpson-work
Copy link
Contributor Author

I have a design question for any reviewers to weigh in on. There is another change in the works to add device properties to the Device class, and the way I've implemented that, is to have device_instance.properties -> DeviceProperties, where DeviceProperties lazy queries the properties and exposes them. In short you would get a property like such:

device = Device()
device.properties.property_a

The reason I put all of the properties in the subclass, is because there are a lot of them, and adding them straight to device would cause device to be very bloated.

The question is whether you think I should do the same thing here. Prior to making the deivce property change, I thought this was the best way to implement it, but I am now leaning towards sticking the attributes in a subclass so they would be accessed like:

kernel.attributes.attribute_a = True
variable = kernel.attributes.attribute_b

One considerable difference is that all the device properties are read only, while some of the kernel attributes are read/write.

@ksimpson-work ksimpson-work marked this pull request as ready for review January 21, 2025 21:26
@ksimpson-work ksimpson-work requested a review from leofang January 21, 2025 21:26
@leofang
Copy link
Member

leofang commented Jan 21, 2025

The question is whether you think I should do the same thing here. Prior to making the deivce property change, I thought this was the best way to implement it, but I am now leaning towards sticking the attributes in a subclass so they would be accessed

I really think this is the way to go! We definitely do not want to bloat the kernel/device instance when hitting tab.

@ksimpson-work
Copy link
Contributor Author

ok cool, I agree. Change made

@ksimpson-work
Copy link
Contributor Author

/ok to test

@ksimpson-work
Copy link
Contributor Author

updated the review to remove the setters on read/write properties in line with the discussion about deadlock between properties and launch config. + a couple formatting improvements to the docs

@ksimpson-work
Copy link
Contributor Author

/ok to test

This attribute is read-only."""
return handle_return(
driver.cuKernelGetAttribute(
driver.CUfunction_attribute.CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK, self._handle, None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Here and below, why is device None?

Comment on lines +66 to +74
@property
def shared_size_bytes(self) -> int:
"""int : The size in bytes of statically-allocated shared memory required by this function.
This attribute is read-only."""
return handle_return(
driver.cuKernelGetAttribute(
driver.CUfunction_attribute.CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES, self._handle, None
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do a little experiment for me: Compare the average timing of getting an attribute via

  1. the current implementation, and
  2. a Python layer cache, something like (can be improved)
    @property
    def shared_size_bytes(self) -> int:
        out = self.cache.get("CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES")
        if out is None:
            out = handle_return(
                driver.cuKernelGetAttribute(
                    driver.CUfunction_attribute.CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES, self._handle, None
                )
            self.cache["CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES"] = out
        return out

The idea is to check whether the C API call overhead is larger or smaller than the Python overhead.

Comment on lines +22 to +30
@pytest.fixture(scope="module")
def cuda_version():
# WAR this is a workaround for the fact that checking the runtime version using the cuGetDriverVersion
# doesnt actually return the driver version buyt rather the latest cuda whcih is supported by the installed drive3r.

version = handle_return(runtime.cudaRuntimeGetVersion())
major_version = version // 1000
minor_version = (version % 1000) // 10
return major_version, minor_version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. In cuda.bindings, this refers to the CUDA Runtime version which is static. (FWIW we also have cudart.getLocalRuntimeVersion(). But it's not usable for here either, see below.)

Comment on lines +92 to +93
if cuda_version[0] < 12:
pytest.skip("CUDA version is less than 12, and doesn't support kernel attribute access")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not working because Kernel encapsulates

  • CUkernel starting CUDA driver/bindings 12+, which needs cuKernelGetAttribute for attribute getter
  • CUfunction for CUDA 11, which needs cuFuncGetAttribute (notice the function signature is different)

I believe we can just reuse this check in the tests:

self._backend_version = "new" if (_py_major_ver >= 12 and _driver_ver >= 12000) else "old"

(I feel we've done similar things for either Program or Linker? Something along the same line.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.core Everything related to the cuda.core module feature New feature or request P0 High priority - Must do!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Kernel attribute getter/setter
2 participants