Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent ICD from loading outdated amdocl binary 1.Keep track of libra… #181

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lttamd
Copy link

@lttamd lttamd commented Jul 12, 2022

Prevent ICD from loading outdated amdocl binary
1.Keep track of library name in KHRicdVendorRec
2.Check if library with the same name has already been loaded before adding it in ICD list

There are going to be two sets of amdocl in registry for ICD to load if we install a new driver on top of a previously installed old driver.
It will cause subsequent undefined behaviors such as duplicate devices.

This is because AMD driver installation switched from c-inf to u-inf which no longer uses SOFTWARE\Khronos\OpenCL\Vendors.
If a new diver is installed by either user or Windows update without manual cleanup, there will be two libraries to load according to registry.
This is not fixable from driver or installation level.

(The other vendors do not have this issue only because they haven't switched to u-inf installation yet as far as I know.)

@CLAassistant
Copy link

CLAassistant commented Jul 12, 2022

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@Kerilk Kerilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very impactful change, and I think it will need to be discussed. Especially, I don't know how many users could be loading several version of the same OpenCL implementation for legitimate uses.

Also I am not a Windows expert, so I can't really comment on the specifics. But if other vendors will encounter the same issue, they should be part of this discussion.

loader/icd.c Outdated Show resolved Hide resolved
loader/linux/icd_linux.c Outdated Show resolved Hide resolved
loader/windows/icd_windows.c Outdated Show resolved Hide resolved
…ry name in KHRicdVendorRec 2.Check if library with the same name has already been loaded before adding it in ICD list
@lttamd lttamd force-pushed the outdated-lib-check branch from 2420c66 to 1bea2c2 Compare August 2, 2022 14:07
@lttamd
Copy link
Author

lttamd commented Aug 2, 2022

This is a very impactful change, and I think it will need to be discussed. Especially, I don't know how many users could be loading several version of the same OpenCL implementation for legitimate uses.

Also I am not a Windows expert, so I can't really comment on the specifics. But if other vendors will encounter the same issue, they should be part of this discussion.

As far as I understand, it is wrong to load multiple versions of the same OpenCL implementation from the same vendor. (unlike loading OpenCL implementations from different vendors)
At least for AMD OpenCL, loading multiple amdocl binaries will definitely causes undefined behaviors.

Also, even before this bug showing up as result of new u-inf installation using the new registry location, I don't think ICD ever allowed "multiple versions of the same OpenCL implementation" to be loaded at the same time.
See the code from line 82 to line 90 in icd.c, ICD has been doing check to "ensure that we haven't already loaded this vendor"(quoted from the comment).
But it only checks library object pointer, so it won't really filter out all duplicates since amdocl now can come from different pointers(registry locations) because of the new u-inf installation.

Therefore, I do not think this is an impactful change, but a merely a bug fix following existing ICD loading rule.
For AMD OpenCL, we have this change as workaround for many releases without causing any issue.
For other vendors, it should have no impact.

Comment on lines +85 to +91
for (c = libraryName; *c; ++c)
{
if (*c == DIRECTORY_SYMBOL)
{
libName = c + 1;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a non-issue, but so we go in with eyes open: As I understand this change strips all path information from the library name and only loads the first library with a given name. While this will prevent unintentional scenarios where the same library name would be loaded multiple times, it also means that library names must be unique, even if they are in different directories, or the ICD loader will filter them out. Is this the behavior we want?

I'm not aware of any implementors doing this, but a scenario I've been thinking of where two libraries could legitimately have the same name is if one device generation is supported by one software package and installed into one directory and another device generation is supported by a different package and installed into a different directory, but both device generations have the same library file name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is the behavior we want.
See my previous comment. ICD's rule on duplicate libraries has always been like this.

There is no such scenario at least for us. (Support for multiple device generations is not implemented in this way.)
Unlike loading OpenCL libraries form different vendors, loading multiple libraries from the same vendor is undefined behavior.

@lttamd lttamd requested review from Kerilk and bashbaug and removed request for Kerilk and bashbaug September 30, 2022 12:11
@Kerilk Kerilk added the agenda Needs Working Group Discussion label Oct 4, 2022
@lttamd
Copy link
Author

lttamd commented Aug 12, 2024

Do we have any update on this?
We can discuss this change in working group meeting.
I am available during pacific time working hours.

@bashbaug
Copy link
Contributor

bashbaug commented Dec 7, 2024

As per my action item from the December 3rd teleconference, here were a few prior teleconferences where we discussed this issue:

  • There was a very brief discussion on August 13th.
  • There was an in-depth discussion the following week, on August 20th, but without any clear next steps or conclusions.
  • I don't see any subsequent discussion in teleconferences or in the October f2f.

For next steps, perhaps we can converge on solution requirements? Briefly, in prior discussions proposed requirements were:

  1. Desire to order platforms – not just for single vendor
  2. Exclusivity
  3. Multiple platforms exposing the same device

Is this correct? If so, are some of these requirements more important than others and hence require a short-term solution, whereas others could be solved longer-term?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agenda Needs Working Group Discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants