Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCI (GPU) passthrough hardening: option ROM edition #1087

Open
marmarek opened this issue Oct 12, 2024 · 19 comments
Open

PCI (GPU) passthrough hardening: option ROM edition #1087

marmarek opened this issue Oct 12, 2024 · 19 comments
Labels
enhancement New feature or request

Comments

@marmarek
Copy link

The problem you're addressing (if any)

Using GPU (or any PCI device for that matter) passthrough with a less trusted VM may allow it to reflash firmware of such device. Just after reboot (during firmware and OS startup) such device is not isolated in a VM and may try to compromise the whole host. This can be done in at least two ways:

  • doing DMA into arbitrary adresses - this should be covered by the early boot DMA protection already
  • providing malicious option ROM for firmware to execute

Theoretically, reflashing malicious firmware should not be possible due to (at least) signature check done by the GPU firmware update mechanism, but history shows this sometimes happen to be buggy/ineffective or in some cases even non-existent.

Describe the solution you'd like

I see two solutions:

  1. An option to disable loading option ROM, either globally, or per-device. This of course is acceptable only if OS driver (within a VM) can work correctly if option ROM wasn't loaded. This also kinda assumes the dGPU is not the only GPU in the system (which is true in setups where dGPU is used for passthrough).
  2. An option to enforce UEFI SecureBoot signature on option ROM independently of enabling SecureBoot for loading OS. This is less robust, as it still may allow attacks where properly signed option ROM is used (maybe downgraded to an earlier version by malicious actor?) but some configuration data is changed to exploit it. But on the other hand, it's probably more compatible (especially if executing option ROM is necessary for later using the dGPU in a VM, or if it's the only GPU in the system).

In either case, there needs to be a mechanism for the OS to verify if the mechanism was enabled to inform the user if passthrough is safe for a given device. And similarly, OS needs to be informed if early boot DMA was enabled. Maybe there is some ACPI table that can be used to pass this info to the OS? Or maybe OS can inspect coreboot config (cbfs?) to check if the option is enabled?

Where is the value to a user, and who might that user be?

Use GPU passthrough with reduced risk of compromising the whole system.

Describe alternatives you've considered

Alternative solution could be reliably blocking reflashing dGPU firmware by the VM. And ensure device reset on reboot works reliably too. In other words - ensure that all VM-controlled state is discarded on reboot.

I think this solution would require changes to the board design, and thus be significantly harder to make in practice.

Additional context

We consider making a feature like this mandatory for allowing Qubes OS certification of systems with dGPU. Without such feature, we don't consider dGPU passthrough safe enough to certify such system, and thus it doesn't make much sense for users to buy systems like this if dGPU would be allowed only in dom0, as it would be mostly wasted.

This is especially relevant for V5x models with nvidia.

@marmarek marmarek added the enhancement New feature or request label Oct 12, 2024
@zirblazer
Copy link

  1. An option to disable loading option ROM, either globally, or per-device. This of course is acceptable only if OS driver (within a VM) can work correctly if option ROM wasn't loaded. This also kinda assumes the dGPU is not the only GPU in the system (which is true in setups where dGPU is used for passthrough).

The MSI boards already have global Option ROM disable, but couldn't convince them to make them per-Slot granularity. It is either fully on, fully off, or GPU Option ROMs only.

  1. An option to enforce UEFI SecureBoot signature on option ROM independently of enabling SecureBoot for loading OS. This is less robust, as it still may allow attacks where properly signed option ROM is used (maybe downgraded to an earlier version by malicious actor?) but some configuration data is changed to exploit it. But on the other hand, it's probably more compatible (especially if executing option ROM is necessary for later using the dGPU in a VM, or if it's the only GPU in the system).

You want to enable Secure Boot then tell it to ignore to check Boot Loaders so that it only validates Option ROMs only. So, Point 8 here: #929
Never expected someone from Qubes to ask for something to be made less secure, heh.

Alternative solution could be reliably blocking reflashing dGPU firmware by the VM. And ensure device reset on reboot works reliably too. In other words - ensure that all VM-controlled state is discarded on reboot.

And how you can possibly do that? As far that I know, VBIOS flashing works by using vendor tools that tells the GPU to use its internal I2C/SPI/whatever Controller to flash the ROM. If you passed the card, these vendor tools would work as in a bare metal environment and I don't see how are you going to block that.
What you can do is to sideload the VBIOS of the card on VM launch (Not sure if on Xen, but on standalone QEMU it is possible and I have been sporadically using it since 2015). So you dump the VBIOSand tell QEMU to load it everytime the VM is launched.

I think this solution would require changes to the board design, and thus be significantly harder to make in practice.

Sure, you can try asking one of the third party vendors to put a Flash Write disable jumper, which I recall having seen in a few historial PC Motherboards when Flash ROM was first introduced. But unless you're a high bidder than can ask for a few thousands of custom cards it is not gonna happen, so I won't bother with it.

@marmarek
Copy link
Author

The MSI boards already have global Option ROM disable, but couldn't convince them to make them per-Slot granularity. It is either fully on, fully off, or GPU Option ROMs only.

This may be enough, if there are no other devices needing Option ROM. In this particular case we are talking about a laptop, so the customization is limited (yes, I know you can still attach almost any PCIe device, but it's much less common in practice).

You want to enable Secure Boot then tell it to ignore to check Boot Loaders so that it only validates Option ROMs only. So, Point 8 here: #929

Yes, exactly.

Never expected someone from Qubes to ask for something to be made less secure, heh.

Well, I want to enable it for Option ROM even if you need it disabled for the OS.

And how you can possibly do that? As far that I know, VBIOS flashing works by using vendor tools that tells the GPU to use its internal I2C/SPI/whatever Controller to flash the ROM. If you passed the card, these vendor tools would work as in a bare metal environment and I don't see how are you going to block that.

Yes, exactly, that's the problem.

What you can do is to sideload the VBIOS of the card on VM launch (Not sure if on Xen, but on standalone QEMU it is possible and I have been sporadically using it since 2015). So you dump the VBIOSand tell QEMU to load it everytime the VM is launched.

That doesn't help much if internal flash can still be modified, even if not loaded by that VM. Option ROM could still be changed and will be used by firmware on next reboot (unless you do similar trick in firmware to side-load Option ROM?).

Sure, you can try asking one of the third party vendors to put a Flash Write disable jumper, which I recall having seen in a few historial PC Motherboards when Flash ROM was first introduced. But unless you're a high bidder than can ask for a few thousands of custom cards it is not gonna happen, so I won't bother with it.

Yes, it would be technically better solution (as it's more comprehensive than just Option ROM), but not feasible at this scale.

@zirblazer
Copy link

And how you can possibly do that? As far that I know, VBIOS flashing works by using vendor tools that tells the GPU to use its internal I2C/SPI/whatever Controller to flash the ROM. If you passed the card, these vendor tools would work as in a bare metal environment and I don't see how are you going to block that.

Yes, exactly, that's the problem.

I don't see that like a problem you can actually fix.

What you can do is to sideload the VBIOS of the card on VM launch (Not sure if on Xen, but on standalone QEMU it is possible and I have been sporadically using it since 2015). So you dump the VBIOSand tell QEMU to load it everytime the VM is launched.

That doesn't help much if internal flash can still be modified, even if not loaded by that VM. Option ROM could still be changed and will be used by firmware on next reboot (unless you do similar trick in firmware to side-load Option ROM?).

Not if you disable loading Option ROMs. It may also be possible to hash the Option ROM (Point 7 of my writeup) so that you know it didn't changed. And yeah, putting an Option ROM in Firmware and loading it for that device instead of its own one QEMU style should also be possible.

@marmarek
Copy link
Author

What I care about, is for a reflashed GPU (which we established already is hard to prevent in the first place) to not be able to attack host. There are many ideas how to achieve it - in the issue description, comments, and the other issue.

@pietrushnic
Copy link

doing DMA into arbitrary adresses - this should be covered by the early boot DMA protection already

The only problem I see here is that it is not validated. We would need grants to enable the DMA attacking tool in the automation process. We have capable hardware. That could confirm in every release that DMA protection is correctly applied.

providing malicious option ROM for firmware to execute

OptionROMs are typically signed by Microsoft Option ROM UEFI CA 2023 or older one. We can have such in our DB, the problem is if we trust MSFT. OTOH at "clean" state user could populate the hash of OptionROM into DB as it is done by sbctl users to remove MSFT from the chain of trust.

Theoretically, reflashing malicious firmware should not be possible due to (at least) signature check done by the GPU firmware update mechanism, but history shows this sometimes happen to be buggy/ineffective or in some cases even non-existent.

This is an exciting part. Do you have any examples of such issues? Because of that, OCP requested a standard update mechanism for GPU firmware, and a document was created.

An option to disable loading option ROM, either globally, or per-device.

This probably was already requested and at least partially implemented. Pease check #139

This of course is acceptable only if OS driver (within a VM) can work correctly if option ROM wasn't loaded.

And for complex modern devices, that can be the core issue.

This also kinda assumes the dGPU is not the only GPU in the system (which is true in setups where dGPU is used for passthrough).

To prevent soft-bricking, one would imagine that boot firmware would detect that fact and warn the user or even not allow the user to self-soft-brick.

An option to enforce UEFI SecureBoot signature on option ROM independently of enabling SecureBoot for loading OS.

This and many other improvements could be employed in UEFI Secure Boot. I already have a ton of requirements in that space. I will explore our options as part of my training campaign in 2025. It may not be hard to implement that at least partially.

In either case, there needs to be a mechanism for the OS to verify if the mechanism was enabled to inform the user if passthrough is safe for a given device.

TPM measurement + event log? There are also UEFI variables dedicated to exposing firmware capabilities to OS like OsIndicationsSupported.

Maybe there is some ACPI table that can be used to pass this info to the OS?

I guess we should employ guidance from here and expose things in ACPI DMAR table, some information already should be there, but the point is there is no validation of that.

Or maybe OS can inspect coreboot config (cbfs?) to check if the option is enabled?

There are better directions than this. Relying on some custom coreboot files exposed will create technical debt, and appropriate mechanisms already exist in the UEFI world. We should ask what to do with non-UEFI builds. Still, I think we should get back to the question of what the standard behavior OSes use for such capability is, and standard most likely will mean what Windows uses for that. Also, checking the Linux approach would be useful.

@zirblazer It is hard to read your write-up. It should be split, TBH. Every point is separate (it could be linked for better context).

@marmarek I don't think it is possible to make boot firmware responsible for controlling peripheral updates when those peripherals have their closed-source verification mechanism. We cannot handle all possible mocking of buses in the system without affecting correct operation. Unless we reach SPDM and device authentication for the whole system, the feature is unlikely to be implemented. Getting updates only from reasonably trustworthy sources with known paths for escalation, e.g., LVFS, can be done, but that does not prevent malicious actors from gaining privileges in the system and abusing those to deliver the wrong firmware to peripherals if those allow unauthenticated updates. That is on the peripheral vendor to provide the correct update mechanism or on the open-source firmware community to deliver support for a transparent mechanism. The best thing we can do is to look for best practices regarding peripheral firmware updates, test that on given hardware, and provide advice on what hardware is recommended now. Even together, we do not have enough resources to solve that problem.

P.S. Maybe this is good discussion for December DUG?

@mkopec
Copy link
Member

mkopec commented Jan 29, 2025

And yeah, putting an Option ROM in Firmware and loading it for that device instead of its own one QEMU style should also be possible.

Sure, but then you have issues with blob redistribution 😫

To prevent softbricks but still ensuring oprom integrity, I was thinking about something along these lines:

  • platform boots

  • for each device with oprom:

    • make hash of opROM
    • if oprom hash found in dbx, reject loading
    • else if oprom hash found in db, allow loading
    • else if device is gpu:
      • defer loading
    • else ask whether to add hash to db or dbx
  • If no gop instances were instantiated:

    • Load deferred oproms
    • Display a big fat warning that loaded oprom is not known, do you want to add it to db or dbx
      • if dbx: ask are you really really sure, you will lose graphics, etc etc
    • Reboot
  • else:

    • ask whether to add deferred oprom hash to db or dbx

and these db and dbx would need ot be independent of secureboot, ideally with an option to use secureboot or these separate oprom db / dbx

@marmarek
Copy link
Author

marmarek commented Feb 1, 2025

Generally this looks like a good plan. I have just one concern:

  • Display a big fat warning that loaded oprom is not known, do you want to add it to db or dbx

If that optionrom was malicious, since it got loaded it could modify the firmware to avoid the warning. Or display something else here, including different hash (than actually got computed and latter added to db/dbx).
But I don't have a better idea, and since in the NovaCustom use case there is always (trusted) internal GPU too, that shouldn't be an issue in practice, as you won't hit the "no gop" case.

@marmarek
Copy link
Author

marmarek commented Feb 1, 2025

And one more thing: I'd like to see from the OS level if the current optionrom for a given device is included in db/dbx. This way, I can see if the trusted hash was recorded before connecting the dGPU to an untrusted VM for the first time (and if not - ask the user to reboot first, to record trusted hash before potentially having it reflashed by untrusted VM).

@miczyg1
Copy link
Contributor

miczyg1 commented Feb 3, 2025

Display a big fat warning

It all looks well, but from vboot autopsy, we know that warnings mostly scare users rather than do anything good. The whole thing should be rather optional at build time for security-conscious people.

and these db and dbx would need ot be independent of secureboot, ideally with an option to use secureboot or these separate oprom db / dbx

Why independent? You will have to create another verification mechanism alongside Secure Boot. Also how you will bypass secureboot over your own verification mechanism? Or will it be just another layer on top of Secure Boot?

Other relevant requests: #929 (point 7)

@marmarek
Copy link
Author

marmarek commented Feb 3, 2025

Why independent?

I'm not sure if independent db/dbx is needed, but I'd like to have independent options for this - for example to enable option rom verification, while still allowing to boot any kernel.

@mkopec
Copy link
Member

mkopec commented Feb 4, 2025

I've had a longer think about this, it makes sense to reuse secureboot for this, but add a mode for option rom verification only. Users would enroll their GPUs while SB is in setup mode, then set SB user mode to enable enforcement.

if another console (serial or GOP) is available, then we can defer loading and have a popup asking whether to load. EFI has EFI_DEFERRED_IMAGE_LOAD_PROTOCOL exactly for this so that part is easy-ish.

With regards to brick prevention (GOP or GPU changes while oprom verification is enabled, no other consoles are available), I'm not sure if there's any way to do this securely:

  • if we load the oprom and show a warning later, it's too late and option rom can already have control
  • if we beep or blink and have the user confirm on their keyboard whether to trust this GPU, that's also not secure. It's conceivable that a malicious GPU also has a malicious USB device that emulates a keyboard to confirm immediately. GPUs have xHCI controllers for USB-C so it's perfectly plausible.

The safest option would be to deny execution always, but have an external way to reset settings (e.g. CMOS reset).

@zirblazer
Copy link

With regards to brick prevention (GOP or GPU changes while oprom verification is enabled, no other consoles are available), I'm not sure if there's any way to do this securely:

  • if we load the oprom and show a warning later, it's too late and option rom can already have control

This can be made super easy if you only support platforms that have a static trusted GPU, namely, any Intel or AMD Processor with an integrated GPU, or platforms with a BMC that also provides its own GPU. This feature is completely unviable on platforms where you need video output working to present an interface for the user to authorize Option ROM on the first place.
Also, it is possible to flash passthroughed dGPUs Option ROM from within VM, I have done so in the past. One more reason to never do this without an always-trusted GPU.

  • if we beep or blink and have the user confirm on their keyboard whether to trust this GPU, that's also not secure. It's conceivable that a malicious GPU also has a malicious USB device that emulates a keyboard to confirm immediately. GPUs have xHCI controllers for USB-C so it's perfectly plausible.

I think only nVidia GeForce 2xxx series had a builtin XHCI Controller, it was removed on the next generation and I don't recall Radeons implementing this.

@mkopec
Copy link
Member

mkopec commented Feb 4, 2025

I don't recall Radeons implementing this.

My friend's 6800XT has USB-C with USB 3 and DP, 7000 series also have this

This can be made super easy if you only support platforms that have a static trusted GPU, namely, any Intel or AMD Processor with an integrated GPU, or platforms with a BMC that also provides its own GPU.

Fair enough. I was mostly thinking about MSI users with K-series CPUs, for example

@zirblazer
Copy link

I don't recall Radeons implementing this.

My friend's 6800XT has USB-C with USB 3 and DP, 7000 series also have this

You're right, Radeons began implementing a XHCI Controller too. Visible on lspci and all that.

This can be made super easy if you only support platforms that have a static trusted GPU, namely, any Intel or AMD Processor with an integrated GPU, or platforms with a BMC that also provides its own GPU.

Fair enough. I was mostly thinking about MSI users with K-series CPUs, for example

K no, F series. K includes IGP, except the KF ones. Fixable by avoiding purchasing any F series. And, as a matter of fact, for Coreboot purposes I always recommended to buy Processor with IGP because dGPU compatibility was never perfect anyways, so you were already risking it.

@marmarek
Copy link
Author

marmarek commented Feb 4, 2025

IMHO it's okay tradeoff to support the most strict option only if there is always trusted iGPU. That's also why I'm asking to have those settings be visible from the OS - so I can inform the user if GPU passthrough is safe or not, and inform about associated risks.
The option of storing trusted Option ROM on ESP is interesting one, but IMO it's something to consider as a future extension, not block the initial simpler version. The main motivation for this feature is to support save GPU passthrough on NovaCustom, which does have also internal GPU.

@wessel-novacustom
Copy link

IMHO it's okay tradeoff to support the most strict option only if there is always trusted iGPU. That's also why I'm asking to have those settings be visible from the OS - so I can inform the user if GPU passthrough is safe or not, and inform about associated risks. The option of storing trusted Option ROM on ESP is interesting one, but IMO it's something to consider as a future extension, not block the initial simpler version. The main motivation for this feature is to support save GPU passthrough on NovaCustom, which does have also internal GPU.

@marmarek Checking such feasibility on the OS side would be great, since firmware releases are way slower than software releases. Quite some people would love to see a Qubes-certified laptop with NVIDIA dGPU.

@wessel-novacustom
Copy link

wessel-novacustom commented Feb 5, 2025

Related comment: due to the fact that we can no longer get new stock of the NVIDIA variants of the V54 and V56 Series, it will be financially unfeasible to release a Heads firmware version for those NVIDIA variants. The last deliveries will take place within two months from now and we expect to have enough stock for over one year.

@miczyg1
Copy link
Contributor

miczyg1 commented Feb 5, 2025

This can be made super easy if you only support platforms that have a static trusted GPU, namely, any Intel or AMD Processor with an integrated GPU, or platforms with a BMC that also provides its own GPU. This feature is completely unviable on platforms where you need video output working to present an interface for the user to authorize Option ROM on the first place. Also, it is possible to flash passthroughed dGPUs Option ROM from within VM, I have done so in the past. One more reason to never do this without an always-trusted GPU.

With integrated GPU and ASPEED BMC graphics you always have a native initialization of that GPU in coreboot, either by FSP for Intel iGPU we have to trust anyways or libgfxinit, or coreboot native code (ASPEED BMC) . So integrated GPUs are really out of scope here.

@mkopec
Copy link
Member

mkopec commented Feb 5, 2025

@miczyg1 I think the point here is that with native gfx init in coreboot, the initialization code can't be as easily reflashed as an oprom, so there's always a "trusted" GPU driver in the system, which invalidates the entire soft brick problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants