Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebase onto edk2-stable202402 #129

Closed
wants to merge 357 commits into from
Closed

Rebase onto edk2-stable202402 #129

wants to merge 357 commits into from

Conversation

SergiiDmytruk
Copy link
Member

@SergiiDmytruk SergiiDmytruk commented Apr 25, 2024

Not a real PR, since this is a rebase (Dasharo/dasharo-issues#432), but could still be used for comments.

Things of note:

Issues discovered while testing that must be resolved:

@mkopec
Copy link
Member

mkopec commented May 28, 2024

Looks pretty good already, boots on NovaCustom V540TU (with #136)

Some issues I've noticed already:

  • RAM capacity on the frontpage is 0
  • UEFI BootManagerMenuApp entry visible in one time boot menu
  • Builtin PS/2 keyboard isn't working in UEFI
    • Works after manually enabling it for ConIn in BootMaintenanceManager
  • No TPM menu

@krystian-hebel
Copy link
Contributor

It doesn't work on coreboot with edk2 as a payload on QEMU, failing very early:

[INFO ]  CBFS: Found 'fallback/payload' @0x80 size 0x188061 in mcache @0x7fefe9ac
[DEBUG]  Checking segment from ROM address 0xff8002ac
[DEBUG]  Checking segment from ROM address 0xff8002c8
[DEBUG]  Loading segment from ROM address 0xff8002ac
[DEBUG]    code (compression=1)
[DEBUG]    New segment dstaddr 0x00800000 memsize 0xe00000 srcaddr 0xff8002e4 filesize 0x188029
[DEBUG]  Loading Segment: addr: 0x00800000 memsz: 0x0000000000e00000 filesz: 0x0000000000188029
[DEBUG]  using LZMA
[DEBUG]  Loading segment from ROM address 0xff8002c8
[DEBUG]    Entry Point 0x00800850
[DEBUG]  BS: BS_PAYLOAD_LOAD run times (exec / console): 142 / 4 ms
[DEBUG]  ICH-NM10-PCH: watchdog disabled
[DEBUG]  Jumping to boot code at 0x00800850(0x7fe86000)
ASSERT [PeiCore] /home/coreboot/coreboot/payloads/external/edk2/workspace/Dasharo/MdePkg/Library/PeiHobLib/HobLib.c(45): HobList != ((void *) 0)
ASSERT [PeiCore] /home/coreboot/coreboot/payloads/external/edk2/workspace/Dasharo/MdePkg/Library/PeiHobLib/HobLib.c(76): HobStart != ((void *) 0)

@SergiiDmytruk any idea what may be causing it? I haven't seen too much changes in such early code done by us, so maybe this is something that changed upstream.

On a tangent note, there is OvmfPkg/Library/PeilessStartupLib, perhaps such approach makes sense for our use case as well?

@SergiiDmytruk
Copy link
Member Author

@mkopec, thanks, will take a look whether these are caused by a missing commit, conflict resolution gone wrong or new base needs somewhat different changes.

@SergiiDmytruk any idea what may be causing it? I haven't seen too much changes in such early code done by us, so maybe this is something that changed upstream.

Not really, seems like upstream changes. Weird thing is that I just tried using rebased EDK on APU6 to check whether RAM size is reported and saw exactly the same asserts. I'd expect this to either work or not work for DasharoPayloadPkg.

On a tangent note, there is OvmfPkg/Library/PeilessStartupLib, perhaps such approach makes sense for our use case as well?

Don't we need PEI for TPM drivers of EDK? IIUC UefiPayloadPkg in upstream now has no PEI which is why it can't be used by us without reworking code somehow.

@SergiiDmytruk
Copy link
Member Author

Not really, seems like upstream changes.

Well, maybe indirectly... The code is old, but something must be different. The error occurs at

GuidHob = GetFirstGuidHob (&gUefiAcpiBoardInfoGuid);

@SergiiDmytruk
Copy link
Member Author

It shouldn't fail there for OVMF which uses its own AcpiTimerLib without accessing HOBs, but something similar is probably happening. On APU BlSupportPei seems to not be executed as if dependencies of PEIMs are not resolved correctly.

@SergiiDmytruk
Copy link
Member Author

So AcpiTimerLibConstructor() needs HOBs which are created by BlSupportPei which is loaded by PEI core, but in the rebased EDK AcpiTimerLib is somehow necessary for PEI core to even start its execution (not sure what causes this dependency).

After working around that APU is able to boot, although it seems to have issues with input/output (no boot menu, can't enter setup UI). Q35 seems to act similar to APU. Should probably find out what caused this new dependency in the first place to find a proper mitigation.

@SergiiDmytruk
Copy link
Member Author

Should probably find out what caused this new dependency in the first place to find a proper mitigation.

There is probably no specific reason. Older version happened to work without trouble in the configuration used by coreboot and the new one just doesn't because code has changed. The workaround in 5ffa33a is probably the way to go.

@SergiiDmytruk SergiiDmytruk force-pushed the rebased branch 2 times, most recently from d2f19a9 to d471c3f Compare June 2, 2024 21:28
@SergiiDmytruk
Copy link
Member Author

Looks like issue with serial on APU was really an issue with serial consoles in rebased version, I just used KVM and didn't notice it initially. Turned out that 35e5619 doesn't actually revert 4628519. The latter moves PlatformConsoleInit() from top to bottom while the former moves it just couple statements up. The change should be present for serial consoles to be registered properly and now serial console gets enabled automatically.

@krystian-hebel, Q35 seems to work now.

@mkopec, I see RAM size on MSI Z690-A DDR5, Q35 and APU4. Was it displayed before the rebase?

  • Builtin PS/2 keyboard isn't working in UEFI
    • Works after manually enabling it for ConIn in BootMaintenanceManager

I think this could have been fixed along with serial output, PS/2 keyboard seems to be registered in PlatformConsole.c as well and it was invoked before devices were discovered.

No TPM menu

No "Device Manager" -> "TCG2 Configuration" menu? Was present on MSI but not on APU4 due to TPM error:

TPM error!
Error: Image at 0007E219000 start failed: Device Error

Turned out that f6ba45d commit was necessary, at least APU now detects TPM and probably NovaCustom V540TU.

UEFI BootManagerMenuApp entry visible in one time boot menu

Still couldn't find what was suppressing it from appearing on "One time boot" menu.

@krystian-hebel
Copy link
Contributor

So AcpiTimerLibConstructor() needs HOBs which are created by BlSupportPei which is loaded by PEI core, but in the rebased EDK AcpiTimerLib is somehow necessary for PEI core to even start its execution (not sure what causes this dependency).

Is it possible to add BlSupportPei to APRIORI PEI, together with PCD as it is its dependency?

@SergiiDmytruk
Copy link
Member Author

SergiiDmytruk commented Jun 3, 2024

Is it possible to add BlSupportPei to APRIORI PEI, together with PCD as it is its dependency?

Doesn't work when I try:

  1. coreboot requires SecCore to be the first one in UEFI payload:
    • E: Not a usable UEFI firmware volume. on build
    • [EMERG] Payload not loaded when run
  2. Making APRIORI PEI { follow SecCore in FDF file breaks EDK autogen probably because APRIORI PEI must be the first one if present.

Not sure how hard it would be to adjust coreboot to be more flexible, it must have trouble finding entrypoint. And in the end it might not help, because APRIORI PEI changes ordering in which PEIMs get processed and those asserts followed by segfault happen before PEIMs dispatcher even starts. The timer calls probably happen due to performance measurement code somewhere running as part of PEI core.

@krystian-hebel
Copy link
Contributor

Few remaining asserts can be removed by dropping PerformanceLib from PEI_CORE, I believe the same problem occurs in SEC, except it isn't printed due to BaseDebugLibNull.

It still fails for me on Q35, although much further:

Loading driver at 0x000069F1000 EntryPoint=0x000069FBA31 dpDynamicCommand.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 6BB9B98
InstallProtocolInterface: 6A1EE763-D47A-43B4-AABE-EF1DE2AB56FC 69FFCB0
ProtectUefiImageCommon - 0x6BB8B40
  - 0x00000000069F1000 - 0x00000000000111C0
PROGRESS CODE: V03040002 I0
InstallProtocolInterface: 3C7200E9-005F-4EA4-87DE-A3DFAC8A27C3 69FF2C0
PROGRESS CODE: V03040003 I0
!!!! X64 Exception Type - 0E(#PF - Page-Fault)  CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000010  I:1 R:0 U:0 W:0 P:0 PK:0 SS:0 SGX:0
RIP  - FFFFFFFFC7CCD4A0, CS  - 0000000000000038, RFLAGS - 0000000000000286
RAX  - FFFFFFFFC7CCD4A0, RCX - 0000000000000080, RDX - 0000000007C9FC30
RBX  - 000000000746EA98, RSP - 0000000007C9FC20, RBP - 0000000007C9FD70
RSI  - 00000000070A6C18, RDI - 0000000000000001
R8   - 0000000007C9FC80, R9  - 000000000746E928, R10 - 00000000000003F8
R11  - 0000000000000010, R12 - 000000000746E929, R13 - 0000000000000000
R14  - 000000000746EAA0, R15 - 0000000007CCD4A0
DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
GS   - 0000000000000030, SS  - 0000000000000030
CR0  - 0000000080010011, CR2 - FFFFFFFFC7CCD4A0, CR3 - 0000000007601000
CR4  - 0000000000000228, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 00000000000F0400
GDTR - 00000000075EC000 0000000000000047, LDTR - 0000000000000000
IDTR - 000000000709C018 0000000000000FFF,   TR - 0000000000000000
FXSAVE_STATE - 0000000007C9F880
!!!! Find image based on IP(0x6BB8E18) (No PDB)  (ImageBase=0000000006BAD018, EntryPoint=0000000006BB47C5) !!!!

After removing dpDynamicCommand.efi it failed in the same way on tftpDynamicCommand.efi, and after removing that:

Loading driver at 0x00006A1B000 EntryPoint=0x00006A209A4 OpalPasswordDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 7039E18
ProtectUefiImageCommon - 0x7039AC0
  - 0x0000000006A1B000 - 0x0000000000018500
PROGRESS CODE: V03040002 I0
InstallProtocolInterface: 18A031AB-B443-4D1A-A5C0-0C09261E9F71 6A33100
InstallProtocolInterface: 107A772C-D5E1-11D4-9A46-0090273FC14D 6A32FB0
InstallProtocolInterface: 6A7A5CFF-E8D9-4F70-BADA-75AB3025CE14 6A32F90
InstallProtocolInterface: 330D4706-F2A0-4E4F-A369-B66FA8D54385 6A331E0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 6A32FE0
PROGRESS CODE: V03040003 I0
!!!! X64 Exception Type - 0E(#PF - Page-Fault)  CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000010  I:1 R:0 U:0 W:0 P:0 PK:0 SS:0 SGX:0
RIP  - FFFFFFFFC7CCD4A0, CS  - 0000000000000038, RFLAGS - 0000000000000286
RAX  - FFFFFFFFC7CCD4A0, RCX - 0000000000000080, RDX - 0000000007C9FC30
RBX  - 000000000746E098, RSP - 0000000007C9FC20, RBP - 0000000007C9FD70
RSI  - 00000000070DB298, RDI - 0000000000000001
R8   - 0000000007C9FC80, R9  - 000000000746E8A8, R10 - 00000000000003F8
R11  - 0000000000000010, R12 - 000000000746E8A9, R13 - 0000000000000000
R14  - 000000000746E0A0, R15 - 0000000007CCD4A0
DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
GS   - 0000000000000030, SS  - 0000000000000030
CR0  - 0000000080010011, CR2 - FFFFFFFFC7CCD4A0, CR3 - 0000000007601000
CR4  - 0000000000000228, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 00000000000F0400
GDTR - 00000000075EC000 0000000000000047, LDTR - 0000000000000000
IDTR - 00000000070BD018 0000000000000FFF,   TR - 0000000000000000
FXSAVE_STATE - 0000000007C9F880
!!!! Find image based on IP(0x7039D18) /home/coreboot/coreboot/payloads/external/edk2/workspace/Build/DasharoPayloadPkgX64/DEBUG_COREBOOT/X64/MdeModulePkg/Universal/FaultTolerantWriteDxe/FaultTolerantWriteDxe/DEBUG/FaultTolerantWriteDxe.dll (ImageBase=0000000007033000, EntryPoint=0000000007036F9A) !!!!

I'm not sure how image lookup works, but according to logs 0x7039D18 isn't FaultTolerantWriteDxe code, it is .debug_info in .debug file which is stripped from .efi. Nevertheless, after stopping execution and analyzing memory, 0x7039D18 has ASCII 'hndl' followed by mostly zeros, it isn't valid x86 code.

gEfiHiiDatabaseProtocolGuid is visible on stack, along with pointers to addresses that don't seem to belong any driver whose base address has been logged. Most of those seem to be magic sequences (hndl, evnt, ptal). Last address that I'm 100% sure is code is instruction immediately following call to CoreDispatcher done by DxeCore.efi, and I'm pretty sure that the page fault happens somewhere in that function.

@krystian-hebel
Copy link
Contributor

Interestingly, after changing the amount of RAM I got different panic:

!!!! X64 Exception Type - 06(#UD - Invalid Opcode)  CPU Apic ID - 00000000 !!!!
RIP  - 000000007BD4857D, CS  - 0000000000000038, RFLAGS - 0000000000010282
RAX  - 000000007F46E098, RCX - 00000000000000E0, RDX - 000000007FC9FC30
RBX  - 000000003FCCD494, RSP - 000000007FC9FC20, RBP - 000000007FC9FD70
RSI  - 000000007F0DB298, RDI - 0000000000000001
R8   - 000000007FC9FC80, R9  - 000000007F46E8A8, R10 - 00000000000003F8
R11  - 0000000000000010, R12 - 000000007F46E8A9, R13 - 0000000000000000
R14  - 000000007F46E0A0, R15 - 000000007FCCD4A0
DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
GS   - 0000000000000030, SS  - 0000000000000030
CR0  - 0000000080010011, CR2 - 0000000000000000, CR3 - 000000007F601000
CR4  - 0000000000000228, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 00000000000F0400
GDTR - 000000007F5EC000 0000000000000047, LDTR - 0000000000000000
IDTR - 000000007F0BD018 0000000000000FFF,   TR - 0000000000000000
FXSAVE_STATE - 000000007FC9F880
!!!! Find image based on IP(0x7BD4857D) /home/coreboot/coreboot/payloads/external/edk2/workspace/Build/DasharoPayloadPkgX64/DEBUG_COREBOOT/X64/CrScreenshotDxe/CrScreenshotDxe/DEBUG/CrScreenshotDxe.dll (ImageBase=0000000000E50498, EntryPoint=0000000000E57E99) !!!!

@SergiiDmytruk
Copy link
Member Author

Few remaining asserts can be removed by dropping PerformanceLib from PEI_CORE, I believe the same problem occurs in SEC, except it isn't printed due to BaseDebugLibNull.

Well, I assumed it was added for a purpose hence the workaround instead of removing performance measurements.

It still fails for me on Q35, although much further:

Weird given that it works for me on your fw_update_smi2 branch with Dasharo/coreboot@455debb cherry-picked on top of it and a tiny change, see the spoiler below.

Kconfig change, defconfig, bash script

One more change is

diff --git a/src/vendorcode/intel/Kconfig b/src/vendorcode/intel/Kconfig
index b481aa0090b..f09aebea2cd 100644
--- a/src/vendorcode/intel/Kconfig
+++ b/src/vendorcode/intel/Kconfig
@@ -8,6 +8,7 @@ config UEFI_2_4_BINDING
 	select UDK_BASE
 
 config UDK_2017_BINDING
+	def_bool y if BOARD_EMULATION_QEMU_X86_Q35
 	def_bool n
 	select UDK_BASE
 

defconfig:

CONFIG_OPTION_BACKEND_NONE=y
CONFIG_BOARD_EMULATION_QEMU_X86_Q35=y
# CONFIG_DASHARO_FIRMWARE_UPDATE_MODE is not set
CONFIG_VGA_TEXT_FRAMEBUFFER=y
CONFIG_RESOURCE_ALLOCATION_TOP_DOWN=y
CONFIG_DRIVERS_EFI_VARIABLE_STORE=y
CONFIG_PAYLOAD_EDK2=y
CONFIG_EDK2_REPOSITORY="https://github.com/Dasharo/edk2.git"
CONFIG_EDK2_TAG_OR_REV="d471c3f6626ec99dfda3993da8ea53f5bf2c291d"
CONFIG_EDK2_USE_EDK2_PLATFORMS=y
CONFIG_EDK2_PLATFORMS_REPOSITORY="https://github.com/Dasharo/edk2-platforms"
CONFIG_EDK2_PLATFORMS_TAG_OR_REV="3323ed481d35096fb6a7eae7b49f35eff00f86cf"
CONFIG_EDK2_DEBUG=y
# CONFIG_EDK2_SECURE_BOOT_DEFAULT_ENABLE is not set
CONFIG_EDK2_DASHARO_SYSTEM_FEATURES=y
CONFIG_EDK2_DASHARO_SERIAL_REDIRECTION_DEFAULT_ENABLE=y

Script:

#!/bin/bash
params=(
    -M q35,smm=on,max-fw-size=16M
    -enable-kvm
    -m 4096
    -drive if=pflash,format=raw,unit=0,file=build/coreboot.rom
    -serial stdio
    -action reboot=shutdown
)

exec qemu-system-x86_64 "${params[@]}"

@mkopec
Copy link
Member

mkopec commented Jun 4, 2024

TPM and PS/2 KB issues are fixed, thanks.

@mkopec, I see RAM size on MSI Z690-A DDR5, Q35 and APU4. Was it displayed before the rebase?

I saw it on a different device with different DIMM modules, but when testing on the same unit, it's not displayed on old and new edk2. It's probably an issue with MTL and DDR5, not with EDK2

@krystian-hebel
Copy link
Contributor

krystian-hebel commented Jun 4, 2024

Interesting, it seems to work with coreboot-sdk:2022-09-18_c8870b1334 but not with coreboot-sdk:2023-11-24_2731fa619b. 2021-09-23_b0d87f753c fails due to the lack of imagemagick, and 2024-05-20_b4949d3de5 can't build iPXE, but image built without it also works. I haven't tried other versions.


2023-11-24_2731fa619b uses gcc version 13.2.0 (Debian 13.2.0-7), both 2022-09-18_c8870b1334 (older) and 2024-05-20_b4949d3de5 (newer) use 12.2.0 (Debian 12.2.0-3 and Debian 12.2.0-14 respectively). Perhaps whatever was hidden by any of -Wno-vla-parameter -Wno-stringop-overflow -Wno-use-after-free -Wno-dangling-pointer is UB or plain bugs that should be fixed.

@krystian-hebel
Copy link
Contributor

krystian-hebel commented Jun 5, 2024

One more problem on QEMU, which may be connected to a problem that @miczyg1 was debugging on Protectli.
Zrzut ekranu z 2024-06-05 15-24-54
Zrzut ekranu z 2024-06-05 15-24-59
Zrzut ekranu z 2024-06-05 15-25-07
Zrzut ekranu z 2024-06-05 15-25-12

By checking CR2 address against log of loaded EFI modules, I found out that this is DxeCore, in particular CoreAllocatePool():

DXE IPL Entry
Loading PEIM D6A2CB7F-6A18-4E2F-B43B-9920A733700A
Loading PEIM at 0x0007FC9E000 EntryPoint=0x0007FCBAE5B DxeCore.efi
PROGRESS CODE: V03021001 I0
Loading DXE CORE at 0x0007FC9E000 EntryPoint=0x0007FCBAE5B
AddressBits=40 5LevelPaging=0 1GPage=0
Pml5=1 Pml4=2 Pdp=512 TotalPage=1027
Install PPI: 605EA650-C65C-42E1-BA80-91A52AB618C6
Notify: PPI Guid: 605EA650-C65C-42E1-BA80-91A52AB618C6, Peim notify entry point: 810B84
HandOffToDxeCore() Stack Base: 0x7FC7E000, Stack Size: 0x20000
PROGRESS CODE: V03040003 I0

Zrzut ekranu z 2024-06-05 16-03-15

By comparing code from Linux's panic (in particular address modulo page size) it seems to be called from VariableRuntimeDxe:

0000000000003ada <InternalAllocatePool>:
    3ada:       55                      push   %rbp
    3adb:       89 f9                   mov    %edi,%ecx
    3add:       48 89 f2                mov    %rsi,%rdx
    3ae0:       48 89 e5                mov    %rsp,%rbp
    3ae3:       48 83 ec 30             sub    $0x30,%rsp
    3ae7:       48 8b 05 32 da 01 00    mov    0x1da32(%rip),%rax        # 21520 <gBS>
    3aee:       4c 8d 45 f8             lea    -0x8(%rbp),%r8
    3af2:       ff 50 40                callq  *0x40(%rax)
    3af5:       48 83 c4 20             add    $0x20,%rsp
    3af9:       48 85 c0                test   %rax,%rax
    3afc:       79 08                   jns    3b06 <InternalAllocatePool+0x2c>
    3afe:       48 c7 45 f8 00 00 00    movq   $0x0,-0x8(%rbp)
    3b05:       00 
    3b06:       48 8b 45 f8             mov    -0x8(%rbp),%rax
    3b0a:       c9                      leaveq 
    3b0b:       c3                      retq

This code tries to use Boot Services after call to ExitBootServices(), which obviously shouldn't happen. Wrong or bugged MemoryAllocationLib maybe?

I haven't checked whether this is something limited to Dasharo or if it applies to upstream edk2 as well.

@krystian-hebel
Copy link
Contributor

Ubuntu 24.04 works slightly better (above was on 22.04), it can read UEFI variables, but not write to them nor create new ones. There is no oops in dmesg. I think it may use static copy of variables obtained on boot, or something else changed in the way efivars are used in newer kernel.

@SergiiDmytruk
Copy link
Member Author

Perhaps whatever was hidden by any of -Wno-vla-parameter -Wno-stringop-overflow -Wno-use-after-free -Wno-dangling-pointer is UB or plain bugs that should be fixed.

Could be. I build with my system's 14.1.0.

This code tries to use Boot Services after call to ExitBootServices(), which obviously shouldn't happen. Wrong or bugged MemoryAllocationLib maybe?

https://github.com/tianocore/edk2/blob/90cb1ec33225a070e9fea1d94c72ff590bd38731/MdePkg/Library/UefiMemoryAllocationLib/UefiMemoryAllocationLib.inf#L21:

LIBRARY_CLASS                  = MemoryAllocationLib|DXE_DRIVER DXE_RUNTIME_DRIVER DXE_SMM_DRIVER UEFI_APPLICATION UEFI_DRIVER

https://github.com/tianocore/edk2/blob/90cb1ec33225a070e9fea1d94c72ff590bd38731/MdePkg/Library/UefiMemoryAllocationLib/MemoryAllocationLib.c#L43

  Status = gBS->AllocatePages (AllocateAnyPages, MemoryType, Pages, &Memory);

I'm not sure if page allocation should be done by runtime services though, maybe the invoking code is wrong. Also I don't see a more suitable memory library.

@krystian-hebel
Copy link
Contributor

krystian-hebel commented Jun 6, 2024

I'm getting the same behavior on MTL-H laptop, with expected deviations caused by different memory layout. It worked fine with older edk2 before rebase.

I've looked at the recent changes done to RuntimeDxe and didn't see anything obviously wrong with them. It may indeed be the calling code. Some of those allocations are to be done once, on first call, maybe this is what's missing. On the other hand, this tries to allocate BootServicesData... Maybe it was always broken, but worked by accident because allocated pool was on the same page as RuntimeServicesData, but IIRC UEFI allocates whole pages event for AllocatePool.

There is also AtRuntime() that controls execution of some functions, it is another possible cause.

@SergiiDmytruk
Copy link
Member Author

SergiiDmytruk commented Jun 6, 2024

Was very puzzled on seeing the crash even after adding if (EfiAtRuntime()) return EFI_INVALID_PARAMETER; to about every function assigned to gRT and turned out that VariableDxe.c uses SystemTable->RuntimeServices to set them up instead... Did get a backtrace in the end pointing at MbedTLS as an offender (frames 3 and 4):

#0  0x000000007f8fbaf2 in InternalAllocatePool (MemoryType=MemoryType@entry=EfiBootServicesData, AllocationSize=AllocationSize@entry=3340)
    at $EDK_PATH/MdePkg/Library/UefiMemoryAllocationLib/MemoryAllocationLib.c:376
#1  0x000000007f8fbba4 in InternalAllocateZeroPool (PoolType=PoolType@entry=EfiBootServicesData, AllocationSize=AllocationSize@entry=3340)
    at $EDK_PATH/MdePkg/Library/UefiMemoryAllocationLib/MemoryAllocationLib.c:469
#2  0x000000007f8fbc0e in AllocateZeroPool (AllocationSize=3340) at $EDK_PATH/MdePkg/Library/UefiMemoryAllocationLib/MemoryAllocationLib.c:496
#3  0x000000007f90cee8 in WrapPkcs7Data (P7Data=0xffff94690aaa0028 "0\202\f\365\002\001\0011\0170\r\006\t`\206H\001e\003\004\002\001\005", P7Length=3321, WrapFlag=0xffffac7e4006b32f ";\f\r", WrapData=0xffffac7e4006b338, WrapDataSize=0xffffac7e4006b330)
    at $EDK_PATH/CryptoPkg/Library/BaseCryptLibMbedTls/Pk/CryptPkcs7VerifyCommon.c:992
#4  0x000000007f90d66b in Pkcs7GetSigners (P7Data=P7Data@entry=0xffff94690aaa0028 "0\202\f\365\002\001\0011\0170\r\006\t`\206H\001e\003\004\002\001\005", P7Length=P7Length@entry=3321, CertStack=CertStack@entry=0xffffac7e4006ba70, 
    StackLength=StackLength@entry=0xffffac7e4006ba78, TrustedCert=0xffffac7e4006ba60, CertLength=0xffffac7e4006ba68)
    at $EDK_PATH/CryptoPkg/Library/BaseCryptLibMbedTls/Pk/CryptPkcs7VerifyCommon.c:1211
#5  0x000000007f9102d9 in Pkcs7GetSigners (CertLength=0xffffac7e4006ba68, TrustedCert=0xffffac7e4006ba60, StackLength=0xffffac7e4006ba78, CertStack=0xffffac7e4006ba70, P7Length=3321, 
    P7Data=0xffff94690aaa0028 "0\202\f\365\002\001\0011\0170\r\006\t`\206H\001e\003\004\002\001\005") at $EDK_PATH/SecurityPkg/Library/AuthVariableLib/AuthService.c:2199
#6  VerifyTimeBasedPayload (VarPayloadSize=<synthetic pointer>, VarPayloadPtr=<synthetic pointer>, OrgTimeStamp=0x7f94096c, AuthVarType=AuthVarTypePk, Attributes=103, DataSize=<optimized out>, Data=<optimized out>, VendorGuid=0xffff94690006b400, 
    VariableName=0xffff94690006b000) at $EDK_PATH/SecurityPkg/Library/AuthVariableLib/AuthService.c:2199
#7  VerifyTimeBasedPayloadAndUpdate (VariableName=0xffff94690006b000, VendorGuid=0xffff94690006b400, Data=Data@entry=0xffff94690aaa0000, DataSize=<optimized out>, Attributes=103, AuthVarType=AuthVarTypePk, VarDel=0xffffac7e4006bbcf "")
    at $EDK_PATH/SecurityPkg/Library/AuthVariableLib/AuthService.c:2487
#8  0x000000007f910c23 in ProcessVarWithPk (VariableName=0xffff94690006b000, VendorGuid=0xffff94690006b400, Data=0xffff94690aaa0000, DataSize=7085, Attributes=103, IsPk=<optimized out>)
    at $EDK_PATH/SecurityPkg/Library/AuthVariableLib/AuthService.c:768
#9  0x000000007f8fef92 in ProcessVarWithPk (IsPk=0 '\000', Attributes=103, DataSize=7085, Data=0xffff94690aaa0000, VendorGuid=0xffff94690006b400, VariableName=0xffff94690006b000)
    at $EDK_PATH/SecurityPkg/Library/AuthVariableLib/AuthVariableLib.c:435
#10 AuthVariableLibProcessVariable (Attributes=103, DataSize=7085, Data=0xffff94690aaa0000, VendorGuid=0xffff94690006b400, VariableName=0xffff94690006b000)
    at $EDK_PATH/SecurityPkg/Library/AuthVariableLib/AuthVariableLib.c:439
#11 VariableServiceSetVariable (VariableName=0xffff94690006b000, VendorGuid=0xffff94690006b400, Attributes=103, DataSize=7085, Data=0xffff94690aaa0000)
    at $EDK_PATH/MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c:2896

The failure occurs when Ubuntu updates dbx which requires processing of authenticated variables.

OpenSSL version uses malloc() in this case which is unavailable for MbedTLS. OpenSSL's implementation is here and we probably need to reuse that file for MbedTLS and update other allocations as well.

@SergiiDmytruk
Copy link
Member Author

OpenSSL version uses malloc() in this case which is unavailable for MbedTLS.

Actually, a copy of it exists and is used for MbedTLS as well, but it wasn't linked to PKCS7 implementation. This was easy to fix, however code of MbedTLS also invokes AllocateZeroPool() so now it fails in the same place but with a different stack.

Updating that to unconditional malloc() breaks boot, so will try to extract one file to 2 libraries to be able to use a different implementation for DXE_RUNTIME_DRIVER which seems to be the only way to do this because MbedTLS is of BASE type.

@SergiiDmytruk
Copy link
Member Author

SergiiDmytruk commented Jun 7, 2024

That worked (no more crashes in dmesg) but SecureBoot got broken because compiler optimizes away malloc() in PKCS7 implementation thinking that it can't work. Will need to make use of malloc() only in runtime phase in that file as well, not sure yet how.

@SergiiDmytruk
Copy link
Member Author

Will need to make use of malloc() only in runtime phase in that file as well, not sure yet how.

Reused the same libraries. There are no more page faults, but EFI variables aren't writable as in 24.04 case.

The reason is that SMI handler doesn't respond and EDK produces EFI_NO_RESPONSE which somehow ends up being reported as invalid parameter on command-line.

SMI handler doesn't work in runtime phase because Linux executes SetVirtualAddressMap() which relocates EFI runtime by moving it to 0xFFFFFFFE80000000 that a 32-bit SMI handler stub (https://github.com/Dasharo/coreboot/blob/a76d969c0717381e805515ea0e009ad58252bb80/src/cpu/x86/smm/smm_stub.S#L211-L216) just can't handle: upper half is lost and SMI gets ignored.

I tried to expose two tables from EDK to tell Linux to not relocate EFI runtime and deprecated EFI_PROPERTIES_TABLE did achieve that, as did passing efi=novamap kernel parameter from GRUB. This however didn't make anything work, I got CPA detected W^X violation in dmesg (seems to happen on doing one-to-one mapping) and a non-working EFI vars again.

So I attempted to make coreboot run in long mode by setting ARCH_ALL_STAGES_X86_64, which resulted in a reboot on SMI after SetVirtualAddressMap(). @krystian-hebel, should Dasharo/coreboot#514 work fine in this case?

By the way, I know that it's virtual addressing because if I break ConvertPointer() EDK keeps using original addresses and everything works (maybe messing up some memory in the process).

To sum up:

  • page faults related to MbedTLS were fixed
  • 32-bit coreboot's SMI handler doesn't work after SetVirtualAddressMap()
  • Linux doesn't work without SetVirtualAddressMap()
  • 64-bit coreboot's SMI handler doesn't work after SetVirtualAddressMap() either
  • it doesn't seem to be EDK's fault

This is a follow up for 6ca2060
and 784750e which provide more detailed
information on the issue and how this addresses it.

The files modified by this commit were chosen based on the list of
sources in CryptoPkg/Library/BaseCryptLibMbedTls/RuntimeCryptLib.inf

The only source permitted to request memory from boot services is
SysCall/RuntimeMemAllocation.c which does it in constructor before
ExitBootServices() is called.

Trying to update minimal set of files because some of the API which does
allocations get used outside of BaseCryptLibMbedTls and can do
FreePool().  In the updated files, allocations are of two types:
 - temporary allocations within a function (they don't get returned or
   set to some output parameter)
 - paired alloc/free kind of functions which remain in control of how
   the memory is treated

Signed-off-by: Sergii Dmytruk <[email protected]>
@macpijan
Copy link
Contributor

It's a rebase. Can't be merged, only force-pushed.

Yes, apologies for inaccurate wording.

So we would want to proceed as described here: #129 (review)

Can we do that?

Let's gather confirmation from @mkopec and @krystian-hebel and proceed if there are no problems with that.

@SergiiDmytruk
Copy link
Member Author

So we would want to proceed as described here: #129 (review)

Yes, current dasharo should probably be renamed to dasharo-stable202002-168 to give a hint about its upstream base (dd7523b).

Can we do that?

Probably won't hurt to merge #163 first (if nobody else is going to test/review it) and then anyone who can modify protected branches can do the update.

@macpijan
Copy link
Contributor

macpijan commented Sep 3, 2024

@miczyg1 I will leave this one to you then.

@miczyg1
Copy link
Contributor

miczyg1 commented Sep 4, 2024

Yes, current dasharo should probably be renamed to dasharo-stable202002-168 to give a hint about its upstream base (dd7523b).

What is 168?

@SergiiDmytruk
Copy link
Member Author

What is 168?

It's from the output of git describe: number of commits since stable202002 tag. dasharo isn't based on a tag exactly.

@miczyg1
Copy link
Contributor

miczyg1 commented Sep 5, 2024

It's from the output of git describe: number of commits since stable202002 tag. dasharo isn't based on a tag exactly.

Not exactly a tag, but based on stable202002 + something/ I don't think 168 introduces anything meaningful to branch name. I think dasharo-stable202002 would be more appropriate. We will not have anything more based stable202002 anyways

@SergiiDmytruk
Copy link
Member Author

I don't think 168 introduces anything meaningful to branch name.

It's not a big deal either way.

mkopec and others added 3 commits September 16, 2024 12:28
This is a port of upstream commits c248802 and bfefdc2

On modern platforms with TBT devices the coreboot resource allocator
opens large PCI bridge MMIO windows above 4GiB to place hotplugable
PCI BARs there as they won't fit below 4GiB. In addition modern
GPGPU devices have very big PCI bars that doesn't fit below 4GiB.

The PciHostBridgeLib made lots of assumptions about the coreboot
resource allocator that were not verified at runtime and are no
longer true.

Remove all of the 'coreboot specific' code and implement the same
logic as OvmfPkg's ScanForRootBridges.

Fixes assertion
"ASSERT [PciHostBridgeDxe] Bridge->Mem.Limit < 0x0000000100000000ULL".

Co-authored-by: Patrick Rudolph <[email protected]>
Signed-off-by: Michał Kopeć <[email protected]>
Tcg2PhysicalPresenceLibSubmitRequestToPreOSFunction writes to TPM2
physical presence PPI provided by coreboot (a memory region preserved
across reboots). CPU caches must be explicitly flushed prior to platform
reboot or request written to PPI will be lost.

Signed-off-by: Artur Kowalski <[email protected]>
Flush cache not only when placing request in PPI, but also after
clearing old request from PPI.

Signed-off-by: Artur Kowalski <[email protected]>
@SergiiDmytruk
Copy link
Member Author

Almost forgot to provide an observation regarding DasharoPayloadPkg after moving capsule update changes from DasharoPayloadPkg to UefiPayloadPkg for upstreaming: some things had to be adjusted but there weren't that many conflicts and some were caused purely by formatting changes.

The takeaway is that we can reduce the difference between the two packages over time by gradually switching to libraries/DXEs of UefiPayloadPkg where it makes sense (small or no differences). This can make future maintaining and upstreaming easier. "Gradually" would mean that when we are changing some code in DasharoPayloadPkg, it won't hurt to compare it to corresponding code in UefiPayloadPkg and evaluate if that code can be used instead.

It has been observed on MinnowBoard Turbot that the detected CPU
count is lower than the number of all cores. In the tested unit, CPU is
dual core, so only the BSP is detected. However, similar situation is
observed on MTL laptop, where a total of 22 cores should be reported,
but only 1 is detected.

After EDK2 rebase a new PCD has been added, PcdFirstTimeWakeUpAPsBySipi
which, when enabled (by default), sends only SIPI to APs. When disabled
sends full INIT-SIPI-SIPI sequence. Only the latter case causes all the
APs to wake up and be detected.

Signed-off-by: Michał Żygowski <[email protected]>
Signed-off-by: Sergii Dmytruk <[email protected]>
wiktormowinski and others added 5 commits September 27, 2024 13:36
This fixes "SecurityPkg: measure Dasharo variables before boot".

gRT->GetNextVariableName() doesn't return variables in any fixed order.
Seems like the order matches order in SMMSTORE.  This means that
measuring variables while enumerating them will produce different
results depending on which variables were update last (setting a
variable in SMMSTORE is marking old entry as deleted and appending of a
new one).  Sort list of variables that share the same GUID before
measuring any of them to impose a fixed order.

Also fix spacing in several places.

Signed-off-by: Sergii Dmytruk <[email protected]>
@macpijan
Copy link
Contributor

macpijan commented Nov 6, 2024

@SergiiDmytruk @miczyg1 I am confused, is there any conclusion of this discussion and how to approach with finalizing the work here?

@SergiiDmytruk
Copy link
Member Author

I think the branches could have been updated long time ago. Even if something else from dasharo is missing (as was pointed out by @miczyg1 internally in the task), so what? Those changes will just be added to a branch with a different name.

@miczyg1
Copy link
Contributor

miczyg1 commented Nov 7, 2024

@SergiiDmytruk @miczyg1 I am confused, is there any conclusion of this discussion and how to approach with finalizing the work here?

Nothing to finalize, except branch name changes. The rebased branch is already used in production and is a part of some releases...

@miczyg1
Copy link
Contributor

miczyg1 commented Nov 7, 2024

Changed branches:

  • current dasharo -> dasharo-stable202002
  • current rebased -> dasharo

Closing this PR and removing branches: rebased and rebase-base

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.