Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to PXE boot VMs with TripleO, out of date PXE on 8.21 #559

Open
timolow opened this issue Jul 8, 2022 · 6 comments
Open

Unable to PXE boot VMs with TripleO, out of date PXE on 8.21 #559

timolow opened this issue Jul 8, 2022 · 6 comments

Comments

@timolow
Copy link

timolow commented Jul 8, 2022

Linking the issue I raised in ipxe repo here: xcp-ng-rpms/ipxe#1

I have converted my lab from VMware to XCP-ng 8.21, I am trying to get my Openstack (TripleO) lab deployed again I am hitting a snag in the PXE version of the Xen guest. Is there any way to get PXE upgraded for the guest VMs? When the VMs attempt to PXE boot they spam "Unrecognized Option --timeout" and refuses to boot. https://bugzilla.redhat.com/show_bug.cgi?id=1343649

TripleO/RDO is a deployment stack based on ansible/puppet/podman and will create a fully running "Overcloud". The PXE functionality is used to inspect the hardware config, install the base OS and also used for cleaning the nodes for retirement or redeploy. PXE booting is a fundamental technology in managing the lifecycle of overcloud nodes. A quick primer on the use of PXE with openstack: https://tripleo-docs.readthedocs.io/en/latest/environments/baremetal.html

@stormi
Copy link
Member

stormi commented Jul 8, 2022

Here are the RPM sources for our ipxe and ipxe-efi packages: https://github.com/xcp-ng-rpms/ipxe and https://github.com/xcp-ng-rpms/ipxe-efi. They were inherited from XenServer when we forked.

Last time I discussed the matter with a developer at Citrix, they told me that things tended to break in subtle ways when they upgraded ipxe, which would explain why the version shipped is so old.

Maybe the issue could be fixed with a simple patch to ipxe to make it handle the --timeout option. Or maybe you could try to build a more recent version (https://xcp-ng.org/docs/develprocess.html#local-rpm-build), replace it and see how it goes?

We have different version for BIOS and UEFI, by the way. Do both cause the issue you reported?

@timolow
Copy link
Author

timolow commented Jul 13, 2022

I got a bit further on this issue but hit a dead end. I did enable UEFI and got the introspection going, however when deploying the overcloud things fall apart.

It tires to PXE boot and it gets this far before dropping me into the UEFI shell.

Start PXE over IPv4
Station IP address is 10.1.2.5
Server IP address is 10.1.2.2
NBP filename is undioly.kpxe
NBP filesize is 73125 Bytes

Download NBP file...

NBP file downloaded successfully.

Start PXE over IPv4.

sometime later is drops me into a EFI shell.

@stormi
Copy link
Member

stormi commented Jul 26, 2022

So this looks like a different issue now.

NBP filename is undioly.kpxe is this a typo in your comment? It should be undionly.kpxe if I remember correctly.

@timolow
Copy link
Author

timolow commented Aug 3, 2022

Yes, it was a transcription error between the screen output and github.
pxe-boot-issue

@timolow
Copy link
Author

timolow commented Aug 7, 2022

I ended up spending a few hours working on this and troubleshooting. I first tried to edit out the --timeout command from my tripleo deployment to bypass the issue, that worked for the introspection of the nodes but once the system tired to load the overcloud VM it would just timeout or reboot the VM.

I then moved on and extracted roms from: ipxe-roms-qemu-20180825-3.git133f4c.el7.noarch.rpm and ipxe-roms-qemu-20160127-5.git6366fa7a.el7.noarch.rpm (both of these versions work fine), I first tried the "cat rtl8139.rom 8086100e.rom > /usr/share/ipxe/ipxe.bin", the realtek ethernet card worked pxe booting, however the e1000 rom never loaded and was greeted by a no boot devices found, shutting down in 30 seconds. Spent some time on it but ended up finding that if you simply copy 8086100e.rom to /usr/share/ipxe/ipxe-1000e.bin both the realtek and e1000 fully pxe boot the tripleo environment.

@stormi
Copy link
Member

stormi commented Aug 22, 2022

Thanks for the feedback. So as I understand it, you currently have a workaround which consists in:

  • edit out the --timeout command from TripleO, which solves the initial issue but only moves you to the next error
  • replace /usr/share/ipxe/ipxe-1000e.bin in dom0

Do you also still need to modify /usr/share/ipxe/ipxe.bin for the realtek card?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants