Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openess Score seem to be broken for Dasharo (coreboot+SeaBIOS) for PC Engines v24.05.00.01 #917

Closed
pietrushnic opened this issue Jun 28, 2024 · 18 comments · Fixed by Dasharo/Openness-Score#10

Comments

@pietrushnic
Copy link

Openness Score for pcengines_apu2_seabios_v24.05.00.01.rom

Open-source code percentage: 13.1%
Closed-source code percentage: 86.9%

  • Image size: 8388608 (0x800000)
  • Number of regions: 13
  • Number of CBFSes: 1
  • Total open-source code size: 370290 (0x5a672)
  • Total closed-source code size: 2453728 (0x2570e0)
  • Total data size: 684738 (0xa72c2)
  • Total empty size: 4879852 (0x4a75ec)

pcengines_apu2_seabios_v24 05 00 01 rom_openness_chart
pcengines_apu2_seabios_v24 05 00 01 rom_openness_chart_full_image

Numbers given above already include the calculations from CBFS regions
presented below

FMAP regions

FMAP region Offset Size Category
RW_VPD 0x1000 0x4000 data
SMMSTORE 0x5000 0x20000 data
RO_VPD 0x200000 0x4000 data
FMAP 0x204000 0x800 data
RO_FRID 0x204800 0x40 data
RO_FRID_PAD 0x204840 0x7c0 data
GBB 0x205000 0x40000 data

CBFS COREBOOT

  • CBFS size: 6008832
  • Number of files: 27
  • Open-source files size: 370290 (0x5a672)
  • Closed-source files size: 504032 (0x7b0e0)
  • Data size: 254658 (0x3e2c2)
  • Empty size: 4879852 (0x4a75ec)

Numbers given above are already normalized (i.e. they already include size
of metadata and possible closed-source LAN drivers included in the payload
which are not visible in the table below)

CBFS filename CBFS filetype Size Compression Category
fallback/romstage stage 24528 none open-source
fallback/ramstage stage 83143 LZMA open-source
fallback/dsdt.aml raw 6962 none open-source
fallback/postcar stage 22112 none open-source
fallback/payload simple elf 53612 none open-source
img/memtest simple elf 47478 none open-source
img/setup simple elf 26983 none open-source
genroms/pxe.rom raw 89088 none open-source
bootblock bootblock 16384 none open-source
AGESA raw 504032 none closed-source
cbfs_master_header cbfs header 32 none data
config raw 3107 LZMA data
revision raw 702 none data
build_info raw 88 none data
spd.bin spd 256 none data
payload_config raw 1599 none data
payload_revision raw 239 none data
etc/boot-menu-key raw 8 none data
etc/boot-menu-wait raw 8 none data
etc/sercon-port raw 8 none data
(empty) null 3549092 none empty
(empty) null 675492 none empty
(empty) null 655268 none empty
@miczyg1
Copy link
Contributor

miczyg1 commented Jun 28, 2024

Ohh.. Found couple of bugs and fixed them: Dasharo/Openness-Score#10

There were some uncaught cases because of it. Seems the real ratio is 33.1 of open-source to 66.9% closed-source

@pietrushnic
Copy link
Author

@miczyg1 I have no idea why. Can you explain to me why Dasharo (coreboot+UEFI) v0.9.0 has a higher Openness Score? At least it would be comparable. Maybe that is because of binary size, and we essentially benefit from a bigger binary size, which seems weird.

@mkopec
Copy link
Member

mkopec commented Jul 1, 2024

Probably comes down to binary size, SeaBIOS is 53KiB, UEFIPayload is almost 2M

@pietrushnic
Copy link
Author

We should consider modifying the Openness Score to promote empty space. A smaller codebase means a smaller TCB, which means potentially fewer places where bugs can occur, lower maintenance costs, etc. Otherwise, we end up in a situation like here, where firmware with 2M UEFI code as payload has theoretically better statistics than something with a minimal footprint. Another improvement we need is to have a summary table, on the Supported Hardware page, where we compare various code bases and can show the real footprint openness of the given solution.

@miczyg1
Copy link
Contributor

miczyg1 commented Jul 1, 2024

@miczyg1 I have no idea why. Can you explain to me why Dasharo (coreboot+UEFI) v0.9.0 has a higher Openness Score? At least it would be comparable. Maybe that is because of binary size, and we essentially benefit from a bigger binary size, which seems weird.

Most likely because of the payload. UEFI Paylaod is ~2MB of open-source code and we compare raw open-source to closed-source ratio. Obviously coreobot+UEFI will have more open-source code thus more openness. The tool does the pure maths. It neither lies nor will return results based on feelings.

Otherwise, we end up in a situation like here, where firmware with 2M UEFI code as payload has theoretically better statistics than something with a minimal footprint.

It is what it is. It is also not a good situation where you have a minimalistic implementation, but the blobs are overwhelming the open-source code by a few times. Say, 200KB of coreboot native code plus 1MB of FSP (and let's not forget a couple MBs of ME FW). What Openness Score should actually say about it? How one can otherwise represent the open-source percentage? I'm open to suggestions @pietrushnic

We should consider modifying the Openness Score to promote empty space.

The diagrams have a visual representation of empty space and how much % it takes in the image.

@pietrushnic
Copy link
Author

What Openness Score should actually say about it? How one can otherwise represent the open-source percentage? I'm open to suggestions @pietrushnic

I think TCB is good direction.

The diagrams have a visual representation of empty space and how much % it takes in the image.

Yes, but the architecture of information focuses on the openness of the code deployed, not on the size. When we have many scores published, we can say something about various releases and builds. So essentially, two requests:

  • Change the architecture of information in the output so free space is promoted; we can use the discussion from this thread to write a justification for that thinking. In general, my opinion is similar to Ron's: about firmware getting out of the way as fast as possible, which means a smaller code base serving the purpose of users is better than a more extensive code base that is fully featured. This could lead to the conclusion of the direction of innovation in the firmware space, which should happen. Here is another exciting perspective on the same problem: https://world.hey.com/dhh/finished-software-8ee43637, which sticks with KISS and UNIX design ideas.
  • Code that can generate a comparison table of various firmware already scored on the Dasharo page; I could present that on the next DUG because it could be interesting for the community.

@miczyg1
Copy link
Contributor

miczyg1 commented Jul 1, 2024

What Openness Score should actually say about it? How one can otherwise represent the open-source percentage? I'm open to suggestions @pietrushnic

I think TCB is good direction.

OK, how do we measure/calculate TCB in the context of firmware images?

The diagrams have a visual representation of empty space and how much % it takes in the image.

Yes, but the architecture of information focuses on the openness of the code deployed, not on the size.

I don't quite understand it. So code can be more open or less open? How does that help in calculations? Either the source is open or closed, period. Anything other than that is just delusion.

When we have many scores published, we can say something about various releases and builds. So essentially, two requests:

  • Change the architecture of information in the output so free space is promoted; we can use the discussion from this thread to write a justification for that thinking. In general, my opinion is similar to Ron's: about firmware getting out of the way as fast as possible, which means a smaller code base serving the purpose of users is better than a more extensive code base that is fully featured. This could lead to the conclusion of the direction of innovation in the firmware space, which should happen. Here is another exciting perspective on the same problem: https://world.hey.com/dhh/finished-software-8ee43637, which sticks with KISS and UNIX design ideas.

Well, you could compare how many closed-source bytes you have removed to achieve the same goal (i.e. booting) vs proprietary firmware. That would be rather liberation percentage, not openness percentage. But I think it is still worth to give it a try (I think initially we wanted to have a comparison mode for this utility). Of course, that only makes sense if you have a proprietary firmware for given platform (which is not the case for apus).

  • Code that can generate a comparison table of various firmware already scored on the Dasharo page; I could present that on the next DUG because it could be interesting for the community.

That calls for some database which would serve such comparison tables or a service to which one can upload the output of Openness Score utility to feed it. Request is valid, but creating such service/database is rather out of scope of this utility at first glance.

@pietrushnic
Copy link
Author

OK, how do we measure/calculate TCB in the context of firmware images?

That is a good question, and it should be executed in code. Still, since we have no way to effectively and in a scalable way explore, scan, and report which paths are executed, we have to assume that potentially all code inside binary can be executed and belongs to TCB. We can say we already measured that because it is the size of the code components in firmware.

So code can be more open or less open?

More or less code can be open; this is what the openness score measures in relation to the whole image size.

How does that help in calculations? Either the source is open or closed, period. Anything other than that is just delusion.

Where are you going with this? Probably, I didn't express myself correctly. You wrote, The diagrams have a visual representation of empty space and how much % it takes in the image. But how does this help? It doesn't help because a device with 1MB and 99% open is different from a device with 100MB and 99% open. This is a question about the goal of the Openness Score and what it should promote. It should encourage better solutions, which means as tiny as possible TBC with as big as possible open-source part. My main problem is that we have two releases for PC Engines and openness score saying UEFI one is better. I see no reason why; I would even say this is misleading because the UEFI version has a much bigger footprint and potentially many more bugs, assuming the same distributions of bugs in code.

Well, you could compare how many closed-source bytes you have removed to achieve the same goal (i.e. booting) vs proprietary firmware. That would be rather liberation percentage, not openness percentage.

This is a very good idea.

Of course, that only makes sense if you have a proprietary firmware for given platform (which is not the case for apus).

Yes, but for apu, we have multiple firmware flavors between which we could compare.

That calls for some database which would serve such comparison tables or a service to which one can upload the output of Openness Score utility to feed it. Request is valid, but creating such service/database is rather out of scope of this utility at first glance.

We already have such a "database," which is called docs.dasharo.com. All published open score docs could be parsed to get results and put in the table to compare.

@miczyg1
Copy link
Contributor

miczyg1 commented Jul 1, 2024

OK, how do we measure/calculate TCB in the context of firmware images?

That is a good question, and it should be executed in code. Still, since we have no way to effectively and in a scalable way explore, scan, and report which paths are executed, we have to assume that potentially all code inside binary can be executed and belongs to TCB. We can say we already measured that because it is the size of the code components in firmware.

Please note that data files and bits are already excluded/separated from the code bits.

So code can be more open or less open?

More or less code can be open; this is what the openness score measures in relation to the whole image size.

The utility calculates the code size in both categories, open-source and closed-source. There is no more or less open code. It is either closed or open componentwise. So it is focused around the size, despite what you say: architecture of information focuses on the openness of the code deployed, not on the size.. The binary as a whole can be more or less open based on the percentage, but not the code.

How does that help in calculations? Either the source is open or closed, period. Anything other than that is just delusion.

Where are you going with this? Probably, I didn't express myself correctly. You wrote, The diagrams have a visual representation of empty space and how much % it takes in the image. But how does this help? It doesn't help because a device with 1MB and 99% open is different from a device with 100MB and 99% open. This is a question about the goal of the Openness Score and what it should promote. It should encourage better solutions, which means as tiny as possible TBC with as big as possible open-source part. My main problem is that we have two releases for PC Engines and openness score saying UEFI one is better. I see no reason why; I would even say this is misleading because the UEFI version has a much bigger footprint and potentially many more bugs, assuming the same distributions of bugs in code.

I meant to "how division into more or less open code helps?". To me there is no division in more or less open. Either opened or closed. I wasn't referring to empty space. I'm open to suggestions how we can emphasize empty space.

Yes, but for apu, we have multiple firmware flavors between which we could compare.

Results would not be that meaningful since both flavours are partially open-source. One would simply get a different empty space and open-source code ratios.

We already have such a "database," which is called docs.dasharo.com. All published open score docs could be parsed to get results and put in the table to compare.

Why would we write a parser for content that we generate from a more structured data in a programming language? That's a waste of resources to parse MD. It would be better to feed Openness Score utility with all those binares and export the data in some CSV for further processing, rather than write a parser just to extract the data and gather it in one place... It's like going to Rome via China.

@tlaurion
Copy link

tlaurion commented Jul 1, 2024

This "libération score" is exactly what the FSF freedom ladder would be looking for to promote FOSS firmware alternatives that they try to bring up front of RYF. That would be really helpful.

@mkopec
Copy link
Member

mkopec commented Jul 1, 2024

IMO comparing binary sizes to determine openness is always going to be unfavorable towards the open components:

  • Blobs are one-size-fits-all, coreboot is built specifically for one board specifically and doesn't include unnecessary code
  • Open components are split into code and data, closed blobs have an unknown structure and we have to assume it's all closed. Intel ME surely has some unused space and data.
  • Compression, we don't know if / how blobs are compressed while coreboot is compressed as much as possible. Turn off CBFS compression and you get a higher openness score

@pietrushnic
Copy link
Author

The utility calculates the code size in both categories, open-source and closed-source. There is no more or less open code. It is either closed or open componentwise.

I meant to "how division into more or less open code helps?". To me there is no division in more or less open. Either opened or closed.

I admit I used the wrong words. You are right. There is either open or closed source code. There is a gray area where code is open for some and not open for others, e.g., FSP, but it is irrelevant for Openness Score. It was a mistake that should not distract us from achieving a conclusion about what should be improved.

So it is focused around the size, despite what you say: architecture of information focuses on the openness of the code deployed, not on the size..

I'm afraid I have to disagree with that. When you go to the report, you see graphs which are expressed in %. On top of the report you have:

Open-source code percentage: X%
Closed-source code percentage: Y%

The architecture of information, which our culture perceives, is that things are read from left to right and from top to bottom. It is expected that the information on top will be the most important. Sayin that the information architecture in the report focuses on %, relative value to size. It is not the size itself because you may have two of the same graphs based on different sizes, and you would not know that unless you dive deeper into the report. Even if you do so, you still need help understanding what you compare to. What does 370290 mean? Is it a lot or not, what units it is in?

So, my point is that we must align information architecture in the report with what is essential. How to do that is a good question; I have already made some proposals.

I wasn't referring to empty space. I'm open to suggestions how we can emphasize empty space.

Great that we agree on.

Results would not be that meaningful since both flavours are partially open-source. One would simply get a different empty space and open-source code ratios.

In numbers, yes, but depending on how you present that, code space has a big difference. If you show % of the flash occupied by open and closed sources side by side, it may be meaningful to some users. Let's not fall into the false consensus effect.

Why would we write a parser for content that we generate from a more structured data in a programming language? That's a waste of resources to parse MD. It would be better to feed Openness Score utility with all those binares and export the data in some CSV for further processing, rather than write a parser just to extract the data and gather it in one place... It's like going to Rome via China.

It is excellent you already know a better solution. It sound much better then your previous comment:

That calls for some database which would serve such comparison tables or a service to which one can upload the output of Openness Score utility to feed it. Request is valid, but creating such service/database is rather out of scope of this utility at first glance.

Initially, you describe the need for a database; if we consider CSV a database you had in mind, I have nothing against it. My point was that this feature needs no additional infrastructure, as you said, and is not that complex.

@pietrushnic
Copy link
Author

This "libération score" is exactly what the FSF freedom ladder would be looking for to promote FOSS firmware alternatives that they try to bring up front of RYF. That would be really helpful.

Are there any next steps we would like to take on that? I wonder if we should lobby anything regarding that. Please note there are competing solutions like LVFS HIS.

@pietrushnic
Copy link
Author

IMO comparing binary sizes to determine openness is always going to be unfavorable towards the open components:

It is true, but why not turn that into an advantage? A smaller footprint means a better solution. The comparison is valid if we compress the open part and ME is also compressed. We cannot avoid errors while evaluating the impact of a compressed binary blob. Still, it is better to make that comparison and provide a note about that instead of not providing any information.

OTOH, your point is valid because our goal should not be to optimize for the amount of open-source code but to minimize the amount of closed-source code. Open-source code here is just a tool that helps to achieve that. This may lead to the further conclusion that cutting binary blobs and eliminating parts of them is good. Still, I doubt you can distribute modified binaries (maybe we could optimize FSP for size). Still, in adjusting it, we should provide only tools and validate outcomes instead of distributing modified versions of binary blobs.

So, the conclusion leans toward liberation, but what in the case of already liberated releases? We should have measures of TCB size, which could be expressed as the amount of free space to some extent and, of course, should be presented in the comparison table. Are there better ideas?

@miczyg1
Copy link
Contributor

miczyg1 commented Jul 2, 2024

I'm afraid I have to disagree with that. When you go to the report, you see graphs which are expressed in %. On top of the report you have:

Open-source code percentage: X%
Closed-source code percentage: Y%

But that's still based on size of the open and closed code itself. Same metric but expressed in different aspect/view.

The architecture of information, which our culture perceives, is that things are read from left to right and from top to bottom. It is expected that the information on top will be the most important. Sayin that the information architecture in the report focuses on %, relative value to size. It is not the size itself because you may have two of the same graphs based on different sizes, and you would not know that unless you dive deeper into the report. Even if you do so, you still need help understanding what you compare to. What does 370290 mean? Is it a lot or not, what units it is in?

Everything is bytes. Is it a lot or not - it is a subjective question. The utility is not supposed to give feeling-based answers, but pure facts and mathematical statistics. Interpretation is left to the reader of the report. For some people some number may be high or low... Probably also depends on threat model.

Results would not be that meaningful since both flavours are partially open-source. One would simply get a different empty space and open-source code ratios.

In numbers, yes, but depending on how you present that, code space has a big difference. If you show % of the flash occupied by open and closed sources side by side, it may be meaningful to some users. Let's not fall into the false consensus effect.

I would also say to watch out and not expose too much information, to keep the reports readable and contain only important information. Wherever possible, I would like to stick to strict calculations. code space has a big difference of course it has, even right now. The size of the code is the base for all numbers shown in the report. I think, I did my job to extract and summarize the sizes, so to me the report contains everything I would like to know about the binary. That said, it is difficult for me to understand these "new needs" in the data representation. I guess PRs welcome and change the report format to your liking.

Initially, you describe the need for a database; if we consider CSV a database you had in mind, I have nothing against it. My point was that this feature needs no additional infrastructure, as you said, and is not that complex.

CSV would be trivial from the utility perspective. CSV may be later imported by something more complex and easily converted to a format/database/structure that is more suitable for querying/searching/summarizing/comparing etc. At least that's what I had in mind when proposing a CSV output.

@miczyg1
Copy link
Contributor

miczyg1 commented Jul 2, 2024

IMO comparing binary sizes to determine openness is always going to be unfavorable towards the open components:

That's correct. The blobs will only increase in size, while minimalistic implementations such as coreboot are not growing in size significantly.

  • Blobs are one-size-fits-all, coreboot is built specifically for one board specifically and doesn't include unnecessary code

I could argue a little bit with that.

  1. There is always some unnecessary code, because there are things you detect at runtime, because different CPUs/memories/peripherals... Sometimes you have to include drivers you don't need for one configuration, but maybe for another, to be flexible.
  2. Also there are some code paths that are not taken depending on board variants, e.g. DDR4 vs DDR5 training with FSP, or a bunch of IFs in coreboot code checking for some feature capability...

I would say coreboot doesn't include code that you would never use on a given platform, but includes code that you can potentially use.

  • Open components are split into code and data, closed blobs have an unknown structure and we have to assume it's all closed. Intel ME surely has some unused space and data.

Yes it does, but detecting it is not trivial. I would like to avoid parsing ME IFWI header if possible... For UEFI I already extract this from the report generated by UEFIExtract/UEFITool, because UEFITool parses the ME FW structures.

  • Compression, we don't know if / how blobs are compressed while coreboot is compressed as much as possible. Turn off CBFS compression and you get a higher openness score

Accounting for compression would cause irregular results when calculating the %. Displaying more bytes of code than binary size would also be confusing. Also determining blobs compression methods sounds unfeasible, given that each blob may have custom/different compression algorithm. The utility also checks if all files have been classified and if the sum of classified files/regions sums up to the total binary size. For these reasons I have kept the original, compressed sizes.

@pietrushnic
Copy link
Author

But that's still based on size of the open and closed code itself. Same metric but expressed in different aspect/view.

I am curious to know how it matters to the user. If looking at those numbers, users need to learn how extensive the considered code base was; it doesn't matter (1% of 10MB vs. 1% of 100MB - you still see 1%, but the conclusion is entirely different). We have to change that to something that would express the value of open-source firmware better.

Everything is bytes.

It should be mentioned in the report wherever we provide a number.

Is it a lot or not - it is a subjective question. The utility is not supposed to give feeling-based answers, but pure facts and mathematical statistics. Interpretation is left to the reader of the report. For some people some number may be high or low... Probably also depends on threat model.

Everything depends on the threat model, but we are treated as advisors here (the same thing is true for fwupd HSI and FSF RYF). Pure numbers only help if one knows how to interpret them. Our job is to assist in interpretation; we have enough experience, data, and knowledge to do that.

I would also say to watch out and not expose too much information, to keep the reports readable and contain only important information. Wherever possible, I would like to stick to strict calculations. code space has a big difference of course it has, even right now. The size of the code is the base for all numbers shown in the report. I think, I did my job to extract and summarize the sizes, so to me the report contains everything I would like to know about the binary. That said, it is difficult for me to understand these "new needs" in the data representation. I guess PRs welcome and change the report format to your liking.

I will work on an improved version of this report when the time comes.

CSV may be later imported by something more complex and easily converted to a format/database/structure that is more suitable for querying/searching/summarizing/comparing etc.

I don't think there's a need for that.

@tlaurion
Copy link

I rbi k the value shows best when comparing against stock bios for each of the firmware of each platform.

Historical comparison shows eg:

  • d16:
    • Stock : 100% closed source
    • Dasharo : 100% open (minus microcode if provided)

OptiPlex. Talos.

There would be interesting historical analysis to be done with raw data of all those stock vs open firmware, with again changes happening for old boards like haswell with NRI when merged and replacing Mrc blob wiry native ram init.

My point here is that the scores alone without comparing to stock shows empty space where there would be blobs there otherwise. Just like platforms running linuxboot would be interesting to compare versus their stock uefi alternatives prior of stripping and dxe replaced by linuxboot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants