Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidance on supporting static libraries #40

Open
xnox opened this issue Aug 17, 2024 · 5 comments
Open

Guidance on supporting static libraries #40

xnox opened this issue Aug 17, 2024 · 5 comments

Comments

@xnox
Copy link
Member

xnox commented Aug 17, 2024

many language ecosystems have great support for documenting and tracing static library linking.
For example in go go version -m /path/to/dir/or/binary prints all go packages that were statically linked.
Similarly when rust binaries are built using cargo-auditable, they can report all the crates they were built with.

I am only just exploring package notes, and it works really well for ELF binaries and shared libraries, and the dlopen stanzas.

I really really want this to work for statically linked libraries too, and for it to work with lto.

Static libraries is just an archive of object files. Thus one cannot really attach the package note to it. One can append an object into the library that just have the package note. And I hope that will be still propagate to a statically linked binary. Alternative to just a single object with a package note.... maybe we should be attaching package note to every .o object, and then deduplicating that information at link time? Or something else?

More fun stuff, making static .a library can be replaced by a linker script that specifies the package.note flags and the underlying static library? :-)

Also not sure about the package note spec. For example, if we need to extend the schema; allow multiple sections; or simply rely on the fact that json allows streams of objects one after another.

Because I really want to expose the information w.r.t. statically linked libraries/packages in the binaries.

@xnox
Copy link
Member Author

xnox commented Aug 17, 2024

Also not sure if linker script can be used instead of static library; as i actually only ever so that being used with .so most notably /usr/lib64/libncurses.so

@xnox
Copy link
Member Author

xnox commented Aug 17, 2024

Hm maybe it can be an LD plugin https://sourceware.org/binutils/docs/ld.html#Plugins and write out package.note data as a text file inside the ar archive and then later use it in the linking.

@smcv
Copy link
Contributor

smcv commented Feb 7, 2025

One can append an object into the library that just have the package note. And I hope that will be still propagate to a statically linked binary.

I think this would not work the way you are hoping, because the linker usually only copies object files from a static library into a binary if the binary needs them. For example if libfoo.a contains foo.o (implementing foo()) and bar.o (implementing bar()), and you link libfoo.a into a static binary that calls bar(), the linker will copy bar.o into the static binary, but it will discard foo.o as unused.

So I think package notes for static library linking would have to use one of the more complicated approaches you've described.

@bluca
Copy link
Member

bluca commented Feb 7, 2025

We talked about it on Monday, and the conclusion was to add a JSON sub-object inside the existing package-notes JSON object, that lists all the static dependencies used by the binary. This would be filled in at package build time.

@xnox
Copy link
Member Author

xnox commented Feb 7, 2025

We talked about it on Monday, and the conclusion was to add a JSON sub-object inside the existing package-notes JSON object, that lists all the static dependencies used by the binary. This would be filled in at package build time.

yes, but i dislike and object to sub-objects, as it is difficult to parse and construct them. I would rather prefer something like https://jsonlines.org/ or any other formats described in https://en.wikipedia.org/wiki/JSON_streaming where the format is expanded to support multiple records, in a backwards compatible way; and also modify linker to aggregated/consume multiple records. Plus when people build their own binaries, they do mix/match binaries from different vendors (primary archive + PPAs + SIGs archives + OBS projects etc), and I would not want to teach things how to constructed a nested object/structure to accumulate all of these dependencies.

and yes the comment from @smcv above & @bluca in-person do stand, that it is likely impractical to encode this in the .o or the .a itself, given how freeform construction of those files are and selective usage of them.

And I also dislike making it complicated, thus would prefer some sort of deliberate choice to encode this information, consume it, and propagate it.

Update pkgconfig files to declare package-metadata flag in Libs.private (rinse repeat for like Cmake modules, bazel module, etc):

Eg.

# grep Libs /usr/lib64/pkgconfig/libcrypto.pc 
Libs: -L${libdir} -lcrypto
Libs.private: -lz -ldl -pthread --package-metadata='{"type":"rpm","name":"openssl","version":"3.2.2-11.fc41","architecture":"x86_64"}'

(it can also be -spec file location as well)

Then when buildilding a binary and choosing to statically link it with a binary, multiple package-metadata flags will be passed during link command:

--package-metadata='{"type":"rpm","name":"my-static-app","version":"10","architecture":"x86_64"}' --package-metadata='{"type":"rpm","name":"openssl","version":"3.2.2-11.fc41","architecture":"x86_64"}' 

With expectation that the resulting ELF note is aggregated into Json-lines format (or another json stream format), by joining all such arguments with a "\n" (alternatives exist too with json-record delimiter).

readelf --notes my-static-app
Displaying notes found in: .note.package
  Owner                Data size 	Description
  FDO                  0x0000007c	FDO_PACKAGING_METADATA
    Packaging Metadata: {"type":"rpm","name":"my-static-app","version":"10","architecture":"x86_64"}
                        {"type":"rpm","name":"openssl","version":"3.2.2-11.fc41","architecture":"x86_64"}

If a distribution has special knowledge of statically linked / generated things (i.e. Debian's Built-using) it can generate such --package-metadata flags (or write them out as a spec-file) and also straight away start populating accurate linking information.

The primary goal is to lit up security scanners of distributed binaries (distribution vendor static binaries, but also self-built static binaries on a given distribution) that are statically linked or elsehow vendored in. Hence I envision things like boost header only libraries to be declared as well potentially. Or pure data - a tonne of binaries compile javascript apps, and vendor them in as data inside the binary without any visibility of the SCA of what was vendored in (although this seems to be particularly popular with golang elf binaries).

If importance of the metadata is important, we can extend each record with extra fields "composition": "main|static-lib|data|header-only...." to give a hint of what/how that thing got into the binary, if known. Although this is hard to tell, as the statically included library may not know, how/why it was included. Or order them by some convention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants