-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--bootstrap_impl=script breaks pkg_tar, bazel-lib tar and py_binary with py_package + py_wheel #2489
Comments
I think this is somewhat WAI. A plain py_binary can't really be given as something for py_package to process, and the plain py_binary isn't meant to be redistributable. The first order problem is that zip doesn't support symlinks. There's some extensions to allow it to store them, though. However, that might be tricky because most of files it is given are going to be symlinks to something else, so we'd need some way to tell "dereference these symlinks, but not these symlinks". Maybe by reading the symlink and see if its non-absolute? IDK. If you really want to package the whole binary, then you're probably better off packaging the zip file version of the binary. That has special code in its startup to handle the case of coming from a zip file that couldn't store symlinks. Why do you want to pass a py_binary to py_package? |
Is this also affecting the |
Yeah it's the same problem. The problem as I see it, is that the venv is created both in the runfiles directory, but also in the directory containing the runfiles directory (is there a name for this?). Only the former is required, the latter isn't functional (broken symlink) and never gets invoked anyway; stage-1 bootstrap runs the former. I didn't see an easy way to fix this - usually I would've tried to use runfiles symlinks if I wanted to create a symlink only inside the runfiles directory, but this doesn't let you create arbitrary relative symlinks. Personally I see this as a deficiency of the related |
I don't know, It is affecting pkg_tar . Should be related to bazelbuild/rules_pkg#115 |
This is a very old usecase for us where we build wheels for Apache-beam application for Dataflow jobs |
I personally think the Do I understand correctly that the As for That said, I think the current |
@aignas just tested with
|
I think |
Is there any downside to containerizing the zip file, then? OR can we maintain a function similar to |
I think it would help if you explained your use-case a bit more thoroughly. This sounds like a deficiency in the workflow or the way that Apache Beam is consuming runtime code for execution. Building a "fat wheel" (like a "fat jar" in Java) isn't the remit of py_package and py_wheel rules. The closest thing to a "fat jar" is the Python zip support via output groups. But it's unclear to me if this is supported by the target runtime (Dataflow?) that you're using. |
Sorry @groodt , I was referring to building docker images and not wheels since --bootstrap_impl=script is broken for both pkg_tar and We have migrated most of our dataflow jobs to docker images, just a couple still use wheels, so that is less of a problem. |
Probably? Ultimately, I'd like to remove all the zip stuff from py_binary itself:
Presumably, if one can create e.g. a zip from a py_binary, then using a different format, e.g. tar, would be fairly simple. To clarify, though -- the output is a zipapp-based thing. That is not quite the same as just putting all of a py_binary into a tar file (or equiv). The former means deriving a slightly different "runnable thing" from the py_binary. The latter means simply putting all the py_binary files into a tar file (essentially If you want the latter, then you're probably better off using e.g. rules_pkg, since that has various facilities to tar up an arbitrary binary and its runfiles. |
I came to this line of thinking becuase Unfortunately I cannot use |
I think that's just incidental. The zip file doesn't contain the venv
Ah, hm, yeah. Because the input is the bazel-bin symlink forest, so you can't distinguish a "real" symlink from a "convenient" symlink? Actually, maybe File.is_symlink would allow solving that. Basically, that can tell us which files are supposed to be symlinks. So then it's just a matter of telling the tool to not dereference paths where File.is_symlink is true. Looking at the CLI of tar, this looks rather tedious, but possible. I think what you'd have to do is one invocation with --dereference (pass it all files for which is_symlink=False), then a second invocation with --no-dereference (pass it all files for which is_symlink=True). |
Just a thought I had just now and I am not sure if the responsibility of what I am gonna describe below is really within the scope of In order for the
When packaging, we almost always want the option 2. because we are probably creating an archive anyway, so the extra operations should not matter. So the only way Not sure I really like my suggestion of creating an extra configuration and use transitions to work around the symlink issue, but I will leave it here in case it sparks some better ideas. |
First, I think this should be handled at the level of the tools, not Having separate configurations will not make much of a difference because, in most cases, we have tests that involve testing the packages, or even container_structure tests will require packaging; option 2 will have to be used at all times. |
I'm also running into this. Created https://github.com/philsc/rules_python-tar-failure before I found this issue. Our use case is putting py_binary targets into docker containers. |
This issue came up in slack with a couple more people also affected, so I think I'll prioritize trying to figure out some sort of work around. I spent the morning looking at rules_pkg, see if it be made happy. It looks fairly easy to make it support raw symlinks using File.is_symlink. The catch is File.is_symlink is only available on Bazel 8+. To do this:
This looked to work in my prototyping. Next, can we use existing rules_pkg functionality to work around this? Ehhh.... I wasn't able to figure out how to convince rules_pkg to accept these raw symlinks, though. It allows creating symlinks at arbitrary locations, so you can manually define a e.g.
So yeah, that took up my allotted time this morning. A couple remaining ideas are:
But really, it starts to look easier to fix rules_pkg. On the rules_python side, I think the two main options we have are:
Neither of these is particularly appealing. I'm not sure which is the lesser evil. Both seem like headaches to implement and deal with edge cases. |
I have been working on this PR to fix this for |
I created #2586 re: my desire to split the py_binary-builtin zip file creation out. Such a refactor would also probably help this situation. I spent some time on the weekend working on a create-venv-at-runtime-in-tmp based solution, based on setting a command line flag. It looks to work, and will probably work OK enough until packaging rules catch up. Though I think I'll add an envvar to control where the venv gets put (/tmp isn't friendly to a long-running process). That File.is_symlink is bazel 8 only is annoying. Maybe have py_binary put a list of its declare_symlink()'d files into a provider somewhere? This would give consuming packaging rules a way to identify such files in Bazel 7. |
+1 for putting things into a provider. |
🐞 bug report
Affected Rule
py_package + py_wheel
Is this a regression?
Yes, related commit #2409
Description
Using the resulting py_binary with py_package+py_wheel fails to find the interpreter symlink. This also breaks
pkg_tar
from rules_pkg🔬 Minimal Reproduction
🔥 Exception or Error
The text was updated successfully, but these errors were encountered: