Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Rules #16

Open
apt-itude opened this issue Jun 11, 2019 · 13 comments
Open

New Rules #16

apt-itude opened this issue Jun 11, 2019 · 13 comments
Assignees
Milestone

Comments

@apt-itude
Copy link
Owner

The current pip rules have a few major flaws that really warrant a rewrite. This repo doesn't have a whole lot of traction, but I plan to make some very breaking changes, and I don't want to rip the rug out from under anyone who might actually be using these rules without an explanation of the overhaul.

@tmc @Globegitter @kerinin @chaoran: If any of you are using the current rules, I would love feedback on whether the proposed changes would make your lives easier or harder and whether it would be difficult to migrate. Also open to ideas, although I have a POC of what I'm proposing that's about 85% done available via the repository-rule-per-distribution branch.

Current Problems

All Python distributions are obtained via a single repository rule (pip_repository) which basically just executes pip wheel

This causes the following problems:
1. This forces all dependencies to be downloaded in order to build/test any Python target, which can slow in a large repository
1. This forces all dependencies to be re-downloaded any time one dependency changes
1. If any wheel fails to download or build, the user gets a vague error about a pip package not containing a BUILD file, and then has to run bazel clean --expunge and re-download everything yet again

The pip_repository rule runs pip wheel under the host configuration,

This means it may build platform-specific wheels for packages that are only available as source distributions which, as far as I can tell, makes Bazel remote execution impossible.

The pip_repository rule may only be used with a single Python version

In a workspace that contains both Python 2 and Python 3 code, you must create separate @pip2 and @pip3 repositories. This makes it difficult for a library to be compatible with both Python 2 and Python 3 unless all dependencies happen to be universal wheels. Even then, it seems wrong for a Python 3 target to transitively depend on a @pip2 package.

It requires developers of the rules to compile and source control .par files for each of the tools whenever changes are made

This is a PITA and it's hard to enforce through CI that developers actually do this every time.

Requirements

To solve these issues, the rules should be enhanced or rewritten to support the following goals:

A single repository rule should be created for each Python distribution in the requirement set

Each distribution in the locked, exhaustive set of requirements should correspond to a separate repository rule that will be responsible for fetching that distribution and creating an external workspace for it.

This will allow Bazel to fetch only the distributions that are actually needed for a given target, and it will allow new dependencies to be fetched incrementally instead of re-fetching everything when the requirements change.

The repository rules should operate exclusively on wheels

When compiling the locked, exhaustive requirement set (i.e. the current compile_pip_requirements rule), wheels must be built for all distributions that do not publish a wheel for the target platform. The built wheels then must either be committed to source control or published to a custom Python package index. These steps must be repeated for all supported platforms.

This means that no build step will be necessary at fetch time, so there is no risk of non-hermetic actions failing and forcing the user to use --expunge. It also means that the correct wheel can be fetched based on the target platform instead of the host platform, enabling remote execution.

The correct wheel should be selected based on Python version and platform

Every required distribution should be available as a py_library via a simple @pip//<name> label (e.g @pip//lxml or @pip//pytest_mock). This label should select the correct wheel for the target platform and Python version.

This can be accomplished via an alias that uses select statements with the @bazel_tools//tools/python and @bazel_tools//platforms conditions.

This will allow a library with pip dependencies to be a dependency of a binary target with python_version set to PY2 or PY3.

Any Python tool used by a repository rule should be a single file with no dependencies outside the standard library

This eliminates the need to compile and source-control .par files.

New Rules

The rules are actively being rewritten or replaced to accomplish the above goals. The new rule set will consist of the following:

pip_lock

pip_lock = rule(
    implementation = _pip_lock_impl,
    doc = """
        Defines a binary target that may be executed via `bazel run` in order to compile
        any number of `requirements.txt` files into a single `requirements-lock.json`
        file. This binary should be executed on all supported platforms to add the 
        correct set of requirements to the lock file for each platform. 
    """,
    attrs = {
        "requirements": attr.label_list(
            allow_files = True,
            doc = """
                Files following the standard requirements file format 
                (https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format)

                These should define direct dependencies only, and should not pin
                dependency versions unless necessary.
            """,
        ),
        "requirements_lock": attr.string(
            default = "requirements-lock.json",
            doc = """
                A path relative to the package in which this rule is defined to which
                the compiled lock file should be written. This file should 
                be source-controlled and provided as input to the `pip_repository` rule.
            """,
        ),
        "python_version": attr.string(
            values = ["PY2", "PY3", "PY2AND3"],
            default = "PY2AND3",
            doc = """
                The Python versions for which to compile the requirement set. The
                requirements lock file will contain one environment per Python version 
                per platform. Each environment will define its locked requirement set.
            """,
        ),
        "wheel_dir": attr.string(
            default = "wheels",
            doc = """
                A path to a directory relative to the package in which this rule is
                defined in which built wheels will be stored. If the given directory
                does not already exist, it will be created. 
                
                Wheels will be built for any required distribution that is not already 
                available as a wheel for the given environment. These wheels may be
                committed to source control or published to a custom Python package 
                index. If the latter approach is used, this binary should be run again
                with the `index_url` attribute set to the URL of that index in order to 
                resolve the new locations of those wheels.
            """,
        ),
        "index_url": attr.string(
            default = "https://pypi.org/simple",
            doc = "The URL of a custom Python package index to use instead of PyPI.",
        ),
    },
    executable = True,
)

pip_repository

pip_repository = repository_rule(
    implementation = _pip_repository_impl,
    doc = """
        Defines an external workspace containing `py_library` targets for every
        requirement found in the given requirements lock file.
        
        Each requirement in the lock file will be available via a label with the 
        following format:
        
        `@<repo_name>//<distro_name>`

        where `<repo_name>` is the name of this repository and `<distro_name>` is the
        canonical name of the Python distribution found in the lock file. The canonical
        name is all lowercase, with hyphens replaced by underscores. For example,
        `PyYAML` would become `@pip//pyyaml` and `pytest-mock` would become 
        `@pytest_mock`.

        This rule also generates a `requirements.bzl` file containing a `pip_install` 
        macro that should be loaded and invoked in the WORKSPACE file. This declares all 
        of the repository rules for the required Python distributions.
    """,
    attrs = {
        "requirements": attr.label(
            allow_single_file = True,
            doc = """
                The label of a requirements lock file, generated by the `pip_lock` rule, 
                defining the required Python distributions.
            """,
        ),
        "rules_pip_repo_name": attr.string(
            default = "com_apt_itude_rules_pip",
            doc = """
                The workspace name that was used for the `rules_pip` repository, if not
                the default.
            """,
        ),
    }
)

Example usage

WORKSPACE

git_repository(
    name = "com_apt_itude_rules_pip",
    remote = "https://github.com/apt-itude/rules_pip.git",
    tag = "1.0.0",
)

load("@com_apt_itude_rules_pip//rules:dependencies.bzl", "rules_pip_dependencies")

rules_pip_dependencies()

thirdparty/pip/BUILD

load("@com_apt_itude_rules_pip//rules:lock.bzl", "pip_lock")

pip_lock(
    name = "lock",
    requirements = ["requirements.txt"],
)

thirdparty/pip/requirements.txt

pytest-mock
PyYAML

Generate lock file

bazel run //thirdparty/pip:lock

Add to WORKSPACE

load("@com_apt_itude_rules_pip//rules:repository.bzl", "pip_repository")

pip_repository(
    name = "pip",
    requirements = "//thirdparty/pip:requirements-lock.json",
)

load("@pip//:requirements.bzl", "pip_install")

pip_install()

some/package/BUILD

py_library(
    name = "dopecode"
    srcs = ["dopecode.py"],
    deps = [
        "@pip//pytest_mock",
        "@pip//pyyaml",
    ]
)
@apt-itude apt-itude self-assigned this Jun 11, 2019
@apt-itude apt-itude added this to the v1.0.0 milestone Jun 12, 2019
@apt-itude apt-itude pinned this issue Jun 12, 2019
@codebreach
Copy link

Thanks for the writeup. We have been using this for our internal monorepo, everything works great other than things due to lack of target platform support. This becomes super visible when building docker images on a Mac (which just does not work due to Linux deps required but OsX deps provided into the built image). We are going to test this branch out over the weekend and will let you our findings.

One change we had to make was to enable support for purelib/platlib (like tensorflow). You can see the changes SpotDraft@88ec9ae. While there are a lot of TODOs and NOTEs in there, we have been running 50+ rules off this depending on pure/platlibs. I can open a separate bug for this if you support of such pip deps is something you intend to support in this repo.

Also, was just wondering how this would be potentially supported now that you have updated to always build wheels/use prebuilt wheels. Or in other words, would be great to have some clarity around this part:

wheels must be built for all distributions that do not publish a wheel for the target platform. The built wheels then must either be committed to source control or published to a custom Python package index. These steps must be repeated for all supported platforms.

Specifically does rules_pip handle the building of wheels? Is the https://github.com/apt-itude/rules_pip/blob/repository-rule-per-distribution/src/bin/create_wheel_repository.py binary the right place to target to make changes for purelib/platlib support?

@apt-itude
Copy link
Owner Author

@codebreach sorry for the delay. I would love to hear your results if you've tested it out for building Docker images.

Regarding the "built wheels" confusion, I meant that the pip_repository and pip_install rules would only support wheels, rather than source distributions, because building wheels hermetically is a real challenge. This also means that all sources in the requirements-lock.json file must be wheels.

It does not mean that you're on your own for building those wheels. The pip_lock rule, which is used as a manual bazel run step to generate/update your requirements-lock.json, will build wheels for all packages that are only available as sdists on PyPI. However, it builds the wheels for the host platform, not the target platform, so you need to run the rule on each of your target platforms.

Your use case sounds like you want to run Bazel on macOS to build a Linux Docker image, so you could potentially use something like dazel to run the pip_lock rule in a Docker container and mount the wheels directory as a volume.

Regarding purelib/platlib support, you are correct that the create_wheel_repository.py script would be the place to do it. That is the tool used to convert a wheel file into a py_library. We already have issue #15 to track this, and you are welcome to help tackle that.

@tmc
Copy link
Contributor

tmc commented Jul 12, 2019 via email

@apt-itude
Copy link
Owner Author

@tmc can you elaborate on what you mean by that?

@tmc
Copy link
Contributor

tmc commented Jul 14, 2019 via email

@apt-itude
Copy link
Owner Author

Ah gotcha. I really didn't intend to expose or support pytest rules (the repo is named rules_pip, after all). I just implemented those for the sake of testing this repo. I will likely update the existing pytest_test rule to use the @pip//pytest target instead of switching on the old @pip2//pytest and @pip3//pytest targets, but I wasn't planning on documenting and really supporting it for external use since. It's really not a complicated rule, however, so it could easily be ported into another repo.

@tmc
Copy link
Contributor

tmc commented Jul 21, 2019

@apt-itude how are things going here? might have some spare cycles to help hack on these improvements.

@apt-itude
Copy link
Owner Author

@tmc the new rules are actually very much functional. I have a development branch going with the new rules in our repo at 128 Technology, which has over 250 py_library targets and over 100 pip dependencies, and it works without any issues. At the moment, I've paused a little bit on development of the actual rules while I try to sort out the changes to our CI process since we need to publish the wheels built by the pip_lock rule to our internal PyPI repo on Artifactory.

You can track the status of this effort in the New Rules Project as well as the status of the broader 1.0.0 Milestone. Issues that are done aren't actually closed since they haven't been merged into master but are labeled as done. Feel free to tackle any of the open issues, but I would especially appreciate an assist on #20 because I was struggling a bit to get that working.

Honestly, one of the most helpful things would be to just try out the new rules and provide feedback on any issues you run into or whether they even improve your development experience. I'll try to whip up a migration section in the README to help with the switch.

Thanks for the help/interest!

@apt-itude
Copy link
Owner Author

Update: I added a whole bunch of detail to the README in the new branch, including a migration section.

@marcromeyn
Copy link

This looks awesome! Am I correct to say that these new rules would work with remote execution?

And do you have an example by any chance of the setup you described to build Linux docker images on a Mac through Dazel? If remote execution is enabled (which would run on Linux I assume), Dazel wouldn’t be necessarily needed I guess?

@groodt
Copy link

groodt commented Oct 14, 2019

This does indeed look awesome! @apt-itude are you still likely to continue pushing this forward?

@apt-itude
Copy link
Owner Author

Hey guys,
I'm sorry I haven't been active with this project lately. I actually just changed jobs, and I'm no longer actively using Bazel, so unfortunately I don't think I will have the time to push this across the finish line. It's possible that somebody else from @128technology will pick this up in the future, but I can't guarantee what their priorities will be.

The new rules are actually pretty functional in their current state, but as we started to use them at 128, it became clear that using a single requirements-lock.json file for all environments is cumbersome and causes problems if you want wheels to come from different repos for different environments.

Obviously, anyone is welcome to fork this and take it over. I'd be happy to answer questions about it.

@groodt
Copy link

groodt commented Oct 21, 2019

Thanks for your honesty @apt-itude

Good luck with your new gig!

It's a shame (from a purely selfish perspective), but hopefully the community can pick this up or there another promising set of rules for Python dependencies emerges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants