Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cot): don't use parent repo when evaluating pull request json-e c… #646

Merged
merged 2 commits into from
May 22, 2024

Conversation

ahal
Copy link
Contributor

@ahal ahal commented Apr 26, 2024

…ontext on forks

Issue: #645

@ahal ahal self-assigned this Apr 26, 2024
@ahal
Copy link
Contributor Author

ahal commented Apr 26, 2024

I'm looking for some extra scrutiny on this PR. I think this is correct, but I don't feel very confident. There's a deeper explanation of what this is fixing in issue #645

@bhearsum
Copy link
Contributor

I don't have any prior knowledge of whether this is important or not :(. It looks like this was added by @JohanLorenzo in 93f061f. I don't see any discussion of this specific point there - but maybe @JohanLorenzo will have some memory of this.

I do think that this specific change is very low risk, as literally the only thing we pull from it is the pull request data. There is certainly no point in trying to pull that data from the wrong repository. If there's something wrong or insecure about pulling it from a fork, we should simply desupport this behaviour altogether.

I will note that many of our staging repos are not created as forks. For example https://github.com/mozilla-releng/staging-adhoc-signing and https://github.com/mozilla-releng/staging-xpi-manifest. This was done at least in part to ensure that they are fully disconnected from production, and as close as possible to a production scenario. I don't know whether this code is scriptworker was a reason we created them like this, though. It is, at least, another way to deal with the accute issue in the staging-firefox-ios repository. The fact that this workaround exists also seems to suggest that this fix is a good one - nothing about the security properties of the repository changes with it being a fork vs. not being a fork. Scopes remain the same, access control on the repo remains the same, etc.

@ahal
Copy link
Contributor Author

ahal commented Apr 29, 2024

I will note that many of our staging repos are not created as forks.

Ah interesting, I recall a conversation sclements drove where we agreed that staging repos should be forks. IIRC, she even went and made a bunch of them forks. This might have happened while you were off the team.

@ahal
Copy link
Contributor Author

ahal commented Apr 29, 2024

nothing about the security properties of the repository changes with it being a fork vs. not being a fork. Scopes remain the same, access control on the repo remains the same, etc.

Yes, this was my thinking too. Also we don't typically run tasks on user forks anyway so I don't think we would ever hit this condition outside of forked staging-* repos.

@JohanLorenzo
Copy link
Contributor

JohanLorenzo commented Apr 29, 2024

It looks like this was added by @JohanLorenzo in 93f061f. I don't see any discussion of this specific point there - but maybe @JohanLorenzo will have some memory of this.

For an unknown reason, the commit should be d08d306 instead, which in turns point to #307. Tom, Aki, and I had discussion there. Let me read them again to see if they refresh my memory

@JohanLorenzo
Copy link
Contributor

Ah interesting, I recall a conversation sclements drove where we agreed that staging repos should be forks.

Oh, I thought they should be forks too. E.g.: https://github.com/mozilla-releng/staging-firefox-android. I didn't know we had some that weren't.

I think this is correct, but I don't feel very confident.

Here are some cases I tested when I introduced the behavior back then: #307 (comment). My main worry here is that we break PRs made against the main repo. If that works, then we covered the main case, in my opinion.

I don't remember why I added this code in the first place. I'm fine removing it as long as we don't regress anything. Unlike then, testing CoT today against several known scenarios should be easier. We don't have to schedule graphs anymore, we can just CoT locally.

@ahal
Copy link
Contributor Author

ahal commented Apr 29, 2024

Good call @JohanLorenzo! This does break regular pull requests. Looking at the error it makes sense.. we do need to get pull request data from the parent repo (in my mind I was thinking that the tasks run on the base repo so it would never be in the context of the fork.. but looks like I was wrong).

I guess the issue is that whether the current repo is a fork or not, is not the right check to use here. I'll poke around a bit.. but this patch introduces a much worse regression than it solves, so if I don't find anything I'll WONTFIX.

@ahal ahal marked this pull request as draft April 29, 2024 18:48
@ahal ahal marked this pull request as ready for review April 29, 2024 21:30
@ahal
Copy link
Contributor Author

ahal commented Apr 29, 2024

This latest iteration works both with regular pull requests and Isabel's case. I'm feeling more confident about this one.

@ahal ahal marked this pull request as draft April 29, 2024 21:34
@ahal
Copy link
Contributor Author

ahal commented Apr 29, 2024

Oops, missed some tests, will fix tomorrow.

Copy link
Contributor

@bhearsum bhearsum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as request changes due to the test issues. I assume you'll add some variants for the various base/head fork/not-fork cases in addition to fixing the existing tests?

@@ -171,7 +171,7 @@
"glean": ("glean-1/decision", "glean-3/decision", "glean-1/decision-gcp", "glean-3/decision-gcp"),
"xpi": ("xpi-1/decision", "xpi-3/decision", "xpi-1/decision-gcp", "xpi-3/decision-gcp"),
"adhoc": ("adhoc-1/decision", "adhoc-3/decision", "adhoc-1/decision-gcp", "adhoc-3/decision-gcp"),
"scriptworker": ("scriptworker-1/decision", "scriptworker-3/decision" "scriptworker-1/decision-gcp", "scriptworker-3/decision-gcp"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously this is broken:

>>> ("scriptworker-1/decision", "scriptworker-3/decision" "scriptworker-1/decision-gcp", "scriptworker-3/decision-gcp")
('scriptworker-1/decision', 'scriptworker-3/decisionscriptworker-1/decision-gcp', 'scriptworker-3/decision-gcp')

Is it part of this fix too, or just a ridealong? (I don't care either way, just wondering, and also wondering if this is actively causing issues somewhere...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hit this bug while trying to run verify_cot manually in order to test this PR. So it is causing active issues, but I guess we don't run into it in day to day. It's not related to this in a technical capacity.

@@ -1176,7 +1179,7 @@ async def _get_additional_github_pull_request_jsone_context(decision_link):
github_repo = GitHubRepository(repo_owner, repo_name, token)
repo_definition = github_repo.definition

if repo_definition["fork"]:
if repo_definition["fork"] and base_repo_url != repo_url:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. It looks like the original github_repo comes from head_repository (via get_repo). From there we have a few cases:

  • Head github_repo is not a fork, and we keep it. Eg: opening a PR to firefox-ios from a branch in that repo. Obviously we need the original github_repo in this case (and redoing it would end up with the same data anyways...)
  • Head github_repo is a fork, and the base repo is different. Eg: opening a PR to firefox-ios from staging-firefox-ios, or even staging-firefox-ios from some user's repository. In this case we must redefine github_repo again, because we can't trust whatever data comes out of the head github_repo. I guess we can trust whatever repo_definition["parent"] provides us because there's no way it could be anything except the repo that the event is firing from?
  • Head github_repo is a fork and the base repo is the same. Eg: opening a PR to staging-firefox-ios from another branch in that repository. We can use the existing head github_repo in this case because the only people that could open this PR would be those that can modify the one on main anyways? (I guess that's mostly true...depending on branch protections we could theoretically have a case where someone could push to a branch, open a PR, but be unable to modify things on main directly. I'm not sure if that's worth dealing with, and it's certainly not something to deal with here if so.)

Two suggestions:

  • Add a comment that explains this stuff a bit - it's a little bit head spinning IMO (doesn't need to be my textwall 😆)
  • Consider defaulting to the base github_repo. Obviously this doesn't change anything now, but it might protect against accidents in a future refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added!

It occurred to me that we could have the case that github_repo is not a fork, but it still opens a pull request on an upstream repo. E.g, we open a pull request from staging-xpi-manifest to xpi-manifest. But in this case I don't think the parent repo information would be populated in the Github API return value.. so I'm not sure there's anything we can do about it unless we trust what's in the BASE_REPOSITORY env

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider defaulting to the base github_repo. Obviously this doesn't change anything now, but it might protect against accidents in a future refactor.

Sorry, I'm confused by what you mean here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider defaulting to the base github_repo. Obviously this doesn't change anything now, but it might protect against accidents in a future refactor.

Sorry, I'm confused by what you mean here

I guess what I'm trying to guard against is some future refactor that causes a bug where we don't set github_repo a second time, and we wrongly end up with the original github_repo definition. In my head when I wrote the earlier comment that might be a security issue...but looking at it again the worst that can happen is a failure to verify CoT....so I don't really care now!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Head github_repo is a fork, and the base repo is different. Eg: opening a PR to firefox-ios from staging-firefox-ios, or even staging-firefox-ios from some user's repository. In this case we must redefine github_repo again, because we can't trust whatever data comes out of the head github_repo.

I'm not sure if it's a matter of trust (maybe it's that too), but I think it's more just that the pull request doesn't exist on the head repo, so querying that endpoint on Github with the head repo would be a 404.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurred to me that we could have the case that github_repo is not a fork, but it still opens a pull request on an upstream repo.

I don't think github lets you do that.

@ahal ahal marked this pull request as ready for review April 30, 2024 19:28
@ahal ahal requested a review from bhearsum April 30, 2024 19:28
Copy link
Contributor

@jcristau jcristau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we just used the base repo for everything all the time.. This looks like a reasonable fix though.

@@ -1176,7 +1179,7 @@ async def _get_additional_github_pull_request_jsone_context(decision_link):
github_repo = GitHubRepository(repo_owner, repo_name, token)
repo_definition = github_repo.definition

if repo_definition["fork"]:
if repo_definition["fork"] and base_repo_url != repo_url:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurred to me that we could have the case that github_repo is not a fork, but it still opens a pull request on an upstream repo.

I don't think github lets you do that.

@jcristau
Copy link
Contributor

Side note: the test changes will conflict with #648.

@bhearsum
Copy link
Contributor

I wish we just used the base repo for everything all the time.. This looks like a reasonable fix though.

How would this work with PRs? (I totally agree with regard to any other event type...)

@jcristau
Copy link
Contributor

I wish we just used the base repo for everything all the time.. This looks like a reasonable fix though.

How would this work with PRs? (I totally agree with regard to any other event type...)

I don't think CoT really cares? and run-task could fetch refs/pull/ID/head from the base repo instead of pulling from the head repo.

@bhearsum
Copy link
Contributor

I wish we just used the base repo for everything all the time.. This looks like a reasonable fix though.

How would this work with PRs? (I totally agree with regard to any other event type...)

I don't think CoT really cares? and run-task could fetch refs/pull/ID/head from the base repo instead of pulling from the head repo.

Maybe I'm missing something, but given that decision tasks are built based on pull request versions of .taskcluster.yml (See this PR and decision task), CoT presumably needs to find that version of the .taskcluster.yml?

@jcristau
Copy link
Contributor

It should be enough to have the base repo and commit hash for that, e.g. https://github.com/mozilla-releng/balrog/raw/10dc4169f8828751e93b084c8e2bc550665a4031/.taskcluster.yml?

@bhearsum
Copy link
Contributor

It should be enough to have the base repo and commit hash for that, e.g. https://github.com/mozilla-releng/balrog/raw/10dc4169f8828751e93b084c8e2bc550665a4031/.taskcluster.yml?

Ah right, I forgot that you could access fork content like that.

Copy link
Contributor

@bhearsum bhearsum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want a follow-up issue or two given some of the discussion here, but I agree this is a fine fix for now, modulo the needed rebase.

@ahal ahal force-pushed the fix_verify_cot branch from 4bff7dd to 484a638 Compare May 21, 2024 20:10
@ahal ahal force-pushed the fix_verify_cot branch from 484a638 to adbb1b1 Compare May 21, 2024 20:19
@ahal ahal merged commit 9b01b3e into mozilla-releng:main May 22, 2024
10 checks passed
@ahal ahal deleted the fix_verify_cot branch May 22, 2024 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants