Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first pass at identifying enterprise owner users #1378

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

danbrauer
Copy link

Preamble

This PR is a copy of #1373, identical in content to that PR as of the time of this new PR's creation. (That PR will now appear empty because of some cleanup I did on my fork... and some learning I got on how forks work. 😄). That PR does have some relevant discussion, which I suppose we will continue here. Apologies for the noise, it'll hopefully be avoided in future PRs.

Summary

This PR adds to the Github graph, marking users as enterprise owners.

We think this is a valuable addition to the graph in general, because these users are not all necessarily visible in the graph at the moment but have broad access. Less generally (but still maybe relevant to others) our analysts at Etsy need to review these users as part of our UAR (User Access Review) process, which we hope Cartography will eventually help to power.

We wanted to do this in a light-touch way, without breaking existing relationships or removing properties. We also wanted to follow how similar properties are graphed on the user node: org ownership, for example, is noted by the 'user.role' property; similarly, the 'user.is_site_admin' property notes whether a user is a site admin). To that end, we did the following:

  1. add an 'is_enterprise_owner' property to all user nodes
  2. add a new type of user-org relationship: 'UNAFFILIATED'. The terminology is Github's, and it is used for enterprise owners who are not also members of the graphed organization.

Here is an illustration of before/after (I will also add some screencap below but thought the high-level illustration might help):
Cartography AMPS User Owns Enterprise (1)

Other notes on the PR

  1. I refactored the integration tests, taking cues from how the testing for Github teams was done by testing the 'sync' function as a whole instead of just the 'load' function.
  2. In general I tried to do things in keeping with the style I saw around me. I am happy to change anything.
  3. In our slack conversation, it was mentioned PRs should use the new models. I’d already written this when I read that, but, when I looked I saw there are no models for this. Is that okay? Should they be added and, if so, could it be in a separate PR or must it be here?

Related issues or links

None.

Screencaps

(I could get other screencaps... if anything would be helpful, please let me know.)

In this case was helpful that we had an enterprise owner who was also a user in one of our orgs, but not another. I highlighted them specifically in a query here, showing both the new property and relationship type.

Before
User Org Before

After
User Org After

Checklist

Provide proof that this works (this makes reviews move faster). Please perform one or more of the following:

  • Update/add unit or integration tests.
  • Include a screenshot showing what the graph looked like before and after your changes.
  • Include console log trace showing what happened before and after your changes.

If you are changing a node or relationship:

  • Update the schema and readme. NOTE: I updated the schema but not the README, which seemed like it was out of date, did not already include github, and suggested using a javascript dependency to update it... please advise, if this needs update. 😄

If you are implementing a new intel module:

@danbrauer
Copy link
Author

For folks interested, as noted in the description above, this is a copy of PR #1373 and there are some prior comments there that talk about some of the work I will be doing to this PR.

But in summary: in discussion with @achantavy we decided this PR should create a schema for the github Users node. And then in discussing that, we realized there were some quirks to how Users relate to github Organizations. It was decided that the User - Org relationship should be schema'ed via the other_relationships field instead of as a sub_resource_relationship. (And for more detail, again, see the comments over in #1737 with apologies for the jump.)

@danbrauer
Copy link
Author

👋 This PR is in the middle of being refactored. In the first commit, it was complete, but done without schemas because it was focused on the User node, and no schema existed for the User node. @achantavy asked that a User schema be created and used. This PR begins that work.

In the process of doing this, I hit a blocker that, if left as is, will I think wipe out GitHubOrganization nodes from the graph. Which would be bad. Because of this, I also couldn't fully test all the other cases. So, this is very much WIP, and THIS SHOULD NOT BE MERGED. I am sharing to get feedback on the blocker.

Details on the main blocking issue:

If you looked at the current loading Cypher, either in master or in this PR's first commit, you can see the first thing done is the creation of a GitHubOrganization node, if one does not already exist:

MERGE (org:GitHubOrganization{id: $OrgUrl})
ON CREATE SET org.firstseen = timestamp()
SET org.username = $OrgLogin,
org.lastupdated = $UpdateTag
WITH org

In contrast, if you look at the Cypher auto-generated from the new schema, there is no similar creation of GitHubOrganization. In case it helps, here is that Cypher code for one of the two loads in this PR:

        UNWIND $DictList AS item
            MERGE (i:GitHubUser{id: item.url})
            ON CREATE SET i.firstseen = timestamp()
            SET
                i.lastupdated = $lastupdated,
                i.fullname = item.name,
                i.username = item.login,
                i.is_site_admin = item.isSiteAdmin,
                i.is_enterprise_owner = item.isEnterpriseOwner,
                i.email = item.email,
                i.company = item.company,
                i.has_2fa_enabled = item.hasTwoFactorEnabled,
                i.role = item.role
            
        WITH i, item
        CALL {
            
        WITH i, item
        OPTIONAL MATCH (n0:GitHubOrganization)
        WHERE
            n0.id = item.MEMBER_OF
        WITH i, item, n0 WHERE n0 IS NOT NULL
        MERGE (i)-[r0:MEMBER_OF]->(n0)
        ON CREATE SET r0.firstseen = timestamp()
        SET
            r0.lastupdated = $lastupdated
        
        }

So, as far as I can see, the org will not be created. I have thoughts, but, before continuing, I wanted to pause and ask if this makes sense and if I am just misunderstanding something or doing something incorrectly. Thank you!

@danbrauer
Copy link
Author

@achantavy and I discussed! High level summary: I was accurate in describing the issue above, and for a solution, we will be adding a Schema obj for GitHubOrganization and creating a separate load for it. I'll aim to do that on Tue (Nov 12), the next day I'm in office, and we'll take it from there.

{
"query": "MATCH (:GitHubUser)-[r:UNAFFILIATED]->(:GitHubOrganization) WHERE r.lastupdated <> $UPDATE_TAG WITH r LIMIT $LIMIT_SIZE DELETE (r)",
"iterative": true,
"iterationsize": 100
}],
"name": "cleanup GitHub users data"
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note I renamed this file to make it clear that it cleans up organization nodes along with user nodes. (This was always the case, and I think the name tweak helps the code self-document a bit.)

@danbrauer danbrauer marked this pull request as ready for review November 12, 2024 14:24
@danbrauer
Copy link
Author

@achantavy following our discussion last week, this is now updated to account for GitHubOrganizations. Orgs have their own node schema now. As was the case before, the Orgs are created in the course of the creation of Users. So in summary, this PR should now match the behavior from before, but updated to define Users and Orgs in schemas, and do the thing the PR initially intended (adding identifying enterprise owners).

(I may be be offline this afternoon (US Eastern) but will be here tomorrow to field any comments and make changes if needed.)

@danbrauer
Copy link
Author

danbrauer commented Nov 13, 2024

I am noticing failed tests here. I thought I ran them locally and they passed... but I realize I didn't have my venv environment activated (I'm used to using poetry in my other py projects, apologies.). I will poke around at this later today and try to get it fixed.

UPDATE: should be fixed now

Signed-off-by: Daniel Brauer <[email protected]>
Signed-off-by: Daniel Brauer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant