Skip to content
This repository has been archived by the owner on Jul 18, 2024. It is now read-only.

Github Graphql improvements #281

Merged
merged 25 commits into from
Nov 20, 2023
Merged

Github Graphql improvements #281

merged 25 commits into from
Nov 20, 2023

Conversation

RonaldEAM
Copy link
Contributor

@RonaldEAM RonaldEAM commented Nov 8, 2023

Main changes:

  • src/client/GraphQLClient/batchUtils.ts: withBatching() function
    Implemented an algorithm to improve the number of requests made to the Github API (better explained here https://youtu.be/i5pIszu9MeM?feature=shared&t=1292)
    Basically we fetch the totalCount of the entity connections (e.g. repository -> issues -> totalCount)
    Then, we group the repos that have less than 100 issues and make a single call using the nodes(ids: [ID!]!) { ...on Repository { issues() { query.
    For repos that have more than 100 issues we make a call for each repo using repository(id) { issues(first: ..., after: ...) query

  • src/client/GraphQLClient/timeoutHandler.ts: withTimeoutHandler() function
    Github returns a "timeout" error when it sees that the query is going to take more than 10 seconds to process, their recommendation is to lower the first argument (limit).
    This function handle those errors and halves the max limit, if the error keeps happening it keeps dividing the limit until it reaches 1, if that fails as well it throws and stops retrying.

  • Batched Queries: Created new query files for the batched version of the normal query

  • Steps: applied the withBatching function to the steps that could use it (steps that depend on a parent entity)

Execution time difference (with JupiterOne org instance):

image

Version

Published prerelease version: v4.0.0-beta.0

Changelog

💥 Breaking Change

🐛 Bug Fix

Authors: 2

@RonaldEAM RonaldEAM requested a review from a team as a code owner November 8, 2023 19:56
gastonyelmini
gastonyelmini previously approved these changes Nov 14, 2023
singleCb: (entityId: string) => Promise<void>;
};

export const withBatching = async ({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add jsDocs. It'd be great to understand the batching approach directly in the code.

await batchCb(entityIds);
} catch (err) {
if (err.message?.includes('This may be the result of a timeout')) {
const newTotalConnectionsById = batchedEntityKeys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this much logic inside of a catch block feels off.
Consider moving it into a separate function or consider a different way to handle the control flow.

});

for (const entityId of [...retrySingleEntityKeys, ...singleEntityKeys]) {
await singleCb(entityId);
Copy link
Contributor

@VDubber VDubber Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these callbacks need to be sequential? Or could a promise.all be used.

}
};

const batchSeparateKeys = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsdoc please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: batchSeparateKeys -> separateKeysByThresholdIntoSingleAndBatches

}
return acc;
},
{ lessThanThreshold: new Map(), moreThanThreshold: new Map() } as {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider a lodash function for this? Not positive but this may already be implemented.

};
};

const groupEntitiesByTotal = (
Copy link
Contributor

@VDubber VDubber Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add jsdoc. Please include the "why" of this approach

threshold,
);

const batchLoop = async ({
Copy link
Contributor

@VDubber VDubber Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add jsdoc. Does batchLoop need to be defined within withBatching?
Complexity of this logic is rather high imo.


const query = `
query (
$repoIds: [ID!]!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

};
};

const processResponseData: ProcessResponse<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What testing/code practices have been used to ensure that the data from single entity requests are the same for batch requests?

Copy link
Contributor

@VDubber VDubber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few comments in some of the new code. I'm excited to see the improvements with this new approach!

Some additional thoughts and why I did not approve:
There is a larger than average risk to releasing this PR (size and complexity).
What is your deploy/release approach?
Please also consider breaking this PR down into smaller chunks.
A single PR with 10 files is something that can be reviewed and logically understood very easily.
It seems like we could introduce your awesome new pattern in one location and release that first and next, spread the pattern to the other locations.
As we work towards a stable platform, please consider these thoughts.
Let me know if you have additional questions/thoughts about these items I mentioned.

Copy link

Updated and removed dependencies detected. Learn more about Socket for GitHub ↗︎

Packages Version New capabilities Transitives Size Publisher
@jupiterone/integration-sdk-testing 11.0.3...11.2.0 None +2/-84 986 kB jupiterone-dev
@jupiterone/integration-sdk-core 11.0.3...11.2.0 None +0/-0 261 kB jupiterone-dev
@jupiterone/integration-sdk-dev-tools 11.0.3...11.2.0 None +4/-86 1.85 MB jupiterone-dev

🚮 Removed packages: @types/[email protected]

@RonaldEAM RonaldEAM changed the base branch from main to beta November 17, 2023 16:57
Copy link

@Gonzalo-Avalos-Ribas Gonzalo-Avalos-Ribas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@RonaldEAM RonaldEAM merged commit ad7c0bf into beta Nov 20, 2023
3 checks passed
@RonaldEAM RonaldEAM deleted the optimization-dh branch November 20, 2023 18:12
@j1-internal-automation
Copy link
Collaborator

🚀 PR was released in v4.0.0-beta.0 🚀

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants