-
Notifications
You must be signed in to change notification settings - Fork 7
Conversation
singleCb: (entityId: string) => Promise<void>; | ||
}; | ||
|
||
export const withBatching = async ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add jsDocs. It'd be great to understand the batching approach directly in the code.
await batchCb(entityIds); | ||
} catch (err) { | ||
if (err.message?.includes('This may be the result of a timeout')) { | ||
const newTotalConnectionsById = batchedEntityKeys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this much logic inside of a catch block feels off.
Consider moving it into a separate function or consider a different way to handle the control flow.
}); | ||
|
||
for (const entityId of [...retrySingleEntityKeys, ...singleEntityKeys]) { | ||
await singleCb(entityId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these callbacks need to be sequential? Or could a promise.all be used.
} | ||
}; | ||
|
||
const batchSeparateKeys = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jsdoc please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: batchSeparateKeys -> separateKeysByThresholdIntoSingleAndBatches
} | ||
return acc; | ||
}, | ||
{ lessThanThreshold: new Map(), moreThanThreshold: new Map() } as { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider a lodash function for this? Not positive but this may already be implemented.
}; | ||
}; | ||
|
||
const groupEntitiesByTotal = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add jsdoc. Please include the "why" of this approach
threshold, | ||
); | ||
|
||
const batchLoop = async ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add jsdoc. Does batchLoop need to be defined within withBatching
?
Complexity of this logic is rather high imo.
|
||
const query = ` | ||
query ( | ||
$repoIds: [ID!]! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
}; | ||
}; | ||
|
||
const processResponseData: ProcessResponse< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What testing/code practices have been used to ensure that the data from single entity requests are the same for batch requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a few comments in some of the new code. I'm excited to see the improvements with this new approach!
Some additional thoughts and why I did not approve:
There is a larger than average risk to releasing this PR (size and complexity).
What is your deploy/release approach?
Please also consider breaking this PR down into smaller chunks.
A single PR with 10 files is something that can be reviewed and logically understood very easily.
It seems like we could introduce your awesome new pattern in one location and release that first and next, spread the pattern to the other locations.
As we work towards a stable platform, please consider these thoughts.
Let me know if you have additional questions/thoughts about these items I mentioned.
Updated and removed dependencies detected. Learn more about Socket for GitHub ↗︎
🚮 Removed packages: @types/[email protected] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
🚀 PR was released in |
Main changes:
src/client/GraphQLClient/batchUtils.ts
: withBatching() functionImplemented an algorithm to improve the number of requests made to the Github API (better explained here https://youtu.be/i5pIszu9MeM?feature=shared&t=1292)
Basically we fetch the totalCount of the entity connections (e.g. repository -> issues -> totalCount)
Then, we group the repos that have less than 100 issues and make a single call using the
nodes(ids: [ID!]!) { ...on Repository { issues() {
query.For repos that have more than 100 issues we make a call for each repo using
repository(id) { issues(first: ..., after: ...)
querysrc/client/GraphQLClient/timeoutHandler.ts
: withTimeoutHandler() functionGithub returns a "timeout" error when it sees that the query is going to take more than 10 seconds to process, their recommendation is to lower the
first
argument (limit).This function handle those errors and halves the max limit, if the error keeps happening it keeps dividing the limit until it reaches 1, if that fails as well it throws and stops retrying.
Batched Queries: Created new query files for the batched version of the normal query
Steps: applied the
withBatching
function to the steps that could use it (steps that depend on a parent entity)Execution time difference (with JupiterOne org instance):
Version
Published prerelease version:
v4.0.0-beta.0
Changelog
💥 Breaking Change
🐛 Bug Fix
Authors: 2