Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services/horizon: Remove --parallel-job-size config parameter used for reingestion. #5484

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

urvisavla
Copy link
Contributor

@urvisavla urvisavla commented Oct 3, 2024

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've reviewed the changes in this PR and if I consider them worthwhile for being mentioned on release notes then I have updated the relevant CHANGELOG.md within the component folder structure. For example, if I changed horizon, then I updated (services/horizon/CHANGELOG.md. I add a new line item describing the change and reference to this PR. If I don't update a CHANGELOG, I acknowledge this PR's change may not be mentioned in future release notes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

  • Removed the --parallel-job-size config parameter.
  • buffer_size parameter is capped to the job/range size.

Why

Fixes #5468

Known limitations

This could potentially disrupt any automation scripts using --parallel-job-size parameter, although it's unlikely since reingestion is generally run as a batch job on as needed basis.

@urvisavla urvisavla marked this pull request as ready for review October 3, 2024 23:47
assert.Eventually(t, func() bool { return len(ledgerBuffer.ledgerQueue) == 15 }, time.Second*1, time.Millisecond*50)
assert.NoError(t, err)

for i := uint32(0); i < endLedger; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does i start at 0 instead of startLedger?

func calculateParallelLedgerBatchSize(rangeSize uint32, workerCount uint) uint32 {
// let's try to make use of all the workers
batchSize := rangeSize / uint32(workerCount)

// Use a minimum batch size to make it worth it in terms of overhead
Copy link
Contributor

@tamirms tamirms Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a maximum batch size as well? I remember you mentioned that it could be helpful to have a maximum batch size because the ledger ingestion time varies based on whether the ledger is from recent history vs very old history. So, in the scenario where you want to reingest full history with, for example, 4 workers, the workers which handle the first half of history will finish first and be idle for a long time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like adding max batch size here would be trying to address a larger concern of maximizing worker pool throughput(minimizing idle workers) which seems like more scope than this function is meant for, it's a good point on performance, wondering if it warrants a separate feature ticket to investigate how constant worker throughput could be accomplished?

one wild thought, workers could be interruptable, so that an idle worker could interrupt one that is running, and take some of it's upper range, the worker would stop it's captive core when it receives lcm for it's adjusted 'to' range in any case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

services/horizon: Remove --parallel-job-size config parameter
3 participants