Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup benchmarks at CI/CD #560

Closed
norberttech opened this issue Oct 10, 2023 · 0 comments
Closed

Setup benchmarks at CI/CD #560

norberttech opened this issue Oct 10, 2023 · 0 comments
Assignees
Labels
ci/cd Developer Experience Resolving this issue should improve development experience for the library users.
Milestone

Comments

@norberttech
Copy link
Member

norberttech commented Oct 10, 2023

While project is growing, it will become easier to miss performance degradation, like it happened here: #558

We should think about creating a set of benchmarks for each adapter/core/lib and execute those benchmarks at CI/CD so we can at least manually check which PR's introduced bottlenecks.
The ideal solution would be to store those performance benchmarks as workflows artifacts after merge to 1.x and compare them with benchmarks from newly open PR's.


I was thinking about creating benchmarks for specific building blocks separately, for example:

  1. Extractors - we could come up with some dataset schema, save it as all supported file types, and just benchmark extraction without doing any operations on the dataset.
  2. Transformers - since we reduced the number of transformers, keeping only critical ones, we might want to start at least from those most frequently used, like the one that evaluates expressions. Here, I think we can take a similar approach, but instead of using extractors, we can directly pass prepared Rows to it and measure the performance of transformations themselves.
  3. Expressions - just like with Transformers, but here we don't even need Rows. Single Row should be enough
  4. Loaders - similarly to Transformers, prepare Rows and execute Loading them into the destination directly

Those are very granular benchmarks, which can test all building blocks separately, providing clear insights about each element separately. However, on top of that, I would probably still try to benchmark entire Pipelines on a selected subset of the most frequently used extractors/loaders/transformers (we would need to develop a few scenarios here).

@norberttech norberttech converted this from a draft issue Oct 10, 2023
@stloyd stloyd moved this from Todo to In Progress in Roadmap Oct 14, 2023
@stloyd stloyd self-assigned this Oct 16, 2023
@stloyd stloyd added ci/cd Developer Experience Resolving this issue should improve development experience for the library users. labels Oct 16, 2023
@stloyd stloyd changed the title Setup benchmakrs at CI/CD Setup benchmarks at CI/CD Oct 16, 2023
@norberttech norberttech moved this from In Progress to Done in Roadmap Oct 25, 2023
@norberttech norberttech added this to the 0.5.0 milestone Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/cd Developer Experience Resolving this issue should improve development experience for the library users.
Projects
Archived in project
Development

No branches or pull requests

2 participants