Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize various bits around scan profiles #4050

Merged
merged 21 commits into from
Feb 12, 2025

Conversation

underdarknl
Copy link
Contributor

@underdarknl underdarknl commented Jan 28, 2025

Changes

I noticed we do a periodic recalculate_scan_profiles, which by itself loops over the list of profiles three times.

QA notes

Scan profiles should still be handled the exact same way. The code is functionally the same as before.

Code Checklist

  • All the commits in this PR are properly PGP-signed and verified.
  • This PR only contains functionality relevant to the issue.
  • I have written unit tests for the changes or fixes I made.
  • I have checked the documentation and made changes where necessary.
  • I have performed a self-review of my code and refactored it to the best of my abilities.
  • Tickets have been created for newly discovered issues.
  • For any non-trivial functionality, I have added integration and/or end-to-end tests.
  • I have informed others of any required .env changes files if required and changed the .env-dist accordingly.
  • I have included comments in the code to elaborate on what is not self-evident from the code itself, including references to issues and discussions online, or implicit behavior of an interface.

Checklist for code reviewers:

Copy-paste the checklist from the docs/source/templates folder into your comment.


Checklist for QA:

Copy-paste the checklist from the docs/source/templates folder into your comment.

@underdarknl underdarknl added octopoes Issues related to octopoes tech-debt labels Jan 28, 2025
@underdarknl underdarknl requested a review from a team as a code owner January 28, 2025 09:18
@underdarknl
Copy link
Contributor Author

Further improvements:
possibly only fetch the state of the scanlevels IF there have been any changes. Can we select the highest transactionID from xtdb somehow?

ammar92
ammar92 previously approved these changes Jan 29, 2025
Copy link
Contributor

@ammar92 ammar92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! It's clever to do all_declared_scan_profiles and assigned_scan_levels in one pass. There could be one step more for little optimization, and that is by creating source_scan_profile_references after the loop. You gain a little performance improvement because it's faster to setup a set on an existing list (or any sequence or iterable) than constructing it element by element via set.add (e.g. previous implementation source_scan_profile_references = {sp.reference for sp in all_declared_scan_profiles})

Example of benchmark code below.

import timeit

code1 = """
x = set()
l = []
for i in range(50000):
    l.append(i)
    x.add(i)
"""

print(timeit.timeit(code1, number=2000)) # 2.3204293749295175

code2 = """
l = [i for i in range(50000)]
x = set([i for i in l])
"""

print(timeit.timeit(code2, number=2000)) # 2.087212207959965

But the performance gain is negligible and the suggested implementation is already fast and understandable, so I'll leave it up to you.

@underdarknl
Copy link
Contributor Author

Nice work! It's clever to do all_declared_scan_profiles and assigned_scan_levels in one pass. There could be one step more for little optimization, and that is by creating source_scan_profile_references after the loop. You gain a little performance improvement because it's faster to setup a set on an existing list (or any sequence or iterable) than constructing it element by element via set.add (e.g. previous implementation source_scan_profile_references = {sp.reference for sp in all_declared_scan_profiles})

Yes, I was pondering this too, as set operations 'one by one' are slower than doing an update, or mass assignment. There's a bunch of other optimizations that can be done, but I'm focussing on not running the entire loop if its needed first, as thats the main win that would safe 90% or more for most installs.

Furthermore Im guessing all of this will be undone once the nibbles start taking care of propagation, as they do all this taint tracking in a much more efficient way.

@underdarknl underdarknl changed the title optimize loops in recalculate_scan_profiles optimize various bits around scan profiles Jan 30, 2025
@underdarknl
Copy link
Contributor Author

the tests are now failing on ill-formatted test-references.

@stephanie0x00
Copy link
Contributor

Checklist for QA:

  • I have checked out this branch, and successfully ran a fresh make reset.
  • I confirmed that there are no unintended functional regressions in this branch:
    • I have managed to pass the onboarding flow
    • Objects and Findings are created properly
    • Tasks are created and completed properly
  • I confirmed that the PR's advertised feature or hotfix works as intended.
  • I checked the logs for errors and/or warnings and made issues where necessary

What works:

I think it works. Did identify some 'weird' things in the propagation, but those seem to already be on main.

What doesn't work:

n/a

Bug or feature?:

n/a

@underdarknl underdarknl added this to the OpenKAT v1.19 milestone Feb 6, 2025
@Donnype Donnype merged commit 83e2f5e into main Feb 12, 2025
33 checks passed
@Donnype Donnype deleted the optimize/recalculate_scan_profiles_loops branch February 12, 2025 15:34
jpbruinsslot added a commit that referenced this pull request Feb 19, 2025
…uler

* main:
  add 1.18 release notes (#4083)
  Combined schedulers (#3839)
  remove inline styling / svg graph as not compatible with out CSP (#4075)
  Hotfix for empty report in history table (#4087)
  optimize various bits around scan profiles (#4050)
jpbruinsslot added a commit that referenced this pull request Feb 27, 2025
* main: (502 commits)
  Fix 'created by' in report and add 'created from recipe' (#4094)
  Remove deprecated queryparams (#4117)
  Pin Ubuntu runners to version `24.04` (#4120)
  Update client.py, reflect earlier changes in katalogus api (#4107)
  Updated `cryptography` (#4121)
  Updated packages (#4114)
  fix task list for boefjes, normalizer and ooi detail (#4115)
  Updated testcase for `Schedule` should result in `schedule_id` of `Task` to be set to `None` (#4104)
  Remove the empty keiko package and container (#4110)
  1.18 release notes improvements (#4109)
  Remove unused queue_uri from boefje settings (#4068)
  add 1.18 release notes (#4083)
  Combined schedulers (#3839)
  remove inline styling / svg graph as not compatible with out CSP (#4075)
  Hotfix for empty report in history table (#4087)
  optimize various bits around scan profiles (#4050)
  Update build-rdo-package.yml (#4081)
  Lock down codeowner edit rights to operations (#4086)
  Translations update from Hosted Weblate (#4085)
  fix logout and styling (#4080)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
octopoes Issues related to octopoes tech-debt
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants