Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update existing lists #144

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Update existing lists #144

wants to merge 4 commits into from

Conversation

bsyk
Copy link
Contributor

@bsyk bsyk commented Dec 1, 2024

To avoid the need to delete all lists and recreate them, we can update existing lists only when their contents had changed.

This processes the diffs between the desired list of domains and the existing lists. Removing entires that are no longer in the desired list and appending any new entries. This prefers to minimize the number of PATCH calls by appending entries to the lists we're already patching for the removals.

The priority for additions is:

  1. Add to lists we're already patching for removals, filling up to LIST_ITEM_SIZE entries.
  2. Add to existing lists with fewer than LIST_ITEM_SIZE entries.
  3. Create a new list.

@bsyk
Copy link
Contributor Author

bsyk commented Dec 1, 2024

This is quite a large diff. Happy to iterate if there are things you want to change.

@vietthedev
Copy link
Contributor

I'm not sure list patching can work reliably. A domain can be in another list instead of the previous list after blocklist update and would mess up the patching process.

For example: We have two lists and the first one is full. If a new domain is added to the first list after the blocklist update, the last domain would end up being in the second list. We wouldn't be able to check its existence without going through all the current lists.

Before blocklist update

1st list
domainA
domainAA
...
domainB

2nd list
domainC
domainD

After blocklist update

1st list
domainA
domainAA
domainAAA
...

2nd list
domainB
domainC
domainD

Things become even more complicated if the next update is from a different set of blockklists.

Therefore, to make it work reliably, we can only go through all the current lists but that would defeat the purpose to save time and requests because there isn't an API to get all items of all the lists.

@bsyk
Copy link
Contributor Author

bsyk commented Dec 3, 2024

This change does check all existing lists. It is a reliable operation and idempotent such that should an error occur, the same command can be used again to complete any missing changes.

The benefits of this change are not only (slightly) faster operations, but avoiding any period where the lists or rules are unapplied which would leave the network and users vulnerable to accessing otherwise blocked hosts.

The maximum requests in this flow is 2x number_of_lists (GET + PATCH to every list) in the worst case.
The normal case is > 1x < 2x number_of_lists where it would be unusual to have to patch every list.

The requests in the current flow is 2x number_of_lists (DELETE + POST to recreate every list) in the normal case.

To avoid the need to delete all lists and recreate them, we can update existing lists only when their contents had changed.
This processes the diffs between the desired list of domains and the existing lists. Removing entires that are no longer in the desired lists and appending any new entries. This prefers to minimize the number of PATCH calls by appending entries to the lists we're already patching for the removals.
The priority for additions is:
1. Add to lists we're already patching for removals.
2. Add to existing lists with fewer than LIST_ITEM_SIZE entries.
3. Create a new list.
@kieranbrown
Copy link

I've been using this PR and the performance difference is pretty huge, using only OISD small it takes the average runtime from 2m45s to about 30s. (Comparing against a full delete+create & this PR to compare+patch).

I have came across a bug though, the script errors when no lists currently exist, it only works where at least 1 list already exists.

Example workflow:
https://github.com/kieranbrown/dns/actions/runs/12520357125/job/34925911614

file:///home/runner/work/dns/dns/lib/api.js:78
  const cgpsLists = lists.filter(({ name }) => name.startsWith("CGPS List"));
                          ^

TypeError: Cannot read properties of null (reading 'filter')
    at synchronizeZeroTrustLists (file:///home/runner/work/dns/dns/lib/api.js:78:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async file:///home/runner/work/dns/dns/cf_list_create.js:143:3

@bsyk
Copy link
Contributor Author

bsyk commented Dec 30, 2024 via email

@kieranbrown
Copy link

Thanks for the feedback and bug report. Will get that bug patched today.

Can confirm that issue is resolved 🙏

I've just encountered another issue when adding a new blocklist. Running a full delete+create fixed the issue so it seems to be related to adding a new Cloudflare list during synchronisation.

Should be able to reproduce by starting off with a small blocklist, let it create at least 1 Cloudflare list, then add another blocklist causing it to create more Cloudflare lists.

Example workflow:
https://github.com/kieranbrown/dns/actions/runs/12583072895/job/35069887006

Could not create "CGPS List - Chunk NaN" - Error: undefined - Error: HTTP error! Status: 409
file:///home/runner/work/dns/dns/lib/helpers.js:59
    throw new Error(`${(data && 'errors' in data) ? data.errors[0].message : data} - ${error}`);
          ^

Error: undefined - Error: HTTP error! Status: 409
    at request (file:///home/runner/work/dns/dns/lib/helpers.js:59:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async createZeroTrustListsOneByOne (file:///home/runner/work/dns/dns/lib/api.js:188:7)
    at async synchronizeZeroTrustLists (file:///home/runner/work/dns/dns/lib/api.js:169:5)
    at async file:///home/runner/work/dns/dns/cf_list_create.js:[143](https://github.com/kieranbrown/dns/actions/runs/12583072895/job/35069887006#step:8:144):3
    ```

Where there are 0 existing CGPS List Chunks, but other existing list(s), the calculation for the next list number would result in NaN
@bsyk
Copy link
Contributor Author

bsyk commented Jan 2, 2025

Apologies for this oversight and thanks for the continued testing and feedback.
Patched the issue in the above commit.

@mrrfv
Copy link
Owner

mrrfv commented Jan 2, 2025

Magnificent work; thank you for being patient and quickly fixing new problems as they come up. Because there are so many moving parts involved in this process, I'd continue testing the new code further before merging.
I've noticed 3 minor issues so far:

  1. Switching from a large blocklist (e.g., the current default) to a smaller one (this to be exact) causes the script to remove all items from some lists; this means a lot of lists end up completely empty. Such lists should probably be removed, as long as that wouldn't cause additional issues.
  2. After the same operation, some lists only contain a handful of items and could be consolidated into one for simplicity.
  3. If there are only additions to a list, the log entry will end with a comma, which ends up looking awkward:
    console.log(`Updating list "${cgpsLists.find(list => list.id === listId).name}", ${appends ? `${appends} additions, ` : ''}${removals ? `${removals} removals` : ''}`);

Aside from what I've mentioned, it works beautifully. Thanks again.

@bsyk
Copy link
Contributor Author

bsyk commented Jan 3, 2025

Fixed your 3. comment since it's an easy change.

For 1. and 2. the change would be more intrusive and other than organization, doesn't change the outcome much.
I think we can address that in a separate PR and possibly as a separate functionality. Some kind of "defrag".
There would be benefits to keeping lists more stable, i.e. hosts that have been consistently in the block list should be grouped into lists together so that any adds/removes affect as few lists as possible.

@kieranbrown
Copy link

I’m still coming across the Chuck NaN problem when the script wants to create a new list

https://github.com/kieranbrown/dns/actions/runs/12702722844/job/35409402648

@bsyk
Copy link
Contributor Author

bsyk commented Jan 10, 2025 via email

@umpire7777777
Copy link

umpire7777777 commented Jan 14, 2025

I would like to test this out myself.
After merging this branch with my own repo, what do I need to change in order to try this? Do I need to update the action in workflow/main.yml?

ps: I'm very sorry to hear you are impacted by the CA fires. That is terrible and I wish you all the best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants