Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve fragment handling #35

Open
AMDmi3 opened this issue Sep 26, 2022 · 0 comments
Open

Improve fragment handling #35

AMDmi3 opened this issue Sep 26, 2022 · 0 comments

Comments

@AMDmi3
Copy link
Member

AMDmi3 commented Sep 26, 2022

Handle a case where's a lot of links which only differ with fragment part, such as nix package recipe urls for haskell modules which all come from a single file (a lot of urls differing with a line number only):

https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/haskell-modules/hackage-packages.nix#L23075

It doesn't make sense to check them all as they are a single url in fact.

Possible solutions:

  • Simple: when checking, group links by fragmentless url. This allows to process a huge batch of new urls quickly, however is not effective solution long-term, as link recheck times are random and not many urls will get into a single batch for recheck.
  • Instead of checking, take status from a fragmentless url. However, we may not have one in the database (to have one, it should be mentioned in some package explicitly; it's no the case for example above, for instance). We can add such an url, but to work cleanly it would require more complicated refcounting mechanism, as links would reference other links (however that's useful for redirects too).
  • Strip fragments on the package level, e.g. keep fragmentless urls in the links database, and keep fragment part in the package.
AMDmi3 added a commit that referenced this issue Sep 26, 2022
Handle a case where's a lot of links which only differ with fragment
part, such as nix package recipe urls for haskell modules which all
come from a single file:

https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/haskell-modules/hackage-packages.nix#L23075

This is not really a complete solution as its efficiency drops over
time when link rechecks are distributed evenly over time, but it allows
to process new batches of urls much quicker.
AMDmi3 added a commit that referenced this issue Nov 25, 2022
…)"

This reverts commit 08bbb27.

After commit 83b6a17 in repology-updater, we no longer have fragments in
links table, so we no longer need this optimization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant