Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Priority does not flow down to child crawls #108

Open
tomdickman opened this issue Jan 20, 2020 · 1 comment
Open

Priority does not flow down to child crawls #108

tomdickman opened this issue Jan 20, 2020 · 1 comment
Assignees
Labels

Comments

@tomdickman
Copy link
Contributor

Issue #90 introduced a crawl priority and issue #99 added a high crawl priority to modified courses and course modules, however the priority does not currently flow down to child nodes when scraping for redirect urls which need to be crawled and marking them for crawling.
It would make sense if the priority given to a crawl is passed down to any child crawls created as the result of a crawl.

@tomdickman tomdickman self-assigned this Jan 20, 2020
@tomdickman
Copy link
Contributor Author

In developing the solution for this it has been identified that flowing the priority down will mark all subsequent children (and their children, recursively) with the same priority, which may end up with a large amount of children more than one step removed from the original parent node being crawled at a high priority.
This is antithetical to the intent of the priority being introduced, therefore we need to add some method of determining which nodes are parent nodes and only apply it to them and direct children, ignoring the higher priority on subsequent childern.

tomdickman pushed a commit that referenced this issue Jan 23, 2020
Add parent priority to child nodes when marking for crawl
tomdickman pushed a commit that referenced this issue Jan 23, 2020
This enhancement aims to flow priority down to direct child nodes only.
Through the implementation of node levels and a level check when
marking a node to be crawled, we only assign a parent priority to a
child node if it is a direct ancestor of the original node.
This will prevent passing priority recursively and if, for example, a child
node is a top level node, filtering the priority to effectively all nodes,
which is undesirable behaviour.
tomdickman pushed a commit that referenced this issue Jan 23, 2020
brendanheywood pushed a commit that referenced this issue Jan 28, 2020
* issue108: Priority does not flow down to child nodes #108

Add parent priority to child nodes when marking for crawl

* enhancement: add priority to direct child nodes #108

This enhancement aims to flow priority down to direct child nodes only.
Through the implementation of node levels and a level check when
marking a node to be crawled, we only assign a parent priority to a
child node if it is a direct ancestor of the original node.
This will prevent passing priority recursively and if, for example, a child
node is a top level node, filtering the priority to effectively all nodes,
which is undesirable behaviour.

* fix: Add priority check to node

* fix: Remove extra table closing tag in install.xml

* style: remove addition line in upgrade script

* tests: Add unit tests for issue #108

* tests: Add priority provider to test all possible parent priorities
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants