You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Until the documentation is updated: in the case of PageRank the algorithm used is Personalized PageRank as originally described here. This means that the higher the spider defined score the higher the probability of a random jump to the page.
The job of the scheduler is to decide what to do with the final scores (spider score + PageRank/HITS). In case of the BFS it simply picks the highest scored web page that is still uncrawled. In case of the FreqScheduler this information is ignored and it simply tries to (re)crawl the pages with the desired frequency: if a page has frequency 8 and the other 2 then the first one is crawled 4 times more often.
Hi, I didn't find any documentation on how the link scores affect/influence their scheduling. It would be nice to understand the relation between:
Thanks
The text was updated successfully, but these errors were encountered: