Page Finder

This module detects which links inside a page are pagination links. It works by manually marking inside a web page at least one link as a pagination link. The algorithm then uses label propagation and a gaussian kernel with Levenshtein edit distance as a measure of similarity to determine which other links are pagination links. There is a small demo included to show you how to use and test it.

Install

python setup.py develop

Dependencies: numpy and scrapely

pip install -r requirements.txt

Demo

python demo.py https://news.ycombinator.com

Enter link to follow (tab autocompletes): news?<TAB>
Enter link to follow (tab autocompletes): https://news.ycombinator.com/news?p=2 <RET>

0) Quit
1) Enter link directly
2) https://news.ycombinator.com/news?p=3
3) https://news.ycombinator.com/news
4) https://news.ycombinator.com/newest
5) https://news.ycombinator.com/jobs
6) https://news.ycombinator.com/ask
Select link to follow: 
2 <RET>

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
tests		tests
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
edit_distance.c		edit_distance.c
page_finder.py		page_finder.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Page Finder

Install

Demo

About

Releases

Packages

Languages

License

plafl/page_finder

Folders and files

Latest commit

History

Repository files navigation

Page Finder

Install

Demo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages