Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Courlan does not load /page/ links #16

Open
sbusso opened this issue Mar 6, 2023 · 3 comments
Open

Courlan does not load /page/ links #16

sbusso opened this issue Mar 6, 2023 · 3 comments

Comments

@sbusso
Copy link

sbusso commented Mar 6, 2023

In reference to the nav filter, courlan will not extract links containing /page/ path. Also, I think page and tag|category should be handled separately. I do need to get all blog posts on my website, which are paginated but I don't want to get tags and categories.

@adbar
Copy link
Owner

adbar commented Mar 6, 2023

Hi @sbusso, I'm not sure what you mean regarding the /page/ pattern, maybe it's a documentation issue. I added tests, could you please look at the commit above and see if you can make the code work in your case or provide more details?

The separation of pagination from the rest makes sense, I'll think about how if could be implemented.

@sbusso
Copy link
Author

sbusso commented Mar 6, 2023

URLs containing /page/1, /page/2 won't be extracted with extract_links without making with_nav=True, also this option will also include other index pages like tags and categories. I'd think page and maybe archives could be separated or extended options.

NAVIGATION_FILTER = re.compile(
r"/(archives|auth?or|cat|category|kat|kategorie|page|schlagwort|seite|tags?|topics?|user)/|\?p=[0-9]+",
re.IGNORECASE,
)

@adbar
Copy link
Owner

adbar commented Mar 7, 2023

I see, let's keep an eye on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants