Skip to content

Haikson/sitemap-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

a6413b9 · Jul 8, 2023

History

81 Commits
Jul 6, 2023
Jan 13, 2018
Mar 17, 2023
Jul 6, 2023
Jan 15, 2018
Mar 24, 2021
Jan 22, 2018
Mar 17, 2023
Jan 22, 2018
Jul 2, 2023
Mar 17, 2023
Jul 2, 2023
Jul 6, 2023

Repository files navigation

pysitemap

Sitemap generator

installing

pip install sitemap-generator

requirements

asyncio
aiofile
aiohttp

example

import sys
import logging
from pysitemap import crawler
from pysitemap.parsers.lxml_parser import Parser

if __name__ == '__main__':
    if '--iocp' in sys.argv:
        from asyncio import events, windows_events
        sys.argv.remove('--iocp')
        logging.info('using iocp')
        el = windows_events.ProactorEventLoop()
        events.set_event_loop(el)

    # root_url = sys.argv[1]
    root_url = 'https://www.haikson.com'
    crawler(
        root_url, out_file='debug/sitemap.xml', exclude_urls=[".pdf", ".jpg", ".zip"],
        http_request_options={"ssl": False}, parser=Parser
    )

TODO

  • big sites with count of pages more then 100K will use more then 100MB memory. Move queue and done lists into database. Write Queue and Done backend classes based on
  • Lists
  • SQLite database
  • Redis
  • Write api for extending by user backends

changelog

About

Sitemap generator

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages