Releases: webrecorder/browsertrix-crawler
Releases · webrecorder/browsertrix-crawler
Browsertrix Crawler 0.5.0 Beta 8
This release includes fix for:
- Improved capture of non-HTML pages, fixes #129
- For
scopeType: domain
, if specified URL starts withwww.
, include the non-www version.
Browsertix Crawler 0.5.0 Beta 7
Browsertrix Crawler 0.5.0 Beta 6
Fixes Include:
- Fix to regression caused in previous release, where check for ERR:NET_ABORTED could cause a null exception.
- Support for downloading profiles via a URL, eg.
--profile https://example.com/path/to/profile.tar.gz
Browsertrix Crawler 0.5.0 Beta 5
- Support for saving state incrementally when
saveState: always
is set, saving everysaveStateInterval
seconds, keeping the lastsaveStateHistory
states. - Make direct capture only apply to 200 responses, load all others (eg. redirect via browser). Print just error message, not stack trace, also ignore ERR_ABORTED caused by trying to load a PDF (the file can not be loaded as a page but is still archived).
- When writing pages, ensure previous page write is awaited.
Browsertix Crawler 0.5.0 Beta 4
- Update to py-wacz 0.4.3, more tolerant of pages with invalid full text search data (skips pages instead of fails wacz creation)
- Support for
scopeType: domain
and include http/https pages in scope by default
Browsertix Crawler 0.5.0 Beta 3
Various fixes, including:
- Screencasting refactor, support screencast via redis, add new 'init' message
- Support for retrying pending URLs after a limited amount of time
- Redis: load queues gracefully to avoid large redis data load
Browsertix Crawler 0.5.0 Beta 2
Add support for WACZ signing (experimental), enabled via WACZ_SIGN_URL and WACZ_SIGN_TOKEN env vars.
Browsertix Crawler 0.5.0 Beta 1
Support for uploading WACZ to S3-compatible storage!
Browsertrix Crawler 0.5.0 Beta 0
Initial Build of 0.5.0 beta for testing!
Browsertrix Crawler 0.4.4
This release includes fixes block rules system and README improvements:
- Page Block Rules Fix: 'request already handled' errors by avoiding adding duplicate handlers to same page.
- Page Block Rules Fix: await all continue/abort() calls and catch errors.
- Page Block Rules: Don't apply to top-level page, print warning and recommend scope rules instead.
- Setup: Attempt to create the crawl working directory (cwd) specified via --cwd if it doesn't exist.
- Scope Types: Rename 'none' -> 'page' (single page only) and 'page' -> 'page-spa' (page with hashtags).
- README: Add more scope rule examples, clarify distinction between scope rules and block rules.
- README: Update old type -> scopeType, list new scope types.