Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

webrecorder / browsertrix-crawler Public

Notifications You must be signed in to change notification settings
Fork 93
Star 714

Code
Issues 101
Pull requests 7
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: webrecorder/browsertrix-crawler

Releases Tags

Releases · webrecorder/browsertrix-crawler

Browsertrix Crawler 0.5.0 Beta 8

23 Mar 01:08

ikreymer

0.5.0-beta.8

7ed5586

Compare

Choose a tag to compare

View all tags

Browsertrix Crawler 0.5.0 Beta 8 Pre-release

Pre-release

This release includes fix for:

Improved capture of non-HTML pages, fixes #129
For scopeType: domain, if specified URL starts with www., include the non-www version.

Assets 2

All reactions

Browsertix Crawler 0.5.0 Beta 7

18 Mar 18:50

ikreymer

0.5.0-beta.7

09082e8

Compare

Choose a tag to compare

View all tags

Browsertix Crawler 0.5.0 Beta 7 Pre-release

Pre-release

This beta includes the following fixes:

Refactor chrome args, add disable LazyFrameLoading to avoid page.goto() never finishing.
Fix userAgent customization not working, #90
Fix possible cloudflare wait #110
Tweak profile creation, support running with pywb proxy
Update wacz dependency to 0.4.4

Assets 2

All reactions

Browsertrix Crawler 0.5.0 Beta 6

14 Mar 21:45

ikreymer

0.5.0-beta.6

12d96f2

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

Browsertrix Crawler 0.5.0 Beta 6 Pre-release

Pre-release

Fixes Include:

Fix to regression caused in previous release, where check for ERR:NET_ABORTED could cause a null exception.
Support for downloading profiles via a URL, eg. --profile https://example.com/path/to/profile.tar.gz

Assets 2

All reactions

Browsertrix Crawler 0.5.0 Beta 5

14 Mar 18:15

ikreymer

0.5.0-beta.5

ab096cd

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

Browsertrix Crawler 0.5.0 Beta 5 Pre-release

Pre-release

Support for saving state incrementally when saveState: always is set, saving every saveStateInterval seconds, keeping the last saveStateHistory states.
Make direct capture only apply to 200 responses, load all others (eg. redirect via browser). Print just error message, not stack trace, also ignore ERR_ABORTED caused by trying to load a PDF (the file can not be loaded as a page but is still archived).
When writing pages, ensure previous page write is awaited.

Assets 2

All reactions

Browsertix Crawler 0.5.0 Beta 4

07 Mar 17:30

ikreymer

0.5.0-beta.4

affa45a

Compare

Choose a tag to compare

View all tags

Browsertix Crawler 0.5.0 Beta 4 Pre-release

Pre-release

Update to py-wacz 0.4.3, more tolerant of pages with invalid full text search data (skips pages instead of fails wacz creation)
Support for scopeType: domain and include http/https pages in scope by default

Assets 2

All reactions

Browsertix Crawler 0.5.0 Beta 3

02 Mar 21:30

ikreymer

0.5.0-beta.3

e160382

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

Browsertix Crawler 0.5.0 Beta 3 Pre-release

Pre-release

Various fixes, including:

Screencasting refactor, support screencast via redis, add new 'init' message
Support for retrying pending URLs after a limited amount of time
Redis: load queues gracefully to avoid large redis data load

Assets 2

All reactions

Browsertix Crawler 0.5.0 Beta 2

27 Jan 01:32

ikreymer

0.5.0-beta.2

66ce668

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

Browsertix Crawler 0.5.0 Beta 2 Pre-release

Pre-release

Add support for WACZ signing (experimental), enabled via WACZ_SIGN_URL and WACZ_SIGN_TOKEN env vars.

Assets 2

All reactions

Browsertix Crawler 0.5.0 Beta 1

23 Nov 21:01

ikreymer

0.5.0-beta.1

9f541ab

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

Browsertix Crawler 0.5.0 Beta 1 Pre-release

Pre-release

Support for uploading WACZ to S3-compatible storage!

Assets 2

All reactions

Browsertrix Crawler 0.5.0 Beta 0

25 Sep 17:10

ikreymer

0.5.0-beta.0

be4e061

Compare

Choose a tag to compare

View all tags

Browsertrix Crawler 0.5.0 Beta 0 Pre-release

Pre-release

Initial Build of 0.5.0 beta for testing!

Assets 2

All reactions

Browsertrix Crawler 0.4.4

18 Aug 04:28

ikreymer

0.4.4

8c8cf23

Compare

Choose a tag to compare

View all tags

Browsertrix Crawler 0.4.4

This release includes fixes block rules system and README improvements:

Page Block Rules Fix: 'request already handled' errors by avoiding adding duplicate handlers to same page.
Page Block Rules Fix: await all continue/abort() calls and catch errors.
Page Block Rules: Don't apply to top-level page, print warning and recommend scope rules instead.
Setup: Attempt to create the crawl working directory (cwd) specified via --cwd if it doesn't exist.
Scope Types: Rename 'none' -> 'page' (single page only) and 'page' -> 'page-spa' (page with hashtags).
README: Add more scope rule examples, clarify distinction between scope rules and block rules.
README: Update old type -> scopeType, list new scope types.

Assets 2

All reactions

Previous 1 2 … 8 9 10 11 12 Next

Previous Next

Footer

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.