Releases · apify/crawlee

09 Mar 17:06

mnmkng

v0.20.2

ac53642

v0.20.2

Fix an error where persistence of SessionPool would fail if a cookie included invalid
expires value.
Skipping one patch version because of an error in publishing via CI.

Assets 2

03 Mar 13:06

mnmkng

v0.20.0

53ff5ee

v0.20.0

BREAKING: Apify.utils.requestAsBrowser() no longer aborts request on status code 406
or when other than text/html type is received. Use options.abortFunction if you want to
retain this functionality.
BREAKING: Added useInsecureHttpParser option to Apify.utils.requestAsBrowser() which
is true by default and forces the function to use a HTTP parser that is less strict than
default Node 12 parser, but also less secure. It is needed to be able to bypass certain
anti-scraping walls and fetch websites that do not comply with HTTP spec.
BREAKING: RequestList now removes all the elements from the sources array on
initialization. If you need to use the sources somewhere else, make a copy. This change
was added as one of several measures to improve memory management of RequestList
in scenarios with very large amount of Request instances.
DEPRECATED: RequestListOptions.persistSourcesKey is now deprecated. Please use
RequestListOptions.persistRequestsKey.
RequestListOptions.sources can now be an array of string URLs as well.
Added sourcesFunction to RequestListOptions. It enables dynamic fetching of sources
and will only be called if persisted Requests were not retrieved from key-value store.
Use it to reduce memory spikes and also to make sure that your sources are not re-created
on actor restarts.
Updated stealth hiding of webdriver to avoid recent detections.
Apify.utils.log now points to an updated logger instance which prints colored logs (in TTY)
and supports overriding with custom loggers.
Improved Apify.launchPuppeteer() code to prevent triggering bugs in Puppeteer by passing
more than required options to puppeteer.launch().
Documented BasicCrawler.autoscaledPool property, and added CheerioCrawler.autoscaledPool
and PuppeteerCrawler.autoscaledPool properties.
SessionPool now persists state on teardown. Before, it only persisted state every minute.
This ensures that after a crawler finishes, the state is correctly persisted.
Added TypeScript typings and typedef documentation for all entities used throughout SDK.
Upgraded proxy-chain NPM package from 0.2.7 to 0.4.1 and many other dependencies
Removed all usage of the now deprecated request package.

Assets 2

30 Jan 16:13

mnmkng

v0.19.1

e65f98f

v0.19.1

BREAKING (EXPERIMENTAL): session.checkStatus() -> session.retireOnBlockedStatusCodes().
Session API is no longer considered experimental.
Updates documentation and introduces a few internal changes.

Assets 2

20 Jan 12:01

mnmkng

v0.19.0

342c727

v0.19.0

BREAKING: APIFY_LOCAL_EMULATION_DIR env var is no longer supported (deprecated on 2018-09-11).
Use APIFY_LOCAL_STORAGE_DIR instead.
SessionPool API updates and fixes. The API is no longer considered experimental.
Logging of system info moved from require time to Apify.main() invocation.
Use native RegExp instead of xregexp for unicode property escapes.

Assets 2

08 Jan 08:19

mnmkng

v0.18.1

db460f5

v0.18.1

Fix SessionPool not automatically working in CheerioCrawler.
Fix incorrect management of page count in PuppeteerPool.

Assets 2

06 Jan 12:16

petrpatek

v0.18.0

343366d

v0.18.0

BREAKING CheerioCrawler ignores ssl errors by default - options.ignoreSslErrors: true.
Add SessionPool implemenation to CheerioCrawler.
Add SessionPool implementation to PuppeteerPool and PupeteerCrawler.
Fix Request constructor not making a copy of objects such as userData and headers.
Fix desc option not being applied in local dataset.getData().

Assets 2

25 Nov 16:02

mnmkng

v0.17.0

b22fee9

v0.17.0

BREAKING: Node 8 and 9 are no longer supported. Please use Node 10.17.0 or higher.
DEPRECATED: Apify.callTask() body and contentType options are now deprecated.
Use input instead. It must be of content-type: application/json.
Add default SessionPool implementation to BasicCrawler.
Add the ability to create ad-hoc webhooks via Apify.call() and Apify.callTask().
Add an example of form filling with Puppeteer.
Add country option to Apify.getApifyProxyUrl().
Add Apify.utils.puppeteer.saveSnapshot() helper to quickly save HTML and screenshot of a page.
Add the ability to pass got supported options to requestOptions in CheerioCrawler
thus supporting things such as cookieJar again.
Switch Puppeteer to web socket again due to suspected pipe errors.
Fix an issue where some encodings were not correctly parsed in CheerioCrawler.
Fix parsing bad Content-Type headers for CheerioCrawler.
Fix custom headers not being correctly applied in Apify.utils.requestAsBrowser().
Fix dataset limits not being correctly applied.
Fix a race condition in RequestQueueLocal.
Fix RequestList persistence of downloaded sources in key-value store.
Fix Apify.utils.puppeteer.blockRequests() always including default patterns.
Fix inconsistent behavior of Apify.utils.puppeteer.infiniteScroll() on some websites.
Fix retry histogram statistics sometimes showing invalid counts.
Added regexps for Youtube videos (YOUTUBE_REGEX, YOUTUBE_REGEX_GLOBAL) to utils.social
Added documentation for option json in handlePageFunction of CheerioCrawler

Assets 2

31 Oct 10:34

drobnikj

v0.16.1

d7462c8

v0.16.1

Add useIncognitoPages option to PuppeteerPool to enable opening new pages in incognito
browser contexts. This is useful to keep cookies and cache unique for each page.
Added options to load every content type in CheerioCrawler.
There are new options body and contentType in handlePageFunction for this purposes.
DEPRECATED: CheerioCrawler html option in handlePageFunction was replaced with body option.

Assets 2

30 Sep 09:51

mnmkng

v0.16.0

e1e1ab5

v0.16.0

Update @apify/http-request to version 1.1.2.
Update CheerioCrawler to use requestAsBrowser() to better disguise as a real browser.

Assets 2

19 Aug 07:46

mnmkng

v0.15.5

ad847c7

v0.15.5

This release just updates some dependencies (not Puppeteer).

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: apify/crawlee

v0.20.2

v0.20.0

v0.19.1

v0.19.0

v0.18.1

v0.18.0

v0.17.0

v0.16.1

v0.16.0

v0.15.5