Skip to content

Releases: crwlrsoft/crawler

v3.4.0

06 Mar 22:44
Compare
Choose a tag to compare

Added

  • Two new methods to the base class of all Http steps:
    • skipCache() – Allows using the cache while skipping it for a specific loading step.
    • useBrowser() – Switches the loader to use a (headless) Chrome browser for loading calls in a specific step and then reverts the loader to its previous setting.
  • Introduced the new BrowserAction::screenshot() post browser navigate hook. It accepts an instance of the new ScreenshotConfig class, allowing you to configure various options (see the methods of ScreenshotConfig). If successful, the screenshot file paths are included in the RespondedRequest output object of the Http step.

v3.3.0

02 Mar 10:58
Compare
Choose a tag to compare

Added

  • New BrowserActions to use with the postBrowserNavigateHook() method:
    • BrowserAction::clickInsideShadowDom()
    • BrowserAction::moveMouseToElement()
    • BrowserAction::moveMouseToPosition()
    • BrowserAction::scrollDown()
    • BrowserAction::scrollUp()
    • BrowserAction::typeText()
    • BrowserAction::waitForReload()
  • A new method in HeadlessBrowserLoaderHelper to include the HTML content of shadow DOM elements in the returned HTML. Use it like this: $crawler->getLoader()->browser()->includeShadowElementsInHtml().

Changed

  • The BrowserAction::clickElement() action, now automatically waits for an element matching the selector to be rendered, before performing the click. This means you don't need to put a BrowserAction::waitUntilDocumentContainsElement() before it. It works the same in the new BrowserAction::clickInsideShadowDom() and BrowserAction::moveMouseToElement() actions.

Deprecated

  • BrowserAction::clickElementAndWaitForReload() and BrowserAction::evaluateAndWaitForReload(). As a replacement, please use BrowserAction::clickElement() or BrowserAction::evaluate() and BrowserAction::waitForReload() separately.

v3.2.5

25 Feb 23:58
Compare
Choose a tag to compare

Fixed

  • When a child step is nested in the extract() method of an Html or Xml step, and does not use each() as the base, the extracted value is an array with the keys defined in the extract() call, rather than an array of such arrays as it would be with each() as base.

v3.2.4

25 Feb 10:49
Compare
Choose a tag to compare

Fixed

  • Trying to load a relative reference URI (no scheme and host/authority, only path) via the HttpLoader now immediately logs (or throws when loadOrFail() is used) an error instead of trying to actually load it.

v3.2.3

28 Jan 17:45
Compare
Choose a tag to compare

Fixed

  • Fix deprecation warning triggered in the DomQuery class, when trying to get the value of an HTML/XML attribute that does not exist on the element.

v3.2.2

17 Jan 13:31
Compare
Choose a tag to compare

Fixed

  • Warnings about loader hooks being called multiple times, when using a BotUserAgent and therefore loading and respecting the robots.txt file, or when using the Http::stopOnErrorResponse() method.

v3.2.1

13 Jan 10:34
Compare
Choose a tag to compare

Fixed

  • Reuse previously opened page when using the (headless) Chrome browser, instead of opening a new page for each request.

v3.2.0

12 Jan 21:29
Compare
Choose a tag to compare

Added

  • RespondedRequest::isServedFromCache() to determine whether a response was served from cache or actually loaded.

v3.1.5

10 Jan 11:59
Compare
Choose a tag to compare

Fixed

  • Another improvement for getting XML source when using the browser, in cases where Chrome doesn't identify the response as an XML document (even though a Content-Type header is sent).

v3.1.4

10 Jan 10:33
Compare
Choose a tag to compare

Fixed

  • HttpLoader::dontUseCookies() now also works when using the Chrome browser. Cookies are cleared before every request.