Skip to content

v0.1.0

Pre-release
Pre-release
Compare
Choose a tag to compare
@otsch otsch released this 18 Apr 11:58
· 298 commits to main since this release
e54edf7

Initial Version containing

  • Crawler class being the main unit that executes all the steps that you'll add to it, handling input and output of the steps.
  • HttpCrawler class using the PoliteHttpLoader (version of HttpLoader sticking to robots.txt rules) using any PSR-18 HTTP client under the hood and having an own implementation for a cookie jar.
  • Some ready to use steps for HTTP, HTML, XML, JSON and CSV.
  • Loops and Groups.
  • Crawler has a PSR-3 LoggerInterface and passes it on to all the steps. The included steps log some messages about what they're doing. Package includes a simple CliLogger.
  • Crawler requires a User Agent and an included BotUserAgent class provides an easy interface for bot user agent strings.
  • Stores to save the final results can be added to the Crawler. Simple CSV File Store is shipped with the package.