Releases · scrapinghub/frontera

latest SQLAlchemy unicode-related crashes are fixed,
corporate website friendly canonical solver has been added.
crawling strategy concept evolved: added ability to add to queue an arbitrary URL (with transparent state check), FrontierManager available on construction,
strategy worker code was refactored,
default state introduced for links generated during crawling strategy operation,
got rid of Frontera logging in favor of Python native logging,
logging system configuration by means of logging.config using file,
partitions to instances can be assigned from command line now,
improved test coverage from @Preetwinder.

Enjoy!

Assets 2

22 Apr 14:45

sibiryakov

v0.4.2

36171aa

Kafka-python bug fix release

This release prevents installing kafka-python package versions newer than 0.9.5. Newer version has significant architectural changes and requires Frontera code adaptation and testing. If you are using Kafka message bus, than you're encouraged to install this update.

Assets 2

18 Jan 10:30

sibiryakov

v0.4.1

dd150ff

Bug fix release

fixed API docs generation on RTD,
added body field in Request objects, to support POST-type requests,
guidance on how to set MAX_NEXT_REQUESTS and settings docs fixes,
fixed colored logging.

Assets 2

30 Dec 20:23

sibiryakov

v0.4.0

be3a952

Distributed and easy to use

A tremendous work was done:

distributed-frontera and frontera were merged together into the single project: to make it easier to use and understand,
Backend was completely redesigned. Now it's consisting of Queue, Metadata and States objects for low-level code and higher-level Backend implementations for crawling policies,
Added definition of run modes: single process, distributed spiders, distributed spider and backend.
Overall distributed concept is now integrated into Frontera, making difference between usage of components in single process and distributed spiders/backend run modes clearer.
Significantly restructured and augmented documentation, addressing user needs in a more accessible way.
Much less configuration footprint.

Enjoy this new year release and let us know what you think!

Assets 2

29 Sep 17:08

sibiryakov

v0.3.3

065b69a

Numerous bug fixes, and improvements

tldextract is no longer minimum required dependency,
SQLAlchemy backend now persists headers, cookies, and method, also _create_page method added to ease customization,
Canonical solver code (needs documentation)
Other fixes and improvements

Assets 2

19 Jun 09:14

sibiryakov

v0.3.2

86af593

Frontera configuration from Scrapy settings

Now, it's possible to configure Frontera from Scrapy settings. The order of precedence for configuration sources is following:

Settings defined in the module pointed by FRONTERA_SETTINGS (higher precedence)
settings defined in the Scrapy settings,
default frontier settings.

Assets 2

25 May 14:11

sibiryakov

v0.3.1

4bd2671

Better support of ordinary Scrapy spiders and cold start problem fix

Main issue solved in this version is that now, request callbacks and request.meta contents are successfully serializing and deserializing in SQL Alchemy-based backend. Therefore, majority of Scrapy extensions shouldn't suffer from loosing meta or callbacks passing over Frontera anymore. Second, there is hot fix for cold start problem, when seeds are added, and Scrapy is quickly finishing with no further activity. Well thought solution for this will be offered later.

Assets 2

15 Apr 13:19

sibiryakov

v0.3.0

087027c

New name, improved scheduling and other

Frontera is the new name for Crawl Frontier.
Signature of get_next_requests method is changed, now it accepts arbitrary key-value arguments.
Overused buffer (subject to remove in the future in favor of downloader internal queue).
Backend internals became more customizable.
Scheduler now requests for new requests when there is free space in Scrapy downloader queue, instead of waiting for absolute emptiness.
Several Frontera middlewares are disabled by default.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: scrapinghub/frontera

Strategy worker hardening and bug fixes

Bug fix

Crawling strategy improvements and native logging

Kafka-python bug fix release

Bug fix release

Distributed and easy to use

Numerous bug fixes, and improvements

Frontera configuration from Scrapy settings

Better support of ordinary Scrapy spiders and cold start problem fix

New name, improved scheduling and other