-
I recently started using apify storage with my own crawler instances and immediately realised I need something that allows me to have a single request queue which can be processed by always available crawler instances. Is that possible to do? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 18 replies
-
We just merged support for v2 of the request queue API, which supports request locking, in other words, parallel runs. This is an experimental feature, so I can't suggest using it in production right now, but it's definitely happening soon. Here is an example test using this feature: https://github.com/apify/crawlee/blob/master/test/e2e/cheerio-request-queue-v2/actor/main.js It will work only on the apify platform, and we know about some issues already, but we would appreciate any reports if you would try to test this yourself right now. |
Beta Was this translation helpful? Give feedback.
-
@B4nan I just tried this like this way. const getCrawler = ({
keepAlive,
}: {
keepAlive?: boolean
}) => {
return new PlaywrightCrawler({
experiments: {
requestLocking: true,
},
keepAlive,
requestHandler: router,
})
}
const alwaysRunningCrawler = getCrawler({ keepAlive: true })
const alwaysRunningCrawler2 = getCrawler({ keepAlive: true })
app.get("/crawl", (req, res) => {
const crawler = getCrawler()
crawler.addRequests(...manyRequests)
})
await Promise.all([alwaysRunningCrawler.run(), alwaysRunningCrawler2.run()]) I am currently using my local file storage to store the requests to test this. When the requests are added using the express endpoint, nothing happens. Is it because I am not using apify storage in this test? |
Beta Was this translation helpful? Give feedback.
-
Alright, so it seems to work partially now with the apify storage. I want to understand what happens if the client which is currently executing requests stops due to some reason and doesnt allow cleanup? Do the requests lying in the storage get unlocked after some time? is it 15 minutes? Can I configure it somehow? |
Beta Was this translation helpful? Give feedback.
We just merged support for v2 of the request queue API, which supports request locking, in other words, parallel runs. This is an experimental feature, so I can't suggest using it in production right now, but it's definitely happening soon.
Here is an example test using this feature:
https://github.com/apify/crawlee/blob/master/test/e2e/cheerio-request-queue-v2/actor/main.js
It will work only on the apify platform, and we know about some issues already, but we would appreciate any reports if you would try to test this yourself right now.