Can't find document/discussion/issue about how to resume the crawling after ctrl-c without erasing the pulled dataset #2149
Replies: 2 comments 2 replies
-
If it does not work for you, are you probably setting it too late (as its the very first async call that is touched some storage which triggers the clean up). Try using the |
Beta Was this translation helpful? Give feedback.
-
@B4nan I have a similar question, If by any meaning my crawler stops I would like to not repeat already crawled urls. I have read the docs trying to find something regarding this issue but I had no luck. To clarify: I want to not crawl or request urls that I have already in my Dataset. |
Beta Was this translation helpful? Give feedback.
-
Since crawling is time costing, and it's common to stop the crawler intentionally or accidentally.
However, by default, the storage is removed when starting a new job.
Even by setting purgeOnStart to false, the files in storage/datasets/default are removed.
I believe pause/resume is a common feature of crawlers.
So, could anyone help on this question?
Beta Was this translation helpful? Give feedback.
All reactions