Replies: 1 comment 2 replies
-
Datasets are abstracted, you can provide your own storage implementation, or use existing, e.g. the ApifyClient if you want to run it on apify platform. So as long as you run things on your own infra and you always want to store JSONs, its quite similar. The main difference is locking support, which is implemented when you use the storage methods. But for dataset that is in general append only, locking does not really matter. We also wait for async storage methods to finish before resolving the crawler, but as long as you would use sync method for writing the file, it should work fine.
This feels weird, could you debug this a bit or ideally tell me how to reproduce? Storing 2,5mb in a dataset item definitely won't "spread" this way to 100mb. Maybe you just use named storages? You need to clear those up yourself, they are persistent. |
Beta Was this translation helpful? Give feedback.
-
I'm playing around with Crawlee and Dataset and love it!
I'm wondering, what are the benefits of using Dataset vs. just writing to a json file? I'm playing around and my Dataset folder for my example is 100MB, but writing to a JSON file with the same information is only 2.5MB.
When should I choose Dataset, and when should I just write to a JSON file?
Beta Was this translation helpful? Give feedback.
All reactions