Replies: 1 comment
-
Hello and thank you for your interest in Crawlee! Unfortunately, with your scenario, I wasn't able to reproduce the error. The increase from ~30 to ~40 MB in the snapshots you've made is completely normal and cannot be considered indicative of any problem. The mark-sweep garbage collection is an expensive operation, so Node.JS always tries to defer it - if you have 4GB of RAM available, Node won't run the garbage collection because of 10 extra megabytes. You can try using |
Beta Was this translation helpful? Give feedback.
-
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/http (HttpCrawler)
Issue description
I'm not completely comfortable with Crawlee and dealing with memory leaks in NodeJS with Express, but I suspect there might be a memory leak issue with Crawlee. I have been running it on my server for the past few days, handling tens of thousands of requests each day. However, I keep encountering memory issues. Each time I check the memory usage, it is really high and continues to grow.
Below is some simple code using the HTTP crawler. I might not be properly disposing of the Crawlee instance, which could be causing the memory to increase with every run. The attached picture shows the growth of the process—each snapshot represents another 100 individual runs. The increase is minimal each time, but it adds up significantly over time.
Snapshot 3 - 0 runs, just initiating the server
Snapshot 4 - 100 runs, since the start
Snapshot 5 - 200 runs, since the start
Snapshot 6 - 300 runs, since the start
I let the crawler sit for a few minutes after finishing each 100 runs to let the memory settle.
This is the code I ran to test the express server:
Please let me know if I am doing something wrong or this is how it's supposed to be! Again, I am not too comfortable with all of this so not really sure, but all I know is that my server keeps running out of memory which is 4GB.
Also I did a lot of searching for previous memory leak issues with crawlee and came across a few threads. I believe I implemented what was suggested, sorry if I missed anything!
Code sample
Package version
3.10
Node.js version
20.11.1
Operating system
No response
Apify platform
I have tested this on the
next
releaseNo response
Other context
No response
Beta Was this translation helpful? Give feedback.
All reactions