Recordings copy speed from cache to NAS - Unable to keep up with recording segments in cache errors #5392
-
Hi all, I am looking for a bit of help. Thanks to the recent Frigate BETAs I have identified what seems like a very intermittent issue writing to my NAS. The NAS is dedicated to Frigate with a single 2TB WD20PURX-64PFUY0 (the purple surveillance drive) HDD. Only /recordings/ is stored on the NAS. DB etc is local to the host. Example of the error: This causes a loss of recording during the duration of this, usually around 30-40 seconds is lost. This can happen quite a bit in an hour or it can go half a day or more without showing once! Thanks to Blake's recommendation I added the following to my config:
This then shows the copy procedures from the cache to the NAS in the logs, on average this takes around 1 second, but close to where it goes wrong it can go up to 5-15 seconds for a few clips after. One example:
This NAS is old and only has 256MB of RAM but it is dedicated to only this, it is connected to a gigabit link through a gigabit switch to the Raspberry Pi host which Frigate is running on as a docker container. The HDD passes SMART tests with 0 warnings but I haven't done anything more extensive on it. The CPU usage on the Pi never goes above 50%, checked with Portainer and with htop. RAM usage is around 50%. Does anyone have any ideas on what the issue could be? I've turned on all of the options on my NAS to increase performance such as no journaling, write caching on etc. I have restarted my NAS and I have fully updated and restarted my Raspberry Pi. I have tried to accessing events, seeking in these events etc and this doesn't cause the issue. I was thinking maybe during this that was slowing writes. I am leaving a continuous ping running from the Pi to the NAS and it was always 2ms or less and never dropped or had any packet loss. The switch ports confirm no packet errors, or loss. Even during this issue pings are below 2ms and no packets are lost. Thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 11 comments 28 replies
-
If your NAS only has 256MB RAM I doubt it has a fast HDD. Probably just slow, outdated hardware. |
Beta Was this translation helpful? Give feedback.
-
As this issue comes and goes for me, having a very brief look at the record.py code that runs in Frigate which performs this function I have made a suggested pull request of changing the value of 5 to 20, which I don't think mine has gone quite as high yet but it gives a bit of a buffer. Perhaps 15 might be a middle ground? The only downside that I could see, in my very, very limited knowledge is a potential for a slightly increased cache usage, but I don't think by that much. However, perhaps merging the change would completely break Frigate, I really don't know enough about it :D. I think it'd fix the issue for me though. Maybe one day when I have the time I'll give it a try building it and testing it myself. I see this is why so many products have analytics so they can easily see if this is an issue for some of their users or not. (I'm not suggesting this at all! haha! ) |
Beta Was this translation helpful? Give feedback.
-
Just for info it also does it when set to record motion on both cameras rather than record always on both.
|
Beta Was this translation helpful? Give feedback.
-
I'm getting similar log messages. Except mine is Unraid with SSD cache & spinning drives for storage as well as 64 gigs ram. I have bumped up the shm to 1gb & tmpfs to 16gig. See if that makes a difference.. |
Beta Was this translation helpful? Give feedback.
-
Hi all @Red-ington (tagging as I saw you posted above). I have found a workaround that almost resolves the issue for me but not quite fully. The segment (recording) drop issue was on average happening for 15-25 minutes of elevated segments at completely random times, 4-6 times a day. The build that Nick kindly created was better, but still during these times I think I recall the count going up to 50-60 at one point!! Other times around 30-40. I checked the full syslogs on my Raspberry Pi and NAS and any other logs, any cron jobs that I could find. There was absolutely nothing apart from Frigate segment drop messages. CPU usage never above 50% on either device. RAM usage was high on the NAS but kind of expected as it only has 256MB RAM and is probably caching a lot. Nothing really obvious using the RAM, SMB wasn't using much. Anyway, I thought I'd change the NAS to use NFS rather than SMB/CIFS/SAMBA. The difference is night and day. Exactly the same setup otherwise. Netgear decided to make the share path quite different when using NFS so that caught me out for a little bit... PS for anyone reading, use showmount -e to check what NFS path is being advertised, it's not always what you set. nfs-common package will need to be installed to be used as NFS client and to run showmount. However, there were still about 5 times it went to a high of 6-8 segments for a few minutes since I performed this fix ~20 hours ago. The duration is clearly lower and it went no higher than a max of 8 segments at one time, a big improvement. Best of all, no recording losses for the first time in ages!!!! Perhaps the issue was with the oplocks of SMB/SAMBA, although I believe this is requested by the client, which would be either Frigate, Docker or Portainer, I am not quite sure what the client would be considered in this case. The version of NFS I use doesn't support oplocks. Hopefully this helps at least one other person. Nick kindly created a new build of Frigate which has an extended cache size before discarding and it has additional logging (please note this was a one off build for testing):
Also tagging this issue in as it's a similar issue. #5818 There is a lot of useful information in that issue too, it's worth a read. Thanks |
Beta Was this translation helpful? Give feedback.
-
For anyone who is still wrestling with this issue, the solution for me was to completely disable swap inside the container. Despite a very capable system I had had loads of "Unable to keep up with recording segments in cache..." issues, which brought me here. Previsously I had an already conservative Please note that I have plenty of RAM dedicated to the container as well as hard limits on utilization so that any memory leaks leading to OOM don't bring down the system. Be sure to check your nvidia-smi, vmstat 1, iotop, iostat -sxy --human 1 1, and podman stats, etc. to make sure you don't have an unrelated issue before trying. For anyone interested, below is my current podman run command for initializing the container. We are now humming along with 21 cameras and barely breaking a sweat:
|
Beta Was this translation helpful? Give feedback.
-
I had the same issue using an external (USB) spinning disk (2TB) that has been used to max capacity for some 2 years. I did some tests on the disk and found it did not perform fast at all (~5Mb/sec). I ended up reformatting the partition and performance was then back to normal. The disk was originally formatted with exfat. I am suspecting that the issue was heavy fragmentation on the disk. A suggestion that might be preventing this is to have an option to control the deletion of old recording so that this happens based on % instead of 1h recording since 1h recording on large volumes is a very low % and will increase fragmentation. However I think this problem is mostly affecting exfat file systems on spinning disks. |
Beta Was this translation helpful? Give feedback.
-
############################## EDIT #################################### Frigat is running on The storage is running on NFS to my nas running Truenas with 4 spinning HD in raid 1 the 2 computers are connected via a 10 gig cable, cameras are on a 1 gig switch htop shows I have 10 cameras running I am currently getting
here is my docker container frigate: what say you? |
Beta Was this translation helpful? Give feedback.
-
I am running into this now also after upgrading from 0.12 to 0.14. I've never seen this error before. Same hardware, same [number of] cameras, same everything. Why is it having a problem now that never was before :( |
Beta Was this translation helpful? Give feedback.
-
This has happened now 2 or 3 times with 14.1 (0.14.1-f4f3cfa) running in Docker on a far more power than my original post Intel x86 platform. It clears up after restarting Frigate but then seems to reappear eventually (with all cameras).
I believe this is all healthy looking from my limited Linux knowledge... I/O seemed normal. I have loads of free disk space and it's all correctly mounted, I checked in Frigate container itself also - so it's not that. However Frigate is constantly complaining of being unable to keep up with recording segments every few seconds non-stop for days when this happens. Since Frigate was last restarted was 5 days before the error this time. I do have a 5120MB memory limit on the Frigate docker (out of the 8GB) perhaps this is the cause? I'm now adding a 6.5GB limit, 1GB reservation and mem_swappiness: 0 Thanks to f1d094 for sharing this potential fix with us! What is odd is how well it worked for quite a while since 14.1 release, which makes you think that perhaps it was another update that caused it, either that or my most likely constant Frigate config tinkering and therefore Frigate restarting for the first few months was enough to prevent it, haha... |
Beta Was this translation helpful? Give feedback.
-
i have been running frigate for a few months and i just started receiving this error with the latest docker version manjaro Frigate (0.14.1-f4f3cfa) "WARNING : Unable to keep up with recording segments in cache for Cam1. Keeping the 5 most recent segments out of 6 and discarding the rest..." |
Beta Was this translation helpful? Give feedback.
Hi all @Red-ington (tagging as I saw you posted above).
I have found a workaround that almost resolves the issue for me but not quite fully.
The segment (recording) drop issue was on average happening for 15-25 minutes of elevated segments at completely random times, 4-6 times a day. The build that Nick kindly created was better, but still during these times I think I recall the count going up to 50-60 at one point!! Other times around 30-40.
I checked the full syslogs on my Raspberry Pi and NAS and any other logs, any cron jobs that I could find. There was absolutely nothing apart from Frigate segment drop messages. CPU usage never above 50% on either device. RAM usage was high on the NA…