Replies: 2 comments
-
Good point, I forgot to add this. The classifier jobs can be scaled per storage. Ie. if you have a hundred users, each with a home storage, you get 100 classifier jobs that you can run in parallel. If all your 100k files are in one storage, you're currently out of luck, I'm afraid. |
Beta Was this translation helpful? Give feedback.
-
Thanks and found also "reserved_at" column in jobs which is probably responsible for reservations and allows to run some jobs in parallel. Luckily in my case, I've majority of photos spread across mainly two users. |
Beta Was this translation helpful? Give feedback.
-
@marcelklehr , would you mind to share if, given the state of the code, adding mode processing nodes (systems with Recognize installed) should speed up processing speed of the data set?
It certainly depends on the way/moment when files to be processed are taken out form the "to be processed" queue.
Looking through tables set in NC DB, it doesn't look like adding more nodes does really help and here is why (both written without 100% understanding of the code, though I haven't seen a logic there and can be 100% wrong as could miss something.
Situation a) - which could enable parallel processing though seems to be vulnerable to miss files to process in case of NC crashing/whatever in the process.
Process run, picks up list of file IDs to be processed and instantly removes them from the queue. Should the process/instance crash, IDs are getting lost and won't be processed again? I didn't find a table or column marking which would server as "in works" queue.
Situation b) - this makes it impossible to have processes running in parallel and processing different files
Process picks up list of file IDs to be processed and removes them from the list upon completion. Whilst this saves us from situation of silently dropping files to be processed, it blocks the real parallel processing, as the moment the second node comes up, it will start working on the very same set until previous process didn't finish and removed these files and situation will repeat, as once the second node would finish, it would again pick same set as previous system, etc.
Hopefully there's Situation 3 already covered in the code, though am unsure hence question to @marcelklehr .
Doing my RTFM, found https://github.com/nextcloud/recognize/wiki/Behind-the-scenes though it is not covered there either.
Though, I'll ask question here instead of raising RFE which hopefully is not needed.
Beta Was this translation helpful? Give feedback.
All reactions