Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autocomplete slower earlier queries redraw over newer ones #766

Open
vwbusguy opened this issue Jul 27, 2024 · 5 comments · May be fixed by #769
Open

Autocomplete slower earlier queries redraw over newer ones #766

vwbusguy opened this issue Jul 27, 2024 · 5 comments · May be fixed by #769

Comments

@vwbusguy
Copy link

vwbusguy commented Jul 27, 2024

Expected behaviour

As you name someone and begin typing their name, the helpful auto-complete drop down box gives suggested names based on what is currently in the text box. If a more recent typing query has already been drawn for autocomplete, it should abandon the earlier query results.

Actual behaviour

With many people named, typing a name that begins with a letter in a lot of other names, the queries for the first letter (and subsequent shorter letters) might return after the queries against more letters in the name, redrawing the auto-complete box, even after a name has been selected, making naming more tedious and leading to occasional accidental wrong names.

Example from MariaDB slow log of queries for 'avo' returning before 'av' and before 'a':
image

Steps to reproduce

  1. Quickly type the first three letters of a name that starts with a letter in many other names
  2. Note that the auto-complete is overwritten by the earlier query
  3. Note that this earlier query auto-complete draw-over still happens even if a name has been selected

Server configuration

  • Operating system:
    Fedora 40

  • Pdlib version:
    v1.1.0

  • How is DLib installed: Make sure it is working correctly with this tool
    From Fedora repository

  • How is PDlib installed: Make sure it is working correctly with this tool
    Compiled from git repo

  • PHP version:
    8.3.9

  • Web server:
    Apache httpd 2.4.62

  • Database:
    mariadb-10.11.8

  • Nextcloud version:
    29.0.4

Client configuration

  • Browser:
    Firefox 127.0.2

  • Operating system:
    Fedora 40

Logs

Background task log with debug.

sudo -u apache php occ -vvv face:background_job
Jul 26 20:14:17 Tauriel systemd[1]: facerecognition.service: Consumed 1min 50.475s CPU time.
Jul 26 20:19:19 Tauriel systemd[1]: Started facerecognition.service - Nextcloud FaceRecognition Job.
Jul 26 20:19:20 Tauriel php[532341]: 1/8 - Executing task CheckRequirementsTask (Check all requirements)
Jul 26 20:19:20 Tauriel php[532341]: 2/8 - Executing task CheckCronTask (Check that service is started from either cron or from command)
Jul 26 20:19:20 Tauriel php[532341]: 3/8 - Executing task DisabledUserRemovalTask (Purge all the information of a user when disable the analysis.)
Jul 26 20:19:20 Tauriel php[532341]: 4/8 - Executing task StaleImagesRemovalTask (Crawl for stale images (either missing in filesystem or under .nomedia) and remove them from DB)
Jul 26 20:19:20 Tauriel php[532341]: 5/8 - Executing task AddMissingImagesTask (Crawl for missing images for each user and insert them in DB)
Jul 26 20:19:20 Tauriel php[532341]:         Skipping image scan for user csaenz that has disabled the analysis
Jul 26 20:19:20 Tauriel php[532341]:         Skipping image scan for user fcc that has disabled the analysis
Jul 26 20:19:20 Tauriel php[532341]:         Skipping image scan for user kodi that has disabled the analysis
Jul 26 20:19:20 Tauriel php[532341]: 6/8 - Executing task EnumerateImagesMissingFacesTask (Find all images which don't have faces generated for them)
Jul 26 20:19:20 Tauriel php[532341]: 7/8 - Executing task ImageProcessingTask (Process all images to extract faces)
Jul 26 20:19:20 Tauriel php[532341]:         NOTE: Starting face recognition. If you experience random crashes after this point, please look FAQ at https://github.com/matiasdelellis/facerecognition/wiki/FAQ
Jul 26 20:19:20 Tauriel php[532341]: 8/8 - Executing task CreateClustersTask (Create new persons or update existing persons)
Jul 26 20:19:20 Tauriel php[532341]:         Skipping cluster creation, not enough data (yet) collected. For cluster creation, you need either one of the following:
Jul 26 20:19:20 Tauriel php[532341]:         * have 1000 faces already processed
Jul 26 20:19:20 Tauriel php[532341]:         * or you need to have 95% of you images processed
Jul 26 20:19:20 Tauriel php[532341]:         Use stats command to track progress
Jul 26 20:19:20 Tauriel php[532341]:         Skipping cluster creation, not enough data (yet) collected. For cluster creation, you need either one of the following:
Jul 26 20:19:20 Tauriel php[532341]:         * have 1000 faces already processed
Jul 26 20:19:20 Tauriel php[532341]:         * or you need to have 95% of you images processed
Jul 26 20:19:20 Tauriel php[532341]:         Use stats command to track progress
Jul 26 20:19:20 Tauriel php[532341]:         Face clustering will be created for the first time.
Jul 26 20:19:39 Tauriel php[532341]:         There are 100359 faces for clustering
Jul 26 20:20:11 Tauriel php[532341]:         64334 clusters found after clustering
Jul 26 20:21:26 Tauriel php[532341]:         Skipping cluster creation, not enough data (yet) collected. For cluster creation, you need either one of the following:
Jul 26 20:21:26 Tauriel php[532341]:         * have 1000 faces already processed
Jul 26 20:21:26 Tauriel php[532341]:         * or you need to have 95% of you images processed
Jul 26 20:21:26 Tauriel php[532341]:         Use stats command to track progress
Jul 26 20:21:26 Tauriel php[532341]:         Clusters already exist, estimated there is no need to recreate them
Jul 26 20:21:27 Tauriel systemd[1]: facerecognition.service: Deactivated successfully.
Jul 26 20:21:27 Tauriel systemd[1]: facerecognition.service: Consumed 1min 48.558s CPU time.
Count of names for my Nextcloud user
select count(DISTINCT name) from oc_facerecog_persons where user = 'scott'\G
*************************** 1. row ***************************
count(DISTINCT name): 290
1 row in set (0.040 sec)

Oddly enough, I haven't noticed this happening until recently - around the time I upgraded the database from MariaDB from 10.5 to 10.11. I turned on slow log and the slow log queries are exclusively coming from facerecognition.

@vwbusguy
Copy link
Author

vwbusguy commented Jul 27, 2024

I wonder if a better approach might be to query for DISTINCT names once on initial page load via ajax and only update it if a name is submitted for a cluster that doesn't already exist in the browser-side cache. That would be a cheaper database query and a lot less queries in general during the whole cluster naming process.

For comparison select DISTINCT name from oc_facerecog_persons where user = 'my_nextcloud_user' ORDER BY name ASC; returns 291 rows in 0.042 seconds.

Then filter the names in browser with javascript on key up instead of server side queries.

@vwbusguy
Copy link
Author

vwbusguy commented Jul 30, 2024

This has been a really annoying issue lately.

I've looked through the code and it isn't going to be as easy as I thought it might be in when I made my last comment.

It looks like the vendor autocomplete library supports caching natively, so I thought that enabling that in the vendor library calls might improve the experience all around:

https://github.com/realsuayip/autocomplete

However, enabling that cache doesn't appear to cause any less API calls as far as I can tell.

I also tried a few things to optimize the lookup query (using outer join on images, etc.) but it didn't produce any meaningful improvement, if any at all. 🫤

EDIT: This looks like a bug upstream as the CachedResults is never actually populated.

@vwbusguy
Copy link
Author

vwbusguy commented Jul 31, 2024

It looks like the reason that turning on caching didn't help is that when you're naming clusters, the Autocomplete() object gets reinstantiated in between each naming, so enabling cache does work in the library, but the cache is lost in between presenting new clusters to name.

I also spent some time with doing some database query analytics yesterday (EXPLAIN, etc.) to see if I can make things more efficient at the database level, such as adding an index on name,user in the person table and tweaking the query, but again - nothing that made a notable improvement.

If there was a way to persist the Autocomplete between cluster namings in assignName and turning cache, that would go a long way in improving things, but I'm not sure how to do this the way that it currently works since the input object is destroyed. Otherwise, turning cache on was a simple matter of adding cache: true in dialogs.js.

@vwbusguy
Copy link
Author

vwbusguy commented Aug 1, 2024

I got it to work much better on the browser side by evaluating a request counter with a little bit of javascript. While I'm hopeful that the upstream library might get some improvements, I have a working POC locally to make this perform better in the meantime. I'll try to clean it up and get a PR to you for it.

@vwbusguy
Copy link
Author

This is still an issue for the new release. I might make a video to show how cumbersome it is to try to do this without the patch. It's not just that the autocomplete isn't perfect, but it often ends up overwriting a selection with a different person unless you put a good 10 seconds or so between typing each individual letter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant