Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsorted results prefix performance #8396

Closed
macdrevx opened this issue Oct 13, 2023 · 2 comments
Closed

Unsorted results prefix performance #8396

macdrevx opened this issue Oct 13, 2023 · 2 comments

Comments

@macdrevx
Copy link

macdrevx commented Oct 13, 2023

How frequently does the bug occur?

Always

Description

I'm working on optimizing a query of a large dataset where I only need the first N results.

In #4886 (comment), there's the following blurb:

Because you're sorting the Results, Realm can't just terminate the query search after it finds a small number of objects matching the predicate, as that might not be the first match after the sort operation. So all 80,000 objects must be sorted before Realm can even serve you one of them. Skipping the sort operation would drastically improve the performance here.

My use case doesn't require sorting, so I was expecting that I could iterate through the results with a for … in loop, breaking after finding N matches, and that it would take a similar amount of time for realm to find each next result.

What I'm actually seeing is that, even without sorting, accessing the first result takes dramatically longer than accessing each subsequent result. Is this expected?

Stacktrace & log output

…
created 1000000
[0.30182695388793945, 1.4066696166992188e-05, 2.0265579223632812e-06, 0.0, 9.5367431640625e-07, 0.0, 0.0, 0.0, 9.5367431640625e-07, 7.033348083496094e-06, 0.0, 0.0, 0.0, 0.0, 0.0, 9.5367431640625e-07, 4.506111145019531e-05, 0.0, 0.0, 0.0, 9.5367431640625e-07]

Each array entry shows the time it took to get the next element in the for…in loop. Note how much longer it takes to access the first element despite the fact that the results are unsorted.

Can you reproduce the bug?

Always

Reproduction Steps

PrefixPerfDemo.zip

Run attached Xcode project.

Version

10.43.0

What Atlas Services are you using?

Local Database only

Are you using encryption?

No

Platform OS and version(s)

iOS Simulator 16.4

Build environment

Xcode version: 14.3.1
Dependency manager and version: Swift Package Manager (whatever version comes with Xcode 14.3.1)

@tgoyne
Copy link
Member

tgoyne commented Oct 13, 2023

The query is run and the full object set is assembled the first time anything which needs it is access, and we don't do an incremental find. The part which is lazy is creating the accessor object for each object in the Results (which for a non-sorted Results is typically much slower than the actual query), and reading any of the data for those objects which wasn't used in the query. We don't have an offset: parameter on index(matching:) because it'd be a performance trap; objects[objects.index(matching: query)] is a bit faster than objects.filter(query).first, but doing multiple searches is typically dramatically slower than just finding all of them even if you end up discarding most of the results without using them.

There isn't any good reason for Results to not have a .limit() function, but I guess it's just never really come up as something anyone has needed in the time since it was implemented in the query engine.

@macdrevx
Copy link
Author

Thanks for the quick answer!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants