Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Yaroslavskiy-Bentley-Bloch Quicksort. #80

Closed
wants to merge 14 commits into from

Conversation

n3vu0r
Copy link

@n3vu0r n3vu0r commented Jun 21, 2021

This is a dual pivot 3-way quick sort which is less likely to run into worst cases which can end in stack overflows for large arrays.

The previous single pivot 2-way quick sort used by the FreedmanDiaconis strategy overflows its stack with this NPY array.

Copy link
Member

@jturner314 jturner314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comments but haven't reviewed the PR in detail. This approach is an improvement over the current implementation. Does this approach work well if all the elements of the input are the same?

src/sort.rs Outdated
/// }
/// ```
fn partition_mut(&mut self, pivot_index: usize) -> usize
fn partition_mut(&mut self) -> (usize, usize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep the existing partition_mut method, because it's a public method which is useful for users. The dual-pivot partitioning can be an internal-only function just used by _get_many_from_sorted_mut_unchecked, or if you think it would be useful to users for some reason, we could make it public, but as a separate method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, no problem. In case we make it public as well, I would suggest to separate dual-pivot sampling from partitioning into two methods. For now, I would keep them private to easily change their interface if needed.

src/sort.rs Outdated
i += 1;
let lowermost = 0;
let uppermost = self.len() - 1;
if self.len() < 42 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does 42 come from?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

47 is used as recursion cutoff by the JDK 7 dual-pivot implementation. It doesn't really apply here, because we just use it to stop sampling small arrays. I chose the next smaller integer multiple of 7 since we divide by 7. So we have slightly better spaced sample elements for small arrays. The reason to test for small arrays at all is that the sampling doesn't work for arrays smaller than 7 because seventh becomes 0 and sample indexes are not unique anymore.

src/sort.rs Outdated
let lowermost = 0;
let uppermost = self.len() - 1;
if self.len() < 42 {
// Sort outermost elements and use them as pivots.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that using the outermost elements as pivots won't work well if the array is already sorted. I suppose it's not too big an issue since the array is known to be small in this branch, but it's still not ideal.

A couple of other ideas for a deterministic strategy would be:

  1. Use pivot indices at 1/3 and 2/3 (or 1/4 and 3/4) of the length of the array. This would handle sorted inputs better.

  2. Use randomly-selected pivots, but use a RNG with a fixed seed instead of thread_rng.

Copy link
Author

@n3vu0r n3vu0r Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, not ideal for sorted input. I'm happy with both your ideas. I was also thinking about it and currently am in favor of this:

The sampling has quite a high overhead for small arrays. Instead of sorting the sample with insertion sort, why not sort the whole small array itself with insertion sort, then the original motivation of the constant ~47 holds again as it becomes a recursion cutoff. I first was against this, since the method is called partition_mut() and should not do a full sort but thinking about it, a full sort of small arrays is totally fine as it is the natural edge case of sorting the sample when sample length and array length coincide. We just have to return valid but artificial pivot indexes, the ones trivial to compute are (lowermost, uppermost).

        let lowermost = 0;
        let uppermost = self.len() - 1;
        // Recursion cutoff at an integer multiple of 7.
        if self.len() < 42 {
            // Sort array instead of sample.
            for mut index in 1..self.len() {
                while index > 0 && self[index - 1] > self[index] {
                    self.swap(index - 1, index);
                    index -= 1;
                }
            }
            return (lowermost, uppermost);
        }
        // Continue with sampling and quick sort.

It seems to be faster than the current PR for my two data sets. And if we want to make the dual pivot partitioning public, we might probably split the code into separate methods: then recursion cutoff moves into the get_ methods which invoke separate pivot sampling and partitioning methods. But functionality-wise above code at the top of partition_mut() is equivalent and requires less modifications to the rest of the code.

Copy link
Author

@n3vu0r n3vu0r Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursion cutoff is not perfect if it happens within partition_mut(), recursion still continues for the middle partition. I think it's best to cleanly separate code into appropriate methods.

src/sort.rs Outdated
self.swap(i, j);
i += 1;
j -= 1;
// Use 1st and 3rd element of sorted sample as skewed pivots.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good strategy. It's possible to design a pathological input for this strategy, but it would look pretty weird and seems unlikely to occur accidentally. Is there a reference for this strategy, or did you come up with it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A symmetric strategy is used by the JDK 7 implementation, they use the 2nd and 4th of five. This master thesis suggests at page 183 in the second paragraph starting with "Finally, the case k = 5", where k is the sample length, that the 1st and 3rd of five is a better choice. But in the end it depends on the input distribution.

@n3vu0r
Copy link
Author

n3vu0r commented Jun 25, 2021

I will test also for same elements and refactor this PR as soon as I have time.

@n3vu0r
Copy link
Author

n3vu0r commented Jun 28, 2021

Since we keep the single-pivot partitioning method, I've implemented a variant of Sesquickselect. It suggests the optimal pivot selection from fixed size samples wrt to the relative sought rank index / length and also switches from dual-pivot to single-pivot partitioning for extreme sought ranks (page 17, figure 3). The benches show speed up with adaptive pivot sampling and with smaller recursion cutoff thresholds, at least on my machine. For the bulk version, I kept the recommended skewed pivots for Quicksort in my assumption that multiple indexes change the characteristics from Quickselect towards Quicksort (and there is no single sought rank).

It works well with equal element arrays and with sorted and reversely sorted arrays. Tested up to 1_000_000 elements.

I would suggest to make sample_mut() generic over the sample size via const_generics if an MSRV of 1.51 is fine?

The sampling does not have to be equally spaced, could also be randomized. I have no favorite yet.

@n3vu0r
Copy link
Author

n3vu0r commented Jun 30, 2021

I used adaptive pivot sampling for the bulk version as well but only for branches with a single index remaining. I dunno how the benches are configured but would like to add larger input arrays.

  * Add test with large array of equal floats.
  * Enable optimization for test profile to reduce execution time.
@n3vu0r
Copy link
Author

n3vu0r commented Apr 1, 2023

Fixes #86 by letting the stack grow on heap whenever necessary. Using dual pivoting should already be superior compared with just splitting => pivot into pivot == and > pivot or do I miss something?

@n3vu0r
Copy link
Author

n3vu0r commented Apr 2, 2023

This dynamic stack grow on heap should in general be used for every recursive implementation whose depth is depending on input data. But we should still try to avoid worst case complexities. I found that the bulk version's single-pivoting branch is the problem. I will try to split => pivot here too like its done in the non-bulk version. If this doesn't work out, the single-pivoting branch of the bulk version should be removed.

Simply removing that branch, reduces runtime of the test for large arrays of equal elements from 700 ms to 50 ms.

  * Delegate single-index invocation to non-bulk implementation.
  * Non-bulk implementation skips recursion if `index == pivot`.
@n3vu0r
Copy link
Author

n3vu0r commented Apr 2, 2023

Reusing non-bulk version for single-index invocation of bulk version does the job even better and reduces code complexity. I've increased the array length by factor 10 in the test and execution time was unchanged.

  * Move single-index delegation into recursion allowing single-pivoting
    in bulk mode.
  * Avoid worst case complexity in non-bulk and bulk mode by using dual-pivot
    if both sample choices of single-pivot are equal.
  * Fix first sample index from being skipped.
@n3vu0r
Copy link
Author

n3vu0r commented Apr 3, 2023

Ready for review/merge.

@n3vu0r
Copy link
Author

n3vu0r commented Apr 7, 2023

Closing in favor of #92.

@n3vu0r n3vu0r closed this Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants