-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List.sort() very slow if list is already sorted or nearly so. #1141
Comments
For the last snippet I was wondering. I'm assuming so but did you measure only the sort or both lines? If not what does this version measure at? var list = List.filled(10000, 1)
list.sort() |
Thanks for digging btw, we can probably add some tests around this kinda thing to give us a baseline to compare with and validate against. |
I just measured the sort. The time to create the list is virtually the same at 0.0183 seconds whether one uses: Incidentally, the Wikipedia article on Quicksort suggests that using a 'median of three' may be better than a simple midpoint. It was actually a shade slower in my own tests possibly because the cost of interpreting the extra code outweighed any efficiency gained. |
Oops, sorry, I messed up the timings for the list creation.
|
Did you tried with a more dumb algorithm, like bubble sort? That way it
would use less stack memory and could compute a little less.
|
here's one I had in my libs laying around static bubble_sort(list, compare) {
if(list.count == 0) return
var done = false
while(!done) {
done = true
for(i in 0 ... list.count-1) {
var comp = compare.call(list[i], list[i+1])
if(comp > 0) {
var tmp = list[i]
list[i] = list[i+1]
list[i+1] = tmp
done = false
}
}
}
} //bubble_sort |
Using the following optimized implementation of an ascending bubble sort: var bubbleSort = Fn.new { |a|
var n = a.count
while (true) {
var n2 = 0
for (i in 1...n) {
if (a[i - 1] > a[i]) {
a.swap(i, i - 1)
n2 = i
}
}
n = n2
if (n == 0) break
}
} the results are a bit variable but it's typically taking around 0.0008 seconds for both the 'already sorted' and 'all the same' cases. @ruby0x1's version is slightly slower than this. As expected, this is much quicker than Quicksort even with the modified pivot but, of course, it would be a completely different story if the list contained random values. |
one has a predicate so not a direct comparison (and not that it matters grand scheme, interesting none the less!) |
After reading through the salient parts of the Wikipedia article, I wonder whether the best solution here would be to change the partition scheme from Lomuto (which we seem to have at present) to Hoare? On the face of it, this would give reasonable (albeit sub-optimal) performance for the 'already sorted' and 'all the same' cases and should be no slower for unsorted data. The interface of The alternative would be to stick with what we have, changing the pivot choice to midpoint (or median of three) to give reasonable performance for already sorted lists and just accept the hit on cases where a lot of the data is the same on the grounds that this shouldn't occur very often in practice. We could perhaps put a warning in the docs about this so that knowledgeable users could use an alternative algorithm for this kind of scenario. |
If the Hoare version works better in all cases @PureFox48, then it makes sense to consider it. A PR would be the ideal path then, if you're up for it. |
OK, when I have a clearer head, I'll try and do a Hoare version and see how it compares. |
Do you have numbers with random values with Buble sort? I mean, the stress
on the stack is less heavy with it. And in our case it could be enough
unless we have a very large amount of data.
|
You optimized version can optimized a little bit more with: `while (n !=
0)`and removing: `if (n == 0) break`.
|
Both bubble sort and insertion sort are hopelessly slow when sorting random data even for lists as small as 100 elements. Check out the Wren results for this Rosetta code task, comparing the performance of various sorting methods, which I did some time ago. To keep things simple, I think we must stick with quicksort for the |
OK, I've done the Hoare version and it seems to be working OK. To make it easier to play about with them, I've put it (and also the modified Lomuto code) in separate classes for now. import "random" for Random
// Lomuto partitioning (as per List.sort()) but with midpoint pivot.
class List2 {
static sort(a) { sort(a) {|low, high| low < high } }
static sort(a, comparer) {
if (!(comparer is Fn)) {
Fiber.abort("Comparer must be a function.")
}
quicksort_(a, 0, a.count - 1, comparer)
return a
}
static quicksort_(a, low, high, comparer) {
if (low < high) {
var p = partition_(a, low, high, comparer)
quicksort_(a, low, p - 1, comparer)
quicksort_(a, p + 1, high, comparer)
}
}
static partition_(a, low, high, comparer) {
var mid = ((low + high)/2).floor // added this line
a.swap(mid, high) // ditto
var p = a[high]
var i = low - 1
for (j in low..(high-1)) {
if (comparer.call(a[j], p)) {
i = i + 1
var t = a[i]
a[i] = a[j]
a[j] = t
}
}
var t = a[i+1]
a[i+1] = a[high]
a[high] = t
return i+1
}
}
// Hoare partitioning as per Wikpedia article.
class List3 {
static sort(a) { sort(a) {|low, high| low < high } }
static sort(a, comparer) {
if (!(comparer is Fn)) {
Fiber.abort("Comparer must be a function.")
}
quicksort_(a, 0, a.count - 1, comparer)
return a
}
static quicksort_(a, low, high, comparer) {
if (low >= 0 && high >= 0 && low < high) {
var p = partition_(a, low, high, comparer)
quicksort_(a, low, p, comparer)
quicksort_(a, p+1, high, comparer)
}
}
static partition_(a, low, high, comparer) {
var mid = ((low + high)/2).floor
var p = a[mid]
var i = low - 1
var j = high + 1
while (true) {
while (true) {
i = i + 1
if (!comparer.call(a[i], p)) break
}
while (true) {
j = j - 1
if (!comparer.call(p, a[j])) break
}
if (i >= j) return j
a.swap(i, j)
}
}
}
var n = 20
var sorted = (1..n).toList
var same = List.filled(n, 1)
var rand = Random.new()
var unsorted = sorted.toList
rand.shuffle(unsorted)
System.print("Tests for List3.sort (Hoare):\n")
System.print(sorted)
List3.sort(sorted)
System.print(sorted)
System.print()
System.print(same)
List3.sort(same)
System.print(same)
System.print()
var unsorted2 = unsorted.toList // make a copy
System.print(unsorted)
List3.sort(unsorted) { |a, b| a < b }
System.print(unsorted)
System.print()
System.print(unsorted2)
List3.sort(unsorted2) { |a, b| a > b } // descending sort
System.print(unsorted2)
/* output:
Tests for List3.sort (Hoare):
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[8, 12, 14, 19, 15, 1, 20, 11, 3, 6, 2, 16, 13, 18, 5, 10, 9, 4, 17, 7]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
[8, 12, 14, 19, 15, 1, 20, 11, 3, 6, 2, 16, 13, 18, 5, 10, 9, 4, 17, 7]
[20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
*/ I'm doing some comparative benchmarks and will post those shortly. |
OK, here's the performance figures for the existing Benchmarks for List2.sort (modified Lomuto): Running 'Sorted data size 10' over 10 iteration(s):Best 0.004 ms Running 'Same data size 10' over 10 iteration(s):Best 0.005 ms Running 'Random data size 10' over 10 iteration(s):Best 0.004 ms Running 'Sorted data size 100' over 10 iteration(s):Best 0.060 ms Running 'Same data size 100' over 10 iteration(s):Best 0.304 ms Running 'Random data size 100' over 10 iteration(s):Best 0.076 ms Running 'Sorted data size 1000' over 10 iteration(s):Best 0.840 ms Running 'Same data size 1000' over 10 iteration(s):Best 27.987 ms Running 'Random data size 1000' over 10 iteration(s):Best 1.144 ms Running 'Sorted data size 10000' over 10 iteration(s):Best 11.382 ms Running 'Same data size 10000' over 10 iteration(s):Best 2773.198 ms Running 'Random data size 10000' over 10 iteration(s):Best 16.246 ms |
...and here's the performance figures for the Hoare partitioning version: Benchmarks for List3.sort (Hoare): Running 'Sorted data size 10' over 10 iteration(s):Best 0.005 ms Running 'Same data size 10' over 10 iteration(s):Best 0.005 ms Running 'Random data size 10' over 10 iteration(s):Best 0.005 ms Running 'Sorted data size 100' over 10 iteration(s):Best 0.065 ms Running 'Same data size 100' over 10 iteration(s):Best 0.080 ms Running 'Random data size 100' over 10 iteration(s):Best 0.084 ms Running 'Sorted data size 1000' over 10 iteration(s):Best 0.839 ms Running 'Same data size 1000' over 10 iteration(s):Best 1.080 ms Running 'Random data size 1000' over 10 iteration(s):Best 1.183 ms Running 'Sorted data size 10000' over 10 iteration(s):Best 10.046 ms Running 'Same data size 10000' over 10 iteration(s):Best 13.034 ms Running 'Random data size 10000' over 10 iteration(s):Best 14.275 ms |
For good measure, here are some figures for lists containing 100,000 and 1,000,000 random elements. Benchmarks for existing list.sort (Lomuto): Running 'Random data size 100000' over 10 iteration(s):Best 175.197 ms Running 'Random data size 1000000' over 10 iteration(s):Best 2273.706 ms Benchmarks for List2.sort (modified Lomuto): Running 'Random data size 100000' over 10 iteration(s):Best 188.390 ms Running 'Random data size 1000000' over 10 iteration(s):Best 2380.935 ms Benchmarks for List3.sort (Hoare): Running 'Random data size 100000' over 10 iteration(s):Best 174.246 ms Running 'Random data size 1000000' over 10 iteration(s):Best 2028.855 ms This demonstrates just how fast the quicksort algorithm is, even for large amounts of data and using an interpreted language such as Wren. |
Whilst we're at it, I think we should also add ordering operators to the String class so that (amongst other things) we can easily sort strings lexicographically by codepoint. This code should do it: class String is Sequence {
/* existing stuff */
// Compares this to another string lexicographically by codepoint.
// Returns -1, 0 or +1 depending on whether
// this < other, this == other or this > other respectively.
compareTo(other) {
if (!(other is String)) Fiber.abort("Other must be a string")
if (this == other) return 0
var cp1 = this.codePoints
var cp2 = other.codePoints
var len = (cp1.count <= cp2.count) ? cp1.count : cp2.count
for (i in 0...len) {
if (cp1[i] < cp2[i]) return -1
if (cp1[i] > cp2[i]) return 1
}
return (cp1.count < cp2.count) ? -1 : 1
}
// Comparison operators.
< (other) { compareTo(other) < 0 }
<= (other) { compareTo(other) <= 0 }
> (other) { compareTo(other) > 0 }
>= (other) { compareTo(other) >= 0 }
} |
Please open a separate discussion for this, it can be optimized more than
that.
|
Done. See #1142. |
Thanks for the detailed posts @PureFox48, it's always appreciated ✨ |
For reference, can you benchmark the algorithm provided by |
The only one we haven't benchmarked so far is the existing Benchmarks for existing list.sort (Lomuto): Running 'Sorted data size 10' over 10 iteration(s):Best 0.007 ms Running 'Same data size 10' over 10 iteration(s):Best 0.005 ms Running 'Random data size 10' over 10 iteration(s):Best 0.003 ms Running 'Sorted data size 100' over 10 iteration(s):Best 0.578 ms Running 'Same data size 100' over 10 iteration(s):Best 0.296 ms Running 'Random data size 100' over 10 iteration(s):Best 0.069 ms Running 'Sorted data size 1000' over 10 iteration(s):Best 53.978 ms Running 'Same data size 1000' over 10 iteration(s):Best 27.065 ms Running 'Random data size 1000' over 10 iteration(s):Best 1.151 ms Running 'Sorted data size 10000' over 10 iteration(s):Best 5471.917 ms Running 'Same data size 10000' over 10 iteration(s):Best 2754.123 ms Running 'Random data size 10000' over 10 iteration(s):Best 14.600 ms
To ensure all 3 methods use exactly the same data, I've updated the figures for the other two in my earlier posts. My overall conclusion is that we should change to the Hoare method which gives reasonable results across the board and seems a shade faster than Lomuto for large lists. |
wren-lang#1141 identified two problems with the existing implementation, namely that it was very slow when presented with lists which were either already sorted or all the same. The purpose of this PR is to solve those problems by changing from Lomuto to Hoare partitioning. As shown by the figures in the above issue, this should not lead to performance degradation for small 'random' lists and may even be a little quicker for large lists. This change would be invisible to users as only private methods are affected and does not require any changes to the documentation or tests.
wren-lang#1141 identified two problems with the existing implementation, namely that is was very slow when presented with lists which were already sorted or all the same. The purpose of this PR is to fix these problems by switching from Lomuto to Hoare partitioning. As far as 'random' lists are concerned, the figures in the above issue indicate that there will be no noticeable change in performance for small lists and possibly a small improvement for large lists. This change will be invisible to users as only two private methods are affected. No change is required to either the documentation or tests.
As it seems clear from these figures that a change to Hoare partitioning will be beneficial, I've now opened a PR for it. |
By having the preconditions of `count > 1` are the conditions `low >= 0 &&
high >= 0` not always true and could be removed?
|
The only time when So, yes, we can get rid of the |
I noticed recently that
List.sort()
, which is based on the Quicksort algorithm, is very slow indeed when applied to lists which are already sorted or nearly so.For example, this code takes around 5.7 seconds to run on my Core i7 machine:
I tracked down the problem to the choice of pivot in the private
partition_
method in wren_core.wren:If instead of always using the high value we use the mid value:
then the code now runs in around 0.014 seconds. Moreover, the effect of adding these two lines has negligible effect on the time taken to sort unsorted lists.
The implementation also has a problem when the list contains repeated elements. For example this code takes around 2.8 seconds to run whatever the pivot choice:
However, I can't think of a simple way to solve this particular problem which probably occurs much less often than the 'already sorted' one in practice. If anyone has an idea on this, please post it.
These, of course, are known issues with Quicksort generally which is also very slow when sorting small lists. The latter could be solved by using an insertion sort for lists under 10 elements say but, frankly, I doubt whether it's worth the effort in a simple scripting language such as Wren.
The text was updated successfully, but these errors were encountered: