Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve equality comparison performance #173

Merged
merged 17 commits into from
Nov 21, 2024

Conversation

Maksimka101
Copy link
Contributor

Status

READY

Breaking Changes

NO

Description

I've noticed that the performance of comparing objects with collections can be significantly enhanced. These optimizations in my application led to a considerable increase in FPS, approximately by 3 times. Additionally, I've conducted tests to compare the performance between the old and new approaches

Todos

  • Tests
  • Documentation
  • Examples

Impact to Remaining Code Base

This PR will affect:

  • comparison logic

@felangel
Copy link
Owner

Thanks for the PR! Are you able to share the benchmarks you used? I would love to run them locally and validate the results.

@Maksimka101
Copy link
Contributor Author

Maksimka101 commented Dec 11, 2023

Thanks for the PR! Are you able to share the benchmarks you used? I would love to run them locally and validate the results.

Sure, here is the link: https://github.com/Maksimka101/equatable_benchmark

@Maksimka101
Copy link
Contributor Author

Hello

Any news?🙂

Do I need to fix something or provide more information?

@felangel
Copy link
Owner

felangel commented Jan 2, 2024

Hello

Any news?🙂

Do I need to fix something or provide more information?

Apologies for the delay! I was slow to respond due to the holidays but I should have time to review this in the next day or two. Thanks again!

@Maksimka101
Copy link
Contributor Author

Thanks. No need to hurry. I forgot that normal people relax on holidays🙂. Happy new year

@Maksimka101
Copy link
Contributor Author

Hello

Can we merge this?

@escamoteur
Copy link

Just saw this. besides that I think it would definitely great to get this merged, why not borrow the optimization of fast_equatable and cache the hashcode if that really improves performance that much :-)

@felangel felangel closed this May 24, 2024
@felangel felangel reopened this May 24, 2024
@felangel
Copy link
Owner

felangel commented May 24, 2024

Just saw this. besides that I think it would definitely great to get this merged, why not borrow the optimization of fast_equatable and cache the hashcode if that really improves performance that much :-)

Will review this later today apologies that it fell through the cracks. Looks like ci was failing but just re-triggered ci to see the logs.

@felangel
Copy link
Owner

Looks like some tests are missing. I’ll take a look at the benchmarks and will see if I can get this merged later today. Apologies for the delay!

@felangel felangel added the enhancement New feature or request label May 24, 2024
@felangel
Copy link
Owner

felangel commented May 26, 2024

@Maksimka101 I've added benchmarks and run them against the implementation on master and am seeing the following results:

branch: master

EmptyEquatable
          total runs:  2 076 295   
          total time:     2.0000  s
         average run:          0 μs
         runs/second:   Infinity
               units:        100   
        units/second:   Infinity
       time per unit:     0.0000 μs

PrimitiveEquatable
          total runs:    810 588   
          total time:     2.0000  s
         average run:          2 μs
         runs/second:    500 000   
               units:        100   
        units/second: 50 000 000   
       time per unit:     0.0200 μs

CollectionEquatable (small)
          total runs:    443 978   
          total time:     2.0000  s
         average run:          4 μs
         runs/second:    250 000   
               units:        100   
        units/second: 25 000 000   
       time per unit:     0.0400 μs

CollectionEquatable (medium)
          total runs:    442 368   
          total time:     2.0000  s
         average run:          4 μs
         runs/second:    250 000   
               units:        100   
        units/second: 25 000 000   
       time per unit:     0.0400 μs

CollectionEquatable (large)
          total runs:    450 915   
          total time:     2.0000  s
         average run:          4 μs
         runs/second:    250 000   
               units:        100   
        units/second: 25 000 000   
       time per unit:     0.0400 μs

branch: feat/performance-improvement

EmptyEquatable
          total runs:  2 069 828   
          total time:     2.0000  s
         average run:          0 μs
         runs/second:   Infinity
               units:        100   
        units/second:   Infinity
       time per unit:     0.0000 μs

PrimitiveEquatable
          total runs:    823 014   
          total time:     2.0000  s
         average run:          2 μs
         runs/second:    500 000   
               units:        100   
        units/second: 50 000 000   
       time per unit:     0.0200 μs

CollectionEquatable (small)
          total runs:    490 253   
          total time:     2.0000  s
         average run:          4 μs
         runs/second:    250 000   
               units:        100   
        units/second: 25 000 000   
       time per unit:     0.0400 μs

CollectionEquatable (medium)
          total runs:    494 469   
          total time:     2.0000  s
         average run:          4 μs
         runs/second:    250 000   
               units:        100   
        units/second: 25 000 000   
       time per unit:     0.0400 μs

CollectionEquatable (large)
          total runs:    494 548   
          total time:     2.0000  s
         average run:          4 μs
         runs/second:    250 000   
               units:        100   
        units/second: 25 000 000   
       time per unit:     0.0400 μs

MacBook Pro (M1 Pro, 16GB RAM)
Dart SDK version: 3.3.4 (stable) (Tue Apr 16 19:56:12 2024 +0000) on "macos_arm64"

I'm not able to reproduce the significant performance increase you describe. Maybe DeepCollectionEquality has been optimized since you last tested?

Let me know if you're still able to reproduce the ~20% performance improvement, thanks!

@felangel
Copy link
Owner

Just saw this. besides that I think it would definitely great to get this merged, why not borrow the optimization of fast_equatable and cache the hashcode if that really improves performance that much :-)

Because the classes should be immutable (e.g. all fields should be final and have a const constructor). Classes that use package:fast_equatable are not immutable and cannot have a const constructor.

@Maksimka101
Copy link
Contributor Author

The thing is, our benchmarks differ. It's hard to notice, but in mine, all the collections are identical, while in yours, almost all of them differ. That's why mine gives such small values – it goes through all the fields, all the elements of the collection. Whereas in your benchmark, differences are found almost immediately, and the method completes quickly

Try to add this benchmark:

_runBenchmark(
  'CollectionEquatable (large) (all equal)',
  (index) => CollectionEquatable(
    list: List.generate(100, (i) => 1024),
    map: Map.fromEntries(
      // ignore: prefer_const_constructors
      List.generate(100, (i) => MapEntry('${1024}', 1024)),
    ),
    set: Set.from(List.generate(100, (i) => 1024)),
  ),
);

It'll give you the following results:

CollectionEquatable (large) (all equal)
          total runs:     20 718   
          total time:     2.0000  s
         average run:         96 μs
         runs/second:     10 417   
               units:        100   
        units/second:  1 041 667   
       time per unit:     0.9600 μs

In my app, one field in a huge store object was changing, and this happened very frequently. My case is not the most common, but it's also not rare. The comparison operation should be fast not only for different values but also for identical ones :)

@felangel
Copy link
Owner

felangel commented Jun 4, 2024

The thing is, our benchmarks differ. It's hard to notice, but in mine, all the collections are identical, while in yours, almost all of them differ. That's why mine gives such small values – it goes through all the fields, all the elements of the collection. Whereas in your benchmark, differences are found almost immediately, and the method completes quickly

Try to add this benchmark:

_runBenchmark(
  'CollectionEquatable (large) (all equal)',
  (index) => CollectionEquatable(
    list: List.generate(100, (i) => 1024),
    map: Map.fromEntries(
      // ignore: prefer_const_constructors
      List.generate(100, (i) => MapEntry('${1024}', 1024)),
    ),
    set: Set.from(List.generate(100, (i) => 1024)),
  ),
);

It'll give you the following results:

CollectionEquatable (large) (all equal)
          total runs:     20 718   
          total time:     2.0000  s
         average run:         96 μs
         runs/second:     10 417   
               units:        100   
        units/second:  1 041 667   
       time per unit:     0.9600 μs

In my app, one field in a huge store object was changing, and this happened very frequently. My case is not the most common, but it's also not rare. The comparison operation should be fast not only for different values but also for identical ones :)

Thanks for the reply! I updated the benchmarks and am able to reproduce the slowdown in larger static datasets. Will look at it a bit more closely in the next few days. I want to take a closer look at why DeepCollectionEquality is suboptimal and ideally open a PR to improve the performance in package collection so that more packages benefit from it.

@Maksimka101
Copy link
Contributor Author

I've researched it a bit and I think the DeepCollectionEqality is slow by design due to is flexible API. It creates a map to compare sets and probably maps. Also, it doesn't use the == operator directly so it has some overhead

But hope I'm wrong :)

@felangel felangel changed the title Feat/performance improvement perf: improve equality comparison performance Oct 11, 2024
@felangel
Copy link
Owner

Planning to land this ASAP but it is still missing quite a few tests for the various equatable_utils APIs (e.g iterableEquals, objectEquals, setEquals, etc.). I already found several bugs in the original implementations and wouldn't be surprised if there were still some lingering bugs in the current version of this PR.

final unitA = a[i];
final unitB = b[i];
if (_isEquatable(unitA) && _isEquatable(unitB)) {
return unitA == unitB;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is incorrect since this will incorrectly return true when the first two objects are equal in a list of props (even if the rest aren't):

class Person with EquatableMixin {
  Person({required this.name});

  final String name;

  @override
  List<Object?> get props => [name];
}

test('...', () {
    final alice = Person(name: 'Alice');      
    expect(equals([alice, null], [alice, -1]), isFalse);
});

The above test incorrectly fails.

if (_isEquatable(unitA) && _isEquatable(unitB)) {
return unitA == unitB;
} else if (unitA is Set && unitB is Set) {
return _setEquals(unitA, unitB);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes for the rest of these -- we can't just return true early without actually checking each unit in the list.

@Maksimka101
Copy link
Contributor Author

You are right. I found one bug in the setEquals function

Also, I've added the @pragma('vm:prefer-inline') annotatio to the listEquals function which improved performance by 20%-10% for some cases:

Benchmark details (when compiled to the exe file)

With the inline pragma

EmptyEquatable
          total runs:  1 598 292   
          total time:     2.0000  s
         average run:          1 μs
         runs/second:  1 000 000   
               units:        100   
        units/second: 100 000 000   
       time per unit:     0.0100 μs

PrimitiveEquatable
          total runs:    678 313   
          total time:     2.0000  s
         average run:          2 μs
         runs/second:    500 000   
               units:        100   
        units/second: 50 000 000   
       time per unit:     0.0200 μs

CollectionEquatable (static, small)
          total runs:    139 323   
          total time:     2.0000  s
         average run:         14 μs
         runs/second:     71 429   
               units:        100   
        units/second:  7 142 857   
       time per unit:     0.1400 μs

CollectionEquatable (static, medium)
          total runs:    101 675   
          total time:     2.0000  s
         average run:         19 μs
         runs/second:     52 632   
               units:        100   
        units/second:  5 263 158   
       time per unit:     0.1900 μs

CollectionEquatable (static, large)
          total runs:     33 898   
          total time:     2.0000  s
         average run:         59 μs
         runs/second:     16 949   
               units:        100   
        units/second:  1 694 915   
       time per unit:     0.5900 μs

CollectionEquatable (dynamic, small)
          total runs:    458 845   
          total time:     2.0000  s
         average run:          4 μs
         runs/second:    250 000   
               units:        100   
        units/second: 25 000 000   
       time per unit:     0.0400 μs

CollectionEquatable (dynamic, medium)
          total runs:    460 127   
          total time:     2.0000  s
         average run:          4 μs
         runs/second:    250 000   
               units:        100   
        units/second: 25 000 000   
       time per unit:     0.0400 μs

CollectionEquatable (dynamic, large)
          total runs:    465 185   
          total time:     2.0000  s
         average run:          4 μs
         runs/second:    250 000   
               units:        100   
        units/second: 25 000 000   
       time per unit:     0.0400 μs

Without the inline pragma

EmptyEquatable
          total runs:  1 332 271   
          total time:     2.0000  s
         average run:          1 μs
         runs/second:  1 000 000   
               units:        100   
        units/second: 100 000 000   
       time per unit:     0.0100 μs

PrimitiveEquatable
          total runs:    585 970   
          total time:     2.0000  s
         average run:          3 μs
         runs/second:    333 333   
               units:        100   
        units/second: 33 333 333   
       time per unit:     0.0300 μs

CollectionEquatable (static, small)
          total runs:    133 655   
          total time:     2.0000  s
         average run:         14 μs
         runs/second:     71 429   
               units:        100   
        units/second:  7 142 857   
       time per unit:     0.1400 μs

CollectionEquatable (static, medium)
          total runs:     99 270   
          total time:     2.0000  s
         average run:         20 μs
         runs/second:     50 000   
               units:        100   
        units/second:  5 000 000   
       time per unit:     0.2000 μs

CollectionEquatable (static, large)
          total runs:     33 109   
          total time:     2.0001  s
         average run:         60 μs
         runs/second:     16 667   
               units:        100   
        units/second:  1 666 667   
       time per unit:     0.6000 μs

CollectionEquatable (dynamic, small)
          total runs:    423 572   
          total time:     2.0000  s
         average run:          4 μs
         runs/second:    250 000   
               units:        100   
        units/second: 25 000 000   
       time per unit:     0.0400 μs

CollectionEquatable (dynamic, medium)
          total runs:    410 959   
          total time:     2.0000  s
         average run:          4 μs
         runs/second:    250 000   
               units:        100   
        units/second: 25 000 000   
       time per unit:     0.0400 μs

CollectionEquatable (dynamic, large)
          total runs:    417 545   
          total time:     2.0000  s
         average run:          4 μs
         runs/second:    250 000   
               units:        100   
        units/second: 25 000 000   
       time per unit:     0.0400 μs

bool iterableEquals(Iterable<Object?> a, Iterable<Object?> b) {
assert(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on calling setEquals instead of asserting here?

if (a is Set && b is Set) return setEquals(a, b);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect anyone to use this function. I'm not sure it's even exported. So I'd prefer to leave it simple, without extra ifs

@felangel
Copy link
Owner

felangel commented Oct 16, 2024

@Maksimka101 with the latest changes, the benchmarks indicate the performance is worse than the original (in which we used DeepCollectionEquality) so I'm not sure there's any reason to proceed. FYI, I'm just comparing the benchmark results via dart run (JIT)

@Maksimka101
Copy link
Contributor Author

Maksimka101 commented Oct 16, 2024

You're right, but only partially. In JIT mode, the current version is indeed faster in 5 out of 8 tests. But in AOT mode, the new implementation is significantly faster in 7 out of 8 tests. And this is very important because most Dart programs are compiled AOT

I'll try to figure out why it performs so poorly in JIT, and I'll also look at the difference when compiling to JS

Edit: New implementation is significantly faster in 6 of 8 tests and slightly slower in 2 of 8 tests when compiled to JS. (compiled with the -O2 flag, launched on node v22.4.0)

@Maksimka101
Copy link
Contributor Author

@felangel I improved performance in almost every test across all build options. The largest performance drop is 6.3% in JS for the PrimitiveEquatable benchmark. The biggest performance boost is also in JS, in CollectionEquatable (static, medium), with a 356% increase :)

In the end, I just marked 2 functions for inline, but I ran so many experiments...

By the way, you can check out the performance comparison charts here: https://docs.google.com/spreadsheets/d/1e5g_URJ6oFc76e-YYhDXVBeqqzV7ZN0vDuh-3nPxGGY/edit?usp=sharing

@felangel
Copy link
Owner

@felangel I improved performance in almost every test across all build options. The largest performance drop is 6.3% in JS for the PrimitiveEquatable benchmark. The biggest performance boost is also in JS, in CollectionEquatable (static, medium), with a 356% increase :)

In the end, I just marked 2 functions for inline, but I ran so many experiments...

By the way, you can check out the performance comparison charts here: https://docs.google.com/spreadsheets/d/1e5g_URJ6oFc76e-YYhDXVBeqqzV7ZN0vDuh-3nPxGGY/edit?usp=sharing

Thanks! I’ll take a closer look later today 💙

@felangel
Copy link
Owner

Will update the benchmarks and include both AOT and JIT versions and plan to merge and publish this later today. Sorry for the delay and thanks for all your contributions and time 💙

Copy link
Owner

@felangel felangel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks so much for the energy and time you put into this -- I really appreciate it! 💙

@felangel felangel merged commit a679c3d into felangel:master Nov 21, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants