Skip to content

Commit

Permalink
Prepare blog post for v4 release
Browse files Browse the repository at this point in the history
  • Loading branch information
rubensworks committed Oct 11, 2024
1 parent 3d1981a commit c05e09e
Showing 1 changed file with 103 additions and 0 deletions.
103 changes: 103 additions & 0 deletions pages/blog/2024-10-15-release_4_0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
title: 'Release 4.0: 🚄 Faster actor testing and modularized expressions'
---

Earlier this year, [Comunica version 3.0 was released](/blog/2024-03-19-release_3_0/),
which introduced better querying planning across data sources and several convenience features.
Since then, we had several minor releases introducing multiple additions and performance improvements.
In [release 3.2](/blog/2024-07-05-release_3_2/), we introduced better techniques for discovering performance bottleneck.
This allowed us identify some low-level bottlenecks in the internals of Comunica which limited performance.
Fixing these issues requires breaking changes to the internal API of Comunica,
which is the main reason for this major update.
As a result of these changes, we see **performance improvements of 20% to 40%**.
Breaking changes are limited to internal Comunica APIs.
This means that developers that make use of Comunica can have their cake _and_ eat it 🎂;
no breaking changes _and_ gaining a performance boost.

<!-- excerpt-end -->

## 🪢 New `Actor.test` contract for better performance

To determine which actors can answer a certain action,
all Comunica actors expose a `test` method, which used to look something like this:

```typescript
class MyActor extends Actor {
public async test(action: IAction): Promise<IActorTest> {
if (conditionNotMet(action)) {
throw new Error('This actor can not handle the action');
}
return true;
}
}
```

The problem with the above is that JavaScript engines such as V8
will eagerly build internal stacktraces when creating `Error` objects.
Since Comunica has a large number of actors (240 at the time of writing),
an average query execution can lead to a huge number of internal `Error` objects being created.
According to our measurements, this produced a non-negligible performance overhead.

As such, we refactored the contract of the `test` method to not rely on these `Error` objects anymore.
Instead, `test` methods now make use of `TestResult` objects,
which in practise look like this:

```typescript
class MyActor extends Actor {
public async test(action: IAction): Promise<TestResult<IActorTest>> {
if (conditionNotMet(action)) {
return failTest('This actor can not handle the action');
}
return passTestVoid();
}
}
```

For various benchmarks on in-memory triple stores, this change makes queries up to 20% faster.

## 🧩 Modularization of expressions logic

Thanks to [Jitse De Smet](https://jitsedesmet.be/)'s monumental effort,
all expressions-related logic in Comunica is now fully modularized.
Previously, the handling of filters and aggregates were all delegated to the singular `sparqlee` package.
While this package did a great job of handling filters and aggregates,
it lacked the modularity that existed for all other parts of query execution.
For example, it was not possible to easily plug in your own actor to evaluate the `SUM` aggregator in a different way.

With this release, the `sparqlee` has been split up into multiple buses and actors,
which are responsible for term comparators, function, and aggregators.
For this, we avoided any kind of performance degradation.

Learn more about [expressions evaluation in our documentation](https://comunica.dev/docs/modify/advanced/expression-evaluator/).

## 🚄 Performance improvements

Besides the changes mentioned above,
there are a number of smaller changes that have a positive impact on performance that are worth mentioning:

- [Optimize Bindings merge logic](https://github.com/comunica/comunica/commit/20daf1761b1ec4c82357909a45fa4f84e2754080)
- [Fix internal cardinalities being wrong for SPARQL endpoints with VoID](https://github.com/comunica/comunica/commit/075f5dde03354f0516398235146d9a93883e8b66)
- [Hash Joins now always use 32-bit numbers](https://github.com/comunica/comunica/commit/c1474e4167f1978e144bee9b7663fcb2bafc21bf), which speeds up operations in the V8 engine.
- [Hash joins use the faster Murmur3 hash method](https://github.com/comunica/comunica/commit/331865c773882be7e061a690833fc841ece27339)
- [Only consider overlapping vars when testing undef in join actors](https://github.com/comunica/comunica/commit/5ddb64bb1df4b1179529f3e2ea0c05fb321dfce9)
- [Refactor HTTP fetch and retry logic](https://github.com/comunica/comunica/commit/b37d7e008b77067f6c6249aadfc901baf42a6914), which leads to more stable query execution over servers that make use of rate limits

## 🤝 Contributors

This release has been made possible thanks to the help of the following contributors (in no particular order):

- [Jonni Hanski](https://github.com/surilindur)
- [Jitse De Smet](https://github.com/jitsedesmet)
- [Ruben Eschauzier](https://github.com/RubenEschauzier)
- [Karel Klíma](https://github.com/karelklima)
- [Ieben Smessaert](https://github.com/smessie)
- [Maarten Vandenbrande](https://github.com/maartyman)
- [Bryan-Elliott Tam](https://github.com/constraintAutomaton)
- [Jesse Wright](https://github.com/jeswr)
- [Ruben Taelman](https://github.com/rubensworks/)

## Full changelog

While this blog post explained the primary changes in Comunica 4.x,
there are actually many more smaller changes internally that will make your lives easier.
If you want to learn more about these changes, check out the [full changelog](https://github.com/comunica/comunica/blob/master/CHANGELOG.md#v401---2024-10-15).

0 comments on commit c05e09e

Please sign in to comment.