diff --git a/pages/blog/2024-10-15-release_4_0.md b/pages/blog/2024-10-15-release_4_0.md new file mode 100644 index 00000000..f7aedb5a --- /dev/null +++ b/pages/blog/2024-10-15-release_4_0.md @@ -0,0 +1,103 @@ +--- +title: 'Release 4.0: 🚄 Faster actor testing and modularized expressions' +--- + +Earlier this year, [Comunica version 3.0 was released](/blog/2024-03-19-release_3_0/), +which introduced better querying planning across data sources and several convenience features. +Since then, we had several minor releases introducing multiple additions and performance improvements. +In [release 3.2](/blog/2024-07-05-release_3_2/), we introduced better techniques for discovering performance bottleneck. +This allowed us identify some low-level bottlenecks in the internals of Comunica which limited performance. +Fixing these issues requires breaking changes to the internal API of Comunica, +which is the main reason for this major update. +As a result of these changes, we see **performance improvements of 20% to 40%**. +Breaking changes are limited to internal Comunica APIs. +This means that developers that make use of Comunica can have their cake _and_ eat it 🎂; +no breaking changes _and_ gaining a performance boost. + + + +## 🪢 New `Actor.test` contract for better performance + +To determine which actors can answer a certain action, +all Comunica actors expose a `test` method, which used to look something like this: + +```typescript +class MyActor extends Actor { + public async test(action: IAction): Promise { + if (conditionNotMet(action)) { + throw new Error('This actor can not handle the action'); + } + return true; + } +} +``` + +The problem with the above is that JavaScript engines such as V8 +will eagerly build internal stacktraces when creating `Error` objects. +Since Comunica has a large number of actors (240 at the time of writing), +an average query execution can lead to a huge number of internal `Error` objects being created. +According to our measurements, this produced a non-negligible performance overhead. + +As such, we refactored the contract of the `test` method to not rely on these `Error` objects anymore. +Instead, `test` methods now make use of `TestResult` objects, +which in practise look like this: + +```typescript +class MyActor extends Actor { + public async test(action: IAction): Promise> { + if (conditionNotMet(action)) { + return failTest('This actor can not handle the action'); + } + return passTestVoid(); + } +} +``` + +For various benchmarks on in-memory triple stores, this change makes queries up to 20% faster. + +## 🧩 Modularization of expressions logic + +Thanks to [Jitse De Smet](https://jitsedesmet.be/)'s monumental effort, +all expressions-related logic in Comunica is now fully modularized. +Previously, the handling of filters and aggregates were all delegated to the singular `sparqlee` package. +While this package did a great job of handling filters and aggregates, +it lacked the modularity that existed for all other parts of query execution. +For example, it was not possible to easily plug in your own actor to evaluate the `SUM` aggregator in a different way. + +With this release, the `sparqlee` has been split up into multiple buses and actors, +which are responsible for term comparators, function, and aggregators. +For this, we avoided any kind of performance degradation. + +Learn more about [expressions evaluation in our documentation](https://comunica.dev/docs/modify/advanced/expression-evaluator/). + +## 🚄 Performance improvements + +Besides the changes mentioned above, +there are a number of smaller changes that have a positive impact on performance that are worth mentioning: + +- [Optimize Bindings merge logic](https://github.com/comunica/comunica/commit/20daf1761b1ec4c82357909a45fa4f84e2754080) +- [Fix internal cardinalities being wrong for SPARQL endpoints with VoID](https://github.com/comunica/comunica/commit/075f5dde03354f0516398235146d9a93883e8b66) +- [Hash Joins now always use 32-bit numbers](https://github.com/comunica/comunica/commit/c1474e4167f1978e144bee9b7663fcb2bafc21bf), which speeds up operations in the V8 engine. +- [Hash joins use the faster Murmur3 hash method](https://github.com/comunica/comunica/commit/331865c773882be7e061a690833fc841ece27339) +- [Only consider overlapping vars when testing undef in join actors](https://github.com/comunica/comunica/commit/5ddb64bb1df4b1179529f3e2ea0c05fb321dfce9) +- [Refactor HTTP fetch and retry logic](https://github.com/comunica/comunica/commit/b37d7e008b77067f6c6249aadfc901baf42a6914), which leads to more stable query execution over servers that make use of rate limits + +## 🤝 Contributors + +This release has been made possible thanks to the help of the following contributors (in no particular order): + +- [Jonni Hanski](https://github.com/surilindur) +- [Jitse De Smet](https://github.com/jitsedesmet) +- [Ruben Eschauzier](https://github.com/RubenEschauzier) +- [Karel Klíma](https://github.com/karelklima) +- [Ieben Smessaert](https://github.com/smessie) +- [Maarten Vandenbrande](https://github.com/maartyman) +- [Bryan-Elliott Tam](https://github.com/constraintAutomaton) +- [Jesse Wright](https://github.com/jeswr) +- [Ruben Taelman](https://github.com/rubensworks/) + +## Full changelog + +While this blog post explained the primary changes in Comunica 4.x, +there are actually many more smaller changes internally that will make your lives easier. +If you want to learn more about these changes, check out the [full changelog](https://github.com/comunica/comunica/blob/master/CHANGELOG.md#v401---2024-10-15).