Skip to content

Commit

Permalink
update blaze version 2.0.8-SNAPSHOT (#386)
Browse files Browse the repository at this point in the history
Co-authored-by: zhangli20 <[email protected]>
  • Loading branch information
richox and zhangli20 authored Feb 2, 2024
1 parent 591b8d7 commit 1aecd9c
Show file tree
Hide file tree
Showing 10 changed files with 184 additions and 172 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build-ce7-releases.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
strategy:
matrix:
sparkver: [spark303, spark333]
blazever: [2.0.7]
blazever: [2.0.8]

steps:
- uses: actions/checkout@v4
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,16 +108,16 @@ spark-sql -f tpcds/q01.sql

## Performance

Check [Benchmark Results](./benchmark-results/20231108.md) with the latest date for the performance
Check [Benchmark Results](./benchmark-results/20240202.md) with the latest date for the performance
comparison with vanilla Spark on TPC-DS 1TB dataset. The benchmark result shows that Blaze saved
~40% query time and ~45% cluster resources in average. ~5x performance achieved for the best case (q06).
Stay tuned and join us for more upcoming thrilling numbers.

Query time:
![20231108-query-time](./benchmark-results/blaze-query-time-comparison-20231108.png)
![20240202-query-time](./benchmark-results/blaze-query-time-comparison-20240202.png)

Cluster resources:
![20231108-resources](./benchmark-results/blaze-cluster-resources-cost-comparison-20231108.png)
![20240202-resources](./benchmark-results/blaze-cluster-resources-cost-comparison-20240202.png)

We also encourage you to benchmark Blaze and share the results with us. 🤗

Expand Down
42 changes: 27 additions & 15 deletions RELEASES.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,32 @@
# blaze-v2.0.7
# blaze-v2.0.8

## Features
* Supports native BroadcastNestedLoopJoinExec.
* Supports multithread UDF evaluation.
* Supports spark.files.ignoreCorruptFiles.
* Supports input batch statistics.

* Enables nested complex data types by default.
* Supports writing parquet table with dynamic partitions.
* Supports partial aggregate skipping.
* Enable first() aggregate function converting.
* Add spill metrics.
*
## Performance
* Improves get_json_object() performance by reducing duplicated json parsing.
* Improves parquet reading performance by skipping utf-8 validation.
* Supports cached expression evaluator in native AggExec.
* Supports column pruning during native evaluation.
* Prefer native sort even if child is non-native.
* Implement batch updating/merging in aggregates.
* Use slim box for storing bytes.
* get_json_object use Cow to avoid copying.
* Reduce the probability of unexpected off-heap memory overflows.
* Introduce multiway merge sort to SortExec and SortRepartitioner.
* SortExec removes redundant columns from batch.
* Implement loser tree with inlined comparable traits.
* Use unchecked index in LoserTree to get slightly performance improvement.
* Remove BucketRepartitioner.
* Reduce number of awaits in sort-merge join.
* Pre-merge records in sorting mode if cardinality is low.
* Use gxhash as default hasher in AggExec.
* Optimize collect_set/collect_list function with SmallVec.
* Implement async ipc reader.

## Bugfix
* Fix missing outputPartitioning in NativeParquetExec.
* Fix missing native converting checks in parquet scan.
* Fix inconsistency: implement spark-compatible float to int casting.
* Avoid closing hadoop fs for reusing in cache.
* Fix buggy GetArrayItem/GetMapValue native converter pattern matching.
* Fix parquet pruning with NaN values.
* Fix map type conversion with incorrect nullable value.
* Fix ffi-export error in some cases.
* Fix incorrect behavior of get_index_field with incorrect number of rows.
* Fix task hanging in some cases with ffi-export.
152 changes: 0 additions & 152 deletions benchmark-results/20231108.md

This file was deleted.

Loading

0 comments on commit 1aecd9c

Please sign in to comment.