update blaze version 2.0.8-SNAPSHOT (#386)

Co-authored-by: zhangli20 <[email protected]>
kwai · Feb 2, 2024 · 1aecd9c · 1aecd9c
1 parent 591b8d7
commit 1aecd9c
Show file tree

Hide file tree

Showing 10 changed files with 184 additions and 172 deletions.
diff --git a/.github/workflows/build-ce7-releases.yml b/.github/workflows/build-ce7-releases.yml
@@ -12,7 +12,7 @@ jobs:
     strategy:
       matrix:
         sparkver: [spark303, spark333]
-        blazever: [2.0.7]
+        blazever: [2.0.8]
 
     steps:
       - uses: actions/checkout@v4

diff --git a/README.md b/README.md
@@ -108,16 +108,16 @@ spark-sql -f tpcds/q01.sql
 
 ## Performance
 
-Check [Benchmark Results](./benchmark-results/20231108.md) with the latest date for the performance
+Check [Benchmark Results](./benchmark-results/20240202.md) with the latest date for the performance
 comparison with vanilla Spark on TPC-DS 1TB dataset. The benchmark result shows that Blaze saved
 ~40% query time and ~45% cluster resources in average. ~5x performance achieved for the best case (q06).
 Stay tuned and join us for more upcoming thrilling numbers.
 
 Query time:
-![20231108-query-time](./benchmark-results/blaze-query-time-comparison-20231108.png)
+![20240202-query-time](./benchmark-results/blaze-query-time-comparison-20240202.png)
 
 Cluster resources:
-![20231108-resources](./benchmark-results/blaze-cluster-resources-cost-comparison-20231108.png)
+![20240202-resources](./benchmark-results/blaze-cluster-resources-cost-comparison-20240202.png)
 
 We also encourage you to benchmark Blaze and share the results with us. 🤗
 

diff --git a/RELEASES.md b/RELEASES.md
@@ -1,20 +1,32 @@
-# blaze-v2.0.7
+# blaze-v2.0.8
 
 ## Features
-* Supports native BroadcastNestedLoopJoinExec.
-* Supports multithread UDF evaluation.
-* Supports spark.files.ignoreCorruptFiles.
-* Supports input batch statistics.
-
+* Enables nested complex data types by default.
+* Supports writing parquet table with dynamic partitions.
+* Supports partial aggregate skipping.
+* Enable first() aggregate function converting.
+* Add spill metrics.
+*
 ## Performance
-* Improves get_json_object() performance by reducing duplicated json parsing.
-* Improves parquet reading performance by skipping utf-8 validation.
-* Supports cached expression evaluator in native AggExec.
-* Supports column pruning during native evaluation.
-* Prefer native sort even if child is non-native.
+* Implement batch updating/merging in aggregates.
+* Use slim box for storing bytes.
+* get_json_object use Cow to avoid copying.
+* Reduce the probability of unexpected off-heap memory overflows.
+* Introduce multiway merge sort to SortExec and SortRepartitioner.
+* SortExec removes redundant columns from batch.
+* Implement loser tree with inlined comparable traits.
+* Use unchecked index in LoserTree to get slightly performance improvement.
+* Remove BucketRepartitioner.
+* Reduce number of awaits in sort-merge join.
+* Pre-merge records in sorting mode if cardinality is low.
+* Use gxhash as default hasher in AggExec.
+* Optimize collect_set/collect_list function with SmallVec.
+* Implement async ipc reader.
 
 ## Bugfix
-* Fix missing outputPartitioning in NativeParquetExec.
-* Fix missing native converting checks in parquet scan.
-* Fix inconsistency: implement spark-compatible float to int casting.
-* Avoid closing hadoop fs for reusing in cache.
+* Fix buggy GetArrayItem/GetMapValue native converter pattern matching.
+* Fix parquet pruning with NaN values.
+* Fix map type conversion with incorrect nullable value.
+* Fix ffi-export error in some cases.
+* Fix incorrect behavior of get_index_field with incorrect number of rows.
+* Fix task hanging in some cases with ffi-export.
diff --git a/benchmark-results/20231108.md b/benchmark-results/20231108.md