Optimize Iceberg table count #46525

Samrose-Ahmed · 2024-06-02T01:24:59Z

Enhancement

One can obtain the count(*) for an iceberg table from the Iceberg metadata without having to do a full scan of the data. Currently, Starrocks performs a full scan of the Iceberg Table data when doing a count(*) query on external Iceberg lake table. This should be optimized to just use the Iceberg metadata (this is already available via the statistics).

E.g.

StarRocks > explain select count(*) as cnt from tbl1;
+-------------------------------------------+
| Explain String                            |
+-------------------------------------------+
| PLAN FRAGMENT 0                           |
|  OUTPUT EXPRS:19: count                   |
|   PARTITION: UNPARTITIONED                |
|                                           |
|   RESULT SINK                             |
|                                           |
|   4:AGGREGATE (merge finalize)            |
|   |  output: count(19: count)             |
|   |  group by:                            |
|   |                                       |
|   3:EXCHANGE                              |
|                                           |
| PLAN FRAGMENT 1                           |
|  OUTPUT EXPRS:                            |
|   PARTITION: RANDOM                       |
|                                           |
|   STREAM DATA SINK                        |
|     EXCHANGE ID: 03                       |
|     UNPARTITIONED                         |
|                                           |
|   2:AGGREGATE (update serialize)          |
|   |  output: count(*)                     |
|   |  group by:                            |
|   |                                       |
|   1:Project                               |
|   |  <slot 21> : 1                        |
|   |                                       |
|   0:IcebergScanNode                       |
|      TABLE: iceberg.db.tbl1 |
|      cardinality=13219153                 |
|      avgRowSize=2.0                       |
+-------------------------------------------+

The cardinality in the IcebergScanNode already has the result it does not need to perform any scan.

The text was updated successfully, but these errors were encountered:

Samrose-Ahmed · 2024-06-02T01:26:59Z

Related: #44387 and #43460

github-actions · 2024-12-02T11:00:46Z

We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!

Samrose-Ahmed · 2024-12-04T00:38:28Z

Still relevant.

dirtysalt · 2025-02-21T21:22:35Z

@Samrose-Ahmed #43616 maybe you can try this pr

it's merged into 3.2.
but to enable this optimization, you have to enable "set enable_rewrite_simple_agg_to_hdfs_scan = true"

dirtysalt · 2025-02-21T21:24:03Z

but that pr does not rely on iceberg metadata. however it can speed up a lot, andit's a general optimization(both works any table format).

Samrose-Ahmed · 2025-02-27T20:11:48Z

Thanks much!, that's a good PR in general, I just tried it and it does improve but it can be much faster for Iceberg (almost instant) because we have metadata.

Samrose-Ahmed added the type/enhancement Make an enhancement to StarRocks label Jun 2, 2024

github-actions bot added the no-issue-activity label Dec 2, 2024

github-actions bot removed the no-issue-activity label Dec 9, 2024

dirtysalt self-assigned this Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Iceberg table count #46525

Optimize Iceberg table count #46525

Samrose-Ahmed commented Jun 2, 2024

Samrose-Ahmed commented Jun 2, 2024

github-actions bot commented Dec 2, 2024

Samrose-Ahmed commented Dec 4, 2024

dirtysalt commented Feb 21, 2025

dirtysalt commented Feb 21, 2025

Samrose-Ahmed commented Feb 27, 2025 •

edited

Loading

Optimize Iceberg table count #46525

Optimize Iceberg table count #46525

Comments

Samrose-Ahmed commented Jun 2, 2024

Enhancement

Samrose-Ahmed commented Jun 2, 2024

github-actions bot commented Dec 2, 2024

Samrose-Ahmed commented Dec 4, 2024

dirtysalt commented Feb 21, 2025

dirtysalt commented Feb 21, 2025

Samrose-Ahmed commented Feb 27, 2025 • edited Loading

Samrose-Ahmed commented Feb 27, 2025 •

edited

Loading