Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Iceberg table count #46525

Open
Samrose-Ahmed opened this issue Jun 2, 2024 · 6 comments
Open

Optimize Iceberg table count #46525

Samrose-Ahmed opened this issue Jun 2, 2024 · 6 comments
Assignees
Labels
type/enhancement Make an enhancement to StarRocks

Comments

@Samrose-Ahmed
Copy link
Contributor

Enhancement

One can obtain the count(*) for an iceberg table from the Iceberg metadata without having to do a full scan of the data. Currently, Starrocks performs a full scan of the Iceberg Table data when doing a count(*) query on external Iceberg lake table. This should be optimized to just use the Iceberg metadata (this is already available via the statistics).

E.g.

StarRocks > explain select count(*) as cnt from tbl1;
+-------------------------------------------+
| Explain String                            |
+-------------------------------------------+
| PLAN FRAGMENT 0                           |
|  OUTPUT EXPRS:19: count                   |
|   PARTITION: UNPARTITIONED                |
|                                           |
|   RESULT SINK                             |
|                                           |
|   4:AGGREGATE (merge finalize)            |
|   |  output: count(19: count)             |
|   |  group by:                            |
|   |                                       |
|   3:EXCHANGE                              |
|                                           |
| PLAN FRAGMENT 1                           |
|  OUTPUT EXPRS:                            |
|   PARTITION: RANDOM                       |
|                                           |
|   STREAM DATA SINK                        |
|     EXCHANGE ID: 03                       |
|     UNPARTITIONED                         |
|                                           |
|   2:AGGREGATE (update serialize)          |
|   |  output: count(*)                     |
|   |  group by:                            |
|   |                                       |
|   1:Project                               |
|   |  <slot 21> : 1                        |
|   |                                       |
|   0:IcebergScanNode                       |
|      TABLE: iceberg.db.tbl1 |
|      cardinality=13219153                 |
|      avgRowSize=2.0                       |
+-------------------------------------------+

The cardinality in the IcebergScanNode already has the result it does not need to perform any scan.

@Samrose-Ahmed Samrose-Ahmed added the type/enhancement Make an enhancement to StarRocks label Jun 2, 2024
@Samrose-Ahmed
Copy link
Contributor Author

Related: #44387 and #43460

Copy link

github-actions bot commented Dec 2, 2024

We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!

@Samrose-Ahmed
Copy link
Contributor Author

Still relevant.

@dirtysalt
Copy link
Contributor

@Samrose-Ahmed #43616 maybe you can try this pr

it's merged into 3.2.
but to enable this optimization, you have to enable "set enable_rewrite_simple_agg_to_hdfs_scan = true"

@dirtysalt
Copy link
Contributor

but that pr does not rely on iceberg metadata. however it can speed up a lot, andit's a general optimization(both works any table format).

@dirtysalt dirtysalt self-assigned this Feb 21, 2025
@Samrose-Ahmed
Copy link
Contributor Author

Samrose-Ahmed commented Feb 27, 2025

Thanks much!, that's a good PR in general, I just tried it and it does improve but it can be much faster for Iceberg (almost instant) because we have metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Make an enhancement to StarRocks
Projects
None yet
Development

No branches or pull requests

2 participants