Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve docs for table statistics #24892

Merged
merged 1 commit into from
Feb 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/src/main/sphinx/connector/iceberg.md
Original file line number Diff line number Diff line change
Expand Up @@ -1917,6 +1917,7 @@ ORDER BY _change_ordinal ASC;
The connector includes a number of performance improvements, detailed in the
following sections.

(iceberg-table-statistics)=
### Table statistics

The Iceberg connector can collect column statistics using {doc}`/sql/analyze`
Expand Down
21 changes: 17 additions & 4 deletions docs/src/main/sphinx/optimizer/statistics.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ Trino supports statistics based optimizations for queries. For a query to take
advantage of these optimizations, Trino must have statistical information for
the tables in that query.

Table statistics are provided to the query planner by connectors.
Table statistics are estimates about the stored data. They are provided to the
query planner by connectors and enable performance improvements for query
processing.

## Available statistics

Expand All @@ -27,6 +29,17 @@ being used and can also vary by table. For example, the
Hive connector does not currently provide statistics on data size.

Table statistics can be displayed via the Trino SQL interface using the
{doc}`/sql/show-stats` command. For the Hive connector, refer to the
{ref}`Hive connector <hive-analyze>` documentation to learn how to update table
statistics.
[](/sql/show-stats) command.

Depending on the connector support, table statistics are updated by Trino when
executing [data management statements](sql-data-management) like `INSERT`,
`UPDATE`, or `DELETE`. For example, the [Delta Lake
connector](delta-lake-table-statistics), the [Hive connector](hive-analyze), and
the [Iceberg connector](iceberg-table-statistics) all support table statistics
management from Trino.

You can also initialize statistics collection with the [](/sql/analyze) command.
This is needed when other systems manipulate the data without Trino, and
therefore statistics tracked by Trino are out of date. Other connectors rely on
the underlying data source to manage table statistics or do not support table
statistics use at all.