Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] optimize the performace for topn with large offset #55886

Merged
merged 1 commit into from
Feb 27, 2025

Conversation

stdpain
Copy link
Contributor

@stdpain stdpain commented Feb 13, 2025

Why I'm doing:

SSB100G dop=4 1BE

select lo_shipmode from lineorder order by lo_shipmode limit 50000000, 400

baseline:1m42s patched:28s339ms

The reasons why baseline performance is too low are:
1.The merge operation is too frequent and needs to be done every 256 chunks. But the total input data is too large. This results in too many merge operations.

This PR adds max_buffer_size, which depends on offset + limit /chunk_size, to reduce the frequency of merge operations. But it may result in using more memory.
This PR additionally optimizes memory for merge chunks. It can reduce the peak memory of merge.

What I'm doing:

  1. change the max_buffered_size to chunk_size/4096 when limit greater than 65535
  2. Avoid large permutations that take up too much memory.
  3. reduce memory when merge large chunks

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.4
    • 3.3
    • 3.2
    • 3.1
    • 3.0

@stdpain stdpain requested a review from a team as a code owner February 13, 2025 13:24
@stdpain stdpain force-pushed the opt_topn_with_large_limit branch from 8eb9d93 to 2bf3116 Compare February 14, 2025 06:44
satanson
satanson previously approved these changes Feb 14, 2025
@stdpain stdpain force-pushed the opt_topn_with_large_limit branch 2 times, most recently from 6735781 to d421e44 Compare February 17, 2025 02:10
satanson
satanson previously approved these changes Feb 24, 2025
@stdpain stdpain force-pushed the opt_topn_with_large_limit branch 4 times, most recently from 160e90d to e765fd7 Compare February 25, 2025 06:15
@stdpain stdpain force-pushed the opt_topn_with_large_limit branch from e765fd7 to 4262201 Compare February 25, 2025 08:11
Copy link

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[BE Incremental Coverage Report]

pass : 296 / 344 (86.05%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/exec/topn_node.cpp 0 2 00.00% [234, 235]
🔵 be/src/exec/chunks_sorter_topn.h 9 13 69.23% [81, 141, 145, 147]
🔵 be/src/exec/sorting/merge.h 29 39 74.36% [152, 184, 185, 186, 188, 189, 190, 210, 211, 212]
🔵 be/src/exec/chunks_sorter_topn.cpp 163 194 84.02% [87, 109, 174, 175, 176, 493, 497, 498, 499, 501, 503, 504, 505, 506, 507, 509, 510, 511, 512, 514, 515, 520, 521, 575, 732, 749, 762, 763, 772, 773, 774]
🔵 be/src/exec/sorting/merge_cascade.cpp 16 17 94.12% [282]
🔵 be/src/exec/sorting/merge.cpp 47 47 100.00% []
🔵 be/src/exec/sorting/merge_column.cpp 30 30 100.00% []
🔵 be/src/exec/pipeline/sort/partition_sort_sink_operator.cpp 2 2 100.00% []

@stdpain stdpain merged commit 11b98b0 into StarRocks:main Feb 27, 2025
52 checks passed
Copy link

@Mergifyio backport branch-3.3

@github-actions github-actions bot removed the 3.3 label Feb 27, 2025
Copy link
Contributor

mergify bot commented Feb 27, 2025

backport branch-3.3

✅ Backports have been created

Copy link

@Mergifyio backport branch-3.4

@github-actions github-actions bot removed the 3.4 label Feb 27, 2025
Copy link
Contributor

mergify bot commented Feb 27, 2025

backport branch-3.4

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Feb 27, 2025
)

1. change the max_buffered_size to chunk_size/4096 when limit greater than 65535
2. Avoid large permutations that take up too much memory.
3. Reduce memory when merging large chunks

SSB100G dop=4 1BE
```
select lo_shipmode from lineorder order by lo_shipmode limit 50000000, 400
```
baseline:1m42s patched:28s339ms

The reasons why baseline performance is too low are:
1.The merge operation is too frequent and needs to be done every 256 chunks. But the total input data is too large. This results in too many merge operations.

This PR adds max_buffer_size, which depends on offset + limit /chunk_size, to reduce the frequency of merge operations. But it may result in using more memory.
This PR additionally optimizes memory for merge chunks. It can reduce the peak memory of merge.

Signed-off-by: stdpain <[email protected]>
(cherry picked from commit 11b98b0)

# Conflicts:
#	be/src/exec/chunks_sorter.cpp
#	be/src/exec/chunks_sorter.h
#	be/src/exec/chunks_sorter_topn.h
#	be/src/exec/pipeline/sort/local_partition_topn_context.cpp
#	be/src/exec/sorting/merge.h
mergify bot pushed a commit that referenced this pull request Feb 27, 2025
)

1. change the max_buffered_size to chunk_size/4096 when limit greater than 65535
2. Avoid large permutations that take up too much memory.
3. Reduce memory when merging large chunks

SSB100G dop=4 1BE
```
select lo_shipmode from lineorder order by lo_shipmode limit 50000000, 400
```
baseline:1m42s patched:28s339ms

The reasons why baseline performance is too low are:
1.The merge operation is too frequent and needs to be done every 256 chunks. But the total input data is too large. This results in too many merge operations.

This PR adds max_buffer_size, which depends on offset + limit /chunk_size, to reduce the frequency of merge operations. But it may result in using more memory.
This PR additionally optimizes memory for merge chunks. It can reduce the peak memory of merge.

Signed-off-by: stdpain <[email protected]>
(cherry picked from commit 11b98b0)
wanpengfei-git pushed a commit that referenced this pull request Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants