Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-23061: LowercaseTransliterator has poor scalability due to synchronized state #3417

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

stevenschlansker
Copy link

@stevenschlansker stevenschlansker commented Feb 25, 2025

LowercaseTransliterator tries to reuse buffers to avoid allocation. Historically, this may have been an important optimization.
However on modern machines you are much more likely to use multiple CPUs, and modern JVMs are very efficient at GC, especially for local temporary data. Contending on a monitor is a big deal nowadays, especially for something so low level and hidden.

I've prepared a benchmark using JMH to show the change in scalability:

Linux 6.12.13-200.fc41.x86_64 AMD Ryzen 7 3700X 8-Core Processor
JMH version: 1.37
JDK 23.0.2, OpenJDK 64-Bit Server VM, 23.0.2+7

Original `synchronized`:

1 Thread Benchmark                         Mode  Cnt     Score    Error   Units
LowercaseTransliteratorPerf.testSentence  thrpt    5   789.169 ±  2.779  ops/ms
LowercaseTransliteratorPerf.testShort     thrpt    5  5217.250 ± 35.501  ops/ms

8 Thread Benchmark                         Mode  Cnt     Score    Error   Units
LowercaseTransliteratorPerf.testSentence  thrpt    5   759.934 ±  4.880  ops/ms
LowercaseTransliteratorPerf.testShort     thrpt    5  3916.430 ± 70.545  ops/ms

New `stateless`:

1 Thread Benchmark                         Mode  Cnt     Score    Error   Units
LowercaseTransliteratorPerf.testSentence  thrpt    5   779.504 ±  6.507  ops/ms
LowercaseTransliteratorPerf.testShort     thrpt    5  5641.104 ± 29.747  ops/ms

8 Thread Benchmark                         Mode  Cnt      Score      Error   Units
LowercaseTransliteratorPerf.testSentence  thrpt    5   6340.495 ±  240.436  ops/ms
LowercaseTransliteratorPerf.testShort     thrpt    5  39132.015 ± 7102.311  ops/ms

The current implementation exhibits negative scalability: adding more threads hurts throughput (~5% loss going from 1T to 8T)
By removing the synchronized state, the implementation now scales perfectly (~8x increase going from 1T to 8T for the longer test case)
There is a very minor (~1%) penalty for pure single-threaded throughput since we no longer reuse state.
I believe this is acceptable in the context of allowing multiple threads to work without coordination.

Checklist

  • Required: Issue filed: ICU-NNNNN
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

@stevenschlansker stevenschlansker force-pushed the translit-lower-scalability-jmh branch from 95a0533 to 2af99d6 Compare February 25, 2025 18:55
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerf.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@stevenschlansker stevenschlansker force-pushed the translit-lower-scalability-jmh branch from 2af99d6 to 2197adb Compare February 25, 2025 18:58
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/README.txt is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@stevenschlansker stevenschlansker changed the title LowercaseTransliterator has poor scalability due to synchronized state ICU-23061: LowercaseTransliterator has poor scalability due to synchronized state Feb 25, 2025
…nitor contention

Current implementation synchronizes the whole lowercase operation, so it is
always single-threaded

Removing the shared state lets us utilize multiple CPU cores
JMH is the OpenJDK microbenchmark harness. It is a standardized tool with
many integrations and provides a lot of out-of-the-box functionality.

No existing perf-tests are changed

https://github.com/openjdk/jmh
@stevenschlansker stevenschlansker force-pushed the translit-lower-scalability-jmh branch from 2197adb to f332c3c Compare February 25, 2025 19:11
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant