Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deps: upgrade Shopify/sarama v1.38.1 to IBM/sarama v1.42.1 #117544

Merged
merged 1 commit into from
Jan 30, 2024

Conversation

wenyihu6
Copy link
Contributor

@wenyihu6 wenyihu6 commented Jan 9, 2024

This patch updates sarama library to the latest version. Note that the ownership
of the sarama library has been transferred from Shopify to IBM.

Fixes: #117522
Release note: none

Here is the list of commits between the two versions upgrade.

d88a48a chore: update CHANGELOG.md to v1.42.1 (#2711)
385b3b4 fix(config): relax ClientID validation after 1.0.0 (#2706)
3364ff0 chore(doc): add CODE_OF_CONDUCT.md
768496e chore(ci): bump actions/dependency-review-action from 3.1.0 to 3.1.1 (#2707)
27710af fix: make fetchInitialOffset use correct protocol (#2705)
a46917f chore(ci): bump actions/dependency-review-action from 2.5.1 to 3.1.0 (#2702)
4168f7c chore(ci): bump ossf/scorecard-action from 2.1.2 to 2.3.1 (#2703)
7155d51 chore(ci): add kafka 3.6.0 to FVT and versions
e0c3c62 fix(txmgr): ErrOffsetsLoadInProgress is retriable
2e077cf Fix default retention time value in offset commit (#2700)
f97ced2 Merge pull request #2678 from lzakharov/fix-data-race-in-async-produce
56d5044 fix: data race in Broker.AsyncProduce
a15034a Fix data race on Broker.done channel (#2698)
82f0e48 Asynchronously close brokers during a RefreshBrokers (#2693)
ee1944b chore(ci): bump github/codeql-action from 2.22.4 to 2.22.5 (#2695)
1d4de95 chore(ci): bump actions/upload-artifact from 3.1.0 to 3.1.3 (#2696)
3ca69a8 chore(doc): add OpenSSF Scorecard badge (#2691)
d2023bf feat(test): add a simple fuzzing example (#2039)
b8b29e1 chore(doc): add OpenSSF badge (#2690)
d38f08c fix(ci): always run CodeQL on every commit (#2689)
c5815ae chore(ci): bump github/codeql-action from 2.2.4 to 2.22.4 (#2686)
3a893f5 Merge pull request #2688 from IBM/dnwe/security-dot-md
7ae18cb fix(ci): ignore markdown changes for dep review
3b0f32e fix(doc): add SECURITY.md for vuln reporting
40ec971 chore(ci): bump actions/checkout from 3.1.0 to 4.1.1 (#2687)
25137dc chore(ci): add Dependency Review Actions
8ce03ed chore(ci): add golangci-lint and gitleaks checks
9658e0e chore(ci): ensure GH actions are pinned by hash
3d56b4c chore(ci): ensure gh permissions are explicit
8892f3f chore(ci): add dependabot to /examples tree
05af18e chore(ci): ossf scorecard.yml (#2683)
c42b2e0 fix(client): ignore empty Metadata responses when refreshing (#2672)
6678dd1 chore(deps): bump the golang-org-x group with 1 update (#2671)
24f1249 fix: pre-compile regex for parsing kafka version (#2663)
64d2044 fix(docs): correct topic name in rebalancing strategy example (#2657)
44f6db5 chore(deps): bump the golang-org-x group with 2 updates (#2661)
e16473b chore(ci): bump docker/setup-buildx-action from 2 to 3 (#2653)
98ec384 fix: use least loaded broker to refresh metadata
4b55bb3 perf: Alloc records in batch (#2646)
05cb9fa fix(consumer): don't retry session if ctx canceled
0b17025 chore(deps): bump the golang-org-x group with 1 update (#2641)
9e75986 chore(ci): bump actions/checkout from 3 to 4
9b0419d fix(consumer): guard against nil client
f3c4194 fix: typo
87229d9 fix: add retry logic to AlterUserScramCredentials
ae5eee5 fix(client): force Event Hubs to use V1_0_0_0 (#2633)
dedd86d fix: make clear that error is configuration issue not server error (#2628)
a4eafb4 chore(proto): doc CreateTopics/JoinGroup fields
503ade3 fix: add paragraph break to fix markdown render
ffaa252 fix(gh): correct issue template comments
78c7b63 chore(gh): add new style issue templates
c7e6bca chore(ci): ignore .md-only changes
09395f6 chore(docs): remove gopkg.in link
261043a chore(ci): add workflow_dispatch to stale
bbf6ee4 chore(ci): improve stale behaviour
b1bf950 chore(docs): add 1.41.0 to CHANGELOG.md
2b4ba74 chore(lint): bump golangci-lint and tweak config
9282d75 fix(doc): add missing doc for mock consumer
e9bd1b8 fix(proto): handle V3 member metadata and empty owned partitions
96c37d1 fix(docs): use go install for fetching tools
5cd9fa6 fix: flaky TestFuncOffsetManager
1bcf2d9 feat(fvt): test wider range of kafkas
827ec18 fix(fvt): reduce minimum compression-ratio metric
d44ebdc fix(fvt): fresh metrics registry for each test
2b54832 fix(fvt): ensure correct version in consumer tests
270f507 chore(fvt): tweak to work across more versions
b4e0554 feat(ci): experiment with tcpdump during FVT
0bb3316 fix(fvt): versioned cfg for invalid topic producer
d4dc7bc fix(examples): sync exactly_once and consumergroup
913b18f fix(fvt): Metadata version in ensureFullyReplicated
8681621 fix(fvt): handle msgset vs batchset
26792a3 feat(fvt): add healthcheck, depends_on and --wait
d2dba29 feat(gzip): switch to klauspost/compress gzip (#2600)
f8daee4 chore(deps): bump github.com/eapache/go-resiliency from 1.3.0 to 1.4.0 (#2598)
868ed33 Merge pull request #2595 from IBM/dnwe/close
b0363d1 fix(test): ensure some more clients are closed
31a8693 fix(fvt): disable keepalive on toxiproxy client
45313c3 fix(test): add missing closes to admin client tests (#2594)
a5b6e6a Merge pull request #2593 from IBM/dnwe/toxiproxy
0b9db06 chore(ci): implement toxiproxy client
4d8bb31 chore(ci): replace toxiproxy client dep
3d7b37f feat(fvt): experiment with per-kafka-version image
bd81a11 chore(fvt): tidyup broker await
8d0df91 chore(deps): bump module github.com/klauspost/compress to v1.16.7
6ff3567 chore(test): fix a couple of leaks
f033fc7 chore(deps): bump github.com/eapache/go-xerial-snappy digest to c322873
0409ed9 chore(deps): bump module github.com/jcmturner/gokrb5/v8 to v8.4.4
9dc4305 chore(deps): bump module github.com/pierrec/lz4/v4 to v4.1.18
108e264 chore(fvt): roll some tests back to DefaultVersion
f4e6453 chore(test): use modern protocol versions in FVT
991b2b0 chore(test): speedup some slow tests
fa7db9a chore(ci): pre-build FVT docker image
e31a540 chore(ci): use latest Go in actions (#2580)
500399c chore(test): ensure all mockresponses use version
43eae9b feat: add new error for MockDeleteTopicsResponse (#2475)
4cde6b3 chore(config): make DefaultVersion V2_1_0_0 (#2574)
8a09ef3 Merge pull request #2575 from IBM/dnwe/mockbroker
f4f435c chore(test): add verbose logger for unittests
03368ff chore(test): ensure MockBroker closed within test
e8808a6 chore(proto): match HeartbeatResponse version (#2576)
be809f9 chore(ci): remove manual go cache
a3024e7 chore(test): add V2_1_0_0 ApiVersions
00741ec feat(proto): add support for TxnOffsetCommit V2
0c39b9f feat(proto): add support for ListOffsetsRequest V4
765bfa3 chore(config): make DefaultVersion V2_0_0_0
bb864d7 fix(test): shutdown MockBroker
1c9ebab Merge pull request #2570 from IBM/dnwe/proto
d9cb01e fix(proto): use full ranges for remaining proto
826ef81 fix(proto): use full range of Txn protocols
09c8186 fix: avoid logging value of proxy.Dialer
76ca69a feat(proto): support for Metadata V6-V10
10dd922 fix(proto): use full range of ListGroupsRequest
4175433 fix(proto): use full range of SyncGroupRequest
29487f1 bug: implement unsigned modulus for partitioning with crc32 hashing
f35d212 fix: a rebalance isn't triggered after adding new partitions
4659dd0 chore(deps): bump the golang-org-x group with 1 update (#2561)
d8d9e73 Merge pull request #2558 from IBM/dnwe/proto
e4bf4df fix(proto): tweak LeaveGroupRequest requiredVersion
24b54f6 fix(proto): use full range of OffsetFetchRequest
57969b4 fix(proto): use full range of HeartbeatRequest
3d1c345 fix(proto): use full range of FindCoordinator
cf96776 chore: add .pre-commit-config
02d1209 chore(ci): tidyup and improve actions workflows
09ced0b fix(proto): use full range of MetadataRequest
8c40629 fix(proto): use up to V3 of OffsetRequest
cdf36d5 fix(proto): use range of OffsetCommitRequest vers
1a8a3ed fix(consumer): support JoinGroup V4
40b52c5 fix(consumer): use full range of FetchRequest vers
5530d61 fix(example): check if msg channel is closed
68312a5 chore(test): add V2_0_0_0 ApiVersions
c10bd1e fix(test): test timing error
82a6d57 fix(proto): correct JoinGroup usage for wider version range
23d4561 chore(proto): permit DeleteGroupsRequest V1
6010af0 chore(proto): permit AlterConfigsRequest V1
c32ffd1 chore(proto): permit CreatePartitionsRequest V1
1532d9f fix(proto): ensure req+resp requiredVersion match (#2548)
2a5f0f6 chore: add 1.40.1 to CHANGELOG.md
4cce955 Fix printing of final metrics
1d8f80e fix(test): use correct v7 mock ProduceResponse
973a9b7 fix(producer): use newer ProduceReq as appropriate
fb761f2 feat: support up to v4 of the ListGroups API (#2541)
017083e fix(consumer): use newer LeaveGroup as appropriate (#2544)
ce1ac25 Merge pull request #2538 from IBM/dnwe/is-valid-version
a9126ad fix(proto): use DeleteRecordsRequest v1
b8cc2b1 feat(proto): add test around supported versions
ee2872c fix(admin): remove group member needs >= 2.4.0
3b82606 fix(proto): correct consumer metadata shim
c240c67 fix(proto): extend txn types for identical versions
40fa609 fix(proto): ensure req+resp requiredVersion match
fa37d61 fix(proto): use DescribeLogDirsRequest v1
3dfbf99 feat(proto): add isValidVersion to protocol types
02c5de3 chore(deps): bump the golang-org-x group with 1 update (#2542)
6d094b8 Merge the two CONTRIBUTING.md's (#2543)
bbee916 chore: rollup fvt kafka to latest three (#2537)
87209f8 Merge pull request #2536 from hindessm/mrh/sleep-when-throttled
102513a test: add throttling support test
5ac5dc0 feat: support for sleeping when throttled
7d7ac52 Implement resolve_canonical_bootstrap_servers_only (#2156)
f07b129 Merge pull request #2533 from hindessm/mrh/extend-throttling-metric-scope
b678d34 chore(typo): trivial typo
e18c6cf feat: refactor throttle metrics to handle more responses
2d7ccb8 feat: add throttleTime function for responses with time.Duration fields
fa93017 feat: add throttleTime function for responses with int32 ms fields
ae24dbf chore(typo): fix field documentation typo
aa72f59 fix: avoiding burning cpu if all partitions are paused (#2532)
34bc8f9 fix: correct unsupported version check (#2528)
c28ecc0 fix(fvt): ensure fully-replicated at test start (#2531)
849c8b1 fix(test): allow testing of skipped test without IsTransactional panic (#2525)
ecf43f4 fix: concurrent issue on updateMetaDataMs (#2522)
e07f521 Merge pull request #2520 from hindessm/mrh/admin-retry-logic
aad8cf3 fix: add retry logic to ListPartitionReassignments
66ef5a9 fix: add retry logic to DescribeCluster
c7ce32f fix: add retry logic to DescribeTopics
80899bf Added support for Kerberos authentication with a credentials cache (KRB5_CCACHE_AUTH). (#2457)
735f33b feat(consumer): use buffer pools for decompression (#2484)
669d2bc fix: comments for PauseAll and ResumeAll (#2523)
63ff8d1 Merge pull request #2519 from hindessm/mrh/fix-retry-logic-again
df58534 fix: use safer condition
3ba807b fix: admin retry logic
08ff0ff Merge pull request #2517 from hindessm/mrh/fix-some-retry-issues
39c18fc fix: off-by-one errors in attempts remaining logging
7888004 fix: admin retry logic
6ecdb50 Merge pull request #2514 from hindessm/main
f6ccc6f fix(test): ubi-minimal seems to be missing zoneinfo files
b53dbe9 fix(tools): remove default duplication from help
d439508 chore(docs): fix iotuil.Discard reference
441b083 chore(typos): random typos spotted while browsing code
12c24a8 chore(deps): bump github.com/stretchr/testify from 1.8.1 to 1.8.3 (#2512)
55ea700 chore(deps): bump github.com/klauspost/compress from 1.15.14 to 1.16.6 (#2513)
2420fcd chore(deps): bump the golang-org-x group with 2 updates
cc72418 chore(ci): bump actions/setup-go from 3 to 4 (#2508)
dcf5196 chore(ci): remove empty scope from dependabot.yml
fb81408 Merge pull request #2504 from EladLeev/golangci-cleanup
5185d46 chore(ci): tweak scope in dependabot.yml
492d4f9 chore(ci): remove scope from dependabot.yml
eb52957 chore(ci): add dependabot.yml
da48ff2 chore(ci): add depguard config
3654162 chore: add 1.40.0 to CHANGELOG.md
0863085 chore(ci): bump golangci, remove deprecated linters
848522f chore(ci): add simple apidiff workflow
ee207f8 chore(ci): fix stale action params
3f22fd3 chore(ci): migrate probot-stale to actions/stale
4b9e8f6 fix: restore (*OffsetCommitRequest) AddBlock func
c2cab9d chore: migrate module to github.com/IBM/sarama
fd35e17 chore: bytes.Equal instead bytes.Compare (#2485)
fd21bd2 chore(ci): remove Shopify/shopify-cla-action (#2489)
9127f1c fix: data race in balance strategy (#2453)
2f8dcd0 fix(mock consumer): HighWaterMarkOffset (#2447)
7dbf0b5 fix: use version 4 of DescribeGroupsRequest only if kafka broker version is >= 2.4
1015b4f chore(deps): bump golang.org/x/net from 0.5.0 to 0.7.0 (#2452)
397cee4 fix: simplify some balance_strategy.go logic
d8bcfcc chore: refresh CHANGELOG.md from github-releases
40329aa chore: add kafka 3.3.2 (#2434)
0b15695 fix(metrics): fix race condition when calling Broker.Open() twice (#2428)
66e60c7 fix(consumer): don't retry FindCoordinator forever (#2427)

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@wenyihu6
Copy link
Contributor Author

wenyihu6 commented Jan 9, 2024

@wenyihu6
Copy link
Contributor Author

I tested this version with cdc/tpcc-1000 and seems to work expectedly.

:145: test status: watching changefeed[default]: end-to-end steady latency 32.171s; max steady latency so far 45.481s; highwater 2024-01-10 02:58:28.370546981 +0000 UTC
...
03:11:29 test_runner.go:1020: [w0] --- PASS: cdc/tpcc-1000/sink=kafka (7981.09s)
--- PASS: cdc/tpcc-1000/sink=kafka (7981.09s)

@wenyihu6 wenyihu6 self-assigned this Jan 17, 2024
@wenyihu6 wenyihu6 marked this pull request as ready for review January 22, 2024 22:12
@wenyihu6 wenyihu6 requested review from a team as code owners January 22, 2024 22:12
@wenyihu6 wenyihu6 requested review from herkolategan, DarrylWong and miretskiy and removed request for a team, herkolategan and DarrylWong January 22, 2024 22:12
@wenyihu6
Copy link
Contributor Author

Does anyone might know why the dependency for "github.com/Shopify/sarama" is still there although I've deleted every places in the codebase that imports it? 32c4548 Could it be because other packages still depend on it?

@miretskiy miretskiy requested review from rharding6373 and jayshrivastava and removed request for miretskiy January 26, 2024 20:42
@jayshrivastava
Copy link
Contributor

Does anyone might know why the dependency for "github.com/Shopify/sarama" is still there although I've deleted every places in the codebase that imports it? 32c4548 Could it be because other packages still depend on it?

I think it's expected that you have to manually delete it from DEPS.bzl. If it keeps automatically coming back, then I would ask dev inf about it.

Also this LGTM, but technically it needs a TL to approve it. They are still using the MIT license, so it should be okay to merge. I think the testing is sufficient enough to say that this won't cause problems.

@rharding6373
Copy link
Collaborator

rharding6373 commented Jan 29, 2024

I tested this version with cdc/tpcc-1000 and seems to work expectedly.

:145: test status: watching changefeed[default]: end-to-end steady latency 32.171s; max steady latency so far 45.481s; highwater 2024-01-10 02:58:28.370546981 +0000 UTC
...
03:11:29 test_runner.go:1020: [w0] --- PASS: cdc/tpcc-1000/sink=kafka (7981.09s)
--- PASS: cdc/tpcc-1000/sink=kafka (7981.09s)

How does this compare to using the previous version of sarama? Could you point me to where the latest benchmark numbers are?

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jayshrivastava and @wenyihu6)


DEPS.bzl line 7882 at r2 (raw file):

        importpath = "github.com/Shopify/sarama",
        sha256 = "ca251ac94fc78893afd2c2debf9ae061223ff4cb174daa508e2b0146f66de40e",
        strip_prefix = "github.com/Shopify/[email protected]",

Interesting that this is not the same version that you deleted..

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jayshrivastava and @wenyihu6)


DEPS.bzl line 7882 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Interesting that this is not the same version that you deleted..

It looks like there's a dependency from "github.com/bsm/sarama-cluster" to "github.com/Shopify/sarama", which could potentially be causing this. Interestingly, sarama-cluster is deprecated, and I can't find any other uses of it in the code base. Do you happen to know what we use it for?

Copy link

blathers-crl bot commented Jan 29, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@wenyihu6
Copy link
Contributor Author

DEPS.bzl line 7882 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

It looks like there's a dependency from "github.com/bsm/sarama-cluster" to "github.com/Shopify/sarama", which could potentially be causing this. Interestingly, sarama-cluster is deprecated, and I can't find any other uses of it in the code base. Do you happen to know what we use it for?

Good point. I tried looking into this more. It looks like there are some other packages that depend on the previous version of sarama which caused the old sarama package to stay.

❯ go mod graph | grep github.com/Shopify/sarama
github.com/jaegertracing/[email protected] github.com/Shopify/[email protected]
github.com/openzipkin/[email protected] github.com/Shopify/[email protected]
github.com/openzipkin/[email protected] github.com/Shopify/[email protected]
github.com/openzipkin/[email protected] github.com/Shopify/[email protected]
github.com/openzipkin/[email protected] github.com/Shopify/[email protected]

I'm not sure if we want to upgrade those packages as well. Their latest version also depend on the previous version of sarama.

@wenyihu6
Copy link
Contributor Author

wenyihu6 commented Jan 29, 2024

@rharding6373 Good point. I vaguely remember this discussion when we were discussing this PR. iirc, I don't think we have a benchmark numbers that could be reliably used for performance comparison. We can run more tests and closely introspect different metrics more.

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thanks for squashing!

RE: "seems to work expectedly" with benchmark results -- My real question (which is mostly educational) is what does "work expectedly" mean in this context? The benchmark successfully built and ran? The latency and tput #s are comparable to before the change? They're below some threshold that people familiar with the benchmark would find egregious? I don't think this is a change that necessarily needs detailed benchmarking. I'm just curious. Sorry for digging into an old comment!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jayshrivastava and @wenyihu6)


-- commits line 4 at r3:
Maybe update the description to indicate the an older version of sarama is kept to satisfy other dependencies by third party libraries.


DEPS.bzl line 7882 at r2 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

Good point. I tried looking into this more. It looks like there are some other packages that depend on the previous version of sarama which caused the old sarama package to stay.

❯ go mod graph | grep github.com/Shopify/sarama
github.com/jaegertracing/[email protected] github.com/Shopify/[email protected]
github.com/openzipkin/[email protected] github.com/Shopify/[email protected]
github.com/openzipkin/[email protected] github.com/Shopify/[email protected]
github.com/openzipkin/[email protected] github.com/Shopify/[email protected]
github.com/openzipkin/[email protected] github.com/Shopify/[email protected]

I'm not sure if we want to upgrade those packages as well. Their latest version also depend on the previous version of sarama.

Nice sleuthing.

@wenyihu6
Copy link
Contributor Author

@wenyihu6
Copy link
Contributor Author

wenyihu6 commented Jan 30, 2024

:lgtm: Thanks for squashing!

RE: "seems to work expectedly" with benchmark results -- My real question (which is mostly educational) is what does "work expectedly" mean in this context? The benchmark successfully built and ran? The latency and tput #s are comparable to before the change? They're below some threshold that people familiar with the benchmark would find egregious? I don't think this is a change that necessarily needs detailed benchmarking. I'm just curious. Sorry for digging into an old comment!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jayshrivastava and @wenyihu6)

-- commits line 4 at r3: Maybe update the description to indicate the an older version of sarama is kept to satisfy other dependencies by third party libraries.

DEPS.bzl line 7882 at r2 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…
Nice sleuthing.

I meant "worked expectedly" as the test cdc/tpcc-1000/sink=kafka passes. It is a relatively large scaled roachtest in cdc (1000 warehouses, the workload ran for 120 minutes). Roachtests don't do any correctness testing now but just runs runFeedLatencyVerifier which polls latencies for the changefeed job and asserts that everything is running as expected below the specific latency target.

ct.runFeedLatencyVerifier(feed, latencyTargets{
initialScanLatency: 3 * time.Minute,
steadyLatency:      10 * time.Minute,
})

@wenyihu6
Copy link
Contributor Author

I was able to repro this test flake TestChangefeedEndTimeWithCursor/webhook on master pretty consistently with ./dev test pkg/ccl/changefeedccl -f TestChangefeedEndTimeWithCursor --stress. I will open an issue for it.

@rharding6373
Copy link
Collaborator

Thanks for the explanation!

This patch updates sarama library to the latest version. Note that the ownership
of the sarama library has been transferred from Shopify to IBM.

Fixes: cockroachdb#117522
Release note: none
@wenyihu6
Copy link
Contributor Author

TFTRs!

bors r=rharding6373

The last two pushes was empty - just to trigger CI to avoid the known test flakes.

@craig
Copy link
Contributor

craig bot commented Jan 30, 2024

Build succeeded:

@craig craig bot merged commit b6d1474 into cockroachdb:master Jan 30, 2024
7 of 9 checks passed
wenyihu6 added a commit to wenyihu6/cockroach that referenced this pull request Feb 2, 2024
Now that this patch (cockroachdb#117544) has been merged, sarama now acknowledges and
reacts to kafka server's throttling messages by slowing down. To provide better
observability into sarama code, this patch adds a metrics registry interceptor
and a new metrics `changefeed.kafka_throttling_hist_nanos` which tracks time (in
nanos) spent in sarama's throttling when cockroachdb exceed the kafka quota.

Fixes: cockroachdb#117618

Release note: changefeed.kafka_throttling_hist_nanos has now been added to
metrics to monitor sarama throttling behavior resulting from exceeding kafka
quota.
wenyihu6 added a commit to wenyihu6/cockroach that referenced this pull request Feb 2, 2024
Now that this patch (cockroachdb#117544) has been merged, sarama now acknowledges and
reacts to kafka server's throttling messages by slowing down. To provide better
observability into sarama code, this patch adds a metrics registry interceptor
and a new metrics `changefeed.kafka_throttling_hist_nanos` which tracks time (in
nanos) spent in sarama's throttling when cockroachdb exceed the kafka quota.

Fixes: cockroachdb#117618

Release note: changefeed.kafka_throttling_hist_nanos has now been added to
metrics to monitor sarama throttling behavior resulting from exceeding kafka
quota.
wenyihu6 added a commit to wenyihu6/cockroach that referenced this pull request Feb 5, 2024
Now that this patch (cockroachdb#117544) has been merged, sarama now acknowledges and
reacts to kafka server's throttling messages by slowing down. To provide better
observability into sarama code, this patch adds a metrics registry interceptor
and a new metrics `changefeed.kafka_throttling_hist_nanos` which tracks time (in
nanos) spent in sarama's throttling when cockroachdb exceed the kafka quota.

Fixes: cockroachdb#117618

Release note: changefeed.kafka_throttling_hist_nanos has now been added to
metrics to monitor sarama throttling behavior resulting from exceeding kafka
quota.
craig bot pushed a commit that referenced this pull request Feb 6, 2024
117117: kv: log slow requests on replica level in addition to range level r=shralex a=shralex

Previously, slow requests were only logged at the range level, but the logs did not indicate which replica is slow. Moreover, the SlowRPC metric attempted to represent the number of requests currently being retried, however it was done on the range level and therefore missed a second level of replica-level retries being done underneath.

This PR adds logging on the replica level, removes a confusing log line, and changes the metric to count the number of slow requests in a simpler manner.

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-33510
Fixes: #114431

117693: changefeedccl: add observability metrics into sarama code r=rharding6373 a=wenyihu6

Now that this patch (#117544) has been merged, sarama now acknowledges and
reacts to kafka server's throttling messages by slowing down. To provide better
observability into sarama code, this patch adds a metrics registry interceptor
and a new metrics `changefeed.kafka_throttling_hist_nanos` which tracks time (in
nanos) spent in sarama's throttling when cockroachdb exceed the kafka quota.

Fixes: #117618

Release note: changefeed.kafka_throttling_hist_nanos has now been added to
metrics to monitor sarama throttling behavior resulting from exceeding kafka
quota.

118372: sql: fix flake in TestTxnContentionEventsTable r=yuzefovich a=michae2

In causeContention we deliberately hold a transaction open using pg_sleep to block an update statement. The timing we're trying to achieve is:

1. transaction insert
2. update starts and blocks
3. transaction held open using pg_sleep

We were using a WaitGroup to order (2) after (1), but there was no synchronization to ensure (3) came after (2).

This commit adds a retry loop that checks `crdb_internal.cluster_queries` to ensure (3) comes after (2).

Fixes: #118236

Release note: None

118760: builtins: allow VIEWACTIVITY priv to use crdb_internal.request_statem… r=xinhaoz a=xinhaoz

…ent_bundle

Previously only those with the VIEWACTIVITY role could use the crdb_internal.request_statement_bundle builtin. We should allow the VIEWACTIVITY privilege as well since role options are now deprecated. This allow also allow stmt bundle requests to be made from db-console for users with this granted privilege.

Epic: none
Fixes: #118759

Release note (bug fix): Those with VIEWACTIVITY privilege can now request statement bundles using crdb_internal.requets_statement_bundle or via db-console's sql activity page.

118767: release: confirm yum install r=celiala a=rail

This adds `-y` flag to install `yum` without user prompt.

Epic: none
Release note: None

118789: jobs,application_api: replace calls to `skip.Stress` with `skip.Duress` r=celiala a=rickystewart

`skip.Duress()` seems like it should have been used in this case as it gives more time under both `race` and `deadlock`. This will give these tests some extra time if they run in a heavyweight configuration but not "under stress".

Epic: CRDB-8308
Release note: None

118792: kvfollowerreadsccl: skip test under `race` not `stressrace` r=celiala a=rickystewart

Epic: CRDB-8308
Release note: None

118797: bincheck: do not run geos tests on Windows r=celiala a=rail

In #106642 we stopped shipping libgeos on Windows, but didn't update the bincheck test to reflect the change.

Epic: none
Release note: None

Co-authored-by: shralex <[email protected]>
Co-authored-by: Wenyi Hu <[email protected]>
Co-authored-by: Michael Erickson <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: Rail Aliiev <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
wenyihu6 added a commit to wenyihu6/cockroach that referenced this pull request Feb 21, 2024
Now that this patch (cockroachdb#117544) has been merged, sarama now acknowledges and
reacts to kafka server's throttling messages by slowing down. To provide better
observability into sarama code, this patch adds a metrics registry interceptor
and a new metrics `changefeed.kafka_throttling_hist_nanos` which tracks time (in
nanos) spent in sarama's throttling when cockroachdb exceed the kafka quota.

Fixes: cockroachdb#117618

Release note: changefeed.kafka_throttling_hist_nanos has now been added to
metrics to monitor sarama throttling behavior resulting from exceeding kafka
quota.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cdc: test and upgrade sarama version
4 participants