Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[to #295] Speed up cdc integrated tests #294

Merged
merged 11 commits into from
Nov 8, 2022
Merged

Conversation

pingyu
Copy link
Collaborator

@pingyu pingyu commented Nov 4, 2022

Signed-off-by: pingyu [email protected]

What problem does this PR solve?

Issue Number: close #295, #296

Problem Description: Speed up integrated tests

What is changed and how does it work?

  1. Speed up integrated tests by:
    a. Disable checking TiDB status (but add checking TiKV status).
    b. Shorten some during of sleep.
  2. Try to fix unstable cdc: integrated test "cdc_hang_on" is not stable #296.
  3. Fix integrated-test.Dockfile.
  4. Add tests/up.sh to run integrated tests in a isolated environment.
  5. By the way update master version as 1.1.0-master.
  6. By the way add shellcheck. As we are using lots of shell scripts, I think it would be better to use some static lint tool to improve the codes. Use make tools/bin/shellcheck and tools/bin/shellcheck xxx to check modified scripts in our PRs. make shellcheck should be enabled in CI later.

Code changes

  • No code

Check List for Tests

This PR has been tested by at least one of the following methods:

  • Integration test

Side effects

  • No side effects

Related changes

  • No related changes

@codecov
Copy link

codecov bot commented Nov 4, 2022

Codecov Report

Merging #294 (addf175) into main (b07ce74) will decrease coverage by 6.6615%.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@               Coverage Diff                @@
##               main       #294        +/-   ##
================================================
- Coverage   61.0519%   54.3904%   -6.6616%     
================================================
  Files           240        238         -2     
  Lines         20286      20294         +8     
================================================
- Hits          12385      11038      -1347     
- Misses         6780       8379      +1599     
+ Partials       1121        877       -244     
Flag Coverage Δ *Carryforward flag
br 39.5359% <ø> (-20.9650%) ⬇️ Carriedforward from 405c41e
cdc 61.2001% <ø> (-0.1075%) ⬇️

*This pull request uses carry forward flags. Click here to find out more.

Impacted Files Coverage Δ
br/pkg/task/backup.go 0.0000% <0.0000%> (-100.0000%) ⬇️
br/pkg/storage/local_unix.go 0.0000% <0.0000%> (-100.0000%) ⬇️
br/pkg/task/restore_raw_config.go 0.0000% <0.0000%> (-71.4286%) ⬇️
br/pkg/task/backup_raw.go 0.0000% <0.0000%> (-67.9739%) ⬇️
br/pkg/task/restore.go 0.0000% <0.0000%> (-67.6471%) ⬇️
br/pkg/utils/tls_config.go 0.0000% <0.0000%> (-65.2174%) ⬇️
br/pkg/summary/summary.go 0.0000% <0.0000%> (-62.5000%) ⬇️
br/pkg/task/restore_raw.go 0.0000% <0.0000%> (-62.2450%) ⬇️
br/pkg/task/common.go 18.6813% <0.0000%> (-51.2641%) ⬇️
br/pkg/restore/client.go 10.6976% <0.0000%> (-50.6977%) ⬇️
... and 46 more

@pingyu pingyu changed the title [close #xxx][Draft] speed up integrated tests [to #295] Speed up cdc integrated tests Nov 5, 2022
@pingyu
Copy link
Collaborator Author

pingyu commented Nov 5, 2022

Mark:
Captures seems to still be stuck after PD stopped. See https://ci.pingcap.net/blue/organizations/jenkins/common-check/detail/common-check/112055/pipeline/

And this may be the reason of "restart" argument of run_cdc_server not working ?

@pingyu
Copy link
Collaborator Author

pingyu commented Nov 5, 2022

Mark:
It can be observed that there are more than 2 captures. The result of "check capture count" may be not reliable.

failed to check capture count, expected: 2, got: 4, retry: 5
captures: [
  {
    "id": "313b963a-9386-4665-92ba-0b32422b9253",
    "is-owner": false,
    "address": "127.0.0.1:8601"
  },
  {
    "id": "a5a02ae1-8dc1-4eb8-85d7-68770a16779f",
    "is-owner": true,
    "address": "127.0.0.1:8602"
  },
  {
    "id": "aa44cd6d-d672-4982-99a9-f68fa7cf737a",
    "is-owner": false,
    "address": "127.0.0.1:8601"
  },
  {
    "id": "c49a38d6-7061-4a21-96e9-ec64a8825e0f",
    "is-owner": false,
    "address": "127.0.0.1:8602"
  }
]

Signed-off-by: pingyu <[email protected]>
Signed-off-by: pingyu <[email protected]>
Signed-off-by: pingyu <[email protected]>
@pingyu
Copy link
Collaborator Author

pingyu commented Nov 5, 2022

@pingyu
Copy link
Collaborator Author

pingyu commented Nov 5, 2022

/verify

@pingyu
Copy link
Collaborator Author

pingyu commented Nov 6, 2022

Mark:
Unstable case of multi_capture:

[2022-11-05T16:41:28.333Z] + sleep 1
[2022-11-05T16:41:29.946Z] + (( i++ ))
[2022-11-05T16:41:29.946Z] + (( i <= 50 ))
[2022-11-05T16:41:29.946Z] ++ curl -vsL http://127.0.0.1:8601/debug/info
[2022-11-05T16:41:29.946Z] * About to connect() to 127.0.0.1 port 8601 (#0)
[2022-11-05T16:41:29.946Z] *   Trying 127.0.0.1...
[2022-11-05T16:41:29.946Z] * Connection refused
[2022-11-05T16:41:29.946Z] * Failed connect to 127.0.0.1:8601; Connection refused
[2022-11-05T16:41:29.946Z] * Closing connection 0
[2022-11-05T16:41:29.946Z] + res=
[2022-11-05T16:41:29.946Z] + echo ''
[2022-11-05T16:41:29.946Z] + grep -q 'failed to get info:'
[2022-11-05T16:41:29.946Z] + echo ''
[2022-11-05T16:41:29.946Z] + grep -q 'etcd info'
[2022-11-05T16:41:29.946Z] + '[' 50 -eq 50 ']'
[2022-11-05T16:41:29.946Z] + echo 'Failed to start TiKV-CDC'
[2022-11-05T16:41:29.946Z] Failed to start TiKV-CDC
[2022-11-05T16:41:29.946Z] + exit 1

log: https://ci.pingcap.net/blue/organizations/jenkins/common-check/detail/common-check/112072/pipeline/

@pingyu
Copy link
Collaborator Author

pingyu commented Nov 6, 2022

/verify

@pingyu
Copy link
Collaborator Author

pingyu commented Nov 7, 2022

@haojinming @zeminzhou PTAL, thanks~

@zeminzhou
Copy link
Contributor

Mark: Captures seems to still be stuck after PD stopped. See https://ci.pingcap.net/blue/organizations/jenkins/common-check/detail/common-check/112055/pipeline/

And this may be the reason of "restart" argument of run_cdc_server not working ?

At the end of run_cdc_server, it will check cdc status and return.

50s maybe is not enough to stop and restart cdc.

RUN make integration_test_build kafka_consumer cdc
COPY --from=downloader /root/download/bin/* ./bin/
RUN make integration_test_build cdc
COPY --from=downloader /root/download/bin/* ./scripts/bin/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What to copy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tidb-server, tikv-server, etc., which are downloaded by ./scripts/download-integration-test-binaries.sh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now make integration_test_build will not download these binaries, we should call make prepare_test_binaries to download these binaries.

Copy link
Collaborator Author

@pingyu pingyu Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binaries are prepared at the beginning. See line 8 here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

# We always need to clean before we build, please don't adjust its order.
RUN make clean
RUN make integration_test_build kafka_consumer cdc
COPY --from=downloader /root/download/bin/* ./bin/
RUN make integration_test_build cdc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove cdc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tikv-cdc is necessary as test cases using $(tikv-cdc cli xxx) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean make integration_test_build cdc -> make integration_test_build, because make integration_test_build will build tikv-cdc.test & tikv-cdc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see.
Fixed, PTAL~

cdc/tests/up.sh Outdated
echo "Usage: $0 [OPTIONS]"
echo "OPTIONS:"
echo " --help Display this message"
echo " --tag (TAG) Specify images tag used in br tests"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

br -> cdc ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. PTAL~

echo "Run \"make integration_test\" to start integrated tests"
# shellcheck disable=SC2068
exec docker run -it \
-v "$host_tmp":/tmp/tikv_cdc_test \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why binds mount a volume?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To write logs to host directory, for addressing issues. Otherwise after exit the container, all logs gone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it, that's good idea!

Copy link
Contributor

@haojinming haojinming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zeminzhou
Copy link
Contributor

/verify

Copy link
Contributor

@zeminzhou zeminzhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM~

@pingyu pingyu merged commit b22ec23 into tikv:main Nov 8, 2022
zeminzhou pushed a commit to zeminzhou/migration that referenced this pull request Nov 10, 2022
* speed up integrated tests

Signed-off-by: pingyu <[email protected]>

* wip

Signed-off-by: pingyu <[email protected]>

* update release version; add usage comment for up.sh

Signed-off-by: pingyu <[email protected]>

* print more debug info

Signed-off-by: pingyu <[email protected]>

* fix fmt

Signed-off-by: pingyu <[email protected]>

* add shellcheck

Signed-off-by: pingyu <[email protected]>

* fix CI error

Signed-off-by: pingyu <[email protected]>

* remove "br" in comment

Signed-off-by: pingyu <[email protected]>

* remove useless cdc target

Signed-off-by: pingyu <[email protected]>

Signed-off-by: pingyu <[email protected]>
Signed-off-by: zeminzhou <[email protected]>
pingyu added a commit that referenced this pull request Nov 10, 2022
* add a integration test case for disk full

Signed-off-by: zeminzhou <[email protected]>

* [to #295] Speed up cdc integrated tests (#294)

* speed up integrated tests

Signed-off-by: pingyu <[email protected]>

* wip

Signed-off-by: pingyu <[email protected]>

* update release version; add usage comment for up.sh

Signed-off-by: pingyu <[email protected]>

* print more debug info

Signed-off-by: pingyu <[email protected]>

* fix fmt

Signed-off-by: pingyu <[email protected]>

* add shellcheck

Signed-off-by: pingyu <[email protected]>

* fix CI error

Signed-off-by: pingyu <[email protected]>

* remove "br" in comment

Signed-off-by: pingyu <[email protected]>

* remove useless cdc target

Signed-off-by: pingyu <[email protected]>

Signed-off-by: pingyu <[email protected]>
Signed-off-by: zeminzhou <[email protected]>

* fix check

Signed-off-by: zeminzhou <[email protected]>

* remove unused code & remove modified it cases (#301)

Signed-off-by: zeminzhou <[email protected]>

Signed-off-by: zeminzhou <[email protected]>
Co-authored-by: Ping Yu <[email protected]>
Signed-off-by: zeminzhou <[email protected]>

* address comment

Signed-off-by: zeminzhou <[email protected]>

* make check

Signed-off-by: zeminzhou <[email protected]>

* Update cdc/tests/integration_tests/disk_full/run.sh

Co-authored-by: Ping Yu <[email protected]>
Signed-off-by: zeminzhou <[email protected]>

* fix stop_downstream

Signed-off-by: zeminzhou <[email protected]>

* fix check

Signed-off-by: zeminzhou <[email protected]>

* fix it

Signed-off-by: zeminzhou <[email protected]>

* fix disk full

Signed-off-by: zeminzhou <[email protected]>

* make check

Signed-off-by: zeminzhou <[email protected]>

Signed-off-by: zeminzhou <[email protected]>
Signed-off-by: pingyu <[email protected]>
Co-authored-by: Ping Yu <[email protected]>
Co-authored-by: Ping Yu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cdc: Speed up integrated tests
3 participants