Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Buildkite] Failed pre-flight in ftest causes exit code not propagate to buildkite correctly #3158

Open
jedrazb opened this issue Jan 29, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@jedrazb
Copy link
Member

jedrazb commented Jan 29, 2025

Bug Description

If connector service fails at pre-flight check, e.g. when attempting to create connector / sync job index this will cause the ftest to not report the error back to buildkite.

See this build https://buildkite.com/elastic/connectors-nightly/builds/3038#0194a6b2-2b54-4465-b15f-017e44589e82

To Reproduce

Steps to reproduce the behavior:

  1. Add a change that causes preflight check in connector framework to fail
  2. See that the ftest step will take 24hrs until the buildkite agent is considered as lost

Expected behavior

  1. Exit code should be propagated back to buildkite immediately

Screenshots

See: reported exit code is 10 but buildkite thinks it's -1 (agent lost)

Image

Environment

Buildkite nightly pipeline

Additional context

Logs

... (truncated)
[FTEST][07:41:29][INFO] Executing action: 'check_stack'.
--
  | [FMWK][07:41:30][INFO] Waiting for Elasticsearch at http://127.0.0.1:9200 (so far: 0 secs)
  | [FMWK][07:41:30][ERROR] Could not connect to the Elasticsearch server
  | [FMWK][07:41:30][ERROR] Server disconnected
  | [FMWK][07:41:35][INFO] Waiting for Elasticsearch at http://127.0.0.1:9200 (so far: 5 secs)
  | [FMWK][07:41:35][ERROR] Could not connect to the Elasticsearch server
  | [FMWK][07:41:35][ERROR] Server disconnected
  | [FMWK][07:41:45][INFO] Waiting for Elasticsearch at http://127.0.0.1:9200 (so far: 15 secs)
  | [FMWK][07:41:45][ERROR] Could not connect to the Elasticsearch server
  | [FMWK][07:41:45][ERROR] [Errno 104] Connection reset by peer
  | [FMWK][07:42:05][INFO] Waiting for Elasticsearch at http://127.0.0.1:9200 (so far: 35 secs)
  | None
  | + /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../.venv/bin/fake-kibana --index-name search-mysql --service-type mysql --config-file mysql/config.yml --connector-definition mysql/connector.json --debug
  | [FMWK][07:42:06][INFO] Loading config from mysql/config.yml
  | [KBN-FAKE][07:42:06][INFO] Loaded connector definition from mysql/connector.json
  | [KBN-FAKE][07:42:06][INFO] Creating 'mysql-functional-test' connector
  | /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/connectors/es/client.py:238: GeneralAvailabilityWarning: This API is in beta and is subject to change. The design and code is less mature than official GA features and is being provided as-is with no warranties. Beta features are not subject to the support SLA of official GA features.
  | result = await func()
  | [KBN-FAKE][07:42:07][INFO] Updating scheduling for 'mysql-functional-test' connector
  | [KBN-FAKE][07:42:07][INFO] Updating configuration for 'mysql-functional-test' connector
  | [KBN-FAKE][07:42:07][INFO] Prepare search-mysql
  | [KBN-FAKE][07:42:07][DEBUG] Creating index search-mysql
  | [KBN-FAKE][07:42:07][INFO] Done
  | Unclosed client session
  | client_session: <aiohttp.client.ClientSession object at 0x76e501398f10>
  | Unclosed connector
  | connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x76e50090ada0>, 223.03005257)])']
  | connector: <aiohttp.connector.TCPConnector object at 0x76e5013990f0>
  | + /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../.venv/bin/python fixture.py --name mysql --action load
  | [FTEST][07:42:08][INFO] Executing action: 'load'.
  | Adding data from table #0...
  | Inserting 500 lines in 5 batches
  | Inserting batch #0 of 100 documents.
  | Inserting batch #1 of 100 documents.
  | Inserting batch #2 of 100 documents.
  | Inserting batch #3 of 100 documents.
  | Inserting batch #4 of 100 documents.
  | + [[ yes == \y\e\s ]]
  | + /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../.venv/bin/python fixture.py --name mysql --action description
  | [FTEST][07:42:18][INFO] Executing action: 'description'.
  | [FTEST][07:42:19][INFO] Running an e2e test for mysql with a small corpus.
  | + [[ linux == \d\a\r\w\i\n ]]
  | + PID=4043
  | + /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../.venv/bin/perf8 --refresh-rate 5 -t /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../perf8-report-mysql --asyncstats --psutil --max-duration 600 --description description.txt -c /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../.venv/bin/elastic-ingest --config-file mysql/config.yml --debug
  | + /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../.venv/bin/python fixture.py --name mysql --action monitor --pid 4043
  | [FTEST][07:42:20][INFO] Executing action: 'monitor'.
  | [perf8][07:42:20][INFO] [perf8] Running /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/.venv/bin/python3 -m perf8.runner -t /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../perf8-report-mysql --ppid 4043 --plugins perf8.plugins._asyncstats:EventLoopMonitoring -s '/opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../.venv/bin/elastic-ingest --config-file mysql/config.yml --debug'
  | [perf8][07:42:20][INFO] [perf8] Plugins: psutil, asyncstats
  | [perf8][07:42:20][INFO] Starting the probing -- every 5.0 seconds
  | [FMWK][07:42:22][INFO] Running connector service version 9.0.0
  | [FMWK][07:42:22][INFO] Loading config from mysql/config.yml
  | [FMWK][07:42:22][DEBUG] ESManagementClient connecting to http://localhost:9200
  | [FMWK][07:42:22][DEBUG] Initial Elasticsearch node configuration is NodeConfig(scheme='http', host='localhost', port=9200, path_prefix='', headers={}, connections_per_node=10, request_timeout=10.0, http_compress=False, verify_certs=True, ca_certs=None, client_cert=None, client_key=None, ssl_assert_hostname=None, ssl_assert_fingerprint=None, ssl_version=None, ssl_context=None, ssl_show_warn=True, _extras={})
  | [FMWK][07:42:22][DEBUG] Connecting using Basic Auth (user: elastic)
  | [FMWK][07:42:22][DEBUG] Verifying cert with system certificates
  | [FMWK][07:42:22][INFO] Running preflight checks
  | [FMWK][07:42:22][DEBUG] Wait for Elasticsearch (max: 120)
  | [FMWK][07:42:22][INFO] Waiting for Elasticsearch at http://localhost:9200 (so far: 0 secs)
  | [FMWK][07:42:22][DEBUG] Seed node configuration: (NodeConfig(scheme='http', host='localhost', port=9200, path_prefix='', headers={'user-agent': 'elasticsearch-py/8.17.0 (Python/3.10.16; elastic-transport/8.15.1)'}, connections_per_node=10, request_timeout=10.0, http_compress=False, verify_certs=True, ca_certs=None, client_cert=None, client_key=None, ssl_assert_hostname=None, ssl_assert_fingerprint=None, ssl_version=None, ssl_context=None, ssl_show_warn=True, _extras={}),)
  | [FMWK][07:42:22][DEBUG] Connected to Elasticsearch server with version '9.0.0-SNAPSHOT' and build flavor 'default'
  | [FMWK][07:42:22][INFO] Elasticsearch 9.0.0-SNAPSHOT and Connectors 9.0.0 are compatible
  | [FMWK][07:42:22][INFO] Extraction service is not configured, skipping its preflight check.
  | [FMWK][07:42:22][DEBUG] Checking index .elastic-connectors-v1
  | [FMWK][07:42:22][DEBUG] Checking index .elastic-connectors-sync-jobs-v1
  | [FMWK][07:42:22][WARNING] Client method 'create' retry 1: api error with status 403
  | [FMWK][07:42:22][WARNING] Attempt 1/10 failed. Retrying...
(10 retries ...)
  | [FMWK][07:47:22][DEBUG] Checking index .elastic-connectors-v1
  | [FMWK][07:47:22][DEBUG] Checking index .elastic-connectors-sync-jobs-v1
  | [FMWK][07:47:22][WARNING] Client method 'create' retry 1: api error with status 403
  | [FMWK][07:47:22][CRITICAL] Could not perform preflight check after 10 retries.
  | [FMWK][07:47:22][DEBUG] Cancelling sleeps...
  | [FMWK][07:47:22][DEBUG] Cancelling sleeps...
  | [FMWK][07:47:22][DEBUG] Cancelling sleeps...
  | [FMWK][07:47:22][INFO] Bye
  | [perf8][07:47:22][INFO] Script is over -- sending a signal to 4043
  | [perf8][07:47:22][INFO] We got a 10 signal, the app finished execution
  | [perf8][07:47:22][INFO] The app wrapper is now building in-process reports
  | [perf8][07:47:22][INFO] Wrote /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/perf8-report-mysql/report.json
  | [perf8][07:47:22][INFO] [4043][psutil] Loaded 61 data points from /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../perf8-report-mysql/report.csv
  | [perf8][07:47:22][INFO] [4043][psutil] Saved plot file at /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../perf8-report-mysql/rss.png
  | [perf8][07:47:22][INFO] [4043][psutil] Saved plot file at /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../perf8-report-mysql/cpu.png
  | [perf8][07:47:22][INFO] [4043][psutil] Saved plot file at /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../perf8-report-mysql/threads.png
  | [perf8][07:47:23][INFO] [4043][psutil] Saved plot file at /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../perf8-report-mysql/disk_io.png
  | [perf8][07:47:23][INFO] [4043][psutil] Saved plot file at /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../perf8-report-mysql/disk.png
  | [perf8][07:47:23][INFO] Command execution time 303.10 seconds.
  | [perf8][07:47:23][INFO] Reports generated:
  | [perf8][07:47:23][INFO] Plugin psutil generated 7 report(s)
  | [perf8][07:47:23][INFO] Plugin asyncstats generated 1 report(s)
  | [perf8][07:47:23][INFO] Find the full report at /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../perf8-report-mysql/index.html
  | [perf8][07:47:23][INFO] 🎉 Looking sharp!
  | [perf8][07:47:23][INFO] Writing status in /opt/buildkite-agent/builds/bk-agent-prod-gcp-1737963489030934493/elastic/connectors-nightly/tests/../perf8-report-mysql/status
@jedrazb jedrazb added the bug Something isn't working label Jan 29, 2025
@jedrazb jedrazb changed the title [Buildkite] Failed pre-flight in ftest causes exit code not propagated correctly [Buildkite] Failed pre-flight in ftest causes exit code not propagate to buildkite correctly Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant