Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful stop is ignored when running SQL tests via xk6-sql plugin #3958

Open
alex-dubrouski opened this issue Sep 16, 2024 · 6 comments
Open
Assignees

Comments

@alex-dubrouski
Copy link

alex-dubrouski commented Sep 16, 2024

Brief summary

We are using K6 with xk6-sql plugin to load test our local TiDB cluster. On top of regular load tests we run basic chaos engineering like cluster failover and node restart during the test. We noticed that TiDB cluster disruptions cause K6 process to hang for a long time.

k6 version

v0.53.0

OS

Linux x86-64

Docker version and image (if applicable)

No response

Steps to reproduce the problem

When running simple select test and manually restarting one of the cluster nodes:

./k6 run -e executor="ramping-arrival-rate" -e preAllocatedVUs=10000 -e gracefulStop=30s -e stages="10s:5,30s:10000,4m:10000,10s:5" -e K6_DNS="ttl=0,select=roundRobin,policy=preferIPv6" simple-range.js

this test is expected to finish in 4mins 50sec, but it actually takes:

running (19m19.8s), 00000/10000 VUs, 2597641 complete and 2502 interrupted iterations
open_model ✓ [======================================] 02502/10000 VUs  4m50s  00114.01 iters/s

Please note running (19m19.8s), traffic correctly drops to 0 after 4mins 50sec, but process keeps sitting idle for over 15 more minutes
I built K6 locally from the trunk with debug symbols and profiled the app when it hangs with perf and bcc-tools:
Offcpu (idle descheduled threads)
Image
Offwake (re-scheduled threads and threads that woke them up combined in one stack)
Image
and perf:

# ........  .......  .................  ......................................
#
    13.89%  k6       k6                 [.] runtime.checkTimers
    13.69%  k6       k6                 [.] runtime.findRunnable
    13.65%  k6       [kernel.kallsyms]  [k] __perf_event_task_sched_in
    13.52%  k6       k6                 [.] runtime.checkdead
    10.98%  k6       k6                 [.] runtime/internal/syscall.EpollWait
    10.66%  k6       [kernel.kallsyms]  [k] __schedule
    10.60%  k6       [kernel.kallsyms]  [k] touch_atime
     3.35%  k6       k6                 [.] runtime.runtimer
     2.78%  k6       [kernel.kallsyms]  [k] fput_many
     1.82%  k6       k6                 [.] runtime.slicebytetostring
     1.50%  k6       [kernel.kallsyms]  [k] _raw_read_lock_irqsave
     1.23%  k6       k6                 [.] runtime.retake
     1.19%  k6       [kernel.kallsyms]  [k] perf_event_groups_first
     0.34%  k6       [kernel.kallsyms]  [k] syscall_return_via_sysret
     0.23%  k6       k6                 [.] runtime.runOneTimer
     0.22%  k6       k6                 [.] runtime.stealWork
     0.18%  k6       [kernel.kallsyms]  [k] htab_map_update_elem
     0.15%  k6       [kernel.kallsyms]  [k] finish_task_switch.isra.0
     0.03%  k6       [kernel.kallsyms]  [k] native_write_msr
     0.00%  k6       [kernel.kallsyms]  [k] native_apic_msr_write

Test script is very simple it just randomly selects data range from a set of configured tables:

import sql from 'k6/x/sql';
import { check } from 'k6';
import * as db_config from './db-config.js';
import * as common from './common.js';

export const options = common.options;

export default function() {
    const randomId = common.getRandomId();
    const randomTable = common.getRandomTable();

    const results = sql.query(db_config.db, 'SELECT c FROM ' + randomTable + ' WHERE id BETWEEN ? AND ?',
        randomId, randomId + common.rangeSize);

    check(results, {
        'is length >= 1': (r) => r.length >= 1,
    });
}

export function teardown() {
    db_config.db.close();
}

Expected behaviour

K6 process supposed to exit after configured graceful stop

Actual behaviour

K6 process keeps sitting idle for a prolonged amount of time

@joanlopez
Copy link
Contributor

joanlopez commented Oct 3, 2024

Hi @alex-dubrouski,

I'll allocate some time to try to reproduce this issue, but meanwhile, here's a couple of questions:

  • Do you think this is something specific to TiDB and how it manages connections? Or could be reproducible with MySQL as well?
  • Do you think this is something specific to TiDB cluster? Or would likely be the same with a single instance?

Also, could you please share a redacted example of how you set up db_config.db, please?
Like, how do you call sql.open. I'm not interested about the connection details (host, port, etc) but if you use any specific configuration other than just the basic connection string.

Thanks!

@alex-dubrouski
Copy link
Author

Hi @joanlopez,

Here is the config

import sql from 'k6/x/sql';

const connectionString = __ENV.db_user + ':'
        + __ENV.db_pwd + '@tcp('
        + __ENV.db_host + ':'
        + __ENV.db_port + ')/'
        + __ENV.db_name;

export let db = sql.open("mysql", connectionString);

I also asked TiDB team to provide details on their co fig and experiment.

@ol-neostyle
Copy link

Hi @joanlopez,

We currently don't know if this is specific to TiDB, but I think it could be reproducible with MySQL as well.
It should not be specific to TiDB cluster.

In our experiment we started the load using K6 that supposed to complete in X mins and after some time we sent signal.SIGSTOP to one of the tidb-server process on the TiDB node. Other tidb nodes are working fine. This causes K6 to hang, we see it is seems to consider those connections alive and keep polling them.

In our test we used: 1 HAProxy (that round robins TiDB hosts) and a TiDB Cluster of (3 PD nodes, 4 TiDB nodes, and 4 TiKV nodes).

Bu I think it might be reproducible with 1-2 TiDB Hosts as well. Potentially even with MySql.

@joanlopez
Copy link
Contributor

Great, thanks @alex-dubrouski and @ol-neostyle for the additional details.
I'll try to reproduce it asap, hopefully along this week.

@sharathjag
Copy link

sharathjag commented Feb 20, 2025

@joanlopez

For reproducing the issue, you can use this script:
Steps to reproduce.

script:

import sql from "k6/x/sql";

// the actual database driver should be used instead of ramsql
import driver from "k6/x/sql/driver/mysql";

const db = sql.open(driver, "some-db");

export function setup() {
  db.exec(`
  CREATE TABLE IF NOT EXISTS roster
       (
         id INTEGER NOT NULL AUTOINCREMENT,
         given_name VARCHAR NOT NULL,
         family_name VARCHAR NOT NULL,
         PRIMARY KEY (ID)
       );
 `);
}

export function teardown() {
  db.close();
}

export default function () {
  let result = db.exec(`
    INSERT INTO roster_2
      (given_name, family_name)
    VALUES
      ('Peter', 'Pan'),
      ('Wendy', 'Darling'),
      ('Tinker', 'Bell'),
      ('James', 'Hook');
  `);
  console.log(`${result.rowsAffected()} rows inserted`);

  let rows = db.query("SELECT * FROM roster WHERE given_name = $1;", "Peter");
  for (const row of rows) {
    console.log(`${row.family_name}, ${row.given_name}`);
  }
}
  • start mysql locally.
  • ./k6 run --vus 10 --duration 60s ./example.js
  • kill -SIGSTOP 76699
  • Wait for query/vu to run indefinitely

We have a fix/workaround on our fork: https://github.com/sharathjag/xk6-sql/blob/conn-options-with-timeouts/sql/module.go#L275 but not sure if this is a generic bug or needs a fix in sql extension.

LMK. Happy to coordinate with xk6-sql if you think it needs to be handled in extension

@szkiba
Copy link
Contributor

szkiba commented Feb 21, 2025

Hey folks, I think it's possible that the context handling in the xk6-sql extension could cause a similar problem. Currently, the VU context is not being used, but the background context. I think this is a potential problem. I'll fix it, so don't waste your time investigating. I'll report back when the context handling is fixed.

Track here: grafana/xk6-sql#124

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants