Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic "index out of bounds" while inserting a lot of data #4696

Closed
akhilman opened this issue Sep 8, 2024 · 14 comments
Closed

Panic "index out of bounds" while inserting a lot of data #4696

akhilman opened this issue Sep 8, 2024 · 14 comments
Labels
C-bug Category Bugs

Comments

@akhilman
Copy link

akhilman commented Sep 8, 2024

What type of bug is this?

Crash

What subsystems are affected?

Standalone mode

Minimal reproduce step

I am having an issue while inserting a lot of data (~100M rows) in to database. Crash seems to appear at stable points but this point changes when I change data order and insert batch size.
I am unable to reproduce this issue with synthetic data.

I am inserting data from python using mysql-connector-python.

What did you expect to see?

All data in database.

What did you see instead?

Mysql client exits with error.

What operating system did you use?

ArchLinux x64

What version of GreptimeDB did you use?

0.9.3 from docker.io

Relevant log output and stack trace

name = "GreptimeDB"
operating_system = "Ubuntu 22.4.0 [64-bit]"
crate_version = "0.9.3"
explanation = """
Panic occurred in file 'library/core/src/unicode/unicode_data.rs' at line 80
"""
cause = "index out of bounds: the len is 54 but the index is 54"
method = "Panic"
backtrace = """

   0: 0x5d099067d4f3 - core::panicking::panic_fmt::h088f9f84f0369284
   1: 0x5d099067d707 - core::panicking::panic_bounds_check::h8d8570f0c291a3bc
   2: 0x5d0990689ff1 - core::unicode::unicode_data::alphabetic::lookup::ha3afd12d2ef9c31c
   3: 0x5d09967d45c2 - <sqlparser::dialect::mysql::MySqlDialect as sqlparser::dialect::Dialect>::is_identifier_start::hb7160ac9833879dc
   4: 0x5d099677d845 - sqlparser::tokenizer::Tokenizer::next_token::h96a647e5f1c84413
   5: 0x5d099677851b - sqlparser::tokenizer::Tokenizer::tokenize_with_location_into_buf::h6c37888545515fb1
   6: 0x5d09967782fc - sqlparser::tokenizer::Tokenizer::tokenize_with_location::h5dcc8f918a10dccf
   7: 0x5d09965c9a78 - sqlparser::parser::Parser::try_with_sql::h0d1d0257366c368d
   8: 0x5d099656f139 - sql::parser::ParserContext::create_with_dialect::h13e0ced9455ecc62
   9: 0x5d09924230c7 - <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll::h9c609bd0d0a5476c
  10: 0x5d099231b4ff - <frontend::instance::Instance as servers::query_handler::sql::SqlQueryHandler>::do_query::{{closure}}::hc86a060ee36d0352
  11: 0x5d0992d45f50 - <servers::query_handler::sql::ServerSqlQueryHandlerAdapter<E> as servers::query_handler::sql::SqlQueryHandler>::do_query::{{closure}}::h6d3b6eae57ad2f8e
  12: 0x5d0995d27786 - servers::mysql::handler::MysqlInstanceShim::do_query::{{closure}}::{{closure}}::h26669b5fea190e35
  13: 0x5d0995d274a3 - servers::mysql::handler::MysqlInstanceShim::do_query::{{closure}}::h70cc041954d67259
  14: 0x5d0995cf6274 - <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll::h420661b5a60ea6d6
  15: 0x5d0995e5bde8 - <servers::mysql::handler::MysqlInstanceShim as opensrv_mysql::AsyncMysqlShim<W>>::on_query::{{closure}}::hbdddc472e4324239
  16: 0x5d099615d4b5 - opensrv_mysql::AsyncMysqlIntermediary<B,R,W>::run::{{closure}}::h8366bb57693a4ede
  17: 0x5d09961b2100 - servers::mysql::server::MysqlServer::do_handle::{{closure}}::h02d38e99c087ce48
  18: 0x5d09961ac3b6 - servers::mysql::server::MysqlServer::accept::{{closure}}::{{closure}}::{{closure}}::h997f054de4a445ec
  19: 0x5d099618ced3 - tokio::runtime::task::core::Core<T,S>::poll::h19c1d9b4ee7d9785
  20: 0x5d0995dceb1e - tokio::runtime::task::raw::poll::hf216e17bd5b4eeeb
  21: 0x5d0996d15e60 - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h543058108c555f4c
  22: 0x5d0996d149c8 - tokio::runtime::context::scoped::Scoped<T>::set::h2b29d0a095b218fc
  23: 0x5d0996d06f0e - tokio::runtime::context::runtime::enter_runtime::h44e72ba45108f872
  24: 0x5d0996d15cc8 - tokio::runtime::scheduler::multi_thread::worker::run::ha96f8790afda8481
  25: 0x5d0996cf64e1 - <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll::hb1dcc9e543fe117c
  26: 0x5d0996cffd91 - tokio::runtime::task::core::Core<T,S>::poll::he73d2bf735c749e7
  27: 0x5d0996cf39e2 - tokio::runtime::task::harness::Harness<T,S>::poll::hde14b7ac3601b678
  28: 0x5d0996d10d22 - tokio::runtime::blocking::pool::Inner::run::h7b7485da1443ffc2
  29: 0x5d0996d0a4c4 - std::sys_common::backtrace::__rust_begin_short_backtrace::hcadaa69c551331e8
  30: 0x5d0996d0b266 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hd5561996febc676c
  31: 0x5d09969793cb - std::sys::pal::unix::thread::Thread::new::thread_start::h68df5163d04f9130
  32: 0x73a0db094ac3 - <unresolved>
  33: 0x73a0db126850 - <unresolved>
  34:        0x0 - <unresolved>"""
@akhilman akhilman added the C-bug Category Bugs label Sep 8, 2024
@evenyag
Copy link
Contributor

evenyag commented Sep 8, 2024

sqlparser::tokenizer::Tokenizer::next_token::h96a647e5f1c84413

Maybe related to the parser crate or mysql service crate we use.

Could you also share your logs with us? It'd be helpful if you could change the log level to "debug", for example:

./greptime --log-level="info,sqlparser=debug" standalone start

This only enables the debug log for the sqlparser crate. You can also use the debug level for more verbose logs

./greptime --log-level="debug" standalone start

@discord9
Copy link
Contributor

discord9 commented Sep 9, 2024

it seems we are having a illegal char from somewhere since this test also failed with same panic:

#[test]
fn test_unicode() {
    for idx in 0..u32::MAX {
        // SAFETY: testing if invalid char is alphabetic would panic?
        let c: char = unsafe { std::mem::transmute(idx) };
        std::hint::black_box(c.is_alphabetic());
    }
}

Also as a quick patch(for panic, not correctness), we should use is_ascii_alphabetic here anyway since is_alphabetic also checks for Other langauages' alphabet like '京'.is_alphabetic() is also true!

edited: we still need to found what's the source of illegal unicode thought

@akhilman
Copy link
Author

akhilman commented Sep 9, 2024

I am currently trying to get DB to fail with all logging enabled.
I got it to fail with only sqlparser's debug logging, but DB did not panic, it just closed the connection. In the log I can see that the whole query of the last batch was not parsed, only half of it.

The query looks like this:

INSERT INTO foo (fields...)
VALUES (...) (...) ...

1_000 rows in one batch.

With only 100 rows per batch I was able to push all my data into the database.

When I tried to push just the time range that caused the DB to fail, I got no errors.

I tend to think this is some kind of back-pressure problem and somehow parser is getting truncated query.

I'm going to push my data with 10_000 row batches but with sleep calls between each batch to slow down the data stream to see if the DB will fail in this case once I finish the run with all logging enabled.

@evenyag
Copy link
Contributor

evenyag commented Sep 9, 2024

The opensrv crate already validates the string so the input string to the parser should be valid, I think. @discord9
https://github.com/datafuselabs/opensrv/blob/66140cc266e1eb712a0821c112a3ec743f9cccd4/mysql/src/lib.rs#L619-L626

@akhilman
Copy link
Author

akhilman commented Sep 9, 2024

To be clear DEBUG on_query{protocol="mysql"}:mysql::do_query:do_query: log: Parsing sql ... shows the entire batch.
But DEBUG on_query{protocol="mysql"}:mysql::do_query:do_query: log: ... stops in the middle field of the middle row. And then connection is closed.
greptimedb-fail-sqlparser-debug.log.gz

@evenyag
Copy link
Contributor

evenyag commented Sep 9, 2024

It appears that there is not enough information in the debug log 😢 I only see the parser parsing the input query. I couldn't reproduce this using the query in the debug log.

I am inserting data from python using mysql-connector-python.

Would you mind sharing your project with us? You can remove your sensitive information from it and only provide a minimal program that can reproduce this issue. Then we can debug this locally.

And then connection is closed.

You can also enable all the debug logs to see if there are other helpful messages.

./greptime --log-level="debug" standalone start

@akhilman
Copy link
Author

akhilman commented Sep 9, 2024

Unfortunately I can't reproduce the problem with all logs enabled due to lack of disc space.
I'll prepare all data and code.

@evenyag
Copy link
Contributor

evenyag commented Sep 9, 2024

Unfortunately I can't reproduce the problem with all logs enabled due to lack of disc space. I'll prepare all data and code.

Thanks!

@discord9
Copy link
Contributor

discord9 commented Sep 9, 2024

Neither can I reproduce with mysql-connector-python in Python 3.10.12 using:

CREATE TABLE public_trades (
    exchange STRING,
    symbol STRING,
    ts TIMESTAMP TIME INDEX,
    price DOUBLE,
    amount DOUBLE,
    side STRING,
    PRIMARY KEY (exchange, symbol)
);

and the insert sql from log file, is there more instruction for how to reproduce it? Maybe run the db with gdb and set breakpoint on rust_panic and print all backtrace for further analysis? Especially interested in what invalid char is_identifier_start is dealing with when panic happen

@akhilman
Copy link
Author

akhilman commented Sep 9, 2024

I am able to reproduce the error with small amount of data and full logging enabled using MySQLdb library. Log attached to this message.

greptimedb-err.2024-09-09-12.gz
greptimedb.2024-09-09-12.gz
greptimedb-err.2024-09-09-11.gz
greptimedb.2024-09-09-11.gz

  • greptimedb.2024-09-09-11 shows an extra-large query that appears to be split into two invalid queries;
  • greptimedb.2024-09-09-12 shows two failed inserts with small 100 row batches, in the first one a unicode symbol appears from nowhere (line 1533185), panic at (line 1518357), in the second one a parser error (line 1538201).

I have used MySQLdb to insert all my data but with different tags, not uuid but two short tags, with no problem.

I'll share data and code in a few days.

@evenyag
Copy link
Contributor

evenyag commented Sep 10, 2024

We guess this is caused by databendlabs/opensrv#49

The next_async may have safety problem.
https://github.com/datafuselabs/opensrv/blob/3142eb545fab811bc9289091781ce6d771cf0bf7/mysql/src/packet_reader.rs#L125-L142

@discord9
Copy link
Contributor

Now we can also reliably reproduce the issue, we are working on fix that data racing from opensrv, and also filed a issue to remind the owner of that crate, databendlabs/opensrv#66

@akhilman
Copy link
Author

This explains why I cannot consistently get these errors.

Here is my test script and data:
https://github.com/akhilman/greptimedb-sqlparse-fault

Thank you.

@discord9
Copy link
Contributor

Close by #4732

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category Bugs
Projects
None yet
Development

No branches or pull requests

3 participants