Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug](mark-join) make mark join column insert false when not null aware probe is null #29311

Closed
wants to merge 4,509 commits into from

Conversation

BiteTheDDDDt
Copy link
Contributor

Proposed changes

make mark join column insert false when not null aware probe is null

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

jackwener and others added 30 commits December 20, 2023 17:35
…etting mv related partition table (apache#28699)

1. Fix data wrong using mv rewrite
2. Ignore case when getting mv related partition table
3. Enable infer expression column name without alias when create mv
1. parser's partitionSpec changed unexpectly by PR apache#26492
2. delete without using should support un-equals expression
…v to be unable to be reclaimed, resulting in excessive memory usage by FE (apache#28723)

when replay addTaskResult log,will create one ConnectContext,and set Env.getCurrentEnv,then store this ctx in ConnectContext.threadLocalInfo,threadLocalInfo is static,so this ctx can not be recycling,Env of replay thread also can not be recycling
…che#28737)

```
4# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75
5# __cxxabiv1::__terminate(void (*)()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
6# 0x00005622F33D22B1 in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
7# 0x00005622F33D2404 in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
8# fmt::v7::detail::error_handler::on_error(char const*) in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
9# char const* fmt::v7::detail::parse_replacement_field<char, fmt::v7::detail::format_handler<fmt::v7::detail::buffer_appender<char>, char, fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<char>, char> >&>(char const*, char const*, fmt::v7::detail::format_handler<fmt::v7::detail::buffer_appender<char>, char, fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<char>, char> >&) in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
10# void fmt::v7::detail::vformat_to<char>(fmt::v7::detail::buffer<char>&, fmt::v7::basic_string_view<char>, fmt::v7::basic_format_args<fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<fmt::v7::type_identity<char>::type>, fmt::v7::type_identity<char>::type> >, fmt::v7::detail::locale_ref) in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
11# doris::vectorized::NewJsonReader::_append_error_msg(rapidjson::GenericValue<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool*) at /root/doris/be/src/vec/exec/format/json/new_json_reader.cpp:924
12# doris::vectorized::NewJsonReader::_set_column_value
```
* fix explode with array<decimal> has specific precision at old planner
…es (apache#28698)

Co-authored-by: yiguolei <[email protected]>Add a new class broadcastbufferholderqueue to manage holders
Using shared ptr to manage holders, not use ref and unref, it is too difficult to maintain.
…8638)

There is a circuit breaker lasting for 2 minutes in grpc, then if a be is down and up again, send fragments to the be fails lasting for 2 minutes.
* [Improve](compile) add `__AVX2__` macro for JsonbParser

* throw exception instead of CHECK
…oject(logicalFilter(logicalOlapScan())) without agg (apache#28747)

support match logicalAggregate(logicalProject(logicalFilter(logicalOlapScan())) without agg
…ed (apache#28650)

`ScannerContext` will schedule scanners even after stopped, and confused with `_is_finished` and `_should_stop`.
 Only Fix the concurrency bugs when scanner is stopped or finished reported in apache#28384
fe matchExactType function should call type.matchTypes for its own logic, do not switch case to do special logic otherwise we may meet core in be like this.
 ```
F20231208 18:54:39.359673 680131 block.h:535] Check failed: _data_types[i]->is_nullable()  target type: Struct(l_info:Nullable(Array(Nullable(String)))) src type: Struct(col:Nullable(Array(Nullable(UInt8))))
*** Check failure stack trace: ***
    @     0x5584e952b926  google::LogMessage::SendToLog()
    @     0x5584e9527ef0  google::LogMessage::Flush()
    @     0x5584e952c169  google::LogMessageFatal::~LogMessageFatal()
    @     0x5584cf17201e  doris::vectorized::MutableBlock::merge_impl<>()
    @     0x5584ceac4b1d  doris::vectorized::MutableBlock::merge<>()
    @     0x5584d4dd7de3  doris::vectorized::VUnionNode::get_next_const()
    @     0x5584d4dd9a45  doris::vectorized::VUnionNode::get_next()
    @     0x5584bce469bd  std::__invoke_impl<>()
    @     0x5584bce466d0  std::__invoke<>()
    @     0x5584bce465c7  _ZNSt5_BindIFMN5doris8ExecNodeEFNS0_6StatusEPNS0_12RuntimeStateEPNS0_10vectorized5BlockEPbEPS1_St12_PlaceholderILi1EESC_ILi2EESC_ILi3EEEE6__callIS2_JOS4_OS7_OS8_EJLm0ELm1ELm2ELm3EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
    @     0x5584bce46358  std::_Bind<>::operator()<>()
    @     0x5584bce46208  std::__invoke_impl<>()
    @     0x5584bce46178  _ZSt10__invoke_rIN5doris6StatusERSt5_BindIFMNS0_8ExecNodeEFS1_PNS0_12RuntimeStateEPNS0_10vectorized5BlockEPbEPS3_St12_PlaceholderILi1EESD_ILi2EESD_ILi3EEEEJS5_S8_S9_EENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EESL_E4typeEOSM_DpOSN_
    @     0x5584bce45c18  std::_Function_handler<>::_M_invoke()
    @     0x5584bce6412f  std::function<>::operator()()
    @     0x5584bce56382  doris::ExecNode::get_next_after_projects()
    @     0x5584bce26218  doris::PlanFragmentExecutor::get_vectorized_internal()
    @     0x5584bce2431b  doris::PlanFragmentExecutor::open_vectorized_internal()
    @     0x5584bce22a96  doris::PlanFragmentExecutor::open()
    @     0x5584bce27c9d  doris::PlanFragmentExecutor::execute()
    @     0x5584bcbdb3f8  doris::FragmentMgr::_exec_actual()
    @     0x5584bcbf982f  doris::FragmentMgr::exec_plan_fragment()::$_0::operator()()
    @     0x5584bcbf9715  std::__invoke_impl<>()
    @     0x5584bcbf96b5  _ZSt10__invoke_rIvRZN5doris11FragmentMgr18exec_plan_fragmentERKNS0_23TExecPlanFragmentParamsERKSt8functionIFvPNS0_12RuntimeStateEPNS0_6StatusEEEE3$_0JEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EESH_E4typeEOSI_DpOSJ_
    @     0x5584bcbf942d  std::_Function_handler<>::_M_invoke()
    @     0x5584b9dfd883  std::function<>::operator()()
    @     0x5584bd6e3929  doris::FunctionRunnable::run()
    @     0x5584bd6cf8ce  doris::ThreadPool::dispatch_thread()
```
xinyiZzz and others added 21 commits December 28, 2023 11:39
…n column (apache#29107)

pred column also needs to be filtered by expr, exclude delete condition column, delete condition column not need to be filtered, query engine does not need it, after _output_column_by_sel_idx, delete condition materialize column will be erase at the end of the block.
Eg:
delete from table where a = 10;
select b from table;
a column only effective in segment iterator, the block from query engine only contain the b column, so no need to filter a column by expr.
…er when load finish (apache#28901)

[Goal]
When building the rowset writer, avoid waiting for inflight segcompaction
to elimite long tail latency for load.

[Current situation]
1. The segcompaction of a rowset is executed serially. During the build phase,
we need to wait for the completion of the inflight segcompaction task.

2. If the rowset writer finishes writing and starts building meta, then segments
that have not been compacted will not be submitted to segcompaction worker.
We simply ignore them to accelerate the build process.

3. But this is not enough. If a segcompaction task has already been submitted to
the worker thread pool, we will set a cancelled flag for the worker,
and nothing will be done during execution to complete the task ASAP.

4. But this is still not enough. Although the latency of the segcompaction task
has been shortened by aforemetioned method, tasks may still be queuing in the
thread pool.

[Solution]
We can increase the worker thread pool to avoid queuing congestion, but this is
not the best solution.
Segcompaction should be a best effort work, and should not use too many CPU and
memory resources. So we adopted the strategy of unbinding build and segcompaction,
specifically:

1. For the segcompaction task that is performing compaction operations, we should
not interrupt it, otherwise it may cause file corruption

2. For those tasks still queued, we no longer care about their results (because
these tasks will know they are cancelled and will not perform any actual operations),
so we just ignore them and continue with the subsequent rowset build process

Signed-off-by: freemandealer <[email protected]>
…#29045)

* fix wrong result after reorder mor table

* update
…pache#29205)

Opening a BDBJournal will acquire the max journal id, but it doesn't
need to check whether the replica txn is matched with the master.
… to reduce IO(apache#28810)

Normally we write the separate index files to disk before we merge the index files into an idx compound file.
In high-frequency load scenarios, disk IO can become a bottleneck. 
In order to reduce the pressure on the disk, we write the standalone index file to the RAM directory for the first time, and then write it to the disk when merging it into a composite file.

Add config `index_inverted_index_by_ram_dir_enable`, default is `false`.
@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.86% (8015/21170)
Line Coverage: 29.53% (65176/220677)
Region Coverage: 28.97% (33496/115605)
Branch Coverage: 24.83% (17188/69222)
Coverage Report: http://coverage.selectdb-in.cc/coverage/739097df898ceed883225fb05f7cb44bf6fc6306_739097df898ceed883225fb05f7cb44bf6fc6306/report/index.html

@github-actions github-actions bot added area/pipeline kind/docs Categorizes issue or PR as related to documentation. labels Dec 29, 2023
@xiaokang xiaokang marked this pull request as draft December 30, 2023 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/pipeline kind/docs Categorizes issue or PR as related to documentation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.