内存问题快用出阴影了( join 怎么能吃这么多内存的?) #22806

liuanxin · 2023-08-10T04:55:09Z

liuanxin
Aug 10, 2023

32G 内存, 1 fe + 1 be, doris 版本 2.0-beta, 主表 6 个表, 表字段在 30 个左右, 使用 unique key 模型, id + times, 其中 times 是 partition key 按月自动, 消费 kafka-consumer 批量写进这 6 个表

有数个定时任务, insert into select 到结果表(select 中通常是单表或小表 left join 大表, 都有条件, 查询条件是小表的时间列, 一条就处理一天或一个月, 时间是 partition key, 一个月一个 partition), 写进去的时候就是 bitmap 或 sum 已经统计好了的数据

这应该不算是很复杂的处理了, 单表的数据量在 400 ~ 9000 万之间, 所有表加起来的日数据在 30 万左右, 老实说就这点数据量... 结果, 日志里全是

Caused by: java.sql.SQLException: errCode = 2, detailMessage = (xxx)
[CANCELLED]Process has no memory available, cancel top memory usage query: 
query memory tracker <Query#Id=xxxx> consumption 32.77 KB, 
backend xxx process memory used 20.55 GB exceed limit 25.01 GB or 
sys mem available 1.57 GB less than low water mark 1.60 GB. 
Execute again after enough memory, details see be.INFO.

之后又是大量的

nested exception is java.sql.SQLException:
Could not retrieve transaction read-only status from server

次数多了之后, be 进程直接被操作系统杀掉了. 重启后数小时又继续, 虽然用 supervisor 做了自动重启, 但是内心的无奈真的是无以言表.

!!!快用出心理阴影来了!!! 没理由这点数据量还要整上集群, 每台还 64G+ 吧(这点量真的说服不了上面让堆硬件), 机器成本也不便宜啊!!! 你们就不能优化一下内存吗?!!! 或者有什么「内存不够的时候 -> 哪怕慢一点您也别崩啊」的 be 配置吗? 翻遍了官方文档也没有相关的信息

liuanxin · 2023-08-10T09:55:06Z

liuanxin
Aug 10, 2023
Author

不好意思说错了, 不是数小时, 是十多分钟

6 replies

liuanxin Aug 11, 2023
Author

Release notes 里面 彻底告别 OOM 的牛现在看来真的一言难尽, 在内存充足时让用户无需感知内存使用 只有苦笑

Hby118 Aug 11, 2023

我还不是2.0 我的是1.2.4

dataroaring Aug 15, 2023
Collaborator

可以在be的日志be.INFO 里看下查询失败之前的内存统计日志，里边打印出了导入查询 compaction 等占用的内存。

xinyiZzz Aug 25, 2023
Collaborator

@liuanxin 这个query才用了 32.77 KB 内存，但系统剩余内存只剩 1.57 GB，所以query被cancel和这个query自己无关，和数据量大小无关

彻底告别 OOM 是指一般情况，你这个集群是混部的，看日志除了Doris BE还有其他进程用了至少6G内存，这会导致基于 be mem_limit 控制内存失效，只剩下系统剩余可用内存的限制，所以通常建议混部集群，要调低 be.conf 中的 mem_limit （默认80%）

在内存充足时让用户无需感知内存使用 此刻你报错的原因是系统内存不足了，内存不足了 自然需要 用户感知了

另外be进程用用了20G内存，你要在 be.INFO 搜下 “Memory Tracker Summary”，看下这20G用在哪了参考这个文档： https://doris.apache.org/zh-CN/docs/dev/admin-manual/maint-monitor/memory-management/memory-limit-exceeded-analysis

xinyiZzz Aug 25, 2023
Collaborator

@Hby118 BE进程用了 200G+ 的内存，参考这个文档： https://doris.apache.org/zh-CN/docs/dev/admin-manual/maint-monitor/memory-management/memory-limit-exceeded-analysis，同样在 be.INFO 中搜 “Memory Tracker Summary” 看内存用在哪了

liuanxin · 2023-08-11T02:07:50Z

liuanxin
Aug 11, 2023
Author

参考 https://doris.apache.org/zh-CN/docs/dev/admin-manual/config/be-config 里面释放内存的配置项

$ curl -X POST http://ip:be_port/api/update_config?memory_mode=compact&persist=true
{
    "status": "BAD",
    "msg": "set memory_mode=compact failed, reason: [NOT_IMPLEMENTED_ERROR]'memory_mode' is not support to modify"
}

咱也不知道官方文档里面写的内容到底适用于哪些版本的

2 replies

learner1212 Aug 11, 2023

这个不支持动态修改吧

xinyiZzz Aug 25, 2023
Collaborator

这个参数在1.2之前可用，2.0之后废弃了，文档还没改

liuanxin · 2023-08-11T12:10:01Z

liuanxin
Aug 11, 2023
Author

select count(*) from t_yyy;
+----------+
| count(*) |
+----------+
| 94043916 |
+----------+
1 row in set (1.66 sec)



select count(*) from t_xxx where times between '2023-01-01 00:00:00' and '2023-01-01 23:59:59';
+----------+
| count(*) |
+----------+
|    9358  |
+----------+
1 row in set (0.05 sec)



insert into t_result(...)
select x.a, x.b, x.c,  y.d
from t_xxx x left join t_yyy y on x.xid = y.xid
where x.times between '2023-01-01 00:00:00' and '2023-01-01 23:59:59';

Query OK, 9358 rows affected (19.20 sec)
{'label':'insert_xx_yy', 'status':'VISIBLE', 'txnId':'zz'}

就这样的一条 insert sql, 在运行前系统的空闲内存有 19G, 在运行的过程中, 内存一直掉到 17G... 11G 最低的时候是 4.3G, 之后开始回升到 6.2G, 执行完之后回落到了 11G

...

有谁可以在出出主意?

7 replies

dataroaring Aug 15, 2023
Collaborator

你可以用2.0 评测下，join 2.0 优化比较大，跑之前记得执行下 analyze table db_name.table_name.

liuanxin Aug 15, 2023
Author

一开始就是基于 2.0-beta 的, 昨天更新到了 2.0.0, 当然也将 join 列调整成了 key 列, 不是 key 值的情况在 2.0.0 没有试, 也不想再折腾了

liuanxin Aug 15, 2023
Author

@dataroaring 老实说, 就算将 join 列调整成了 key 列, 将 2.0-beta 升级成了 2.0.0 之后, 效果也只是好一点点而已. 跑多几个任务, 依然是大量的

Caused by: java.sql.SQLException: errCode = 2, detailMessage = (ip)[INTERNAL_ERROR]create expr failed, TExprNode=TExprNode {
  01: node_type (i32) = 3,
  02: type (struct) = TTypeDesc {
    01: types (list) = list<struct>[1] {
      [0] = TTypeNode {
        01: type (i32) = 0,
        02: scalar_type (struct) = TScalarType {
          01: type (i32) = 2,
        },
      },
    },
    03: byte_size (i64) = -1,
  },
  04: num_children (i32) = 0,
  06: bool_literal (struct) = TBoolLiteral {
    01: value (bool) = false,
  },
  20: output_scale (i32) = -1,
  29: is_nullable (bool) = false,
}, reason=[E11] Allocator sys memory check failed: Cannot alloc:32, consuming tracker:<Query#Id=27a5c127a7fd476a-95862dff43aff8de>, exec node:<>, process memory used 19.70 GB exceed limit 23.45 GB or sys mem available 1.52 GB less than low water mark 1.60 GB.
0. /root/src/doris/be/src/common/stack_trace.cpp:298: StackTrace::tryCapture() @ 0x000000000b36ece7 in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
1. /root/src/doris/be/src/common/stack_trace.h:0: doris::get_stack_trace[abi:cxx11]() @ 0x000000000b36d2bd in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
2. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:173: doris::Exception::Exception(int, std::basic_string_view<char, std::char_traits<char> >) @ 0x000000000ae1c7ae in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
3. /root/src/doris/be/src/vec/common/allocator.cpp:0: Allocator<false, false, false>::sys_memory_check(unsigned long) const @ 0x000000000d0d9bf5 in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
4. /root/src/doris/be/src/vec/common/allocator.cpp:150: Allocator<false, false, false>::memory_check(unsigned long) const @ 0x000000000d0dab02 in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
5. /root/src/doris/be/src/vec/common/allocator.h:116: Allocator<false, false, false>::alloc(unsigned long, unsigned long) @ 0x000000000a8bb0cb in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
6. /root/src/doris/be/src/vec/common/pod_array.h:128: doris::vectorized::ColumnVector<unsigned char>::reserve(unsigned long) @ 0x000000000af20960 in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
7. /root/src/doris/be/src/vec/common/cow.h:198: doris::vectorized::IDataType::create_column_const(unsigned long, doris::vectorized::Field const&) const @ 0x000000000d100492 in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
8. /root/src/doris/be/src/vec/common/cow.h:143: doris::vectorized::VLiteral::init(doris::TExprNode const&) @ 0x000000000e8ebbaf in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
9. /root/src/doris/be/src/vec/exprs/vliteral.h:0: doris::vectorized::VExpr::create_expr(doris::TExprNode const&, std::shared_ptr<doris::vectorized::VExpr>&) @ 0x000000000e8d067e in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
10. /root/src/doris/be/src/common/status.h:414: doris::vectorized::VExpr::create_tree_from_thrift(std::vector<doris::TExprNode, std::allocator<doris::TExprNode> > const&, int*, std::shared_ptr<doris::vectorized::VExpr>&, std::shared_ptr<doris::vectorized::VExprContext>&) @ 0x000000000e8d26d4 in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
11. /root/src/doris/be/src/common/status.h:414: doris::vectorized::VExpr::create_expr_tree(doris::TExpr const&, std::shared_ptr<doris::vectorized::VExprContext>&) @ 0x000000000e8d305c in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
12. /root/src/doris/be/src/common/status.h:414: doris::vectorized::VExpr::create_expr_trees(std::vector<doris::TExpr, std::allocator<doris::TExpr> > const&, std::vector<std::shared_ptr<doris::vectorized::VExprContext>, std::allocator<std::shared_ptr<doris::vectorized::VExprContext> > >&) @ 0x000000000e8d34aa in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
13. /root/src/doris/be/src/common/status.h:414: doris::vectorized::VUnionNode::init(doris::TPlanNode const&, doris::RuntimeState*) @ 0x000000000e89c971 in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
14. /root/src/doris/be/src/common/status.h:414: doris::ExecNode::create_tree_helper(doris::RuntimeState*, doris::ObjectPool*, std::vector<doris::TPlanNode, std::allocator<doris::TPlanNode> > const&, doris::DescriptorTbl const&, doris::ExecNode*, int*, doris::ExecNode**) @ 0x000000000b1e2d89 in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
15. /root/src/doris/be/src/exec/exec_node.cpp:232: doris::ExecNode::create_tree(doris::RuntimeState*, doris::ObjectPool*, doris::TPlan const&, doris::DescriptorTbl const&, doris::ExecNode**) @ 0x000000000b1e2902 in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
16. /root/src/doris/be/src/common/status.h:414: doris::pipeline::PipelineFragmentContext::prepare(doris::TPipelineFragmentParams const&, unsigned long) @ 0x0000000011abbb03 in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
17. /root/src/doris/be/src/runtime/fragment_mgr.cpp:0: doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0::operator()(int) const @ 0x000000000b16e66b in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
18. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701: doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&) @ 0x000000000b16d31d in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
19. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:244: doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&) @ 0x000000000b16c67d in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
20. /root/src/doris/be/src/common/status.h:414: doris::PInternalServiceImpl::_exec_plan_fragment_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, doris::PFragmentRequestVersion, bool) @ 0x000000000b2ac996 in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
21. /root/src/doris/be/src/common/status.h:335: doris::PInternalServiceImpl::_exec_plan_fragment_in_pthread(google::protobuf::RpcController*, doris::PExecPlanFragmentRequest const*, doris::PExecPlanFragmentResult*, google::protobuf::Closure*) @ 0x000000000b2ac345 in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
22. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646: doris::WorkThreadPool<false>::work_thread(int) @ 0x000000000b2c6a4b in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
23. /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85: execute_native_thread_routine @ 0x0000000014a24c10 in /path.../apache-doris-2.0.0-bin-x64/be/lib/doris_be
24. start_thread @ 0x0000000000007dc5 in /usr/lib64/libpthread-2.17.so
25. __clone @ 0x00000000000f776d in /usr/lib64/libc-2.17.so

liuanxin Aug 17, 2023
Author

@dataroaring 还是上面类似的 sql 语句, left 左边是小表, 两个表的 id 都是 key 键, 在运行前的可用内存是 17G, 运行过程中, 最小的可用内存只 4.9G, 有类似这样的语句 > 2 条一起执行, 内存就会崩(这真的是 优化比较大 ?). 最近因为内存的事, 真的是折磨, mysql 就 2G 的 innodb_buffer_pool_size 也不会在运行某条 sql 的时候眼看要执行完了结果崩了啊, 哪怕慢一点呢

mysql> show load profile "/"\G
*************************** 1. row ***************************
   Profile ID: 8614dbc504d0464e-a3d45284e5fe23fd
    Task Type: LOAD
   Start Time: 2023-08-17 12:46:55
     End Time: 2023-08-17 12:47:12
        Total: 17s3ms
   Task State: ERR
         User: default_cluster:xx
   Default Db: default_cluster:xxx
Sql Statement: insert into xxxxx(...)
select ...
from xxx i left join yyy md on md.id = i.id
where `timestamp` between '2023-05-01 00:00:00' and '2023-05-31 23:59:59'
1 row in set (0.00 sec)

mysql> show load profile "/8614dbc504d0464e-a3d45284e5fe23fd"\G
*************************** 1. row ***************************
    TaskId: 8614dbc504d0464e-a3d45284e5fe23fd
ActiveTime: 17s14ms
1 row in set (0.00 sec)

mysql> show load profile "/8614dbc504d0464e-a3d45284e5fe23fd/8614dbc504d0464e-a3d45284e5fe23fd"\G
*************************** 1. row ***************************
Fragments: 
                                          ┌───────────────────────┐
                                          │[-1: OlapTableSink]    │
                                          │Fragment: 0            │
                                          │MaxActiveTime: 16s963ms│
                                          └───────────────────────┘
                                                      │
                                                      │
                                           ┌────────────────────┐
                                           │[2: VHASH_JOIN_NODE]│
                                           │Fragment: 0         │
                                           └────────────────────┘
                   ┌──────────────────────────────────┴────┬───────────────────────────┬──────────────┐
                   │                                       │                           │              │
┌─────────────────────────────────────┐          ┌───────────────────┐          ┌────────────┐ ┌────────────┐
│[0: VNewOlapScanNode(data_user_role)]│          │[3: VEXCHANGE_NODE]│          │[BuildPhase]│ │[ProbePhase]│
│Fragment: 0                          │          │Fragment: 0        │          │Fragment: 0 │ │Fragment: 0 │
└─────────────────────────────────────┘          └───────────────────┘          └────────────┘ └────────────┘
                   │                                       │
                   │                                       │
             ┌───────────┐                     ┌──────────────────────┐
             │[VScanner] │                     │[3: VDataStreamSender]│
             │Fragment: 0│                     │Fragment: 1           │
             └───────────┘                     │MaxActiveTime: 4s502ms│
                   │                           └──────────────────────┘
                   │                                       │
          ┌─────────────────┐                              │
          │[SegmentIterator]│           ┌─────────────────────────────────────┐
          │Fragment: 0      │           │[1: VNewOlapScanNode(mapping_device)]│
          └─────────────────┘           │Fragment: 1                          │
                                        └─────────────────────────────────────┘
                                                           │
                                                           │
                                                     ┌───────────┐
                                                     │[VScanner] │
                                                     │Fragment: 1│
                                                     └───────────┘
                                                           │
                                                           │
                                                  ┌─────────────────┐
                                                  │[SegmentIterator]│
                                                  │Fragment: 1      │
                                                  └─────────────────┘

1 row in set (0.01 sec)

xinyiZzz Aug 25, 2023
Collaborator

哪怕慢一点， Join 算子的落盘，还没有合入2.0 开放给用户使用，后续测试完备后会开放

Join内存用的多，准确说 Hash Table 内存用的多，这确实是Doirs当前存在的问题，我们已经在优化中

Liangjiajjj · 2023-08-14T10:10:55Z

Liangjiajjj
Aug 14, 2023

我测试下来insert into性能非常差，建议使用Stream load（http）进行批量插入。

2 replies

Hby118 Aug 14, 2023

有没有什么方法可以类似INSERT INTO A SELECT * FROM B 这样来复制B表比较快的方法?

liuanxin Aug 15, 2023
Author

好奇使用 stream load, 构建 csv 是在什么时机? 这个东西除了我把数据从 1.2 导出, 再导进 2.0 的时候感觉有点用之外(制表换行逗号折腾了好久), 我是不知道要在什么时机弄

dataroaring · 2023-08-15T02:24:57Z

dataroaring
Aug 15, 2023
Collaborator

我测试下来insert into性能非常差，建议使用Stream load（http）进行批量插入。
你用的是 insert into values 还是 insert into select？

8 replies

liuanxin Aug 15, 2023
Author

routine load 消费 kafka 的方式我用起来的感觉是消费太慢了, 时间越久 CURRENT-OFFSET 和 LOG-END-OFFSET 之间的差距越大, 消费赶不上生产, 调整 desired_concurrent_number 等几个参数也没效果, 而且这种方式怎么样也是个黑盒, 后面改成自己写 kafka-consumer 批量写 insert into values( ... ), ( ... ), 一次 500 条, 堆积的数据分分钟就消费完了, stream load 性能好是好, 但我是挺不能理解这个使用场景的, 自己构建 csv ? 数据有逗号制表换行是很常见的, 要避免得额外处理吧, 数据 encode ? 写数据的时候可不支持 decode, 换分隔符, 谁能保证数据里有没有新的符号? 而且数据在 kafka, 在什么时机构建 csv 也是个问题

Liangjiajjj Aug 15, 2023

不一定要用csv的，它不是有json格式的吗？

liuanxin Aug 15, 2023
Author

百万数据, 导出成 csv 可能就 2 个文件(select * from t_xxx into outfile "file:///path/xx_"), 导出成 json 差不多要 5 个文件(一个文件 1G), 当然 csv 数据有问题的时候, 这也是一个办法

Liangjiajjj Aug 15, 2023

我这边是一个程序收到消息，然后一万条数据调用一次stream load。

liuanxin Aug 15, 2023
Author

如果慢的时候一段时间都到不了一万条, 那可能及时性就会差好多了. 目前我用 insert into values( ... ), ( ... ) 这种方式(如果每次都只有单条, 性能确实挺差的)就是偶尔会有 Cause: java.sql.SQLException: timeout when waiting for send fragments RPC. Wait(sec) 并伴随着 java.sql.SQLException: Could not retrieve transaction read-only status from server 异常, 这个应该是 doris 内存不足自身的保护机制导致的, 过了那一阵时间段就正常了

liuanxin · 2023-08-18T08:17:16Z

liuanxin
Aug 18, 2023
Author

谁能帮忙看一下啊, @yagagagaga @dataroaring 当初看一堆文章, 说这个多好多好, 改造公司的统计服务, 数据量就上面那点,

实现也不复杂, kafka-consumer -> insert into values(...), (...) 进 doris, 之前是有定时任务会执行 insert into select 写到结果表, 内存动不动崩, 跑几分钟就崩, 20G 的数据量 32G 动不动客户端一条简单的 show create table 都能卡 20 多秒出结果, 然后就把定时任务停了, 手动调 http 请求来统计, 这已经足够线性了, 同一时间都只有一条 sql 在做统计, 然而这样也会 10 来分钟就大量的 Caused by: java.sql.SQLException: errCode = 2, detailMessage = There is no scanNode Backend available 错误, 然后 20 ~ 30 秒响应不过来, 这样导致单个统计任务根本处理不出来结果就异常结束了.

可问题是去服务器看, 进程是还在的, 可用内存也还有 10 多个 G, 当初跟上面信誓旦旦说 doris 很强, 引进来改造统计服务, 后面要加新统计指标也只是 sql 层面的改动, 结果一个坑接着一个坑!!! mysql 几个 G 哪怕慢一点也不会这样啊. 真的应了那个笑话 "应聘财务, 123 * 456, 结果 12345, 别管对不对, 你就说快不快吧"

不是每个地方都能内存无限的啊

1 reply

xinyiZzz Aug 25, 2023
Collaborator

OOM的问题，如上面第一条回复，如果你的集群是混部的，调低 be.conf 中的 mem_limit （系统内存 - 其他服务内存 - 预留大约20%的buffer），OOM的分析文档可参考：https://doris.apache.org/zh-CN/docs/dev/admin-manual/maint-monitor/memory-management/be-oom-analysis

insert into 为什么会费这么大内存的问题，需要看表、SQL、并发、单次导入数据量、日志来具体分析，如上面我的第二条回复，有 join 的SQL可能比你预想的要费内存，确认原因后也有一些参数可以调整

young138120 · 2023-08-24T06:08:35Z

young138120
Aug 24, 2023

唉刚想提一个join的support，你这个都还算好的了能分区
我的case是无法做分区的，裁剪很难
基本全扫，mysql只要几百毫秒，doris要几十秒，没法接受呀

7 replies

young138120 Aug 25, 2023

目前官方这边的join相关的文档挺少的，我是2.0版本了
然后profile跟文档出入还是比较大，应该还没有更新。尝试将join列做key和分桶key ，效果都不理想

xinyiZzz Aug 25, 2023
Collaborator

@young138120
Doris社区目前计划对 query profile 大改一波，文档还没同步
join优化，首先可以看下 runtime filter 有没有用上，这个对性能通常影响非常大（几倍几十倍），参考文档：https://doris.apache.org/zh-CN/docs/dev/query-acceleration/join-optimization/runtime-filter

xinyiZzz Aug 25, 2023
Collaborator

另外 join 调优的case，也可以反馈给社区人员看看 - -，之前打榜 TPCDS 和 TPCH join手动挡调优效果还是很明显的 😂，下一步社区计划提高开箱即用的能力，具体要问下大佬们了

young138120 Aug 25, 2023

目前跟社区人员沟通过了，也做了一些参数的调整之类的
例如调整pipeline_task_num相关的，还有run sql之前做了analyse table with sync等动作，目前都不理想

young138120 Aug 25, 2023

没事，我再等等看下，等社区的同学给我具体的反馈吧😂

hli191 · 2023-08-25T00:31:12Z

hli191
Aug 25, 2023

我这边大概2T的数据，想把目前的架构往Doris上面做迁移；根据这个情况反馈，我也动摇了

4 replies

xinyiZzz Aug 25, 2023
Collaborator

@hli191 Doris很多几十上百T数据量的用户，包括京东美团小米之类的，诚然 Doris 现在无法做到完全开箱即用，有些地方需要手动挡调优，有些使用成本，至于内存不足和性能问题可以参考上面的评论。
内存不足：Join现在确实相比竞品更费内存，还在优化中，落盘也没开放出来
OOM：默认参数偏保守，如果是混部集群需要调整，如果压测环境可能也要相应调整
性能问题：Doris目前在TPCH TPCDS等榜单的成绩可以看文档，不过确实需要手动挡调优…，这也是后面要优化的地方

如果有问题，尽量在社区用户群里讨论，回复的会比这里及时很多

Hby118 Aug 25, 2023

社区用户群怎么进

xinyiZzz Aug 25, 2023
Collaborator

@Hby118 可以加下这个同学的微信 ApacheDoris_Zaki，负责社区运营的，微信公众号上也有个二维码可以加，微信公众号搜 Apache Doris

hli191 Aug 25, 2023

@xinyiZzz 感谢答复，后续我会做相关的技术验证

内存问题快用出阴影了( join 怎么能吃这么多内存的?) #22806

Replies: 8 comments · 37 replies

liuanxin Aug 10, 2023 Author

liuanxin Aug 11, 2023 Author

dataroaring Aug 15, 2023 Collaborator

xinyiZzz Aug 25, 2023 Collaborator

xinyiZzz Aug 25, 2023 Collaborator

liuanxin Aug 11, 2023 Author

xinyiZzz Aug 25, 2023 Collaborator

liuanxin Aug 11, 2023 Author

dataroaring Aug 15, 2023 Collaborator

liuanxin Aug 15, 2023 Author

liuanxin Aug 15, 2023 Author

liuanxin Aug 17, 2023 Author

xinyiZzz Aug 25, 2023 Collaborator

liuanxin Aug 15, 2023 Author

dataroaring Aug 15, 2023 Collaborator

liuanxin Aug 15, 2023 Author

liuanxin Aug 15, 2023 Author

liuanxin Aug 15, 2023 Author

liuanxin Aug 18, 2023 Author

xinyiZzz Aug 25, 2023 Collaborator

xinyiZzz Aug 25, 2023 Collaborator

xinyiZzz Aug 25, 2023 Collaborator

xinyiZzz Aug 25, 2023 Collaborator

xinyiZzz Aug 25, 2023 Collaborator

Replies: 8 comments 37 replies

liuanxin
Aug 10, 2023
Author

liuanxin Aug 11, 2023
Author

dataroaring Aug 15, 2023
Collaborator

xinyiZzz Aug 25, 2023
Collaborator

xinyiZzz Aug 25, 2023
Collaborator

liuanxin
Aug 11, 2023
Author

xinyiZzz Aug 25, 2023
Collaborator

liuanxin
Aug 11, 2023
Author

dataroaring Aug 15, 2023
Collaborator

liuanxin Aug 15, 2023
Author

liuanxin Aug 15, 2023
Author

liuanxin Aug 17, 2023
Author

xinyiZzz Aug 25, 2023
Collaborator

liuanxin Aug 15, 2023
Author

dataroaring
Aug 15, 2023
Collaborator

liuanxin Aug 15, 2023
Author

liuanxin Aug 15, 2023
Author

liuanxin Aug 15, 2023
Author

liuanxin
Aug 18, 2023
Author

xinyiZzz Aug 25, 2023
Collaborator

xinyiZzz Aug 25, 2023
Collaborator

xinyiZzz Aug 25, 2023
Collaborator

xinyiZzz Aug 25, 2023
Collaborator

xinyiZzz Aug 25, 2023
Collaborator