-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
多维分析场景,维度组合大概有30种组合,数据膨胀后会生产很多bitmap,导致gc时间很长,有好的解决方案? #15
Comments
1、膨胀前可以先对明细数据预聚合一下 to_bitmap -> bitmap_union_count |
你计算结果保存的是Bitmap?还是去重(bitmap_count)后的值? |
是这么操作的,先构建了一张轻度聚合表,然后useri_id也做了映射表,to_bitmap(user_id)落盘存储了。基于这个轻度聚合层的数据做了cube操作,维度比较多,spark gc时间很长 |
轻度聚合层保存的是bitmap,然后基于这个轻度聚合层做cube,用到了bitmap_union和bitmap_count |
|
No description provided.
The text was updated successfully, but these errors were encountered: