Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] google tcmalloc can crash with a SEGV on stack trace dump #17875

Closed
1 task done
mbautin opened this issue Jun 21, 2023 · 0 comments
Closed
1 task done

[DocDB] google tcmalloc can crash with a SEGV on stack trace dump #17875

mbautin opened this issue Jun 21, 2023 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@mbautin
Copy link
Contributor

mbautin commented Jun 21, 2023

Jira Link: DB-6957

Description

When trying to capture a stack trace with a signal handler, if a memory allocation/deallocation is happening in the thread receiving the signal, the process could crash.

Filed this issue in google/tcmalloc: google/tcmalloc#189

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@mbautin mbautin added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Jun 21, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 21, 2023
mbautin added a commit to yugabyte/yugabyte-db-thirdparty that referenced this issue Jun 21, 2023
This allows the application to know in a signal handler whether a memory allocation/deallocation operation was interrupted by the signal. Needed to fix the issue yugabyte/yugabyte-db#17875.

Also update LLVM version used to 16.0.5.
mbautin added a commit to yugabyte/yugabyte-db-thirdparty that referenced this issue Jun 21, 2023
Upgrade tcmalloc to pick up IsCurThreadInAllocDealloc. This allows the application to know in a signal handler whether a memory allocation/deallocation operation was interrupted by the signal. Needed to fix the issue yugabyte/yugabyte-db#17875.

The previous commit a93cc9d that attempted to do this had an incorrect tcmalloc tag. The tag/commit used here is https://github.com/yugabyte/tcmalloc/releases/tag/e116a66-yb-4 ( yugabyte/tcmalloc@677ba2d ).

Also update LLVM to 16.0.6.
mbautin added a commit that referenced this issue Jun 23, 2023
Summary:
When trying to capture a stack trace with a signal handler, if a memory allocation/deallocation is happening in the thread receiving the signal, the process could crash. Google TCMalloc issue: google/tcmalloc#189.

In this diff, we are using the IsCurThreadInAllocDealloc malloc extension API we added in yugabyte/tcmalloc@677ba2d to skip capturing the stack trace in case the signal interrupted a thread that is currently allocating or deallocating memory. In such cases, we produce an empty stack trace which is later omitted from the overall threads dump. #17889 is a follow-up issue for retrying obtaining stack traces in such cases.

Another change contained in the TCMalloc version that we are upgrading to is yugabyte/tcmalloc@d1b0e69 (adding an option to not seed lifetime profiler with live allocations). We are now setting seed_with_live_allocs to false when capturing an allocation profile.

Test Plan: Jenkins

Reviewers: asrivastava

Reviewed By: asrivastava

Subscribers: ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D26349
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants