Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

content: linux coredumps #545

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
312 changes: 312 additions & 0 deletions _drafts/linux_coredump.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,312 @@
---
title: Coredumps at Memfault Part 1 - Introduction to Linux Coredumps
description: Post Description (~140 words, used for discoverability and SEO)
author: blake
---

One of the core features of the Memfault Linux SDK is the ability to capture and
analyze crashes. Since the inception of the SDK we've been slowly expanding our
crash capture and analysis capabilities. Starting from the standard ELF
coredump, we've added support for capturing only the stack memory, and even
capturing just the stack trace with no registers and locals present. This
article series will give you a high level overview of that journey, and give you
a deeper understanding of how coredumps work on Linux.

<!-- excerpt start -->

In this article we'll start by taking a look at how a Linux coredump is
formatted, how you capture them, and how we use them at Memfault.

<!-- excerpt end -->

{% include newsletter.html %}

{% include toc.html %}

## What is a Linux Coredump

A linux coredump represents a snapshot of the crashing process' memory. It is
written as an ELF[^elf_format] file. The entirety of the ELF format is outside
the scope of this article, but we will touch on a few of the more important bits
when looking at an ELF core file.

## What triggers a cordump

Coredumps are triggered by certain signals generated by or sent to a program.
The full list of signals can be found in the signal man page[^man_signal]. Here
are the signals that cause a coredump:

- SIGABRT: Abnormal termination of the program, such as a call to abort.
- SIGBUS: Bus error (bad memory access).
- SIGFPE: Floating-point exception.
- SIGILL: Illegal instruction.
- SIGQUIT: Quit from keyboard.
- SIGSEGV: Invalid memory reference.
- SIGSYS: Bad system call.
- SIGTRAP: Trace/breakpoint trap.

Of these the most common culprits you'll likely see are `SIGSEGV`, `SIGBUS`, and
`SIGABRT`. These are signals that will be generated when a program tries to
access memory that it doesn't have access to, tries to dereference a null
pointer, or when the program calls `abort`. These typically indicate a fairly
serious bug in either your program, or the libraries that it uses.

Coredumps are very useful in these situations, as generally you're going to want
to inspect the running state of the process a the time of crash. From the
coredump you can get a backtrace of the crashing thread, the values of the
registers at the time of crash, and the values of the local variables at each
frame of the backtrace.

## How are coredumps enabled/collected

Enabling coredumps on your Linux device requires a few configuration options. To
start with you'll need the following options enabled on your kernel at a
minimum:

```c
CONFIG_COREDUMP=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
```

These settings will enable the kernel to generate coredumps, as well as set the
default mappings that are present in the coredump. `man core`[^man_core]
provides a good overview of the options available to you when configuring
coredumps.

### core_pattern

The kernel provides an interface for controlling where and how coredumps are
written. The `/proc/sys/kernel/core_pattern`[^man_core] file provides two
methods for capturing coredumps from crashed processes. A coredump can be
written directly to a file by providing a path directly to it. For example if we
wanted to write the core file to our `/tmp` directory with both the process name
and the pid we would write the following to `/proc/sys/kernel/core_pattern`.

```bash
/tmp/core.%e.%p
```

In this example `%e` expands to the name of the crashing process, and `%p`
expands to the PID of the crashing process. More information on the available
expansions can be found in the `man core`[^man_core] page.
Comment on lines +90 to +91
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other expansions that you think are useful, or are these basically the only two you use?


We can also write pipe a coredump directly to a program. This is useful when we
want to modify the coredump in flight. The coredump is streamed to the provided
program via `stdin`. The configuration is similar to saving directly to a file
except the first character must be a `|`. This is how we capture coredumps in
the Memfault SDK, and will be covered more in depth later in the article.

## Elf Core File Layout

Linux coredumps use a subset of the ELF format. The coredump itself is a
snapshot of the crashing process' memory, as well as some metadata to help
debuggers understand the state of the process at the time of crash. We will
touch on the most important aspects of the core file in this article. We will
not be doing an exhaustive dive into the ELF format, however, if you are
interested in learning more about the ELF format, the ELF File
Format[^elf_format] is a great resource.

![]({% img_url linux-coredump/elf-core-layout.png %})

The above image gives us a very high level view of the layout of a coredump. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have a good chunk of info on the header, lets give it its own section:

Suggested change
The above image gives us a very high level view of the layout of a coredump. The
### ELF Header
The above image gives us a very high level view of the layout of a coredump. To start, the

ELF header outlines the layout of the file and source of the file. We can see if
the producing system was 32-bit or 64-bit, little or big endian, and the
architecture of the system. Additionally it shows the offset to the program
headers. Here is the layout of the ELF header[^elf_format]:

```c
typedef struct {
unsigned char e_ident[EI_NIDENT];
Elf32_Half e_type;
Elf32_Half e_machine;
Elf32_Word e_version;
Elf32_Addr e_entry;
Elf32_Off e_phoff;
Elf32_Off e_shoff;
Elf32_Word e_flags;
Elf32_Half e_ehsize;
Elf32_Half e_phentsize;
Elf32_Half e_phnum;
Elf32_Half e_shentsize;
Elf32_Half e_shnum;
Elf32_Half e_shstrndx;
} Elf32_Ehdr;
```

There is a lot going on here, but the fields that are most important to our
discussion are broken down below:

- `e_ident`: This field is an array of bytes that identify the file as an ELF
file.
- `e_type`: This field tells us what type of file we are looking at. For our
purposes this will always be `ET_CORE`.
- `e_machine`: This field tells us the architecture of the system that produced
the file. Common values here are
[`EM_ARM`](https://github.com/torvalds/linux/blob/c45323b7560ec87c37c729b703c86ee65f136d75/include/uapi/linux/elf-em.h#L26)
for 32 bit ARM, and
[`EM_AARCH64`](https://github.com/torvalds/linux/blob/c45323b7560ec87c37c729b703c86ee65f136d75/include/uapi/linux/elf-em.h#L46)
for aarch64.
- `e_phoff`: This field tells us the offset to the program headers.
- `e_phentsize`: This field tells us the size of each program header.

### Program Headers and Segments

The meat of our coredump exists in the program headers. There are a wide variety
of program header types defined in the Elf File Format[^elf_format]. From the
perspective of the coredump, however, we are primarily interested in the
`PT_NOTE` and `PT_LOAD` program headers.

Program headers have the following layout[^elf_format]:

```c
typedef struct {
Elf32_Word p_type;
Elf32_Off p_offset;
Elf32_Addr p_vaddr;
Elf32_Addr p_paddr;
Elf32_Word p_filesz;
Elf32_Word p_memsz;
Elf32_Word p_flags;
Elf32_Word p_align;
} Elf32_Phdr;
```

Here is a brief breakdown of the fields in the program header:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since p_flags is omitted:

Suggested change
Here is a brief breakdown of the fields in the program header:
Here is a brief breakdown of the fields we care about in the program header:


- `p_type`: This field tells us what type of segment we are looking at. For our
purposes this will be either `PT_NOTE` or `PT_LOAD`.
- `p_offset`: This field tells us the offset from the beginning of the file
where the segment starts.
- `p_vaddr`: This field tells us the virtual address where the segment is
loaded.
- `p_paddr`: This field tells us the physical address where the segment is
loaded.
- `p_filesz`: This field tells us the size of the segment in the file.
- `p_memsz`: This field tells us the size of the segment in memory.
- `p_align`: This field tells us the alignment of the segment.

We'll start by taking a look at the format of the `PT_NOTE` segments. Below is
the layout of a `PT_NOTE` segment.

![]({% img_url linux-coredump/elf-note-layout.png %})

The first two fields of the segment are fairly self explanatory, they represent
the size of both the name and the descriptor. The `name` field is a string that
represents the type of note. The `desc` field is a structure that contains the
actual data of the note. The type field tells us what type of note we are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
actual data of the note. The type field tells us what type of note we are
actual data of the note. The `type` field tells us what type of note we are

looking at. It is an unsigned integer that represents the type of note. It's
worth noting that the `name` field works as a kind of namespace for the type
field. Two notes with the same type field can be differentiated by their name
field.

The `PT_LOAD` segment is a bit more straightforward. This represents a segment
of memory that was loaded into the process at the time of crash. These can
represent either the stack, heap, or any other segment of memory that was loaded
into the process.

## `procfs` Shallow Dive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking this section would fit better after core_pattern, or within core_pattern? Feels slightly out of place here


An additional benefit to the `core_pattern` pipe interface is that until the
program that is being piped to exits, we have access to the `procfs` of the
crashing process. But what is `procfs`, and how does it help us with a coredump?

`procfs` gives us direct, usually read-only, access to some of the kernel's data
structures[^man_proc]. This can be system wide information, or information about
individual processes. For our purposes we are interested mostly in the
information about the process that is currently crashing. We can get direct read
only access to all mapped memory by address through
`/proc/<pid>/mem`[^man_proc_pid_mem], or look at the command line arguments of
the process through `/proc/<pid>/cmdline`[^man_proc_pid_cmdline].

## Coredumps at Memfault: Rev. 1

Our first crack at coredumps at Memfault had one goal: leveraging existing tools
to capture info about a crashing process. To have feature parity with our
existing offerings we needed a few basic things:

- A symbolicated backtrace for each running thread in the crashing process
- The values of registers at the time of crash
- Symbolicated local variables at each frame

Based on what we've learned about Linux core files so far, they are an obvious
fit for these requirements. We can use an established system to route
information about crashed processes, add metadata that helps gives us
information the device in question, and do all of this without making any source
modifications to anything running on the system. For this reason our first pass
at coredumps leave them largely untouched from what the kernel provides. The
only addition is a note that contains the metadata we use to identify devices
and the version of software they're running on. This takes advantage of the fact
that the `PT_NOTE` segment is a free form segment that can be used to add any
metadata we want to the coredump.

This allows us to gather additional information about the process that crashed,
and more easily stream memory to avoid unnecessary allocations or memory usage.

Now that we've covered all the background information we can start to dive into
the innards of the `memfault-core-handler`. First we use the pipe operation that
was outlined earlier.
[Here](https://github.com/memfault/memfault-linux-sdk/blob/49adfe0ce0cb6082360012b0f0092a31e8030048/meta-memfault/recipes-memfault/memfaultd/files/memfaultd/src/coredump/mod.rs#L14)
is the pattern we write to `/proc/sys/kernel/core_pattern` to pipe the coredump
to our handler:

```bash
|/usr/sbin/memfault-core-handler -c /path/to/config %P %e %I %s
```

This tells the kernel to pipe the coredump to our handler, and provides the
handler with the PID of the crashing process, the name of the crashing process,
the UID of the crashing process, and the signal that caused the crash.

When a crash occurs the kernel will write the coredump to the `stdin` of the
handler. The handler will then read all the program headers into memory. This
sets us up to do two things. First we'll read all of the `PT_NOTE` segments and
save them in memory. For the first iteration of the handler, we won't do
anything further with them until we write them to a file. They'll become more
important in later articles as we get into more of the special sauce of the
handler.

The next thing the handler does is read all of the memory ranges for each
`PT_LOAD` segment in the coredump. Instead of storing this in memory we'll
stream it directly to the output file from `/proc/<pid>/mem`. This is done to
reduce the memory footprint of the handler, and prevent any issues where we
would potentially need to seek backwards in the stream. As mentioned before,
`stdin` is a one way stream, and we can't seek backwards in it.

Now you're probably wondering, why did we go through all of this trouble to end
up with a file that's largely the same as what the kernel would have produced?
Well for one it allows us to add metadata to the coredump, but it also sets the
stage for more advanced coredump handling in the future that we'll cover in the
the next article.
Comment on lines +275 to +279
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great, I like the anticipation of reader questions a lot


## Conclusion

We've covered the basics of coredumps on Linux, and how they're used in the
Memfault SDK. You should now have a pretty good idea of how things look under
the hood. While the baseline coredumps are useful, and a known commodity, there
are a few things that aren't great about them. The biggest issue is that they
can be quite large for processes that have many threads, or do a large amount of
memory allocation. This can be a large problem for embedded devices that may not
have a lot of room to store large files. In the next article we'll take a look
at the steps we've taken to reduce the size of coredumps.

<!-- Interrupt Keep START -->

{% include newsletter.html %}

{% include submit-pr.html %}

<!-- Interrupt Keep END -->

{:.no_toc}

## References

<!-- prettier-ignore-start -->
[^elf_format]: [ELF File Format](https://refspecs.linuxfoundation.org/elf/elf.pdf)
[^man_core]: [`man core`](https://man7.org/linux/man-pages/man5/core.5.html)
[^man_proc]: [`man proc`](https://man7.org/linux/man-pages/man5/procfs.5.html)
[^man_proc_pid_mem]: [`man proc_pid_mem`](https://man7.org/linux/man-pages/man5/proc_pid_mem.5.html)
[^man_proc_pid_cmdline]: [`man proc_pid_cmdline`](https://man7.org/linux/man-pages/man5/proc_pid_cmdline.5.html)
[^man_ulimit]: [`man ulimit`](https://man7.org/linux/man-pages/man3/ulimit.3.html)
[^man_signal]: [`man signal`](https://www.man7.org/linux/man-pages/man7/signal.7.html)
<!-- prettier-ignore-end -->
14 changes: 14 additions & 0 deletions example/linux-coredump/boom.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
//! Compile. Run. Boom.

void make_the_boom(void)
{
int *p = 0;
*p = 0;
}

int main(void)
{
make_the_boom();

return 0;
}
Binary file added img/linux-coredump/elf-core-layout.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/linux-coredump/elf-note-layout.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.