Skip to content

Commit

Permalink
Update DwarFS format documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
mhx committed May 24, 2023
1 parent 50ef8bc commit e58c807
Showing 1 changed file with 36 additions and 3 deletions.
39 changes: 36 additions & 3 deletions doc/dwarfs-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ This document describes the DwarFS file system format, version 2.3.

## FILE STRUCTURE

A DwarFS file system image is just a sequence of blocks. Each block has the
following format:
A DwarFS file system image is just a sequence of blocks, optionally
prefixed by a "header", which is typically some sort of shell script.
Each block has the following format:

┌───┬───┬───┬───┬───┬───┬───┬───┐
0x00 │'D'│'W'│'A'│'R'│'F'│'S'│MAJ│MIN│ MAJ=0x02, MIN=0x03 for v2.3
Expand Down Expand Up @@ -63,9 +64,24 @@ A couple of notes:
larger than the one it supports. However, a new program will still
read all file systems with a smaller minor version number.

### Header Detection

In order to access the file system data when it is prefixed by a header,
the size of the header must be known. It can either be given to the
tools or the FUSE driver explicitly (using e.g. the `--image-offset` or
`-o offset` options), or it can be determined automatically (by passing
`auto` as the argument to the aforementioned options).

Automatic detection works by scanning the file for the section header
magic (`DWARFS`) and validating the match by looking up the second
section header using the length of the first section and also checking
its magic. It is rather unlikely that a file is created accidentally
that would pass this check, although one could be crafted manually
without any problems.

### Section Types

There are currently 3 different section types.
There are currently 4 different section types.

- `BLOCK` (0):
A block of data. This is where all file data is stored. There can be
Expand All @@ -81,6 +97,23 @@ There are currently 3 different section types.
each list and structure depends on the actual data and is stored
separately in `METADATA_V2_SCHEMA`.

- `SECTION_INDEX` (9):
The section index is, well, an index of all sections in the file
system. If present (creation of the index can be suppressed with
`--no-section-index`), this is *required* to be the last section.
Each entry in the section index is a 64-bit value with the upper
16 bits being the section type and the lower 48 bits being the
offset relative to the first section. That is, the section index
is independent of whether or not a header is present before the
first section. The whole point of the section index is to avoid
having to build an index by visiting all section headers.
In order to find the start of the section index, you only have
to read the last 64-bit value from the file, check if the upper
16 bits match the `SECTION_INDEX`, then add the image offset
(header size) to the lower 48 bits. At that position in the
file, you should find a valid section header for the section
index.

## METADATA FORMAT

Here is a high-level overview of how all the bits and pieces relate
Expand Down

0 comments on commit e58c807

Please sign in to comment.