Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test btrfs-sb-mod #825

Open
wants to merge 35 commits into
base: devel
Choose a base branch
from
Open

Test btrfs-sb-mod #825

wants to merge 35 commits into from

Conversation

kdave
Copy link
Owner

@kdave kdave commented Jun 19, 2024

Fails an assertion "btrfs-sb-mod image.test sectorsize +4096"

adam900710 and others added 30 commits June 12, 2024 20:16
[BUG]
There is a long existing failure in my local VM that with any newer
kernel (6.x) the test case misc/004 would fail with ENOSPC during
balance:

    [TEST]   misc-tests.sh
    [TEST/misc]   004-shrink-fs
 failed: /home/adam/btrfs-progs/btrfs balance start -mconvert=single -sconvert=single -f /home/adam/btrfs-progs/tests/mnt
 test failed for case 004-shrink-fs
 make: *** [Makefile:547: test-misc] Error 1

[CAUSE]
With more testing, it turns out that just before the balance, the
filesystem still have several empty data block groups.

The reason is the new default discard=async behavior, as it also changes
the empty block groups to be async, this leave the empty block groups
there, resulting no extra space for the convert balance.

[FIX]
I do not understand why for loop block devices we also enable
discard, but at least disable discard for the test case so that we can
ensure the empty block groups get cleaned up properly.

Pull-request: #809
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
[BUG]
There is a bug report that a canceled checksum conversion (still
experimental feature) resulted in unexpected super flags:

csum_type		0 (crc32c)
csum_size		4
csum			0x14973811 [match]
bytenr			65536
flags			0x1000000001
			( WRITTEN |
			  CHANGING_FSID_V2 )
magic			_BHRfS_M [match]

While for a filesystem under checksum conversion it should have either
CHANGING_DATA_CSUM or CHANGING_META_CSUM.

[CAUSE]
It turns out that, due to btrfs-progs keeps its own extra flags inside
its own ctree.h headers, not the shared uapi headers, we have
conflicting super flags:

kernel-shared/uapi/btrfs_tree.h:#define BTRFS_SUPER_FLAG_METADUMP_V2	(1ULL << 34)
kernel-shared/uapi/btrfs_tree.h:#define BTRFS_SUPER_FLAG_CHANGING_FSID	(1ULL << 35)
kernel-shared/uapi/btrfs_tree.h:#define BTRFS_SUPER_FLAG_CHANGING_FSID_V2 (1ULL << 36)
kernel-shared/ctree.h:#define BTRFS_SUPER_FLAG_CHANGING_DATA_CSUM	(1ULL << 36)
kernel-shared/ctree.h:#define BTRFS_SUPER_FLAG_CHANGING_META_CSUM	(1ULL << 37)

Note that CHANGING_FSID_V2 is conflicting with CHANGING_DATA_CSUM.

[FIX]
Cross port the proper updated uapi headers into btrfs-progs, and remove
the definition from ctree.h.

This would change the value for CHANGING_DATA_CSUM and
CHANGING_META_CSUM, but considering they are experimental features, and
kernel would reject them anyway, the damage is not that huge and we can
accept such change before exposing it to end users.

Pull-request: #810
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Although we already have a pretty good array defined for all
super/compat_ro/incompat flags, we still rely on a manually defined mask
to do the printing.

This can lead to easy de-sync between the definition and the flags.

Change it to automatically iterate through the array to calculate the
flags, and add the remaining super flags.

Pull-request: #810
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
There's an update to CI hosted runners,
https://github.com/actions/runner-images/blob/main/images/ubuntu/Ubuntu2404-Readme.md

- kernel 6.8
- e2fsprogs 1.47
- gcc 13.2
- clang 18.1.3

Switch the workflow files to use it as ubuntu-latest still points to the
22.04 version. The updated versions let us avoid workarounds due to old
version if e2fsprogs.

The musl 32bit build seems to fail so pin the version to the last one
where it's known to work.

Signed-off-by: David Sterba <[email protected]>
The support of Leap 15.4 has ended, not much point to test build there
so move it to 15.6 (upper bound). The lower bound is still 15.3 to catch
potential backward compatibility problems.

The hub image exists (https://hub.docker.com/r/kdave/ci-opensuse-leap-15.6-x86_64).

Signed-off-by: David Sterba <[email protected]>
Fix the following issues in the test suite:

- lack of quoting for variables
- declare function variables local when missing (prevent accidental
  overwrite of global variables)
- for variables with underscore in the name use plain "$VAR_NAME"
  instead of { } (unless necessary)
- minor style adjustments like moving quotes to the end of the same
  string

Signed-off-by: David Sterba <[email protected]>
All the three commands need a file, update the completion rules.

Signed-off-by: David Sterba <[email protected]>
There's no Centos 9, use one of it's replacemnts for to continue the
familiy of LTS distros. No exceptions for feature support (zoned, udev).

Signed-off-by: David Sterba <[email protected]>
The test check/037-freespacetree-repair sometimes fails (does not in the
CI though) that the last unmount does not work due to -EBUSY. This
depends on timing, but what is obviously wrong is the 'rm' before
umount. The quoting of "*" does not remove all the files before
fallocate but is otherwise silent due to '-f'.

Add the run_check so it's recorded in the logs taht it happens and alos
add sudo so it can actually work. This makes the test work more
reliably.

Signed-off-by: David Sterba <[email protected]>
…tale

[BUG]
There is a random chance that misc/055 would fail with stale qgroups
detected.

[CAUSE]
Commit 82f7d6c ("btrfs-progs: qgroup: handle stale qgroup deletion
more accurately") changed the behavior of "btrfs qgroup clear-stale"
that it will no longer try to remove qgroup under deletion.

And the test case itself relies on the return value of "btrfs qgroup
clear-stale" to do the extra wait for subvolume deletion.

This means after that commit, the test case would skip the wait if there
is a subvolume waiting for cleanup, resulting in a race window where the
subvolume can be dropped and become stale, and eventually trigger the
stale qgroup detection and cause false alerts.

[FIX]
Fix the test case by always wait for the subvolume to be dropped, so
that later "btrfs qgroup clear-stale" can properly remove all stale
qgroups.

Pull-request: #814
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
[BUG]
There is a random failure for misc/028 in github CI, but unable to
reproduce locally.

[EXTRA DUMP]
The failure is at mount time, which is mostly out of our control (kernel
is provided by the CI image), but at least we can do extra kernel dump
if a mount fails.

This dump is done inside run_check() where if we detects there is a
"mount" word inside the command string.

The detection is not the best, it may detect other commands that
contains the word "mount", but it should be good enough to detect real
mount failure.

Pull-request: #815
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
The raid-stripe-tree (RST) feature is for zoned devices to support extra
data profiles, and is not yet a stable feature (still requires
CONFIG_BTRFS_DEBUG enabled kernel to support it).

Furthermore the supported filesystems (ext*, reiserfs and ntfs) don't
even support zoned devices, and even we force RST support for
btrfs-convert, we would only create an empty tree for RST, as btrfs
convert would only result SINGLE data profile with SINGLE/DUP metadata
profile, neither needs RST at all.

Enabling RST for btrfs-convert would only cause problems for false test
failures as we incorrectly allow RST feature for btrfs-convert.

Fixes the problem by removing raid-stripe-tree support from
btrfs-convert and remove the test cases support for RST.

This patch is mostly reverting commit 346a381 ("btrfs-progs:
convert: add raid-stripe-tree to allowed features"), but keeps the test
infrastructure to support bgt features for convert.

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…uilds

Currently we unconditionally run mkfs/029 and mkfs/030 as we export
RST feature through mkfs.btrfs as supported features.

But considering the mkfs.btrfs features don't match kernel support (only
a BTRFS_DEBUG kernel can support that), we're going to hide RST behind
experimental builds.

In that case RST related tests cases also need to be behind experimental
builds as regular builds of mkfs.btrfs will no longer support RST
feature.

Introduce two helpers:

- _test_config()
  To verify if certain config is set to 1

- check_experimental_build()
  A wrapper of "_test_config EXPERIMENTAL", and skip the test if
  btrfs-progs has no experimental features.

So that test cases can be skipped for non-experimental builds.

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Currently raid-stripe-tree feature is still experimental as it requires
a BTRFS_DEBUG kernel to recognize it.  To avoid confusion move it back
to experimental so regular users won't incorrectly set it.

And since RST is no longer supported by default, also change the RST
profile detection so that for non-experimental build we won't enable RST
according to the data profiles.

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
This test case will check the following behviour:

- Regular zoned supported data/metadata profiles
  Should success

- RST with zoned feature
  Should fail for non-experimental builds

- zoned with data profiles that only RST supports
  Should fail for non-experimental builds.

  Meanwhile for experimental builds it should success and enable RST
  feature automatically (verified in mkfs/030)

Pull-request: #813
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
The tags can't find structures defined in kernel-shared/uapi/*.h due to
missing path.

Signed-off-by: David Sterba <[email protected]>
The label is of a fixed size 256 bytes and expects the zero terminator.
Using __strncpy_null is correct as it makes sure there's always the zero
termination but the argument passed in skips the last character.

Signed-off-by: David Sterba <[email protected]>
The macro strncpy_null uses sizeof the first argument for the length,
but there are no checks and this works only for buffers with static
length, i.e. not pointers. This is error prone.  Use the open coded
variant that makes the sizeof visible.

Signed-off-by: David Sterba <[email protected]>
Now that there's only __strncpy_null we can drop the underscore and move
it to string-utils as it's a generic string function rather than
something for paths.

Signed-off-by: David Sterba <[email protected]>
Use the safe version of strncpy that makes sure the string is
terminated.

To be noted:

- the conversion in scrub path handling was skipped
- sizes of device paths in some ioctl related structures is
  BTRFS_DEVICE_PATH_NAME_MAX + 1

Recently gcc 13.3 started to detect problems with our use of strncpy
potentially lacking the null terminator, warnings like:

cmds/inspect.c: In function ‘cmd_inspect_logical_resolve’:
cmds/inspect.c:294:33: warning: ‘__builtin_strncpy’ specified bound 4096 equals destination size [-Wstringop-truncation]
  294 |                                 strncpy(mount_path, mounted, PATH_MAX);
      |                                 ^

Signed-off-by: David Sterba <[email protected]>
The xattr names are user strings but still can potentially contain
special characters (as reported). There doesn't seem to be a restriction
on the name defined.

The xattr values care length-encoded byte arrays so escaping needs be
done.

The clone source is a path and by mistake lacked the encoding.

Issue: #818
Signed-off-by: David Sterba <[email protected]>
Add cases for clone source path and xattr name and value escapin in
receive dump.

Signed-off-by: David Sterba <[email protected]>
The separator of key=value is only one or more space character, the
'encoded_write' also uses ',' which is inconsistent with the rest.

Signed-off-by: David Sterba <[email protected]>
There's more information available in sysfs
(/sys/fs/btrfs/FSID/allocation) that we can print in 'fi df'. This is
still meant for debugging or deeper analysis of the filesystem, the
values need to be correctly interpreted with respect to the profiles,
persistence and other conditonal features.

The extended output is not printed by default and for now is behind the
verbosity options:

$ btrfs -vv fi df /mnt
Data, single: total=47.06GiB, used=25.32GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=1.44GiB, used=961.20MiB
GlobalReserve, single: total=125.62MiB, used=0.00B
Data:
  bg_reclaim_threshold                   0%
  bytes_may_use                     8.00KiB
  bytes_pinned                        0.00B
  bytes_readonly                   64.00KiB
  bytes_reserved                      0.00B
  bytes_used                       25.32GiB
  bytes_zone_unusable                 0.00B
  chunk_size                       10.00GiB
  disk_total                       47.06GiB
  disk_used                        25.32GiB
  total_bytes                      47.06GiB
Metadata:
  bg_reclaim_threshold                   0%
  bytes_may_use                   126.62MiB
  bytes_pinned                        0.00B
  bytes_readonly                      0.00B
  bytes_reserved                      0.00B
  bytes_used                      961.20MiB
  bytes_zone_unusable                 0.00B
  chunk_size                      256.00MiB
  disk_total                        2.88GiB
  disk_used                         1.88GiB
  total_bytes                       1.44GiB
System:
  bg_reclaim_threshold                   0%
  bytes_may_use                       0.00B
  bytes_pinned                        0.00B
  bytes_readonly                      0.00B
  bytes_reserved                      0.00B
  bytes_used                       16.00KiB
  bytes_zone_unusable                 0.00B
  chunk_size                       32.00MiB
  disk_total                       64.00MiB
  disk_used                        32.00KiB
  total_bytes                      32.00MiB

Signed-off-by: David Sterba <[email protected]>
- variable quoting
- cd error handling
- `` to $()
- command output instead of command (008-subvolume-get-set-default)

Signed-off-by: David Sterba <[email protected]>
Add parsing of user defined sorting specification and some helpers.

Example usage:

  sortdef = "key1,key2,key3"
  do {
    id = compare_parse_key_to_id(&comp, sortdef)
    if (id < 0) return error;
    compare_add_sort_id(&comp, id)
  } while(id >= 0);

Signed-off-by: David Sterba <[email protected]>
Enhance the sorting capabilities of 'inspect list-chunks' to allow
multiple keys. Drop the gaps, this works only for pstart and it's hard
to make it work with arbitrary sort keys.

Usage is printed by default, assuming this is an interesting info and
even if it slows down the output (due to extra lookups) it's more
convenient to print it rather than not.

The options related to usage and empty were removed.

Output changes:

- rename Number to PNumber, meaning physical number on the device
- print Devid, device number, can be also sort key

Examples:

btrfs inspect list-chunks /mnt
btrfs inspect list-chunks --sort length,usage
btrfs inspect list-chunks --sort lstart

Depending on the sort key order, the output can be wild, for that the
PNumber and LNumber give some hint where the chunks lie in their space.

Example output:

$ sudo ./btrfs inspect list-chunks --sort length,usage /
Devid PNumber      Type/profile    PStart    Length      PEnd LNumber    LStart Usage%
----- ------- ----------------- --------- --------- --------- ------- --------- ------
    1       7       Data/single   1.52GiB  16.00MiB   1.54GiB      69 191.68GiB  86.04
    1       3     System/DUP    117.00MiB  32.00MiB 149.00MiB      40 140.17GiB   0.05
    1       2     System/DUP     85.00MiB  32.00MiB 117.00MiB      39 140.17GiB   0.05
    1      15       Data/single   8.04GiB  64.00MiB   8.10GiB      61 188.60GiB  94.46
    1       1       Data/single   1.00MiB  84.00MiB  85.00MiB      68 191.60GiB  74.24
    1       5   Metadata/DUP    341.00MiB 192.00MiB 533.00MiB      60 188.41GiB  82.58
    1       4   Metadata/DUP    149.00MiB 192.00MiB 341.00MiB      59 188.41GiB  82.58
    1      20   Metadata/DUP      9.29GiB 256.00MiB   9.54GiB      38 139.92GiB  57.76
    1      19   Metadata/DUP      9.04GiB 256.00MiB   9.29GiB      37 139.92GiB  57.76
    1      22   Metadata/DUP      9.79GiB 256.00MiB  10.04GiB      25 113.15GiB  57.93
    1      21   Metadata/DUP      9.54GiB 256.00MiB   9.79GiB      24 113.15GiB  57.93
    1      46   Metadata/DUP     29.29GiB 256.00MiB  29.54GiB      43 142.71GiB  62.38

Signed-off-by: David Sterba <[email protected]>
[BUG]
'btrfstune --csum' would always fail for a newly created btrfs:

  # truncate -s 1G test.img
  # ./mkfs.btrfs -f test.img
  # ./btrsftune --csum xxhash test.img
  WARNING: Experimental build with unstable or unfinished features
  WARNING: Switching checksums is experimental, do not use for valuable data!

  Proceed to switch checksums
  ERROR: failed to start transaction: Unknown error -28
  ERROR: failed to start transaction: No space left on device
  ERROR: failed to generate new data csums: No space left on device
  ERROR: btrfstune failed

[CAUSE]
After commit e79f18a ("btrfs-progs: introduce a basic metadata free
space reservation check"), btrfs_start_transaction() would check the
metadata space.

But at the time of introduction of csum conversion, the parameter for
btrfs_start_transaction() was incorrect.

The 2nd parameter is the *number* of items to be added (if we're
deleting items, just pass 1).
However commit 08a3bd7 ("btrfs-progs: tune: add the ability to
generate new data checksums") is using the item size, not the number of
items to be added.

This means we're passing a number 8 * nodesize times larger than the
original size, no wonder we would error out with -ENOSPC.

[FIX]
Use proper calcuation to convert the new csum item size to number of
leaves needed, and double it just in case.

Pull-request: #820
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
adam900710 and others added 5 commits June 19, 2024 03:31
Test the following:

- Create a btrfs with crc32c csum
- Populate the filesystem
- Convert the filesystem to the following csums:
  * xxhash
  * blake2
  * sha256
  * crc32c

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Free memory after errors in sum(), this is reported by gcc 13 on the CI.

This was reproduced by:

$ make D=asan TEST=019-receive-clones-on-mounted-subvol test-misc

Signed-off-by: David Sterba <[email protected]>
There's a report on the CI after base ubuntu image update:

geninfo: WARNING:
/home/runner/work/btrfs-progs/btrfs-progs/common/device-scan.c:429:
unexecuted block on non-branch line with non-zero hit count.  Use
"geninfo --rc geninfo_unexecuted_blocks=1 to set count to zero.
	(use "geninfo --ignore-errors gcov,gcov ..." to suppress this warning)

Signed-off-by: David Sterba <[email protected]>
The CI lcov generation fails due to:

Processing ./common/device-utils.gcda
geninfo: ERROR: Unexpected negative count '-6' for /home/runner/work/btrfs-progs/btrfs-progs/common/device-utils.h:69.
	Perhaps you need to compile with '-fprofile-update=atomic
	(use "geninfo --ignore-errors negative ..." to bypass this error)

Use a safer way to gather the profile stats.

Signed-off-by: David Sterba <[email protected]>
Use btrfs-sb-mod and do some arbitrary changes to super blocks. Then run
'btrfs check'. Check will fail to read the filesystem most of the time
and cannot do any actual repair but must not crash.

Signed-off-by: David Sterba <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants