Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structure layout disagreeing with C compilers #63

Open
diversenok opened this issue Dec 15, 2023 · 2 comments
Open

Structure layout disagreeing with C compilers #63

diversenok opened this issue Dec 15, 2023 · 2 comments

Comments

@diversenok
Copy link

diversenok commented Dec 15, 2023

I noticed that cstruct mixes field packing rules, which makes it compute wrong offsets/sizes and disagree with C compilers about the layout of some structures.

Here is an example C program to use as a staring point:

#include <windows.h>
#include <stdio.h>

typedef struct _TEST {
    __int32 Field32;
    __int64 Field64;
} TEST;

int wmain(int argc, wchar_t\* argv[]) {
    wprintf_s(L"End/size of Field32 = %zd\r\n", RTL_SIZEOF_THROUGH_FIELD(TEST, Field32));
    wprintf_s(L"Start of Field64 = %d\r\n", FIELD_OFFSET(TEST, Field64));
    wprintf_s(L"Total size of TEST = %zd\r\n", sizeof(TEST));
    return 0;
}

It gives the following output:

End/size of Field32 = 4
Start of Field64 = 8
Total size of TEST = 16

As you can see, the compiler inserted a 4-byte padding before Field64 due to its natural alignment of 8 and adjusted the total size to 16 bytes,

Now here is a program that parses the same definition using the default settings in cstruct:

from dissect import cstruct

cparser = cstruct.cstruct()
cparser.load("""
struct TEST {
    __int32 Field32;
    __int64 Field64;
}
""")

print('End/size of', cparser.TEST.fields[0].name, '=', cparser.TEST.fields[0].type.size)
print('Start of', cparser.TEST.fields[1].name, '=', cparser.TEST.fields[1].offset)
print('Total size', cparser.TEST.size, 'with alignment', cparser.TEST.alignment)

It gives a different result:

End/size of Field32 = 4
Start of Field64 = 4
Total size 12 with alignment 8

As you can see, cstruct did not insert a 4-byte padding before Field64.
Loading the same definition with align=True yields the correct result:

End/size of Field32 = 4
Start of Field64 = 8
Total size 16 with alignment 8

If I understand it right, align=False is supposed to act similar to wrapping the definition in #include <pshpack1.h>/#include <poppack.h> and should trigger compact packing of fields that ignores their natural alignment. This is not, however, the default behavior for C. I think you should consider changing the defaults to use align=True to make sure that cstruct's defaults agree with C defaults.

Additionally, it might be useful to add support for different packing overrides, i.e., #include <pshpack1.h>, #include <pshpack2.h>, #include <pshpack4.h>, etc.

UPD: removed the wrong example which lead to wrong conclusions, sorry about that

@diversenok
Copy link
Author

There is also a similar issue with unions:

#include <windows.h>
#include <stdio.h>

typedef union _TEST2 {
    __int64 Field64;
    __int32 Fields32x3[3];
} TEST2;

int wmain(int argc, wchar_t* argv[]) {
    wprintf_s(L"Total size %zd\r\n", sizeof(TEST2));
    return 0;
}

This C program outputs Total size 16. The union inherits its alignment of 8 from the first field and the size of 12 from the second. Because this result size does not agree with the alignment, the compiler ends up including a 4-byte tail padding. Surrounding the definition with #include <pshpack1.h>/#include <poppack.h> removes the padding and yields Total size 12.

Parsing the same definition with cstruct always prints Total size 12 with alignment 8 both when using align=True and align=False:

from dissect import cstruct

cparser = cstruct.cstruct()
cparser.load("""
union TEST2 {
    __int64 Field64;
    __int32 Fields32x3[3];
}
""", align=True)

print('Total size', cparser.TEST2.size, 'with alignment', cparser.TEST2.alignment)

So, it looks like you also forgot to align the total size in Union's _calc_size_and_offsets().

@Schamper
Copy link
Member

Schamper commented Dec 24, 2023

I don't think we should change the default for align, a lot of things will break. The most common use-case for dissect.cstruct is probably eyeballing hexdumps and unpacking (usually tightly packed) binary data. Doing alignment by default might be unexpected in those cases. At least internally, it's quite rare that we use dissect.cstruct with something we already have the structure definitions for (and that is aligned). dissect.cstruct originally didn't have the alignment feature for a very long time, it was only added just a few years ago by demand for working with memory dumps.

The alignment feature should be better documented though, with a few examples like the ones you demonstrated.

Additionally, it might be useful to add support for different packing overrides

Agreed. We do not currently have plans or a use-case for this ourselves, though. If you would be willing to work on this then your contribution would be more than welcome!

The union alignment looks like a bug that needs to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants