Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial register and scattered arguments should be using the partialregister_t instead of phrase_t. #195

Open
arizvisa opened this issue Jul 23, 2024 · 1 comment
Assignees
Labels

Comments

@arizvisa
Copy link
Owner

arizvisa commented Jul 23, 2024

The function.args.storage function and its peer (function.arg.storage) supports the ALOC_REG1, ALOC_RREL, and ALOC_DIST argument locations using a phrase_t. This is wrong. The semantics of phrase_t are essentially "register+offset" whereas ALOC_REG1 is used to represent a field that is specified at a specific byte within a register. Thus, the type that these functions return should actually be partialregister_t. The partialregister_t class was introduced by commit 89dd9fe in order to represent an interval spread through a single register. If I recall correctly, I hacked this together in order to support some parts of ALOC_DIST that I had encountered.

To elaborate, the boundaries for ALOC_REG1 are determined by the "regoff" field and the size for the type, whereas the "regidx" field contains the register being described. The ALOC_RREL location type, despite not having any of IDA's magic syntax (represented via "name@<[reg+offset]>") should represent a "register+offset" (hence, correct in returning a phrase_t). Presently, the ALOC_STATIC type returns a straight-up integer. Returning an integer for ALOC_STATIC, however, might be incorrect as technically it has an address and a size (as per the type) which allows it to fit into a proper location_t...sorta like ALOC_STACK.

One thing that's worth noting is that the documentation for the "REG^OFFSET.SIZE" syntax (SetType) specifies that all these values should be in bytes. I noticed, however, that you can use "int@<eax^31.1>" or "int@<eax^30.2>" in a 32-bit 8.4 database (without complaints), whereas "int@<eax^31.2>", "int@<eax^32.1>", and "int@<eax^32.0> all produce an error (the latter is probably due to the 0-size). Despite this, when analysis is turned on, IDA corrects the "SIZE" to the size of the type. Perhaps Hex-Rays is planning to change the semantics of this from bytes to bits in the future in order to facilitate work on the decompiler, or perhaps it is just a bug. I haven't tested how this works with bitfield types or anything (to confirm), but this is probably worth a support@ email to confirm...

@arizvisa arizvisa added the bug label Jul 23, 2024
@arizvisa arizvisa self-assigned this Jul 23, 2024
@arizvisa
Copy link
Owner Author

arizvisa commented Jul 23, 2024

Fixing this issue will require updating the implementation of tinfo.location from the base/_interface.py module, but will also require correcting the function.type.*.storage functions which seem to explicitly assume that a tuple is a "register+offset" pair. So to fix this, the function.type.*.storage implementation will probably need to avoid this assumption and perform a type check to differentiate what is being returned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant