Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SH2 Improvements #113

Open
sozud opened this issue Jun 19, 2023 · 1 comment
Open

SH2 Improvements #113

sozud opened this issue Jun 19, 2023 · 1 comment

Comments

@sozud
Copy link
Contributor

sozud commented Jun 19, 2023

I'm running into some issues with data making decompiling SH2 annoying. With SH2, there's only 8-bit immediates, so most data is loaded with pc-relative instructions, and the data is interspersed with the function. This scratch is an example of the sort of issues: https://decomp.me/scratch/SUNET

Lines 18-20 in the asm is actually a jump table.
The 0xB0 in the switch is at line 5A in the asm.
Lines 5C-5E in the asm is the pointer to D_800A7734.

Longer functions have this issue worse since the pc-relative offset is limited so there will be a block instructions and a jump, a block of data, the next block of instructions and a jump, a block of data, etc.

I've been thinking about different ways to solve this issue and was wondering if anyone has suggestions. Do any other architectures have this sort of problem?

I've though about:

Replace objdump with a better disassembler for the SH2 case? My disassembler has gnu as-compatible output but it's written in rust so that would be a significant dependency.
Add more parsing to asm-differ to try the make the objdump output better? I'm not sure exactly how much additional parsing is needed but it seems like it would be basically re-implementing a disassembler for certain patterns.
(Not really asm-differ related) Allow linker arguments in decomp.me so that the pointers can be set to the right locations? This would help with cases like struct->offset where the offset and the base pointer added together by gcc.

@simonlindholm
Copy link
Owner

I've been thinking about different ways to solve this issue and was wondering if anyone has suggestions. Do any other architectures have this sort of problem?

I think arm does, but its offsets reach further so it's less of a problem. Haven't used it myself though.

Switching to a different disassembler seems fine to me as long as it's an optional dependency, and as long as the different syntax doesn't end up bloating the code too much by requiring SH2-specific handling everywhere. (So some amount of objdump mimicry might be required? Or we could change the code to separate instruction parsing from diffing, having all other arches use regexes just for instruction parsing instead of the current mess, but that would be a much larger change.)

I wonder if objdump could be made into emitting .word for all literal pools by marking those parts of the .o as not ST_FUNC. Maybe some tool could fix up .o files just before diffing or just after compilation?

(Not really asm-differ related) Allow linker arguments in decomp.me so that the pointers can be set to the right locations? This would help with cases like struct->offset where the offset and the base pointer added together by gcc.

decomp.me doesn't run a linking step atm. Is the goal here to avoid diffs like 0x1244 vs D_1234 + 0x10? If so that's normally solved by improving the disassembly and using relocs (e.g. lw $a0, %lo(D_1234 + 0x10)($a0) or .text D_1234 + 0x10 in MIPS), which is nice because it preserves names in the output.

From your scratch it looks like asm-differ might be ignoring relocs altogether, which isn't great (so the above wouldn't even be 0x1244 vs D_1234 + 0x10, it'd be 0x1244 vs 0x10). But I understand that it might be needed if relocs could apply to literal pools and crash asm-differ if it's not aware those are literal pools...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants