Skip to content

Commit

Permalink
Update wiki (#293)
Browse files Browse the repository at this point in the history
* Update wiki

* address pr comments, tweak a bit more

* INCLUDE_RODATA

* Update docs/General-Workflow.md

Co-authored-by: Anghelo Carvajal <[email protected]>

---------

Co-authored-by: Anghelo Carvajal <[email protected]>
  • Loading branch information
mkst and AngheloAlf authored Oct 2, 2023
1 parent 90bc096 commit dc33b04
Show file tree
Hide file tree
Showing 5 changed files with 123 additions and 79 deletions.
6 changes: 3 additions & 3 deletions docs/Adding-Symbols.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Symbols (i.e. labelling a function or variable) are controlled by the `symbols_addrs.txt` file.
Symbols (i.e. labelling a function or variable) are controlled by the `symbols_addrs.txt` file.

The format for defining symbols is:

Expand All @@ -10,7 +10,7 @@ e.g.
osInitialize = 0x801378C0; // type:func
```

:information_source: The file used can be overridden via the `symbol_addrs_path` setting in the global `options` section of the splat yaml. This option can also accept multiple paths, allowing to organize symbols in multiple files.
:information_source: The file used can be overridden via the `symbol_addrs_path` setting in the global `options` section of the splat yaml. This option can also accept a list of paths, allowing for symbols to be organized in multiple files.

## symbol

Expand All @@ -34,7 +34,7 @@ Override splat's automatic type detection, possible values are:
- `s8`, `u8`: To specify data/rodata to be disassembled as `.byte`s
- `s16`, `u16`: To specify data/rodata to be disassembled as `.short`s
- `s32`, `u32`: To specify data/rodata to be disassembled as `.word`s (the default)
- `s64`, `u64`: :man_shrugging:
- `s64`, `u64`: :man_shrugging:
- `f32`, `Vec3f`: To specify data/rodata to be disassembled as `.float`s
- `f64`: To specify data/rodata to be disassembled as `.double`s
- `asciz`, `char*`, `char`: C strings (disassembled as `.asciz`)
Expand Down
1 change: 1 addition & 0 deletions docs/Advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ The following list contains examples of custom segments:

- [RNC](https://github.com/mkst/sssv/blob/master/tools/splat_ext/rnc.py)
- [Vtx](https://github.com/mkst/sssv/blob/master/tools/splat_ext/sssv_vtx.py)
- [Multiple](https://github.com/pmret/papermario/tree/main/tools/splat_ext)
122 changes: 72 additions & 50 deletions docs/General-Workflow.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
This describes an example of how to iteratively edit the splat segments config, when decompiling
This describes an example of how to iteratively edit the splat segments config in order to maximise code and data migration from the binary.

(If you have no idea what this is about, please head over to the [Quickstart](https://github.com/ethteck/splat/wiki/Quickstart) to get an initial configuration for your ROM.)
# 1 Initial configuration

# 1 Initially
After successfully following the [Quickstart](https://github.com/ethteck/splat/wiki/Quickstart), you should have an initial configuration like the one below:

After succesfully following the [Quickstart](https://github.com/ethteck/splat/wiki/Quickstart), you should get an initial configuration like the one below:
```yaml
- name: main
type: code
Expand All @@ -16,37 +15,44 @@ After succesfully following the [Quickstart](https://github.com/ethteck/splat/wi
- [0x1060, asm]
# ... a lot of additional `asm` sections
# This section is found out to contain __osViSwapContext
- [0x25C20, asm, "energy_orb_wave"]
# ... a lot of addtiional `asm` sections
- [0x25C20, asm, energy_orb_wave]
# ... a lot of additional `asm` sections
- [0x2E450, data]

- [0x3E330, rodata]
# ... a lot of addtional `rodata` sections
# ... a lot of additional `rodata` sections
- { start: 0x3F1B0, type: bss, vram: 0x800E9C20 }

- [0x3F1B0, bin]
```
## 1.1 Match rodata to asm sections
## 1.1 Match `rodata` to `asm` sections

It's good practice to start pairing `rodata` sections with `asm` sections _before_ changing the `asm` sections into `c` files. This is because rodata may need to be explicitly included within the `c` file (via `INCLUDE_RODATA` or `GLOBAL_ASM` macros).

In order to simplify decompilation, it's good practice to start pairing `rodata` sections with `asm` sections.
`splat` provides hints about which `rodata` segments are referenced by which `asm` segments based on references to these symbols within the disassembled functions.

`splat` gives hints about what `rodata` is used in which `asm` segment. These look like:
These messages are output when splitting and look like:

```
Rodata segment '3EE10' may belong to the text segment 'energy_orb_wave'
Based on the usage from the function func_0xXXXXXXXX to the symbol D_800AEA10
```

To pair these two sections, simply add the name of the suggested text (`asm`) segment to the `rodata` segment:
To pair these two sections, simply add the _name_ of the suggested text (i.e. `asm`) segment to the `rodata` segment:

```yaml
- [0x3EE10, rodata, "energy_orb_wave"]
- [0x3EE10, rodata, energy_orb_wave] # segment will be paired with a text (i.e. asm or c) segment named "energy_orb_wave"
```

### Useful knowledge about splitting
**NOTE:**

By default `migrate_rodata_to_functions` functionality is enabled. This causes splat to include paired rodata along with the disassembled assembly code, allowing it to be linked via `.rodata` segments from the get-go. This guide assumes that you will disable this functionality until you have successfully paired up the segments.

### Troubleshooting

#### Multiple `rodata` segments for a single text segment

#### Multiple `rodata`
Using the following configuration:
```yaml
# ...
Expand All @@ -65,43 +71,57 @@ Rodata segment '3E930' may belong to the text segment '16100'
Based on the usage from the function func_800862C0 to the symbol jtbl_800AE530
```

This hint tells you that `splat` thinks one text (`asm`) segment seems to have two `rodata` sections. This usually means that either there should not be a split at `0x3E930` or `0x16100` is missing a file split, since one text segment should only have one `rodata` segment.
This hint tells you that `splat` believes one text segment references two `rodata` sections. This usually means that either the `rodata` should not be split at `0x3E930`, or that there is a missing split in the `asm` at `0x16100`, as a text segment can only have one `rodata` segment.

Please note that this could be a false positive, and you should do your own investigation to figure out the truth. If you, however, feel confident that the `rodata` should not be split, simply remove the second split from the configuration:
If we assume that the rodata split is incorrect, we can remove the extraneous split:

```yaml
# ...
- [0x3E900, rodata, "16100"]
# begone!
# ...
```

### **TODO** Multiple `asm` referring to the same `rodata`
**NOTE:** Splat uses heuristics to determine `rodata` and `asm` splits and is not perfect - false positives are possible and, if in doubt, double-check the assembly yourself before changing the splits.


### Multiple `asm` segments referring to the same `rodata` segment

Sometimes the opposite is true, and `splat` believes two `asm` segments belong to a single `rodata` segment. In this case, you can split the `asm` segment to make sure two files are not paired with the same `rodata`. Note that this too can be a false positive.

Sometimes the opposite from above is true, and `splat` shows you two `asm` segments belonging to one `rodata` segment. In this case, try to split the `asm` segment to make sure two files are not paired with the same `rodata`. Note this too can be a false positive.

# 2 Disassemble text, data, rodata

Let's say you want to start decompiling the subsegment at `0x25C20` (`energy_orb_wave`). Start by replacing the `asm` type with `c`.
Let's say you want to start decompiling the subsegment at `0x25C20` (`energy_orb_wave`). Start by replacing the `asm` type with `c`, and then re-run splat.

```yaml
- [0x25C20, c, energy_orb_wave]
# ...
# ...
- [0x3EE10, rodata, energy_orb_wave]
```

This will disassemble `0x25C20` to individual `.s` files for each function found. The output will be located in `asm/nonmatchings/energy_orb_wave` (depending on the `asm_path` setting, found in the configuration).
This will disassemble the ROM at `0x25C20` as code, creating individual `.s` files for each function found. The output will be located in `{asm_path}/nonmatchings/energy_orb_wave/<function_name>.s`.

It will also generate `asm/energy_orb_wave.data.s` (if it is paired with a `data` segment), and `energy_orb_wave.rodata.s` (using information gained during the disassembly of the functions).
Assuming `data` and `rodata` segments have been paired with the `c` segment, splat will generate `{asm_path}/energy_orb_wave.data.s` and `{asm_path}/energy_orb_wave.rodata.s` respectively.

Finally, it will generate a C file at `src/energy_orb_wave.c` (depending on the `src_path` setting, found in the configuration) containing `GLOBAL_ASM()` and `GLOBAL_RODATA()` macros to include all disassembled functions. (This macro is to be defined in an included header, which splat currently does not produce. For an example, see [the include.h for Dr. Mario](https://github.com/AngheloAlf/drmario64/blob/master/include/include_asm.h),)
Finally, splat will generate a C file, at `{src_path}/energy_orb_wave.c` containing macros that will be used to include all disassembled function assembly.

**NOTE:**
- the path for where assembly is written can be configured via `asm_path`, the default is `{base_dir}/asm`
- the source code path can be configured via `src_path`, the default is `{base_path}/src`

## Macros

The macros to include text/rodata assembly are different for GCC vs IDO compiler:

**GCC**: `INCLUDE_ASM` & `INCLUDE_RODATA` (text/rodata respectively)
**IDO**: `GLOBAL_ASM`

These macros must be defined in an included header, which splat currently does not produce.

For a GCC example, see the [include.h](https://github.com/AngheloAlf/drmario64/blob/master/include/include_asm.h) from the Dr. Mario project.

For IDO, you will need to use [asm-processor](https://github.com/simonlindholm/asm-processor) in order to include assembly code within the c files.

Figuring out the data and rodata addresses is to be done manually. Just disassembling the whole segment may help:
```yaml
- [0x42100, c, energy_orb_wave]
```
to locate data

# 3 Decompile text

Expand All @@ -111,47 +131,49 @@ This involved back and forth between `.c` and `.s` files:
- decompiling functions, declaring symbols (`extern`s) in the `.c`

The linker script links
- `.text` (only) from the .o built from `energy_orb_wave.c`
- `.data` (only) from the .o built from `energy_orb_wave.data.s`
- `.rodata` (only) from the .o built from `energy_orb_wave.rodata.s`

.data (respectively .rodata) is not linked from the .o built from `energy_orb_wave.c`, because the subsegments include segments with the `data` (respectively `rodata`) segment type
- `.text` (only) from the `.o` built from `energy_orb_wave.c`
- `.data` (only) from the `.o` built from `energy_orb_wave.data.s`
- `.rodata` (only) from the `.o` built from `energy_orb_wave.rodata.s`

# 4 Decompile data
# 4 Decompile (ro)data

Move (decompile) data/rodata to the .c, using structs or relying on strings used in the code, or other things
Migrate data to the .c file, using raw values, lists or structs as appropriate code.

Again, the .data/.rodata sections from the .o built from the .c will not be linked as long as there are any `data`/`rodata` subsegment in the code segment (and not just for `energy_orb_wave`, any other subsegment too)
Once you have paired the rodata and text segments together, you can enabled `migrate_rodata_to_functions`. This will add the paired rodata into each individual function's assembly file, and therefore, the rodata will end up in the compiled .o file.

To link the .data/.rodata from the .o built from the .c (instead of from the .s files), the subsegments should be changed from
To link the .data/.rodata from the .o built from the .c file (instead of from the .s files), the subsegments must be changed from:

```yaml
- [0x42100, c, energy_orb_wave]
- [0x42200, data, energy_orb_wave]
- [0x42300, rodata, energy_orb_wave]
- [0x42200, data, energy_orb_wave] # extract data at this ROM address as energy_orb_wave.data.s
- [0x42300, rodata, energy_orb_wave] # extract rodata at this ROM address as energy_orb_wave.rodata.s
```

to
to:

```yaml
- [0x42100, c, energy_orb_wave]
- [0x42200, .data, energy_orb_wave]
- [0x42300, .rodata, energy_orb_wave]
- [0x42200, .data, energy_orb_wave] # take the .data section from the compiled c file named energy_orb_wave
- [0x42300, .rodata, energy_orb_wave] # take the .rodata section from the compiled c file named energy_orb_wave
```

If using `auto_all_section` and there is no other `data`/`.data`/`rodata`/`.rodata` in the subsegments in the code segment, the subsegments can also be changed to

**NOTE:**
If using `auto_all_section` and there are no other `data`/`.data`/`rodata`/`.rodata` in the subsegments in the code segment, the subsegments can also be changed to

```yaml
- [0x42100, c, energy_orb_wave]
- [0x42200]
```

# 5 Done!
# 5 Decompile bss

`.text`, `.data` and `.rodata` are linked from the .o built from `energy_orb_wave.c` which now has everything to match when building
`bss` works in a similar way to data/rodata. However, `bss` is usually discarded from the final binary, which makes it somewhat tricker to migrate.

The assembly files (functions .s, data.s and rodata.s files) can be deleted
The `bss` segment will create assembly files that are full of `space`. The `.bss` segment will link the `.bss` section of the referenced `c` file.

# 6 Done!

# BSS
`.text`, `.data`, `.rodata` and `.bss` are linked from the .o built from `energy_orb_wave.c` which now has everything to match when building

Note: this explanation lacks .bss handling
The assembly files (functions .s, data.s and rodata.s files) can be deleted
14 changes: 8 additions & 6 deletions docs/Home.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,23 @@
### What is splat?

**splat** is a binary splitting tool, written in Python.
**splat** is a binary splitting tool, written in Python. Its goal is to support the successful disassembly and then rebuilding of binary data.

It is the spiritual successor to [n64split](https://github.com/queueRAM/sm64tools/blob/master/n64split.c). Originally written to handle N64 ROMs, it also has limited support for PSX binaries.
It is the spiritual successor to [n64split](https://github.com/queueRAM/sm64tools/blob/master/n64split.c), originally written to handle N64 ROMs, it now has limited support for PSX and PS2 binaries.

MIPS code disassembly is handled via [spimdisasm](https://github.com/Decompollaborate/spimdisasm/).

There are a number of asset types built-in (e.g. various image formats, N64 Vtx data, etc), and it is designed to be simple to write your own custom types that can do anything you want and fit right into the splat pipeline.
There are a number of asset types built-in (e.g. various image formats, N64 Vtx data, etc), and it is designed to be simple to extend by writing your own custom types that can do anything you want as part of the **splat** pipeline.


### How does it work?

**splat** takes a [yaml](https://en.wikipedia.org/wiki/YAML) configuration file which tell it *where* and *how* to split a given file. Splat loads the yaml and an optional "symbol_addrs" file that can give it information about symbols that will be used during disassembly. It then runs the two main phases: scan and split.
**splat** takes a [yaml](https://en.wikipedia.org/wiki/YAML) configuration file which tells it *where* and *how* to split a given file. Symbols can be mapped to addresses (and their types provided) via an optional "symbol_addrs" file.

The scan phase is for making a first pass over the data and for doing initial disassembly. During the split phase, information gathered during the scan phase is used and files are written out to disk.
**splat** runs two distinct phases: scan and split.

After scanning and splitting, splat will output a linker script that can be used to re-build the input file.
The _scan_ phase makes a first pass over the data and performs the initial disassembly of code and data. During the _split_ phase, information gathered during the _scan_ phase is used and files & data are written out to disk.

After scanning and splitting, **splat** will output a linker script that can be used as part of re-building the input file.


### Sounds great, how do I get started?
Expand Down
Loading

0 comments on commit dc33b04

Please sign in to comment.