Comments (12)
As for the reverse-engineering process, I usually go as this:
Using BGB (for dynamic analysis)
- Compile the game (
make all
). This generates the game from 1. the (partially) disassembled sources, 2. the dumped binary banks. It producesgame.gbc
(a compiled rom identical to the original) and, more important,game.map
, the debug symbols. - Open
game.gbc
in the BGB emulator, which has a nice debugger. - Open the debugger, and jump to the
0000:0150
address. You'll see a function namedStart
. BGB knows the name of this function from the debug symbols.
From there the goal is to pick a function or a memory location, understand what it does, so we can label it in the disassembled code.
- Pick a function instruction (for instance
call label_0A43
) - Jump to this function (e.g. by placing a beakpoint)
- Understand what is does. For this you can either:
- read the assembly,
- see what memory location it reads or changes,
- observe the values changing in the memory viewer while the game runs,
- use the debugger to replace the function by a
nop
, and see what change occur in the game
- Open the assembly source (
src/main.asm
, orsrc/bank1.asm
) or memory map (src/constants/*
), and label the code or memory you identified the purpose of. - Rince and repeat.
Using awake
(for static analysis)
Awake is a static GameBoy assembly explorer, specialy tuned for exploring ZeldaGB and ZeldaDX. While still in experimental stage, it allows to identify functions, loops, and to jump easily from functions to function.
I'm currently writing some improvement to this tools, so that it can read debug symbols (otherwise no functions are labelled), and label functions from within the explorer. So this is still experimental.
from ladx-disassembly.
@Ayymoose @Drenn1 @Xkeeper0 btw, I just merged some improvements to this project:
- Added these "Disassembling HOWTOs" to the README.md file (plus some additional infos);
- Simplified the
src
directory organization: it should be easier to follow how sources are laid out.
This should make it easier to understand how to contribute. If the project structure or tools weren't that clear before, please take a new look !
from ladx-disassembly.
Link's Awakening is quite large, it seems :)
All 61 banks seem to be holding code or data. Plus there are some additional graphics and routines for the DX edition.
from ladx-disassembly.
Thanks for your reply. I was also wondering why there is so much dead code in the disassembly? For example, in bank0.asm there are quite a few labels whose sole purpose is just to execute nop instructions. Also in bank20.asm, the starting label at the top is perforated with nops between instructions. Is this to implement some kind of dela or for some hardware register to increment or something like that? I'm just very curious.
from ladx-disassembly.
That's not actually code, it's data being misinterpreted as code. Not that I know what the data represents, though. Separating the data from the code is one of the more time-consuming tasks. There may be specialized emulators that can help with that, but personally I didn't go with that approach in my disassembly. (It makes little difference if the final goal is to label everything by hand, anyway.)
The files in the "disassembled-banks" folder are just for reference, anyway, they're not actually being assembled.
from ladx-disassembly.
I thought it was actual code so I forked this repository and tried to decipher the disassembly myself. I assumed that the data would not be mixed in with code. Why does this happen?
Also could you tell me your approach for analysing the disassembly or how you started out this whole project? For me, I basically whipped out the Z80 opcode table and memory map from the manual and tried to follow through the code (main.asm) at the top but it quickly gets confusing because some labels are referenced but I cannot find the code that follows in any of the files.
from ladx-disassembly.
There's no way to tell the difference between code and data. The Game Boy will start executing from a certain location; the only way to tell is to actually track what code is being run or to "guess" at possible execution paths, both of which have pitfalls in that you might run into a glitch or unexpected path that throws everything off.
E: It's also possible that some ROM data is actually both, for example using some block of code for pseudo-randomness.
from ladx-disassembly.
Actually there are some ways to tell the difference between code and data.
First we can assume most banks contain either code or data (and not mixed content). This is not always true, but it helps. Then we can attempt to convert a whole binary bank to PNG, and see if we recognize sprites in the resulting picture. For this you can use the gfx.py
script in this repository.
- Take a binary bank you want to look at from
bin/banks
(dumped from the original rom). For instance the bank2F
. - Copy it somewhere, and rename it to add a
.2bpp
file extention - Run gfx.py to convert it to png:
./gfx.py png bank_2F_BC000.bin.2bpp
- Look at the resulting
bank_2F_BC000.bin.png
.
If you recognize pictures and sprites in the resulting png picture, congratulation, you found a gfx data bank! But if it all looks garbled, this is probably a code bank, or a bank that contains other data (like dungeon maps, or ennemies stats).
If you recognize pictures, you can now even move the PNG file into the src/gfx
directory. Then edit the main.asm
file to tell "To compile bank 2F, instead of importing the binary bank from bin/banks/bank_2F_BC000.bin, use the data from src/gfx/bank_2F.png`. When compiling the ROM, the Makefile will convert the png file back to a 2bpp binary file, and inject it into the ROM.
Once this is done, you can even start splitting this large PNG file into some smaller fragments, sprite-per-sprite (have a look at src/gfx
to see some already extracted sprites – and also that much works still needs to be done :) )
Another possibility is to read some existing documentation about the banks. See for instance this bank-map, made by @devdri)
from ladx-disassembly.
It is, of course, possible to differentiate between code and data. You just can't rely on a computer to do it for you. Not 100%.
from ladx-disassembly.
Thanks for all your replies, it has cleared some things up for me. I was wondering if any of you have identified the contents of all 61 banks?
from ladx-disassembly.
I was wondering if any of you have identified the contents of all 61 banks?
Not everything yet, but it's coming together ! (see the updated ROM map)
I labeled and extracted all the graphics banks. Now the missing part is to identify data (dungeons, etc), the remaining of the code, and the audio files.
from ladx-disassembly.
I'm closing this issue, as the original question was answered. Thanks for the discussion!
from ladx-disassembly.
Related Issues (20)
- Regarding tilemap and attrmap encoding HOT 1
- Name the few unamed tilemaps HOT 3
- Label palettes pointer tables
- Convert palettes to RGB format
- Rename `Velocity` to `Speed` HOT 5
- Find what `LINK_MOTION_UNKNOWN_0A` is used for HOT 2
- Make Octorock / Octorok spelling consistent HOT 1
- Make data formats friendlier to external editors HOT 2
- Label palettes data HOT 4
- Document entities helper functions
- Rename 'cyclop key' to 'slime key' HOT 1
- Make WRAM shiftable
- Rename hMapId HOT 1
- Label dialog constants
- `convert_background.py` fails on `menu_file_selection_commands.tilemap.encoded` HOT 5
- Makefile: add targets for Debug builds HOT 1
- convert_background should ignore filler byte for game_over.tilemap
- Some gfx need to be processed
- Some palettes need to be converted
- Consider removing the `; $ADDR: $BY $TE` comments? HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ladx-disassembly.