04 — The String Encoding Mystery
"Hello" doesn’t mean "Hello" here.
The Bug
The grey screen was gone. The UI screens were visible. The text was garbled — random symbols, wrong characters, visual noise where words should be.
Text worked in the base game’s menus. It worked in the gear names, generated from lookup tables. Any text written direct in C strings came out as garbage.
The Discovery
Pokémon FireRed does not use ASCII.
The game has its own character encoding — a custom charmap defined in charmap.txt at the root of the project. In this encoding:
| Character | ASCII value | FireRed value |
|---|---|---|
A |
0x41 | 0xBB |
E |
0x45 | 0xBF |
a |
0x61 | 0xD5 |
| space | 0x20 | 0x00 |
| string terminator | 0x00 | 0xFF |
Write "Hello" in C and the compiler emits bytes 48 65 6C 6C 6F. FireRed’s text renderer wants C2 BF C6 C6 C9. Every character wrong. The compiler never said a word about it.
The _() Macro
The decompilation project has a solution: the _() macro. In the build pipeline:
source.c → cpp (C preprocessor) → preproc (custom tool) → cc1 (compiler) → as (assembler)
The custom preproc step intercepts _("text") and converts each character using charmap.txt. So _("Hello") becomes the correct byte sequence for FireRed’s encoding.
The catch: on macOS (for IDE support), _() is a no-op:
#if defined(__APPLE__)
#define _(x) (x)
#endif
This makes syntax highlighting and autocomplete work. During development, _("Hello") looks like a normal C string. It isn’t. The build pipeline transforms it.
The Double Catch
You can’t use _() inline. This fails:
// WRONG — _() emits a brace-enclosed initializer, not a pointer
AddTextPrinterParameterized(windowId, FONT_NORMAL, _("Hello"), 0, 0, 0, NULL);
Static constants are required:
// CORRECT
static const u8 sText_Hello[] = _("Hello");
AddTextPrinterParameterized(windowId, FONT_NORMAL, sText_Hello, 0, 0, 0, NULL);
_() expands to something like { 0xC2, 0xBF, 0xC6, 0xC6, 0xC9, 0xFF } — a brace initializer. It initializes an array. It can’t be used as an expression.
The Audit
Claude found raw ASCII strings — (const u8 *)"text" casts — in 4 of the UI files. The gear UI had been written correctly from the start (it used lookup tables populated with _()-encoded strings). The relic, altar, merchant, and chest UIs all had handwritten strings that bypassed encoding.
The fix was mechanical but thorough — replace every raw string with a static const u8 using _():
// Before (broken):
AddTextPrinterParameterized(win, font, (const u8 *)"EQUIPPED", x, y, 0, NULL);
// After (working):
static const u8 sText_Equipped[] = _("EQUIPPED");
AddTextPrinterParameterized(win, font, sText_Equipped, x, y, 0, NULL);
Other Charmap Surprises
- Tilde (
~) is not in the charmap → use hyphen (-) - Left bracket (
[) is not in the charmap → use parentheses RIGHT_ARROWis character0x7C(not0x72as initially assumed)- Ellipsis
…IS in the charmap as0xB0— a happy surprise - The yen symbol
¥is0xB7— adopted as the shard currency symbol {A_BUTTON}and{B_BUTTON}are special escape sequences for button glyphs
The Lesson
GBA ROM hacking via decompilation looks like C, feels like C, but isn’t quite C. The build pipeline has invisible transformations. The character set is custom. The string terminator is 0xFF, not 0x00. You’re writing C that gets reprocessed through a tool that predates the compiler.
Once this was understood, garbled text never came back. It took debugging 4 separate UI files to fully settle the rule: every user-facing string must go through _().
By the Numbers
| Metric | Value |
|---|---|
| Commits | ~9 |
| Copilot requests | ~39 |
| Tool executions | ~780 |
| Sub-agents | 3 |
Next: 05 — Making It Beautiful