04 — The String Encoding Mystery

"Hello" doesn’t mean "Hello" here.


The Bug

The grey screen was gone. The UI screens were visible. The text was garbled — random symbols, wrong characters, visual noise where words should be.

Text worked in the base game’s menus. It worked in the gear names, generated from lookup tables. Any text written direct in C strings came out as garbage.

The Discovery

Pokémon FireRed does not use ASCII.

The game has its own character encoding — a custom charmap defined in charmap.txt at the root of the project. In this encoding:

Character ASCII value FireRed value
A 0x41 0xBB
E 0x45 0xBF
a 0x61 0xD5
space 0x20 0x00
string terminator 0x00 0xFF

Write "Hello" in C and the compiler emits bytes 48 65 6C 6C 6F. FireRed’s text renderer wants C2 BF C6 C6 C9. Every character wrong. The compiler never said a word about it.

The _() Macro

The decompilation project has a solution: the _() macro. In the build pipeline:

source.c → cpp (C preprocessor) → preproc (custom tool) → cc1 (compiler) → as (assembler)

The custom preproc step intercepts _("text") and converts each character using charmap.txt. So _("Hello") becomes the correct byte sequence for FireRed’s encoding.

The catch: on macOS (for IDE support), _() is a no-op:

#if defined(__APPLE__)
#define _(x) (x)
#endif

This makes syntax highlighting and autocomplete work. During development, _("Hello") looks like a normal C string. It isn’t. The build pipeline transforms it.

The Double Catch

You can’t use _() inline. This fails:

// WRONG — _() emits a brace-enclosed initializer, not a pointer
AddTextPrinterParameterized(windowId, FONT_NORMAL, _("Hello"), 0, 0, 0, NULL);

Static constants are required:

// CORRECT
static const u8 sText_Hello[] = _("Hello");
AddTextPrinterParameterized(windowId, FONT_NORMAL, sText_Hello, 0, 0, 0, NULL);

_() expands to something like { 0xC2, 0xBF, 0xC6, 0xC6, 0xC9, 0xFF } — a brace initializer. It initializes an array. It can’t be used as an expression.

The Audit

Claude found raw ASCII strings — (const u8 *)"text" casts — in 4 of the UI files. The gear UI had been written correctly from the start (it used lookup tables populated with _()-encoded strings). The relic, altar, merchant, and chest UIs all had handwritten strings that bypassed encoding.

The fix was mechanical but thorough — replace every raw string with a static const u8 using _():

// Before (broken):
AddTextPrinterParameterized(win, font, (const u8 *)"EQUIPPED", x, y, 0, NULL);

// After (working):
static const u8 sText_Equipped[] = _("EQUIPPED");
AddTextPrinterParameterized(win, font, sText_Equipped, x, y, 0, NULL);

Other Charmap Surprises

The Lesson

GBA ROM hacking via decompilation looks like C, feels like C, but isn’t quite C. The build pipeline has invisible transformations. The character set is custom. The string terminator is 0xFF, not 0x00. You’re writing C that gets reprocessed through a tool that predates the compiler.

Once this was understood, garbled text never came back. It took debugging 4 separate UI files to fully settle the rule: every user-facing string must go through _().

By the Numbers

Metric Value
Commits ~9
Copilot requests ~39
Tool executions ~780
Sub-agents 3

Next: 05 — Making It Beautiful