04 — The String Encoding Mystery

Why "Hello" isn’t "Hello" on a GBA


The Bug

After fixing the grey screen, I could finally see my UI screens. But the text was garbled — random symbols, wrong characters, visual noise where words should be.

The text worked in the base game’s menus. It worked in the gear names (generated from lookup tables). But any text written directly in C strings came out as garbage.

The Discovery

Pokémon FireRed does not use ASCII.

The game has its own character encoding — a custom charmap defined in charmap.txt at the root of the project. In this encoding:

Character ASCII value FireRed value
A 0x41 0xBB
E 0x45 0xBF
a 0x61 0xD5
space 0x20 0x00
string terminator 0x00 0xFF

When you write "Hello" in C, the compiler emits bytes 48 65 6C 6C 6F. But FireRed’s text renderer expects C2 BF C6 C6 C9. Every character is wrong.

The _() Macro

The decompilation project has a solution: the _() macro. In the build pipeline:

source.c → cpp (C preprocessor) → preproc (custom tool) → cc1 (compiler) → as (assembler)

The custom preproc step intercepts _("text") and converts each character using charmap.txt. So _("Hello") becomes the correct byte sequence for FireRed’s encoding.

Here’s the catch: on macOS (for IDE support), _() is defined as a no-op:

#if defined(__APPLE__)
#define _(x) (x)
#endif

This makes syntax highlighting and autocomplete work. But it means that during development, _("Hello") looks like it produces a normal C string. It doesn’t. The build pipeline transforms it.

The Double Catch

You can’t use _() inline. This fails:

// WRONG — _() emits a brace-enclosed initializer, not a pointer
AddTextPrinterParameterized(windowId, FONT_NORMAL, _("Hello"), 0, 0, 0, NULL);

You must use static constants:

// CORRECT
static const u8 sText_Hello[] = _("Hello");
AddTextPrinterParameterized(windowId, FONT_NORMAL, sText_Hello, 0, 0, 0, NULL);

This is because _() expands to something like { 0xC2, 0xBF, 0xC6, 0xC6, 0xC9, 0xFF } — a brace initializer, not a string literal. It can initialize an array but can’t be used as an expression.

The Audit

Claude found raw ASCII strings (using (const u8 *)"text" casts) in 4 of the UI files. The gear UI had been written correctly from the start (it used lookup tables populated with _()-encoded strings), but the relic, altar, merchant, and chest UIs all had handwritten strings that bypassed encoding.

The fix was mechanical but tedious — replace every raw string with a static const u8 using _():

// Before (broken):
AddTextPrinterParameterized(win, font, (const u8 *)"EQUIPPED", x, y, 0, NULL);

// After (working):
static const u8 sText_Equipped[] = _("EQUIPPED");
AddTextPrinterParameterized(win, font, sText_Equipped, x, y, 0, NULL);

Other Charmap Surprises

The Lesson

GBA ROM hacking via decompilation looks like C, feels like C, but isn’t quite C. The build pipeline has invisible transformations. The character set is custom. The string terminator is 0xFF, not 0x00. You’re writing C that gets reprocessed through a tool that predates the compiler.

Once I understood this, garbled text never appeared again. But it took debugging 4 separate UI files to fully internalize the rule: every user-facing string must go through _().

By the Numbers

Metric Value
Commits ~9
Copilot requests ~39
Tool executions ~780
Sub-agents 3

Next: 05 — Making It Beautiful