04 — The String Encoding Mystery
Why "Hello" isn’t "Hello" on a GBA
The Bug
After fixing the grey screen, I could finally see my UI screens. But the text was garbled — random symbols, wrong characters, visual noise where words should be.
The text worked in the base game’s menus. It worked in the gear names (generated from lookup tables). But any text written directly in C strings came out as garbage.
The Discovery
Pokémon FireRed does not use ASCII.
The game has its own character encoding — a custom charmap defined in charmap.txt at the root of the project. In this encoding:
| Character | ASCII value | FireRed value |
|---|---|---|
A |
0x41 | 0xBB |
E |
0x45 | 0xBF |
a |
0x61 | 0xD5 |
| space | 0x20 | 0x00 |
| string terminator | 0x00 | 0xFF |
When you write "Hello" in C, the compiler emits bytes 48 65 6C 6C 6F. But FireRed’s text renderer expects C2 BF C6 C6 C9. Every character is wrong.
The _() Macro
The decompilation project has a solution: the _() macro. In the build pipeline:
source.c → cpp (C preprocessor) → preproc (custom tool) → cc1 (compiler) → as (assembler)
The custom preproc step intercepts _("text") and converts each character using charmap.txt. So _("Hello") becomes the correct byte sequence for FireRed’s encoding.
Here’s the catch: on macOS (for IDE support), _() is defined as a no-op:
#if defined(__APPLE__)
#define _(x) (x)
#endif
This makes syntax highlighting and autocomplete work. But it means that during development, _("Hello") looks like it produces a normal C string. It doesn’t. The build pipeline transforms it.
The Double Catch
You can’t use _() inline. This fails:
// WRONG — _() emits a brace-enclosed initializer, not a pointer
AddTextPrinterParameterized(windowId, FONT_NORMAL, _("Hello"), 0, 0, 0, NULL);
You must use static constants:
// CORRECT
static const u8 sText_Hello[] = _("Hello");
AddTextPrinterParameterized(windowId, FONT_NORMAL, sText_Hello, 0, 0, 0, NULL);
This is because _() expands to something like { 0xC2, 0xBF, 0xC6, 0xC6, 0xC9, 0xFF } — a brace initializer, not a string literal. It can initialize an array but can’t be used as an expression.
The Audit
Claude found raw ASCII strings (using (const u8 *)"text" casts) in 4 of the UI files. The gear UI had been written correctly from the start (it used lookup tables populated with _()-encoded strings), but the relic, altar, merchant, and chest UIs all had handwritten strings that bypassed encoding.
The fix was mechanical but tedious — replace every raw string with a static const u8 using _():
// Before (broken):
AddTextPrinterParameterized(win, font, (const u8 *)"EQUIPPED", x, y, 0, NULL);
// After (working):
static const u8 sText_Equipped[] = _("EQUIPPED");
AddTextPrinterParameterized(win, font, sText_Equipped, x, y, 0, NULL);
Other Charmap Surprises
- Tilde (
~) is not in the charmap → use hyphen (-) - Left bracket (
[) is not in the charmap → use parentheses RIGHT_ARROWis character0x7C(not0x72as initially assumed)- Ellipsis
…IS in the charmap as0xB0— a happy surprise - The yen symbol
¥is0xB7— I adopted it as the shard currency symbol {A_BUTTON}and{B_BUTTON}are special escape sequences for button glyphs
The Lesson
GBA ROM hacking via decompilation looks like C, feels like C, but isn’t quite C. The build pipeline has invisible transformations. The character set is custom. The string terminator is 0xFF, not 0x00. You’re writing C that gets reprocessed through a tool that predates the compiler.
Once I understood this, garbled text never appeared again. But it took debugging 4 separate UI files to fully internalize the rule: every user-facing string must go through _().
By the Numbers
| Metric | Value |
|---|---|
| Commits | ~9 |
| Copilot requests | ~39 |
| Tool executions | ~780 |
| Sub-agents | 3 |
Next: 05 — Making It Beautiful