Recently, I’ve been working on a translation of a Japanese PlayStation Portable game called Nichijou: Uchuujin.
Much of the technical work needed for the translation of Nichijou: Uchuujin was done by other people! I’d like to thank FShadow, jjjewel, JamRules, and CompCom of GBATemp.net for the RE work they’ve done on the game engine Uchuujin runs on. Thanks for the documentation!
In this series, I’ll be detailing some of the things I learned about the forementioned game engine (and PSP ROM hacking in general) during the translation of the game. I’ll also be going into some of the binary modifications I made to the game in order to facilitate English text display.
PSP games were originally packaged in a proprietary, disk-based physical media form known as an UMD (“Universal” Media Disc). Over time, Sony phased out UMDs in favor of distributing games through their online store. However, the games themselves were still distributed as disk images.
At the time of writing, no commercial UMD readers and writers were available. However, disk images can be obtained by rooting a PSP with a disk drive.
Here is the annotated structure of Uchuujin’s disk image when extracted.
dump/ ├── PSP_GAME │ ├── ICON0.PNG (game icon) │ ├── INSDIR │ │ ├── 0000.DNS │ │ └── ICON0.PNG (game icon) │ ├── PARAM.SFO (metadata of the game) │ ├── PIC1.PNG (preloader background) │ ├── SYSDIR │ │ ├── BOOT.BIN (decrypted game binary) │ │ ├── EBOOT.BIN (encrypted game binary) │ │ ├── OPNSSMP.BIN │ │ └── UPDATE (system updates) │ │ ├── DATA.BIN │ │ ├── EBOOT.BIN │ │ └── PARAM.SFO │ └── USRDIR │ └── DATA │ ├── ed.pmf (video; credits) │ ├── ICON0.PNG (game icon) │ ├── koutyou.pmf (video; cutscene) │ ├── lt.bin (323kb; font) │ ├── op.pmf (video; opening) │ ├── PIC1.PNG (game background) │ ├── pr.bin (1mb; assorted bitmaps) │ ├── sc.cpk (2mb; dialog/"scripts") │ ├── se.acx (862kb; ???) │ ├── union.cpk (228mb; music & pictures) │ ├── vo.cpk (voices) │ └── yokoku.pmf (video; cutscene) └── UMD_DATA.BIN
The bulk of the game data exist as .CPK files. This is a proprietary archive
format designed by CRIWARE. Although one can obtain CRIWARE’s own tool for
extracting and modifying .CPKs
through dubious means, an open source
alternative exists, albeit
with reduced functionality.
Some cutscenes are stored as .PMF files. This is a proprietary video format developed for the PSP. VLC can “play” these out of the box, albeit without audio and major glitching.
One issue I encountered while translating Uchuujin was the fact that the English translations
were often longer took up more space than the Japanese sentences. (surprisingly, a paper
published by the DDL at the Université de Lyon suggests that Japanese has a
lower information density and higher syllabic rate
than English does, which would lead you to
believe the Japanese would be longer than the English. My guess is
that the dataset used by the paper is
very different from the Japanese found in-game.)
EDIT 12/28/2020: (this is, of course, due to the fact that Japanese uses kanji, which encode lots of semantic meaning into one character :) )
This was a problem, as the game seemed to have a hard size limit for script files. The game would refuse to load modified script files that were larger than the originals.
I had the following options:
- Shorten the English translations.
- Investigate why the game wasn’t accepting larger script files.
- Make the script files smaller.
I left No. 1 as a last resort.
No. 2 seemed like an underlying problem with either how CRIWARE’s middleware loads CPKs or how the tool I was using was re-packaging modified CPKs, which meant a lot of digging around for information and possibly digging through a lot of middleware code.
I decided to go with approach #3 and investigated ways to make the script files smaller.
The text was encoded in a variant of Shift-JIS, a Japanese character encoding. Shift-JIS makes use of double-byte characters in order to represent the wide breadth of kanji available in Japanese.
However, since we were going to be displaying English text in the translation, we didn’t need two bytes to represent a character! I could encode my English text using one byte for each character, leading to size parity with the original script files.
This meant that the binary had to be modified in order to correctly draw each “double-byte character” as two single characters.
To do this, I identified the routine responsible for printing the “double-byte” character using PPSSPP’s debugger. I then modified the routine in the ROM to detour into my own code. From there, the code calculates the positions we should print the two English characters at and calls the appropriate function to draw them. We then jump back into game logic after having cleaned up our registers.
Below is the routine in Assembly form and annotated for reading.
; the new instructions in the ROM ; (located at 0x0884e824 in PSP memory) addr_0884e824: ; we save the return address in register v0 ; v0 always contains zeros at this point during execution, ; so we're not losing critical information move v0, ra jal expand_and_print ; jump to our routine (overwrites register ra) nop ; the original instructions from the ROM ; (replaced by instructions above) addr_0884e824_original: move a3, s3 jalr t1 li t0, 0x200 ... ; our expand & print routine ; (located somewhere? in PSP memory) expand_and_print: move a3, s3 ; run a instruction we overwrote... addiu sp, sp, -0x40 ; allocate 64 bytes of stack ; save all registers necessary needed to run the original print routine ; the routine will trash these registers, but we can then ; reinitialize them with the values we save here in the stack, ; allowing us to run the print routine multiple times sw a0, 0x8(sp) sw a1, 0xC(sp) sw a2, 0x10(sp) sw a3, 0x14(sp) sw t0, 0x18(sp) sw t1, 0x1C(sp) sw t2, 0x20(sp) sw t3, 0x24(sp) sw t4, 0x28(sp) sw t5, 0x2C(sp) sw t6, 0x30(sp) sw v0, 0x38(sp) ; save the return address from the original routine sw ra, 0x3C(sp) ; save our return address ; calculate position of the first character we want to print & put it into s4 ;; REGISTER CONTENTS AT THIS POINT ;; ; a2 is the original X-position of the Japanese character (e.g. 13, 28, 33, 38) ; s4 is the value we will read into a2 right before running the original ; print function ; 0x0(s6) contains the "column" number that the original Japanese character ; would be in (e.g. 0,1,2,3) lb t7, 0x0(s6) ; load column number into t7 li s3, 0x5 ; load the number 5 (the width of a character) into s3 mult s3, t7 ; multiply t7 * s3 mflo s4 ; store in s4 add s4, a2, s4 ; add the original X-position to s4 ; print the first character lw v0, 0x8(sp) ; load word that represents the characters into v0 andi v0, v0, 0xFF ; grab LSB (our first character) jal call_original_print_routine ; call routine nop ; print the second character lw v0, 0x8(sp) ; load word that represents the characters into v0 srl v0, v0, 8 ; grab MSB (our second character) addi s4, s4, 0x5 ; shift over the position of the character by one character to the right jal call_original_print_routine ; call routine nop ; clean up the registers we used & jump back to the original code lw ra, 0x38(sp) ; load the original routine's return address lw t7, 0x3C(sp) ; load our return address li s2, 0x4 ; restore original value of s2 li s4, 0x1 ; restore original value of s4 li s3, 0x2 ; restore original value of s3 jr t7 ; return nop ; our subroutine that calls the original print routine ; (located somewhere? in PSP memory) ; ; registers of interest for the original print routine: ; a0: the character to print ; a2: the horizontal position of the character to print ; call_original_print_routine: ; save our return address in s2 ; the original print routine does not trash any of the s* registers move s2, ra ; set up the registers needed for the print routine to run lw a0, 0x8(sp) lw a1, 0xC(sp) move a2, s4 ; load our calculated position instead of the original lw a3, 0x14(sp) lw t0, 0x18(sp) lw t1, 0x1C(sp) lw t2, 0x20(sp) lw t3, 0x24(sp) lw t4, 0x28(sp) lw t5, 0x2C(sp) lw t6, 0x30(sp) move a0, v0 ; load the character from v0 ; run original routine jalr t1 li t0, 0x200 ; return move ra, s2 jr ra nop