background

Recently, I’ve been working on a translation of a Japanese PlayStation Portable game called Nichijou: Uchuujin.

Much of the technical work needed for the translation of Nichijou: Uchuujin was done by other people! I’d like to thank FShadow, jjjewel, JamRules, and CompCom of GBATemp.net for the RE work they’ve done on the game engine Uchuujin runs on. Thanks for the documentation!

In this series, I’ll be detailing some of the things I learned about the forementioned game engine (and PSP ROM hacking in general) during the translation of the game. I’ll also be going into some of the binary modifications I made to the game in order to facilitate English text display.

game files

PSP games were originally packaged in a proprietary, disk-based physical media form known as an UMD (“Universal” Media Disc).

UMD disk

Over time, Sony phased out UMDs in favor of distributing games through their online store. However, the games themselves were still distributed as disk images.

At the time of writing, no commercial UMD readers and writers were available. However, disk images can be obtained by rooting a PSP with a disk drive.

Here is the annotated structure of Uchuujin’s disk image when extracted.

dump/
├── PSP_GAME
│   ├── ICON0.PNG (game icon)
│   ├── INSDIR
│   │   ├── 0000.DNS 
│   │   └── ICON0.PNG (game icon)
│   ├── PARAM.SFO (metadata of the game)
│   ├── PIC1.PNG (preloader background)
│   ├── SYSDIR 
│   │   ├── BOOT.BIN (decrypted game binary)
│   │   ├── EBOOT.BIN (encrypted game binary)
│   │   ├── OPNSSMP.BIN
│   │   └── UPDATE (system updates)
│   │       ├── DATA.BIN
│   │       ├── EBOOT.BIN
│   │       └── PARAM.SFO
│   └── USRDIR
│       └── DATA
│           ├── ed.pmf (video; credits)
│           ├── ICON0.PNG (game icon)
│           ├── koutyou.pmf (video; cutscene)
│           ├── lt.bin (323kb; font)
│           ├── op.pmf (video; opening)
│           ├── PIC1.PNG (game background)
│           ├── pr.bin (1mb; assorted bitmaps) 
│           ├── sc.cpk (2mb; dialog/"scripts")
│           ├── se.acx (862kb; ???)
│           ├── union.cpk (228mb; music & pictures)
│           ├── vo.cpk (voices)
│           └── yokoku.pmf (video; cutscene)
└── UMD_DATA.BIN

The bulk of the game data exist as .CPK files. This is a proprietary archive format designed by CRIWARE. Although one can obtain CRIWARE’s own tool for extracting and modifying .CPKs through dubious means, an open source alternative exists, albeit with reduced functionality.

Some cutscenes are stored as .PMF files. This is a proprietary video format developed for the PSP. VLC can “play” these out of the box, albeit without audio and major glitching.

text encoding

One issue I encountered while translating Uchuujin was the fact that the English translations were often longer than the Japanese sentences.

(surprisingly, a paper published by the DDL at the Université de Lyon suggests that Japanese has a lower information density and higher syllabic rate than English does, which would lead you to believe the Japanese would be longer than the English. My guess is that the dataset used by the paper is very different from the Japanese found in-game.)

This was a problem, as the game seemed to have a hard size limit for script files: the game would refuse to load modified script files that were larger than the originals.

I had the following options:

  1. Shorten the English translations.
  2. Investigate why the game wasn’t accepting larger script files.
  3. Make the script files smaller.

I left No. 1 as a last resort.

No. 2 seemed like an underlying problem with either how CRIWARE’s middleware loads CPKs or how the tool I was using was re-packaging modified CPKs, which meant a lot of digging around for information and possibly digging through a lot of middleware code.

I decided to go with approach #3 and investigated ways to make the script files smaller.

The text was encoded in a variant of Shift-JIS, a Japanese character encoding. Shift-JIS makes use of double-byte characters in order to represent the wide breadth of kanji available in Japanese.

However, since we were going to be displaying English text in the translation, we didn’t need two bytes to represent a character! I could encode my English text using one byte for each character, leading to size parity with the original script files.

This meant that the binary had to be modified in order to correctly draw each “double-byte character” as two single characters.

To do this, I identified the routine responsible for printing the “double-byte” character using PPSSPP’s debugger. I then modified the routine in the ROM to detour into my own code. From there, the code calculates the positions we should print the two English characters at and calls the appropriate function to draw them. We then jump back into game logic after having cleaned up our registers.

Below is the routine in Assembly form and annotated for reading.

; the new instructions in the ROM
; (located at 0x0884e824 in PSP memory)
addr_0884e824:
    ; we save the return address in register v0
    ; v0 always contains zeros at this point during execution,
    ; so we're not losing critical information
    move v0, ra 
    jal expand_and_print ; jump to our routine (overwrites register ra)
    nop

; the original instructions from the ROM
; (replaced by instructions above)
addr_0884e824_original:
    move a3, s3 
    jalr t1
    li t0, 0x200

...

; our expand & print routine
; (located somewhere? in PSP memory)
expand_and_print:
    move a3, s3         ; run a instruction we overwrote... 
    addiu sp, sp, -0x40 ; allocate 64 bytes of stack

    ; save all registers necessary needed to run the original print routine
    ; the routine will trash these registers, but we can then
    ; reinitialize them with the values we save here in the stack,
    ; allowing us to run the print routine multiple times
    sw a0, 0x8(sp)
    sw a1, 0xC(sp)
    sw a2, 0x10(sp)
    sw a3, 0x14(sp)
    sw t0, 0x18(sp)
    sw t1, 0x1C(sp)
    sw t2, 0x20(sp)
    sw t3, 0x24(sp)
    sw t4, 0x28(sp)
    sw t5, 0x2C(sp)
    sw t6, 0x30(sp)

    sw v0, 0x38(sp) ; save the return address from the original routine
    sw ra, 0x3C(sp) ; save our return address

    ; calculate position of the first character we want to print & put it into s4 

    ;; REGISTER CONTENTS AT THIS POINT ;;
    ; a2 is the original X-position of the Japanese character (e.g. 13, 28, 33, 38)
    ; s4 is the value we will read into a2 right before running the original
    ;   print function
    ; 0x0(s6) contains the "column" number that the original Japanese character
    ;   would be in (e.g. 0,1,2,3)

    lb t7, 0x0(s6) ; load column number into t7
    li s3, 0x5     ; load the number 5 (the width of a character) into s3
    mult s3, t7    ; multiply t7 * s3
    mflo s4        ; store in s4
    add s4, a2, s4 ; add the original X-position to s4

    ; print the first character 
    lw v0, 0x8(sp)                  ; load word that represents the characters into v0
    andi v0, v0, 0xFF               ; grab LSB (our first character)
    jal call_original_print_routine ; call routine
    nop

    ; print the second character
    lw v0, 0x8(sp)                  ; load word that represents the characters into v0
    srl v0, v0, 8                   ; grab MSB (our second character)
    addi s4, s4, 0x5                ; shift over the position of the character by one character to the right
    jal call_original_print_routine ; call routine
    nop

    ; clean up the registers we used & jump back to the original code
    lw ra, 0x38(sp) ; load the original routine's return address
    lw t7, 0x3C(sp) ; load our return address
    li s2, 0x4 ; restore original value of s2
    li s4, 0x1 ; restore original value of s4
    li s3, 0x2 ; restore original value of s3
    jr t7 ; return
    nop

; our subroutine that calls the original print routine
; (located somewhere? in PSP memory)
;
; registers of interest for the original print routine:
; a0: the character to print
; a2: the horizontal position of the character to print
;
call_original_print_routine:
    ; save our return address in s2
    ; the original print routine does not trash any of the s* registers
    move s2, ra 
    
    ; set up the registers needed for the print routine to run
    lw a0, 0x8(sp)
    lw a1, 0xC(sp)
    move a2, s4         ; load our calculated position instead of the original
    lw a3, 0x14(sp)
    lw t0, 0x18(sp)
    lw t1, 0x1C(sp)
    lw t2, 0x20(sp)
    lw t3, 0x24(sp)
    lw t4, 0x28(sp)
    lw t5, 0x2C(sp)
    lw t6, 0x30(sp)

    move a0, v0         ; load the character from v0

    ; run original routine
    jalr t1
    li t0, 0x200

    ; return 
    move ra, s2 
    jr ra
    nop

In the next article, I’ll explain how images and textures were stored in the game and how they were extracted from game files!