PSXMinimise

Standard

So it was late one night and I couldn’t sleep, which is nothing abnormal for me and I got thinking. Is there any way I can improve the compression ratio of my collection of PS1 games, or was 7zip with LZMA the de-facto and best option? The truth it turns out is a little more complicated.

Updated 2021-07-15: Article finished, downloads and final conclusions added!

Finding The ‘Best’ Compression

I’ve tried numerous compression formats in the search for the best one for PlayStation games. I needed a test game to find the best archive solution for the game data. I would tackle the audio content of the discs later. I grabbed my backup copy of Oddworld: Abe’s Oddysee (one of my favourite games) and tried all the regular formats that PeaZip could handle in their “Best” modes.

Compression FormatOptionsCompression TimeResulting Filesize
None685,461 KB
7ZipUltra2 minutes403,753 KB
ARC (FreeArc)94 minutes289,820 KB
Brotli935 minutes417,106 KB
BZip2Ultra16 minutes512,282 KB
GZipUltra7 minutes514,581 KB
XZUltra3 minutes403,755 KB
ZIPUltra6 minutes514,582 KB
Zstd193 minutes447,022 KB
ZPAQUltra34 minutes409,185 KB

Up until I saw the FreeArc compression sizes, it would seem that 7zip ultra compression was the best for PS1 data. At first, I thought there’d been some kind of error and the archive couldn’t have been as small as it became. However, after some checking I finally saw that the result was indeed accurate and it did give an excellent compression ratio of PlayStation 1 game data.

Sub-channel Data

A few years ago, I used to visit the EmuParadise website, and I recall that they used to compress their bin files with a tool called ECM. Armed with this vague piece of information I started searching the web for information about it. I managed to find the ECM Tools package (on archive.org) written in 2002 by Neill Corlett that is used to encode and decode files. These tools either add or remove sub-channel error correction data, or “Error Code Modelling” on a CD image. Removing this information will save some data from an image file and it can be added back in such a way that the file is an identical binary.

FilenameOriginal SizeSize (after ECM Encoding)
Destruction Derby 2 (Europe) (Track 01).bin71,603 KB64,141 KB
Dino Crisis (Europe) (Track 1).bin378,702 KB340,398 KB
Driver (Europe).bin729,961 KB651,308 KB
Grand Theft Auto (Europe) (Track 01).bin91,673 KB81,870 KB
MediEvil (Europe) (Track 1).bin516,356 KB478,157 KB
Oddworld – Abe’s Oddysee (Europe).bin685,461 KB612,203 KB
Tomb Raider II (Europe) (Track 01).bin224,530 KB218,150 KB

So it seems it is rather worth processing the ECM data, with all the games I’ve tested so far you get some kind of saving and with others you get a substantial saving.

Audio Compression

To compress the audio portions of the PS1 games, I decided to use FLAC, the free lossless audio encoder. I used this because the input files you pass to FLAC when decoded will be exactly the same as the original files. In a typical PS1 game, Track 01.bin is usually the game data, and if there are any other tracks (Track 02.bin, etc) they are CDDA files, which you can compress using FLAC if you tell it how these raw files can be processed.

flac -8 -V --force-raw-format --endian=little --channels=2 --bps=16 --sample-rate=44100 --sign=signed *.bin

An explanation of these options is that -8 sets the maximum level of FLAC compression and -V makes FLAC verify the files it has written after they have been created. The –force-raw-format argument instructs the command line version of FLAC to treat the input files as raw data. The –endian=little setting instructs FLAC of the byte order of the files. –channels=2 indicates that the audio stream is stereo. –bps=16 informs that the files are 16bit, –sample-rate=44100 is the standard audio sampling rate for a compact disc and finally –sign=signed tells FLAC that it’s a signed audio file. I am indebted to Cacovsky for the vital information on how raw CD audio can be converted to FLAC.

Putting It All Together

I had a few design goals for anything I created. It needed to rely on mostly open tools so that the techniques used are operating system agnostic and could fairly easily be ported to any other system and to use nothing that has encumbering license terms.

Compression speed isn’t much of a factor as these files will really only be compressed once, however decompression speed shouldn’t be prohibitively slow. The methods used needed to be binary compatible so that the games were exactly the same after decompressing as they would be beforehand and I also wanted a system to verify that the files were indeed exactly the same as the originals.

From the results of compression earlier, FreeArc was definitely the obvious choice, given that it had a great improvement in compression ratios of the other archive formats and it didn’t take significantly longer to compress the data.

The overall method I came up with is to remove the ECM data from the first track and then process the audio data from the other tracks so that it’s filesize is reduced using FLAC and then the entire game can be packaged up in a .arc file.

Before doing this however, I needed a way to make sure that my compressed files would be the same as the originals when running the decompression process. On Linux there is a utility called sha256sum which will calculate a hash which can be compared against a file later to ensure that no changes have been made. On Windows there is a built in hash checker, however it doesn’t seem to be able to compare from a list of checksums in a single file and verify the data is intact. I am again indebted to Dave Benham who maintains HASHSUM.BAT at the DOSTips Forum. He created a script that is compatible with sha256sum but for Windows that works (mostly) in batch script.

The Scripts

So after a few sleepless nights, I’ve put together some utilities for automatically compressing and decompressing the games.

My compression script is a bit more ugly than the unpacker, since I imagine people will unpack the games more often than they will pack them.

Further Testing

GameOriginal Filesize7zip Filesize7z PercentagePSXMinimisePSXMinimise Percentrage
Destruction Derby 2620 MB533 MB85 %393 MB63 %
Dino Crisis405 MB202 MB49 %170 MB41. %
Driver712 MB357 MB50 %298 MB41 %
Grand Theft Auto714 MB612 MB85 %452 MB63 %
MediEvil535 MB322 MB60 %283 MB52 %
Oddworld: Abe’s Oddysee669 MB387 MB57 %241 MB36 %
Tomb Raider II708 MB460 MB64 %317 MB44 %
WipEout 2097681 MB566 MB83 %413 MB60 %
Average630.5 MB429 MB67.22 %320 MB50.60 %
Average Saving200 MB32.77 %309 MB49.40 %

Downloads

Finally after a little while of having “writers block”, I have finally released a download of my scripts for working with PS1 files. These are batch scripts written for Windows with their accompanying required executables, however the techniques employed here can be used on Linux based operating systems too.

You can compile ECM tools with GCC. One problem you might have is that FreeArc only seems to like to run on 32bit builds of Linux and there’s a library it needs that doesn’t seem to exist for newer operating systems, however you can link to a newer version of it and it seems to run okay. If anyone has any good ideas for working around these problems, I could then package up a build that works on Linux with a shell script.

Downloads

PSXMinimise-0.52.zip 1.6M

This package includes everything you need to compress and decompress PS1 games on Windows, including ECM Tools, FLAC, FreeArc and HASHSUM.BAT

SCED-00816 – Demo One (Version 5) (Europe).arc 227M

This is the original demo CD that I got with my PlayStation back in 1997, and I include it as an example of the power of PSXMinimise. I’ve included a demo as there aren’t any commercial games that I could provide which are free of copyright, and I haven’t found a PS1 homebrew that is big enough to demonstrate PSXMinimise.

Usage Instructions

Decompression:
Using the files is relatively easy if you just want to uncompress one file, simply place the downloaded .arc file into a separate folder, navigate to it, and extract the contents of the PSXMinimise tool in the folder, then run the psxminimise-decompress.bat file.

Compression:
If you want to compress a file, place all the individual .bin files into a folder named after the game (which will become the name of the archive) with the track numbers after the game name (if it contains multiple tracks). Then run the psxminimise-compress.bat file.

Advanced Usage:
If you want to work with multiple files, or use the decompress-all or compress-all tool, you need to add the folder to your path variable. Included is a script to add the current directory to your User path. To enable verbose output, simply place a number 1 after the command which will output all the information from the scripts as they run. The scripts will not change any colours if ran in this mode.

Using psxminimise-compress-all.bat will go into each subfolder and run psxminimise-compress. Running psxminimise-decompress-all.bat will extract all archives to individual folders.

Conclusions

Using the SCED-00816 “Demo One” PlayStation game (from above) which uncompressed is 498,163 K. Compressing with 7z results in a size of 304,084 K and with my PSXMinimise method it’s further reduced to 232,940 K

You could get smaller game files if you removed some of the games content (a process known as “Ripping”) using something like PocketISO to remove audio tracks and FMV sequences from games, however I wanted the games to be intact to their original release, and I think this has done quite a good job.

In total running PSXMinimise on a dataset of 1,492 games I was able to improve the compression on the entire folder from 393 GB in 7z format, to being 319 GB with PSXMinimise, an over 70 GB saving from files already compressed with the “de-facto best” archive format, with an overall compression ratio of around 50%. Very nice.