home

The Hunt for the Lost Compiler: Descend

Last time, we have confirmed the app was compiled on Windows, this gives us advantage and disadvantage at the same time. It may be as easy as downloading some `MinGW` windows builds and checking them one by one, or as hard as searching for lost in time zip.

It is the second one.

Package Manager Ghost

It turns out my friend, the original creator of 3DWalker, didn’t just use a vanilla MinGW installer. He used something called Win-builds. It was a project that aimed to be a package manager for Windows, you know, easy installation of GCC, libraries, the works. Awesome tool for developers.

It’s dead. Like, completely frickin’ dead.

installer window that looks like it belongs on Windows XP

Gives me nostalgia for software I have never used, although Windows 7 was fire!

This distribution had its own specific build of GCC. If I want a byte-accurate replica (and I do, because I have problems), I need this specific compiler. I need the exact binaries provided by this specific defunct project.

So, I went to the archives. I found the repository. I tried to clone it.
And as one might expect, the git repository in the archive is broken.

Parsing HTML

This is the part where a sane person quits. The repo data is corrupted, meaning I can’t just git clone the build scripts I need to reconstruct the environment.
However, the archive does contain a mirror of the cgit web interface. You know, those old-school git web viewers that render commit logs and diffs as HTML pages?

The data is there. It’s just… in HTML.

So, I decided to scrape the HTML to reconstruct the git history. I wrote a script to parse the cgit pages, extract the commit metadata, parse the diffs, and apply them to recreate the files.

I looked at the code I wrote to achieve this. It consists entirely of string splitting, regex hacks, and desperation. At one point, I just stared at the monitor and typed to my friends the only accurate description of what I was looking at:

“gruz.”

(Translation for the non-Polish friends: “what rubble/garbage.”)

It’s a classic case of a piece of software that barely functions, but it did the one thing I needed: it started spitting out files.

downloaded commits with filenames as hash and in html from cgit

Downloaded commits with filenames as hash and in html from cgit

json object from html commit, hash, author, etc.

Basically I’ve parsed every html file into handy json object with diff and everything necessary

To be clear about why this is less than ideal: I am not hitting a nice REST API or anything close. I am downloading raw HTML pages intended for human eyes and locating the divs that contain the diffs. If the cgit theme adds an extra whitespace or changes a CSS class, my entire recovery process detonates. Although, I won’t expect it to happen anytime soon, this repo is basically dead, and I am amazed it is still online, and not only in the web archive.

Why Am I Doing This?

Honestly spent about 30 minutes just staring at the terminal output wondering if this was all worth it. It probably isn’t.

But we’re here now, so:

  • We have identified the target: Win-builds. We have a way to extract the source code for the packages (SDL2, GLEW, etc.) from the broken archives using my awesome cursed HTML scraper. We are dangerously close to having the necessary build scripts to compile everything from the ground up.
  • I don’t have the compiler running yet. I don’t have the game compiling yet on this lost compiler. But I have basically everything I need.

Next time, we actually build the packages. We take this pile of scraped scripts and try to make
an OS-level environment out of it.

I won’t be doing it anytime soon. Probably gonna take a break and play some actual games instead of reverse-engineering dead ones.

comments powered by Disqus