The Dink Network

All about .d script files and why you should never use them

December 14th 2014, 05:08 AM
peasantmb.gif
yeoldetoast
Peasant They/Them Australia
LOOK UPON MY DEFORMED FACE! 
This is the second part of my little series about Dink data files that were never included in WC's "ultimate" file format guide. For this segment we'll be covering .d files or "compiled scripts" as they're commonly (and wrongly) called.

These files use a compression algorithm called Byte Pair Encoding (BPE) by Phil Gage that was released in '94. The Wikipedia page explains it quite well and even provides an example of use. Frequently occurring pairs of characters are grouped together into single substituted bytes to reduce file size.

For this to make sense, the most useful way to think of scripts is as a series of numbers in a file. Every character in a script is simply a numerical value that maps to a letter when opened in a text editor. These numbers may have a value from 0 to 127 with "A" corresponding to 65, "B" to 66 and so on. In the case of BPE, numbers above 127 are used to substitute the pairs, so if 65 and 66 occur together often they may be replaced with 128. The two replaced numbers are then moved to the beginning of the file so that they may be decoded later on. This process looks for all possible pairs to substitute and then will perform more passes until there are the maximum amount of pairs (127) or if there aren't any more common pairs in the text.

After this, you're left with a file containing the compressed text, the table of pairs, and a counter indicating how many pairs have been replaced. Here is a crappy diagram illustrating what .d files actually contain:
<1 byte indicator + 128>
<pairs * indicator - 128 of 2 bytes each>
<compressed text>

In order to decompress, you must get the first byte's value, subtract 128, and then go through and get as many pairs as indicated by the value of that byte. The first pair will map onto the number 128, with the next onto 129 etc. From there you must go through the compressed text and find values greater than 127 and replace them with the corresponding pair. This process may need to be repeated several times before there are no values left to change.

This is all well and good, but there's a rather large problem with this implementation that you may already be aware of. If you check in an ASCII code list (look in the DEC column) you'll notice there are tons of values above 127 that map to special characters and foreign letters. If you decide to use any such character in a script the compressor will fail. You can open up a .d script in a hex editor or notepad and see characters like € and ƒ in use as substitutes. This is why you should never make .d scripts.
December 14th 2014, 05:22 AM
custom_coco.gif
CocoMonkey
Bard He/Him United States
Please Cindy, say the whole name each time. 
A handful of DMODs are completely broken due to this problem.
December 14th 2014, 06:09 AM
peasantmb.gif
yeoldetoast
Peasant They/Them Australia
LOOK UPON MY DEFORMED FACE! 
I was unaware of that. Which ones are affected?
December 14th 2014, 02:31 PM
custom_coco.gif
CocoMonkey
Bard He/Him United States
Please Cindy, say the whole name each time. 
None of the scripts in "Gung's Attack" work. "Legend of Pärnu" is missing several key scripts, and you can't get much of anywhere. "Dinkaventure" is missing a few scripts, including its boss script, so there's no ending. In each case, the .d files contain just a character or two. One of the characters always seems to be "ÿ."
December 14th 2014, 11:24 PM
peasantmb.gif
yeoldetoast
Peasant They/Them Australia
LOOK UPON MY DEFORMED FACE! 
I had a look at Gungs/Gnugs and it's unrelated to what I am on about since the scripts are all normal .c files. I opened them up and it seems that they were saved in UTF-16 for some reason. Re-saving them as ANSI (1252) or UTF-8 in Notepad2 made them work perfectly. Magicman investigated this issue a little bit and found that UTF-16 will work on standard 1.08 on Windows but not in 1.09 or Freedink.

If you resaved the scripts it would work perfectly and you could fix up your review of it and add screenshots or whatever.

The lesson here for DMOD authors is to never use anything other than UTF-8 when writing scripts.
December 14th 2014, 11:27 PM
custom_coco.gif
CocoMonkey
Bard He/Him United States
Please Cindy, say the whole name each time. 
Oh, is that the case? That's interesting. I don't think I really care to go back to those DMODs, though.
December 14th 2014, 11:44 PM
peasantmb.gif
yeoldetoast
Peasant They/Them Australia
LOOK UPON MY DEFORMED FACE! 
The other two you mention do use .d files though with some of them blank so it does appear that those scripts may have failed to compress and ended up empty.