Author Topic: Slow compression issue (and a crash)  (Read 3727 times)

saga

  • Posts: 2662
Slow compression issue (and a crash)
« on: 14 Sep '20 - 18:57 »
Since the MO3 format's LZ compression theoretically allows the window to grow infinitely large, I was wondering what the largest window size the official MO3 encoder actually uses. So I created an IT file with a very long and repetitive song message (~22K space characters), and it seems like that upset the encoder. :) I have attached two files, crash.it causes a crash apparently because there are no samples and the pattern is completely empty. The second file, hang.it, doesn't crash but probably hits a very unoptimized code path in the encoder: It took about 15 to 20 minutes to process the file.

Anyway, on the original topic I was researching: For the sample compression algorithm in MO3, a clear upper decompression size can be established because every sampling point is represented by at least two bits, so the decompressed data can at most be 4 times larger than the uncompressed data. For the music data it's more complicated and I think the decompressed size is essentially unbounded.
Given the behaviour I observed with hang.it, I suppose the official MO3 encoder has no upper bound on string lengths and just keeps growing its window if the input data allows for it? The reason I am wondering is to reduce the chance of denial-of-service attacks when loading MO3 files - right now for example the file header could claim that the decompressed size of the music blob is a gigabyte while the file itself is only a kilobyte in size (and as hang.it demonstrates, this could even be a syntactically valid MO3 file, it doesn't have to be malformed garbage). I have eliminated that risk with sample data as described above, but for the music data I think this isn't really possible by just looking at the algorithm. I could of course assume that the variable that receives the LZ string length and offset is a 32-bit integer but that doesn't really establish a helpful upper bound.
« Last Edit: 14 Sep '20 - 21:49 by saga »

Ian @ un4seen

  • Administrator
  • Posts: 25454
Re: Slow compression issue (and a crash)
« Reply #1 on: 17 Sep '20 - 17:13 »
Since the MO3 format's LZ compression theoretically allows the window to grow infinitely large, I was wondering what the largest window size the official MO3 encoder actually uses. So I created an IT file with a very long and repetitive song message (~22K space characters), and it seems like that upset the encoder. :) I have attached two files, crash.it causes a crash apparently because there are no samples and the pattern is completely empty. The second file, hang.it, doesn't crash but probably hits a very unoptimized code path in the encoder: It took about 15 to 20 minutes to process the file.

Oops! The MO3 2.4.2 encoder is indeed crashing when there's no sample data. Here's an update that should fix that, and the hanging problem too:

   www.un4seen.com/stuff/mo3.exe

Let me know if you still see any problems with it.

Given the behaviour I observed with hang.it, I suppose the official MO3 encoder has no upper bound on string lengths and just keeps growing its window if the input data allows for it? The reason I am wondering is to reduce the chance of denial-of-service attacks when loading MO3 files - right now for example the file header could claim that the decompressed size of the music blob is a gigabyte while the file itself is only a kilobyte in size (and as hang.it demonstrates, this could even be a syntactically valid MO3 file, it doesn't have to be malformed garbage).

Yes, matches could in theory be any length. It's probably safe to assume that the data will never legitimately be anywhere near 1GB though :) ... It might be possible to calculate a ballpark limit from the size of the MO3 structures and the maximum number of them in a file.

By "denial-of-service attacks", I guess you mean using up memory? The memory should only be in use briefly anyway, as an app would usually release it straight after parsing the decoded data (it isn't needed any more after that).

saga

  • Posts: 2662
Re: Slow compression issue (and a crash)
« Reply #2 on: 17 Sep '20 - 17:48 »
I can confirm that the new version no longer crashes / hangs. :)

By "denial-of-service attacks", I guess you mean using up memory? The memory should only be in use briefly anyway, as an app would usually release it straight after parsing the decoded data (it isn't needed any more after that).
It's true that it's only used for a short while, and in a 32-bit process it might even fail to be allocated. Depending on the scenario the code is used in, even a brief moment of allocating a lot of memory can still be troublesome, though (think of a server scanning module files uploaded by users). I just want to limit that risk as much as possible with libopenmpt - for most formats, only slightly more memory is required to decode the file than the actual file size, but compressed formats are of course more dangerous in that regard.
For MO3 I only see two possible (not-really-)solutions: I suppose for most real-world files one could assume that the resulting file will not be, say, 1000 times bigger than the compressed file. So a 1KB file would not cause more than 1MB of memory to be allocated. The other alternative would only apply to reject corrupted files: Try to decode the music data without actually decompressing it, and if the resulting size is not what the header claimed, reject the file. The disadvantage here is of course that decoding takes longer for all "real" MO3 files, so I don't like that approach a lot. I guess this is a dead-end for now. :)