NBT compression
As NBT files can be stored with a variety of compression types, it's sometimes useful to be able to transparently detect which is in use. This doesn't help for writes unless you statefully track where the NBT structure came from and how it was compressed, but it can simplify map export type tools.
There are three types of storage currently possible:
- GZip (defined in RFC 1952)
- Zlib (defined in RFC 1950)
- Uncompressed
The two compressed types amount to DEFLATE plus a small header. Most languages have libraries or bindings for these already. It's possible to detect stream type by considering the first two bytes, however.
GZip
GZip starts with a two-byte identification header: 0x31 0x8b. If it doesn't have that, it isn't GZip.
Zlib
- Zlib starts with a two-byte header that must be decomposed into bits:
| Zlib header bytes | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Byte offset | 0 | 1 | ||||||||||||||
| Byte name | CMF | FLG | ||||||||||||||
| Bit offset | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| Bit field name | CINFO | CM | FLEVEL | FDICT | FCHECK | |||||||||||
Just make sure that these fields obey the following invariants:
CM == 8(compression method is deflate)CINFO <=7(log2 of window size)- The entire header as a big-endian 16-bit word is divisible by 31. (i.e.
(CMF*256+FLG) % 31 == 0). The FCHECK field exists solely to allow the header to be set this way.
Technically, other methods beyond DEFLATE could be implemented, but they haven't been added to this format standard. As a result, it's reasonably safe to consider only the DEFLATE case of Zlib.
Uncompressed
NBT files are required by the standard to start with a TAG_Compound, which is 0x0a. That's only one byte of identification, but it helps as a basic sanity test. As all known root-tag-names are less than 256 codepoints long, NBT streams should all start with 0x0a 0x00, but longer names are legal.