FSArchiver - Internals: file format

About the file format

The file format is made of two sort of structures: headers and data-blocks. The headers are all a dictionary where the key is an integer. That way we can add new things in headers in next versions without having to break the file format.

Global archive layout

1. All fsarchiver archives are made of one or multiple volumes. Each volume starts with an FSA_MAGIC_VOLH header and terminates with a FSA_MAGIC_VOLF header. These headers are used to determine that the archive is the fsarchiver format, and which volume it is.
1. The beginning of the archive (which is always at the beginning of the first volume) contains an archive header (FSA_MAGIC_MAIN) where se store information such as the archive creation date. Then we write all the filesystem information header (FSA_MAGIC_FSIN). It stores things such as the filesystem type (ext3, reiserfs, …) and all its attributes (label, uuid, space used, …). Some of these attributes are specific to a particular filesystem. For instance, it saves the compatibility flags of ext2/ext3/ext4 filesystem so that fsarchiver can recreates a filesystem with the same properties.
1. Then the archive contains all the filesystems contents: there is an header that marks the beginning of a filesystem data (FSA_MAGIC_FSYB) and another header is used to mark its end (FSA_MAGIC_DATF). The filesystem contents is a list of object headers (FSA_MAGIC_OBJT) and data-blocks.
1. The archive splitting mechanism will never split the archive in the middle of an archive or datablock. An in-memory structure called s_writebuf has been created to store all the data which are written to the archive. All the bytes that are part of an header / block are first accumulated in a s_writebuf. When it’s ready, it’s passed to the write-to-archive function in charge of the file splitting. That way an s_writebuf is like an atomic item that cannot be split. The consequence it that it’s not possible to respect exactly the volume size specified by the user, it will always be a bit smaller.

About regular files management

Creating a normal tar.gz file is like compressing a tar file. It means if there is a corruption in the tar.gz file, we can’t ungzip it and then we can untar it. Just a single corruption in one byte at the beginning of the file make it unreadable.

For this reason fsarchiver is working the other way. It’s like a gz.tar: we archive files that have already been compressed before. It provides a better safety for your data: if one byte if corrupt, we just ignore the current file and we can extract the next one.

There is a problem with that: in case there are a lot of small files, like in the linux-sources-2.6, we compress each small files in a single data block of only few KB, which produces a bad compression ratio compared to compression of a big tar archive of all the small files.

For that reason, all the small files are processed in a different way in fsarchiver-0.4.4 and later: it creates a set of small files that can fit in a single data block (256KB by default) and then it compresses a block which is quite big compared to the size of a single file.

There is a threshold which is used to know whether a regular file will be considered as a small file (sharing a data block with other small files) or if it will be considered as a normal/big one having its own data blocks. This threshold depends of the size of a data block, which depends itself on the algorithm chosen to compress the data. In general, a data block is 256KB (it can be bigger) and the threshold will be 256KB/4 = 64KB.

How files are stored in the archive

There are many sort of objects in a filesystem:

1. special files (symbolic links, hard links, directories, …) all these files just have one header in the archive. It contains all their attributes (name, permissions, xattr, time stats, …)
1. normal/big regular files (bigger than the threshold) [REGFILE ] these files have their header first (FSA_MAGIC_OBJT) with the file attributes, and then one or several data blocks (depending on how big the file is), and then a file footer (FSA_MAGIC_FILF) with the md5sum of the contents. The md5sum can’t be written in the object header (before the data blocks) because it would require to read the file twice: first pass to compute the checksum, and a second pass to copy the contents the data blocks. (think about a very large file, say 5GB which is written to an fsa archive which is split into small volumes, say 100MB)
1. small regular files (smaller than the threshold) [REGFILEM] small files are written to the archive when we have a full set of small files, or at the end of the savefs/savedir operation. During the savefs, all the small files are stored in a structure called s_regmulti. When this structure is full, it’s copied to the archive and a new one is created. So we end up with the following data in the archive: we first write one object header (FSA_MAGIC_OBJT) for each small file in the s_regmulti. This headers contains extra keys (not found in object-header for normal files): DISKITEMKEY_MD5SUM, DISKITEMKEY_MULTIFILESCOUNT, DISKITEMKEY_MULTIFILESOFFSET. It’s the md5sum for the file (we can compute it before it’s written to disk because the files are quite small, and it saves an extra FSA_MAGIC_FILF footer), the number of small-files to expect in the current set (used at the extraction to know what read), and the offset of the data for the current file in the shrared data block. After the individual small-files headers, we write a single shared data-block (which is compressed and may be encrypted as any other data block). There is no header/footer after the shared data lock in the archive.

About datablocks

Each data block which is written to the archive has its own header (FSA_MAGIC_BLKH). This header stores block attributes: original block size, compressed size, compression algorithm used, encryption algorithm used, offset of the first byte in the file, … When both compression and encryption are used, the compression is done first. It’s more efficient to process that way because the encryption algorithm will have less things to do because the compressed block is smaller. When the compression makes a block bigger than the original one, fsarchiver automatically ignores the compressed version and keeps the uncompressed block.

About endianess

fsarchiver should be endianess safe. All the integers are converted to little-endian before they are written to the disk. So it’s neutral on intel platforms, and it requires a conversion on all the big-endian platforms. The only things that have to be converted are the integers which are part of the headers. These conversions don’t pollute the source code since they are all converted in the functions that manage the headers: write-header-to-disk and read-header-from-disk functions. All the higer level functions in the code just have to add a number to the header, they don’t have to worry about endianess.

About headers

All the information which is not the files contents itself (meta-data) are stored in a generic management system called headers. It’s not just a normal C structure which is written to disk. A specific object oriented mechanism has been written for that. The headers are quite high-level data structures that are reused a lot in the sources. They provide endianess management, checksumming, and extensibility. Since the data are stored as a dictinnary, where the keys are 16bit integers, it will be possible to extend the headers in the future by adding new keys to add new information in the next versions. These headers are just implemented as s_dico in the program, and the magic and checksum are added when we write it to the archive file.

Here is how an header is stored in an archive:

32bit magic string used to identify the type of header it is
32bit archive id (random number generated suring the savefs / savedir)
16bit filesystem id (which filesystem of the archive it belongs to)
dictionary with its own internal data (length, 32bit checksum, …)

The random 32bit archive id is used to make sure the program won’t be disturbed by nested archive in case the archive is corrupt. When an archive is corrupt, we can skip data and search for the next header using the known magic strings. In case we have nested archives, we could find the magic string that belongs to a nested archive. It won’t be used since we also compare the random 32bit archive id, and the id of the nested archive header will be different so the program will know that it has to ignore that header and continue to search.

About checksumming

Almost everything in the archive is checksummed to make sure the program will never silently restore corrupt data. The headers that contains all the meta-data (file names, permissions, attributes, …) are checksummed using a 32bit fletcher32 checksum. The files contents are written in the archive as blocks of few hundreds kilo-bytes. Each block is compressed (except if the compression makes the block bigger) and it may also be encrypted (if the user provided a password). Each block also has its own 32bit fletcher32 checksum. All the files also have an individual md5 checksum that makes sure the whole file is exactly the same as the original one (the blocks are checksummed but it allows to make sure we did not drop one of the block of a file for instance). Because of the md5 checksum, we can be sure that the program is aware of the corruption if it happens.