Today all the new processors are dual-core or quad-core. But all the standard compression tools are single threaded. It means that when you use “tar cfz” for “tar cfj”, to compress a tarball using gzip or bzip2, you just use one cpu, so just 50% of 25% of the power you have. FSArchiver is multi-threaded if you use option “-j” to create several compression jobs, so that it can use all the power of your processors to compress faster.
FSArchiver is using three kinds of threads even if you don’t use the option “-j”. There is a main thread, a archive-io thread, and one or more compression/decompression threads. When you use option “-j” you just create more than one compression/decompression threads.
To implement multi-threading, fsarchiver is using the pthread library (POSIX Pthread libraries). This is a standard threads implementation available on many operating systems.
The most important thing in the multi-threading implementation is the queue which is implemented in queue.c. All the data that are read of written in a file first go in the queue. The queue is a list of items that have to be written: it can be either an header or a data block. The contents are split into blocks of few hundreds kilo-bytes. For instance, if you want to archive a 10MB file, it can be spitted into 50 datablocks. The queue must be big enough to contain multiple data blocks at a time. The compression/decompression threads are always searching for the first block to be processed in the queue, they process it, and update the block in the queue. For instance if the queue is able to store 10 data blocks at a given time, it means that a quad-core processor will have enough blocks to feed each of its cores, and then to use all the power of this processor. The size of the queue is defined by FSA_MAX_QUEUESIZE. It says how many data blocks can be stored in the queue at a given time. When this limit is reached, the thread which fills the queue will have to wait.
Here are how the threads work:
The queue is what links all the threads together. It’s a critical section of the code so it’s very important that it contains no bug. The consistency of this queue is guaranteed with a mutex (to make sure that two threads can’t change the same thing at the same time). It’s very important that each function that locks this mutex unlocks it before it exits, else there will be a dead-lock. It’s also useful to keep the queue management quite simple in order to avoid bugs.
To synchronize threads, there are two attributes:
The threads have to play with these two attributes to manage events such as errors of the end of data.
In a normal situation:
In case of errors:
when the user press Ctrl+C, the signal handler will set g_abort to true and the main thread has to manage that case.
if the queue writer has a problem and wants to stop, it just has to set end_of_archive to true and the queue reader will stop reading it queue_set_end_of_queue(&g_queue, true);
if the queue reader has a problem and wants to stop, it has to set g_stopfillqueue to true, to say to the queue writer that it has to stop. Then the queue writer checks g_stopfillqueue and stop filling the queue. Then the queue reader has to remove the remaining items from the queue, using queue_destroy_first_item() set_stopfillqueue(); // don’t need any more data while (queue_get_end_of_queue(&g_queue)==false) queue_destroy_first_item(&g_queue);