One thing you may have noticed when using the z switch with tar is that the compression can take some time! If you look at your CPU usage, though, you’ll notice that only one core is being utilised to compress the files. In a modern system 4 or 8 cores are common, meaning that there is plenty of potential to speed up the process if you could utilise more cores. As the gzip package only supports one core, we need to look elsewhere.
Fortunately, there is a gzip package which uses multiple cores available – it’s called pigz. To install it type:
sudo apt-get install pigz
Once that is installed we can tell the tar command to use it like so:
tar -c --use-compress-program=pigz -f [tar file] [directory or files]
tar -c --use-compress-program=pigz -f backupOfMovies.tar /opt/movies
Note the double hyphen before use. Check your CPU usage while the command is running – you should be able to see all available cores being utilised!
If you have compressible data you may save space on you tapes by using compression; this comes at a cost of CPU cycles to do the compressing, which can often be a worthwhile tradeoff for a long-term backup. To do this is quite simple – add in the -z switch to your tar command.
tar -cvzf /dev/[tape-device] [folder or files to back up]
tar -cvzf /dev/st0 /opt/movies
For some file types – e.g. movies, mp3s, compressed picture files and the like you probably won’t see a great deal of space saved – though if it enough to save you from using two tapes instead of one, it may be worth it even so. Text and other file types may compress more easily and you may see more of a savings – it will vary greatly depending on your dataset. Try it and see!
Sometimes you may see people using the -j switch instead – this uses the bzip2 algorithm rather than the gzip algorithm (the -z switch). You will probably find that gzip is slightly better supported and bzip2 sometimes provides slightly better compression but takes longer. If you are chasing better compression it may be worth replacing the z switch with j to see if it helps.
By default ZFS uses the lzjb compression algorithm; you can select others when setting compression on a ZFS folder. To try another one do the following:
sudo zfs set compression=gzip [zfs dataset]
This changes the compression algorithm to gzip. By default this sets it to gzip-6 compression; we can actually specify what level we want with:
sudo zfs set compression=gzip-[1-9] [zfs dataset]
sudo zfs set compression=gzip-8 kepler/data
Note that you don’t need the leading / for the pool, and that you can set this at a pool level and not just on sub-datasets. 1 is the lowest level of compression (less CPU-intensive, less compressed) where gzip-9 is the opposite – often quite CPU intensive and offers the most compression. This isn’t necessarily a linear scale, mind, and the type of data you are compressing will have a huge impact on what sort of returns you’ll see. Try various levels out on your data, checking the CPU usage as you go and the compression efficiency afterwards – you may find that 9 is too CPU-intensive, or that you don’t get a great deal of benefit after a certain point. Note that when you change the compression level it only affects new data written to the ZFS dataset; an easy way of testing this is to make several sets, set a different level of compression on each and copy some typical data to them one by one while observing. We discussed checking your compression efficiency in a previous post.
Compression doesn’t just benefit us in terms of space saved, however – it can also greatly improve disk performance at a cost of CPU usage. Try some benchmarks on compression-enabled datsets and see if you notice any improvement – it can be anywhere from slight to significant, depending on your setup.
If you have enabled compression on a ZFS folder you can check to see just how much disk space you’re saving. Use the following command:
sudo zfs get all [poolname]/[folder] | grep compressratio
sudo zfs get all backup01/data | grep compressratio
returns the following:
backup01/data compressratio 1.50x –
Here we can see we have a compression ratio of 1.5x. Compression is an excellent way of reducing disk space used and improving performance, so long as you have a modern CPU with enough spare power to handle it. Some data will not be easily compressible and you may see less benefit – other data will be much more compressible and you may reach quite high compression ratios.
If we run the same command on a folder full of already-compressed RAW image files:
sudo zfs get all backup01/photos | grep compressratio
backup01/photos compressratio 1.05x
…we can see that they do not compress as easily as the documents in the data folder, giving us only a 1.05x compression ratio. You can see the compression ratio of all of your ZFS pools and folders with the following:
sudo zfs get all | grep compressratio
Check your own datasets and see how much you are saving!