Tar: How to speed up tar preventing seek especially with many small files
I've faced this problem archiving a dir with around 50.000 images that are 100k each or less. It used to take more than 5 minutes to archive all those files in a .tar because of the massive seek of the HDD heads. This happens because the tar command has basically zero buffer so the heads are moved back and forth for every file.
But I've found a solution. Instead of
tar cvf arc.tar dir_to_archive
I've used mbuffer to solve the problem of the missing buffering feature that tar should have.
tar cvf - dir_to_archive | mbuffer -t -m500M -P80 -f -o arc.tar
This way, instead of 5+ minutes it only takes 15 seconds!!!
To install mbuffer
apt install mbuffer
The parameters I've used above mean
- -t Use RAM for buffering.
- -m How much RAM.
- -p Percentage of buffer to fill before starting to write on disk. 80=80%.
- -f Force overwriting if the output file already exists.
- -o Tell mbuffer to use a file for output instead of stdin (very suggested because stdin I've tested it's slower).
In case you want to compress with zstd, this is the way
tar --zstd -cf - ./dir_to_archive | mbuffer -t -m500M -P80 -f -o arc.tar.zst
EVEN BETTER
Sorting the files by inode, prevents seek while reading, for a further huge increment in speed while feeding the buffer.
# Sort find dir_to_archive -type f -print0 | xargs -0 stat --format='%i %n' | sort -n | cut -d' ' -f2- > filelist_sorted.txt # Archiving tar -cf - -T filelist_sorted.txt | mbuffer -t -m500M -P80 -f -o arc.tar # Or archiving with zstd: a bit of compression at basically the same speed tar --zstd -cf - -T filelist_sorted.txt | mbuffer -t -m500M -P80 -f -o arc.tar.zst
Example
find ./backup -type f -print0 | xargs -0 stat --format='%i %n' | sort -n | cut -d' ' -f2- > filelist_sorted.txt tar --zstd -cf - -T filelist_sorted.txt | mbuffer -t -m500M -P80 -f -o backup.tar.zst