User Tools

Site Tools


software:tar-how-to-speed-up-tar-preventing-seek-especially-with-many-small-files

Tar: How to speed up tar preventing seek especially with many small files

I've faced this problem archiving a dir with around 50.000 images that are 100k each or less. It used to take more than 5 minutes to archive all those files in a .tar because of the massive seek of the HDD heads. This happens because the tar command has basically zero buffer so the heads are moved back and forth for every file.

But I've found a solution. Instead of

tar cvf arc.tar dir_to_archive

I've used mbuffer to solve the problem of the missing buffering feature that tar should have.

tar cvf - dir_to_archive | mbuffer -t -m500M -P80 -f -o arc.tar

This way, instead of 5+ minutes it only takes 15 seconds!!! :-o

To install mbuffer

apt install mbuffer

The parameters I've used above mean

  • -t Use RAM for buffering.
  • -m How much RAM.
  • -p Percentage of buffer to fill before starting to write on disk. 80=80%.
  • -f Force overwriting if the output file already exists.
  • -o Tell mbuffer to use a file for output instead of stdin (very suggested because stdin I've tested it's slower).

In case you want to compress with zstd, this is the way

tar --zstd -cf - ./dir_to_archive | mbuffer -t -m500M -P80 -f -o arc.tar.zst

EVEN BETTER

Sorting the files by inode, prevents seek while reading, for a further huge increment in speed while feeding the buffer.

# Sort
find dir_to_archive -type f -print0 | xargs -0 stat --format='%i %n' | sort -n | cut -d' ' -f2- > filelist_sorted.txt
 
# Archiving 
tar -cf - -T filelist_sorted.txt | mbuffer -t -m500M -P80 -f -o arc.tar
 
# Or archiving with zstd: a bit of compression at basically the same speed
tar --zstd -cf - -T filelist_sorted.txt | mbuffer -t -m500M -P80 -f -o arc.tar.zst

Example

find ./backup -type f -print0 | xargs -0 stat --format='%i %n' | sort -n | cut -d' ' -f2- > filelist_sorted.txt
 
tar --zstd -cf - -T filelist_sorted.txt | mbuffer -t -m500M -P80 -f -o backup.tar.zst
software/tar-how-to-speed-up-tar-preventing-seek-especially-with-many-small-files.txt · Last modified: 2024/06/21 21:09 by rik