The original post: /r/datahoarder by /u/ds3534534 on 2024-12-30 08:31:18.
Thought I'd share an interesting (probably well-known) TIL observation.
I'm backing up around 900,000 JPEGs and XMP - even locally is fine, it's just a 'get out of jail free' copy before I start messing about on the original.
These files are stored on a 12yo HP Microserver Gen8, running a Celeron CPU, with a 4x10TB Hardware RAID5 5200rpm array, running VMWare ESXi 5.5, and WIn10 on top of that. Horribly slow, of course.
I tried a few different options, but trying to copy those files was going to take at least a day, maybe 20-30.
The optimum method I've currently landed on, is this:
- External (old 5200rpm SATA) drive in spare USB3 caddy
- USB3 caddy mounted as new USB device in VMWare
- Use 7Zip in Windows to archive an entire folder of ~100,000 JPGs, with 0 compression, from the source (Win VDisk on Hardware RAID5) to dest (NTFS-formatted USB Drive)
I had tested the USB drive at 150MB/s write speed using large movie files, which is acceptable enough. It was also twice as fast as an internal drive-to-drive copy within the RAID5 array, even though it's on a max-RAM hardware RAID card.
However, Windows-copying the small JPGs to backup to the external NTFS was running at only 100kB/s, no doubt due to NTFS overhead on an old spinning rust drive.
So - what I've found is fastest, is to use 7Zip at 0 compression to write the backups to the external drive. Even with my puny 2-core Celeron CPU, I'm getting 60-90MB/s sustained rates from the RAID5 array to external drive, against the previous best-case of 150MB/s for single large files.
Surprisingly, running 10 x 7Zip archive jobs at zero compression in parallel, it seems the parallel runs are faster than a single run (which ran at 20-30MB/s). I would have thought the high parallel copies would be slower than some optimal lower count of 2-3 copies, but it seems not.
At this rate, I'll back up 900,000 small files totalling 1TB in around 3-4hrs, which is way better than every other solution I had tried.
So my learning is that it seems 7Zip with 0 compression is the answer for copying small files far faster than other methods, running at near-(old) disk speeds even on a 12yo small Celeron CPU.