this post was submitted on 30 Dec 2024
1 points (100.0% liked)

It's A Digital Disease!

11 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
 
The original post: /r/datahoarder by /u/ds3534534 on 2024-12-30 08:31:18.

Thought I'd share an interesting (probably well-known) TIL observation.

I'm backing up around 900,000 JPEGs and XMP - even locally is fine, it's just a 'get out of jail free' copy before I start messing about on the original.

These files are stored on a 12yo HP Microserver Gen8, running a Celeron CPU, with a 4x10TB Hardware RAID5 5200rpm array, running VMWare ESXi 5.5, and WIn10 on top of that. Horribly slow, of course.

I tried a few different options, but trying to copy those files was going to take at least a day, maybe 20-30.

The optimum method I've currently landed on, is this:

  • External (old 5200rpm SATA) drive in spare USB3 caddy
  • USB3 caddy mounted as new USB device in VMWare
  • Use 7Zip in Windows to archive an entire folder of ~100,000 JPGs, with 0 compression, from the source (Win VDisk on Hardware RAID5) to dest (NTFS-formatted USB Drive)

I had tested the USB drive at 150MB/s write speed using large movie files, which is acceptable enough. It was also twice as fast as an internal drive-to-drive copy within the RAID5 array, even though it's on a max-RAM hardware RAID card.

However, Windows-copying the small JPGs to backup to the external NTFS was running at only 100kB/s, no doubt due to NTFS overhead on an old spinning rust drive.

So - what I've found is fastest, is to use 7Zip at 0 compression to write the backups to the external drive. Even with my puny 2-core Celeron CPU, I'm getting 60-90MB/s sustained rates from the RAID5 array to external drive, against the previous best-case of 150MB/s for single large files.

Surprisingly, running 10 x 7Zip archive jobs at zero compression in parallel, it seems the parallel runs are faster than a single run (which ran at 20-30MB/s). I would have thought the high parallel copies would be slower than some optimal lower count of 2-3 copies, but it seems not.

At this rate, I'll back up 900,000 small files totalling 1TB in around 3-4hrs, which is way better than every other solution I had tried.

So my learning is that it seems 7Zip with 0 compression is the answer for copying small files far faster than other methods, running at near-(old) disk speeds even on a 12yo small Celeron CPU.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here