this post was submitted on 08 Aug 2024
10 points (91.7% liked)

Everything ZFS

273 readers
1 users here now

A community for the ZFS filesystem.

ZFS is an opensource COW filesystem used by enterprise and serious homelabbers for it's data safety and extensive feature set.

OpenZFS is the active branch now developed primarily for Linux with a port to it's FreeBSD roots.

This community is here to answer questions and discuss topics related to the use of ZFS in the wild.

Rules:

As always, the main rule is Don't Be a Dick. Be polite with new users asking questions that you may consider obvious. If you don't have something constructive to offer, downvote and move on.

No dirty deletes: your posts are here for posterity, perhaps the next person will get something out of it, even if it's wrong.

founded 1 year ago
MODERATORS
 

Looking for thoughts/opinions

I have a 5 disc raidz1 array. The volumes are accumulating CKSUM errors - fairly evenly distributed over the discs. I've been lazy and let this progress to the point where there are permanent errors in files.

# zpool status -v
  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 748K in 06:17:19 with 1 errors on Sun Jul 14 06:41:22 2024
config:

        NAME                                 STATE     READ WRITE CKSUM
        tank                                 ONLINE       0     0     0
          raidz1-0                           ONLINE       0     0     0
            ata-ST8000VN004-2M2101_WSD13YBW  ONLINE       0     0     6
            ata-ST8000VN004-2M2101_WSD13YE4  ONLINE       0     0     7
            ata-ST8000VN004-2M2101_WSD1454G  ONLINE       0     0     8
            ata-ST8000VN004-2M2101_WSD1454W  ONLINE       0     0     6
            ata-ST8000VN004-2M2101_WSD14563  ONLINE       0     0     7

errors: Permanent errors have been detected in the following files:

        /you/do/not/need/this/level of detail.txt

I've done some research and believe (hope) that the cause of these errors is the "domestic" onboard SATA controllers I'm using and I have ordered a LSI SAS3008 9300-8i HBA as an upgrade.

I know I can fix the permanent error by deleting and restoring it and then running a scrub. But, I'm torn - should I scrub now and risk stressing it more on the crappy SATA controllers, or wait until I get the new HBA (in a few weeks - free cheap, slow, shipping)?

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 3 months ago (2 children)

I’d shut it down before it corrupts even more, replace HBA when it arrives and run a scrub to see what’s the damage

[–] [email protected] 1 points 3 months ago (1 children)

I know that's the correct response. But, it's been running like this for many months, maybe even years - as I said in the post, I've been lazy. There's nothing on it that can't easily be restored, or replaced, and shutting it down would be a PITA.

[–] [email protected] 1 points 3 months ago

There’s always a chance your backups might get corrupted too if you let it continue like that