this post was submitted on 22 Nov 2024
32 points (97.1% liked)
Linux
48905 readers
1547 users here now
From Wikipedia, the free encyclopedia
Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).
Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.
Rules
- Posts must be relevant to operating systems running the Linux kernel. GNU/Linux or otherwise.
- No misinformation
- No NSFW content
- No hate speech, bigotry, etc
Related Communities
Community icon by Alpár-Etele Méder, licensed under CC BY 3.0
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I may have posted this before, but...late last year I realized my Debian server with circa 2009 hardware, with 4 gb of RAM and Core 2 Quad processor, was no longer up to the tasks I wanted it to perform; in particular, running a Home Assistant server. Back in 2018 or so, I added a software Linux RAID5 array with 5 active 3 TB drives and one hot spare, along with a "cold spare" that I've never actually used.
So, early this year, I bought hardware to upgrade my desktop machine, which was still plenty fast for me, and move the guts to my server. This is how my server usually gets upgraded. Upgrade the desktop machine, give it a few days or weeks to make sure it's stable, and then upgrade the server.
I installed the hardware without a problem, booted it up, and everything seemed okay, except that I ...couldn't access the RAID. At first it was like, well, I'm sure it's nothing serious, but then when mdadm could even FIND it, I started to get extremely worried. Fear set in.
Long story short: When I built the RAID, I followed directions that used the entire discs as the RAID, instead of making a partition on the disc and using that partition. The old motherboard didn't care, but the new one saw the bare discs and was like, "Hey, those are messed up, I'll fix the partition table for you!" Turns out, building Linux RAIDs by using the full discs like that is a VERY BAD IDEA for exactly this reason - but there are still guides out there showing that method and not mentioning the risks.
I was panicking. I spent days trying to figure out what to do and nothing was working. I was asking for help on the Linux-RAID list (and most of them were as helpful as they could be). Unfortunately my backups were NOT up to par (something I should have checked before starting), and I was at the point where I was like, well, I've lost x, y, and z.
I had basically given up and was just recreating the RAID using the "create command" then trying to see if I could mount the drive read-only. With 6 drives, there are quite a few possible combinations that could be the right one. If I remember correctly, I was able to figure out which drive was the spare, so I could limit my searches to the other 5, and knowing all 5 were in use, it was a matter of trying different orders. I think I got close one time and ext4 gave me weird read error, so after that I swapped two drives, and hit the right order.
Eventually .... I found it. I found the right combination and could reload it! Everything was there, untouched! As quickly as I could, I copied everything to a 10 TB drive I bought and installed into the desktop system. I saved the command, rebooted, and the same thing happened again - so it was definitely a motherboard problem - but this time I knew how to recreate it, and did so.
Since I now had a backup, I partitioned each drive and rebuilt the array using partitions...and I saved every piece of data I could think of about building the array, outputs of mdadm, outputs of /proc/mdstat, partition IDs, etc. Naturally, having that info likely means I'll never need it.
I was so relieved when I saw that mount command work without error. I spent close to a week worrying about it, and in that moment it was a huge rush.
New setup handles HA and other duties with aplomb and is very reliable, so in the end it was very worth it.
This is less "silly" and more "horrifying". Sorry.
Well written, and I learned a few things from this story. I recently started a cloud of my own with 4 20TiB HDDs in a raid 5 configuration so this story felt very prescient to me. Makes me very grateful for the simplicity of Cockpit and LUKS2.. my setup felt so trivial to configure!