JohnJeffreyJones

joined 1 year ago
[–] [email protected] 2 points 1 year ago

Thanks for the hint, but no, no mismatch and yes the files are being pulled (even looked with tshark if everything comes over properly).

The solution was then much more benign. The stock kernel in the NFSroot where the initrd was produced in was much smaller than the one from the bootable system. This lead into why this was the case and it was missing about 200 non-free drivers which somehow made the kernel stop right before really starting off.

Adding those to the NFSroot and then into the initrd solved the problem sigh

Wasted way too much time there and I still have no idea why 5.10 booted ok the whole time.

25
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]
 

I've got a weird problem and looking for possible pointers.

On at least one of our servers, kernel 5.10.0-0.deb10.16-amd64 boots without a problem. Bat as we don't want to rely on an "ancient" kernel build for Debian Buster, we also tried various later ones but they all fail to start in the same way. Taking for example 6.1.0-11-amd64 from Debian Bookworm, this one would boot fine from local disk, but the very same one loaded via DHCP/PXE/TFTP would load the kernel and initrd seemingly fine but then only print

early console in setup code
Probing EDD (edd=off to disable)... ok

and then hang, i.e. the newly loaded kernel does not even start. Kernel command line options include already

debug loglevel=7 ro console=ttyS1,115200n8 earlyprintk=serial,ttyS1,115200n8 console=tty0

and I don't get any more info from the system, neither via serial port nor at the console.

Anyone with pointers?

Edit: edd=off results in the very same except the corresponding line missing from output

[–] [email protected] 1 points 1 year ago (1 children)

RAID5 ist auch mit SSD eine blöde Idee, solange das Dateisystem nicht selbst auch Prüfsummen schreibt. Ansonsten kann es passieren, dass ein Stripe nachher nicht mehr zusammenpasst, weil ein Device etwas falsches ohne Fehler liefert, dann aber das System/der Controler nicht mehr erkennen kann, welcher Block jetzt kaputt ist.

Daher mein Tipp: RAID6 oder ZFS.