this post was submitted on 08 Mar 2024
20 points (100.0% liked)

libre

9656 readers
4 users here now

Welcome to libre

A comm dedicated to the fight for free software with an anti-capitalist perspective.

The struggle for libre computing cannot be disentangled from other forms of socialist reform. One must be willing to reject proprietary software as fiercely as they would reject capitalism. Luckily, we are not alone.

libretion

Resources

  1. Free Software, Free Society provides an excellent primer in the origins and theory around free software and the GNU Project, the pioneers of the Free Software Movement.
  2. Switch to GNU/Linux! If you're still using Windows in $CURRENT_YEAR, flock to Linux Mint!; Apple Silicon users will want to check out Asahi Linux.
  3. Social Media Recommendations:

Rules

  1. Be on topic: Posts should be about free software and other hacktivst struggles. Topics about general tech news should be in the technology comm or programming comm.
  2. Avoid using misleading terms/speading misinformation: Here's a great article about what those words are. In short, try to avoid parroting common Techbro lingo and topics.
  3. Avoid being confrontational: People are in different stages of liberating their computing, focus on informing rather than accusing. Debatebro nonsense is not tolerated.
  4. All site-wide rules still apply

Artwork

founded 3 years ago
MODERATORS
 

I've spent weeks searching for an answer and trying different fixes, but at best I've reduced the frequency of it happening and even that I'm dubious of, since it seems so random.

  • journalctl has absolutely nothing at all from when it happens, except one time where it managed to log that the kernel lost contact with the GPU in the seconds before the system went down - after undervolting and underclocking the GPU that message hasn't happened since.

  • there's no crash log from it either.

  • memtest declared there were no problems with the RAM.

  • I've been watching sysinfo and corectrl like a hawk and CPU, RAM, and VRAM usage is all well within normal levels when it happens, temperatures are low across the board.

  • the same system has been 100% stable and completely fine running under heavier load for hours at a time in windows.

  • I've followed AMD's instructions for making sure the GPU drivers are what they should be for this, and the kernel is a version that's supposed to be correct and stable for those drivers as well.

  • specific compatibility settings that other people found to fix literally this exact problem may have, at most, reduced the frequency of the crashes but again, they're so erratic it's almost impossible to determine cause and effect here.

  • I've tried disabling the integrated graphics both in the BIOS and through settings, because that can apparently cause instability, but that hasn't helped.

I don't know what else to look at or try at this point.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 8 months ago* (last edited 8 months ago) (1 children)

Try this: https://launchpad.net/~oibaf/+archive/ubuntu/graphics-drivers

It is a repackage of the bleeding edge mesa drivers. Mesa is generally more stable than AMDs drivers and sometimes it performs better, as well

[–] [email protected] 3 points 8 months ago (1 children)

Unfortunately my main use case for linux is ROCm and that requires the drivers it installs for itself. After updating and some light testing updating to ROCm 6.0 hasn't actively failed yet but I won't conclusively know if it's fixed until it's gone long enough with regular use without crashing, at which point I'm sure it'll black screen the second I dare to feel relief and believe it to be fixed.

[–] [email protected] 2 points 8 months ago (1 children)

You can install mesa and ROCm at the same time. There should be a guide to it on the AMD website.

[–] [email protected] 3 points 8 months ago (1 children)

Sorry, I should have been clearer: in linux I'm only using the GPU for ROCm, I'm not trying to get games running or anything. I just want to get its ROCm performance stable and then never touch anything for fear of breaking it.

[–] [email protected] 3 points 8 months ago* (last edited 8 months ago) (1 children)

ROCm sits on top of the kernel driver in the graphics driver stack. Switching out the kernel driver (i.e. AMDGPU for mesa) is a good place to start. Feel free to the repository version of mesa if you're not using it for gaming. Trust me, I've tried to get ROCm working on my own machine before.

[–] [email protected] 2 points 8 months ago (1 children)

When I looked at it it was only talking about vulkan, opengl, and something mimicking directx 9 for compatibility, but I'll keep it in mind and if switching from ROCm 5.6 to 6.0 didn't solve the problem I'll try it. I didn't find anything about using ROCm with mesa when I searched, but between google being useless and ROCm seemingly being the least talked about and documented thing ever that's probably not surprising.

[–] [email protected] 3 points 8 months ago

It's one of the most annoying thing I've ever dealt with, and that purely becaus of how badly it's documented