this post was submitted on 26 Jul 2024
310 points (98.1% liked)

Technology

59039 readers
3763 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 5 points 3 months ago* (last edited 3 months ago) (1 children)

I don't think that the voltage issue is simply heat, not unless it is some kind of extremely-localized or extremely-short-in-time issue internal to the chip. I hit the problem with a very hefty water cooler that didn't let the attached processor ever get very warm, at least as the processor reported temperatures.

Wendell, at Level1Techs, who did an earlier video with Steve Burke talking about this, looked over a dataset of hundreds of machines. They were running with conservative speed settings, in a datacenter where all temperatures were being logged, and he said that the hottest he ever saw on any hotspot on any processor in his dataset was, IIRC, 85 degrees Celsius, and normally they were well below that. He saw about a 50% failure rate.

If we hit the problem on our well-cooled CPUs, if the CPU simply getting hot were a problem, I'd have expected people running them in hotter environments to have slammed into the thing immediately. Ditto for Intel -- I'd guess (I'd hope) that part of their QA cycle involves running the processors in an industrial oven, as a way to simulate more-serious conditions. Those things are supposed to be fine at 100 degrees Celsius, at which point they throttle themselves.

[–] [email protected] 1 points 3 months ago

It's not about the CPU package getting too hot, it's about a specific set of transistors getting too hot. I think I read they're between the processing units and the cache. The size of these transistors combined is probably around a couple mm square. Unless you etch the package back you can't measure them precisely. And if you etch that you can't dissipate their temperature so you can rub CPU at maximum load.