this post was submitted on 25 Jul 2024
656 points (100.0% liked)
196
16566 readers
2227 users here now
Be sure to follow the rule before you head out.
Rule: You must post before you leave.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Some apps allow you to offload to GPU, and CPU while loading the active part of the model. I have a an old SSD that give me 500gb of "usable" ram set up as swap.
It is horrendously slow and pointless but you can do it. I got about 2 tokens in 10 minutes before I gave up on a 70b model on a 1080 ti.
Even if they used more powerful hardware than you, the model they ran is still almost 6 times bigger - so if you got two tokens in 10 minutes, one token in 30 minutes for them sounds plausible.
I would have to use an entire 1tb drive for swap but I'm sure I could manage 1 token before the heat death of the universe.
I'd worry less about the heat death of the universe and more about your hardware's heat from all that load.