196

16566 readers

2195 users here now

Be sure to follow the rule before you head out.

Rule: You must post before you leave.

^other^ ^rules^

founded 1 year ago

MODERATORS

[email protected]

656

The Rule (lemmy.ml)

submitted 4 months ago by [email protected] to c/[email protected]

53 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 5 points 4 months ago

I don't have access to llama 3.1 405b but I can see that llama 3 70b takes up ~145 gb, so 405b would probably take 840 gigabytes, just to download the uncompressed fp16 (16 bits / weight) model. With 8 bit quantization it would probably take closer to 420 gb, and with 4 bit it would probably take closer to 210 gb. 4 bit quantization is really going to start harming the model outputs, and its still probably not going to fit in your RAM, let alone VRAM.

So yes, it is a crazy model. You'd probably need at least 3 or 4 a100s to have a good experience with it.