6
submitted 3 weeks ago* (last edited 3 weeks ago) by [email protected] to c/[email protected]

I can run full 131K context with a 3.75bpw quantization, and still a very long one at 4bpw. And it should barely be fine-tunable in unsloth as well.

It's pretty much perfect! Unlike the last iteration, they're using very aggressive GQA, which makes the context small, and it feels really smart at long context stuff like storytelling, RAG, document analysis and things like that (whereas Gemma 27B and Mistral Code 22B are probably better suited to short chats/code).

you are viewing a single comment's thread
view the rest of the comments
this post was submitted on 31 Aug 2024
6 points (100.0% liked)

Free Open-Source Artificial Intelligence

2806 readers
1 users here now

Welcome to Free Open-Source Artificial Intelligence!

We are a community dedicated to forwarding the availability and access to:

Free Open Source Artificial Intelligence (F.O.S.A.I.)

More AI Communities

LLM Leaderboards

Developer Resources

GitHub Projects

FOSAI Time Capsule

founded 1 year ago
MODERATORS