llama.cpp quantizes the heck out of language models, which allows consumer cpus to run them. my laptop can run most 7b or 13b LLMs with 4bit quantization (and they are trying to push the level of quantization even further to 2 or 1.5 bits!)
The same will happen with stable diffusion. Most SD models are still around fp16 levels of quantization, and will soon be going lower. I expect we'll all be running SDXL or larger models on our laptop CPUs without breaking a sweat at 4bit level.