Stable Diffusion

1490 readers
1 users here now

Welcome to the Stable Diffusion community, dedicated to the exploration and discussion of the open source deep learning model known as Stable Diffusion.

Introduced in 2022, Stable Diffusion uses a latent diffusion model to generate detailed images based on text descriptions and can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by text prompts. The model was developed by the startup Stability AI, in collaboration with a number of academic researchers and non-profit organizations, marking a significant shift from previous proprietary models that were accessible only via cloud services.

founded 1 year ago
MODERATORS
26
27
 
 
28
 
 
29
30
 
 

Stability AI has just announced the release of StableCode, its very first LLM generative AI product for coding.

31
 
 

I was curious, do you run Stable Diffusion locally? On someone else's server? What kind of computer do you need to run SD locally?

32
 
 

She Looks cool and funny but she is a hell of a monster.

33
 
 

Kraken created with ComfyUI and SDXL 0.9

34
 
 
35
 
 
36
 
 

Prompt created locally via wizardLM-7b

Describe an eerie image of a mountain landscape

An eerie image of a mountain landscape could be one where the sun is setting behind the mountains, casting long shadows across the valleys below. The sky may be painted with a palette of oranges, purples, and pinks as the light fades away into darkness. The peaks of the mountains are shrouded in mist, creating a sense of mystery and intrigue. The silence is broken only by the occasional howl of a wolf or the rustling of leaves in the wind. It's a scene that evokes a sense of solitude and isolation, making it feel like you're at the edge of the world.

Image created locally via

Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 4.5, Seed: 2599999378, Size: 512x512, Model hash: 4199bcdd14, Model: revAnimated_v122, Version: v1.3.0

Time taken: 13.70s

37
 
 

This one was fun to work on. Used Inkpunk Diffusion - no img2img or controlnet - straight prompt editing.

38
2
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]
 
 

I worked on this one for a family member who owns a blue Alfa Romeo Giulia. No post processing, I just used Inkpunk Diffusion model and kept running and tweaking prompts then upscaled my favorite..

39
 
 

the last post was about a week ago

40
41
42
 
 

I understand that, when we generate images, the prompt itself is first split into tokens, after which those tokens are used by the model to nudge the image generation in a certain direction. I have the impression that the model gets a higher impact of one token compared to another (although I don't know if I can call it a weight). I mean internally, not as part of the prompt where we can also force a higher weight on a token.

Is it possible to know how much a certain token was 'used' in the generation? I could empirically deduce that by taking a generation, stick to the same prompt, seed, sampling method, etc. and remove words gradually to see what the impact is, but perhaps there is a way to just ask the model? Or adjust the python code a bit and retrieve it there?

I'd like to know which parts of my prompt hardly impact the image (or even at all).

43
 
 

I really want to setup an instance at home I know the MI25 can be hacked into doing this well but I would love to know what other people are running and see if I can find a good starter kit.

THX ~inf

44
 
 

one of my favourite things about stablediffusion is that you can get weird dream-like worlds and architectures. how about a garden of tiny autumn trees?

45
46
 
 

I don't know if this community is intened for posts like this, if not, I'm sorry and I'll delete this post ASAP....

So, I play TTRPG (mostly online) and I'm a big fan of visual aids, so I wanted to create some chahrcter images for my charakter in the new campaign I'm playing in. I don't need perfect consistency as humans usually change a little over time and I only needed the character to be recognizable on a couple of images that are usually viewed on their own and not side by side, so nothing like the consistency you'd need for a comic book or something similar. So I decided to create a Textual Inversion following this tutorial and it worked way better than expected. After less than 6 epochs I had a consistency that was enough for my usecase and it didn't start to overfit when I stopped the training around epoch 50.

Generated image of a character wearing a black hoodie standing in a rundown neighborhood at night Generated image of the character wearing a black hoodie standing on a street Gerneated image of the character cosplaying as Ironman Generater image of the character cosplaying as Amos from the Expanse

Then my SO, who's playing in the same campaign asked me to do the same for their character. So we went through the motions and created and filtered the images. A first training attempt had the TI starting to overfit halfway through the second epoch, so I lowered the learning rate by factor five and started another round. This time the TI started overfitting somewhere around epoch 8 without reaching consistency before. The generated images alternate between a couple of similar yet distinguishable faces. To my eye the training images seem to have a simliar or higher quality than the images I used in the first set. Was I just lucky with my first TI and unlucky with the other two and simply should keep on trying or is there something I should change (like the learningrate that still seems high to me with 0.0002 judging from other machine learning topics)?

47
 
 

48
 
 

49
 
 
50
-1
ControlNet for QR Code (www.reddit.com)
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]
view more: ‹ prev next ›