23 min read
Image created by Decrypt using Stable Diffusion /Jose Lanz
Whether you're a digital artist seeking fresh inspiration or just a regular Joe with an insatiable hunger for visuals, Stable Diffusion is set to become your new go-to tool. The best part? It's open-source and completely free, inviting everyone to don their creative hats. But be warned: like any skilled artist, it has the potential to produce NSFW content if that's what your 'recipe' requires.
Stable Diffusion is a Text-to-Image Generative AI tool, which means it translates words into images. The process is akin to mailing a detailed brief to a master painter and awaiting the return of a meticulously created artwork.
Consider Stable Diffusion your personal AI-based creative ally. Primarily engineered for generating images from text prompts, this deep learning model extends beyond a single function. It can also be utilized for inpainting (altering sections of an image), outpainting (expanding an image beyond its existing borders), and translating images based on text prompts. This versatility equates to having a multi-talented artist at your disposal.
Stable Diffusion operates on the basis of a deep learning model that crafts images from text descriptions. Its mainstay is a diffusion process, where an image is morphed from random noise into a coherent image via a series of steps. The model is trained to steer each phase, hence guiding the entire process from inception to completion, as per the provided text prompt.
The central idea behind Stable Diffusion is the conversion of noise (randomness) into an image. The model kickstarts the process with a heap of random noise (think of a colorized version of the white noise from an out-of-signal TV) which is then gradually refined, influenced by the text prompt, into a discernible image. This refinement proceeds systematically, steadily decreasing the noise and intensifying the detail until a high-quality image emerges.
The process of creating an image out of random noise. Credit: Jay Alammar
As the diffusion process kicks off, the preliminary stages largely dictate the overall composition of the image, with subsequent keyword alterations affecting only minor portions. This emphasizes the need for careful attention to your keyword weighting and scheduling to realize your desired outcome.
Among its strengths, Stable Diffusion excels at creating detailed, high-quality images, custom-designed to specific prompts. It easily navigates across various art styles, seamlessly blends techniques of different artists, and smoothly transitions between varying keywords.
Unlike its counterparts such as MidJourney, Stable Diffusion comes free of charge, a boon for your budget. It is also open source, which means you can modify it as you want. Whether you aspire to create futuristic landscapes or anime-inspired images, Stable Diffusion has a model for that. We will later delve into how to download and tailor these models to your preference.
You can run it offline, eliminating the need for constant internet connection or server access, making it a valuable tool for privacy-conscious users.
Unlike MidJourney, Stable Diffusion has a steep learning curve. To generate truly remarkable images, you must engage with custom models, plugins, and a sprinkle of prompt engineering. It is a little bit of a Windows vs Linux situation.
Also, the model can occasionally exhibit unforeseen associations, leading to unexpected results. A slight miss in the prompt can lead to significant deviations in the output. For example, specifying eye color in a prompt might unintentionally influence the ethnicity of the generated characters (for example, blue eyes are usually associated to caucasians). Therefore, a deep understanding of its workings is necessary for optimal results.
Additionally, it necessitates an extensive amount of detail in the prompt to deliver impressive results. Unlike MidJourney, which performs well with prompts like "a beautiful woman walking in the park", Stable Diffusion requires a comprehensive description of everything you wish to (or not to) see in your image. Be prepared for long, detailed prompts.
There are multiple ways to run Stable Diffusion, either via cloud-based platforms or directly on your local machine.
These are some of the online platforms that let you test it in the cloud:
However, if you opt for a local installation, ensure your computer has the necessary capabilities.
To run Stable Diffusion locally, your PC should run on Windows 10 or higher, and at least sport a discrete Nvidia video card (GPU) with at least 4 GB VRAM, have 16GB of RAM, and at least 10GB of free space.
For an optimal experience, an RTX GPU with 12GB of vRAM, 32 GB of RAM, and a high-speed SSD are recommended. Disk space will depend on your specific needs: the more models and add-ons you plan to use, the more space you'll require. Generally, models need between 2GB and 5 GB of space.
As you set out on your journey with Stable Diffusion, choosing the right Graphical User Interface (GUI) becomes crucial. For outpainting, Invoke AI leads the pack, while SD.Next champions efficiency. ComfyUI is a node-based super lightweight option that has been gaining a lot of steam lately because of its compatibility with the new SDXL. However, Automatic 1111, with its popularity and user-friendliness, stands as the most preferred. Let's delve into how you can get started with Automatic 1111.
Two different GUIs (A1111 and ComfyUI) running Stable Diffusion
The installation process of Automatic 1111 is uncomplicated, thanks to the one-click installer available on this repository. Proceed to the “assets” section of the Github page, download the .exe file, and run it. It may take a moment, so hang in there—remember, patience is key.
Upon successful installation, an 'A1111 WebUI' shortcut will materialize within a newly opened folder. Consider pinning it to your taskbar or creating a desktop shortcut for easier access. Clicking this shortcut will launch Stable Diffusion, ready for your creative commands.
It would be a good idea to tick the boxes for: Auto-Update WebUI (keep the program up to date), Auto Update Extensions (keep the plugins and third party tools updated), and, if your Pc is not that powerful, the Low VRam (medvram) and the option to enable Xformers should also be activated.
Screen that appears before launching A1111
Once you have Stable Diffusion with A1111 installed, this is what you will see when you open it
Automatic 1111 GUI
But don’t be intimidated. Here's your brief tour of the interface when running Stable Diffusion:
That's it—you are all set! Now, let your creativity flow, and see the magic of Stable Diffusion unfold.
A successful venture with Stable Diffusion is largely dependent on your prompt – think of it as a compass steering the AI. The richer the details, the more accurate your image generation will be.
Prompt crafting may sometimes seem daunting, as Stable Diffusion doesn't follow a linear pattern. It's a process steeped in trial and error. Start with a prompt, generate images, select your preferred output, modify elements you cherish or wish to eliminate, and then begin afresh. Rinse and repeat this process until your masterpiece emerges from inpainting tweaks and relentless improvements.
Stable Diffusion's design enables keyword weight adjustment with the syntax (keyword: factor). A factor below 1 downplays its importance, while values above 1 amplify it. To manipulate the weight, select the specific keyword and hit Ctrl+Up for an increase or Ctrl+Down for a decrease. Additionally, you can utilize parentheses – the more you employ, the heavier the keyword weight.
Modifiers add that final flourish to your image, specifying elements like mood, style, or details like "dark, intricate, highly detailed, sharp focus."
Positive prompts outline your desired elements. A reliable strategy for prompt construction is specifying the type of image, subject, medium, style, setting or scenery, artist, tools used, and resolution, in that order. A demonstration from civitai.com could be “photorealistic render, (digital painting),(best quality), serene Japanese garden, cherry blossoms in full bloom, (((koi pond))), footbridge, pagoda, Ukiyo-e art style, Hokusai inspiration, Deviant Art popular, 8k ultra-realistic, pastel color scheme, soft lighting, golden hour, tranquil atmosphere, landscape orientation”
Conversely, negative prompts detail everything you wish to exclude from the image. Examples include: dull colors, ugly, bad hands, too many fingers, NSFW, fused limbs, worst quality, low quality, blurry, watermark, text, low resolution, long neck, out of frame, extra fingers, mutated hands, monochrome, ugly, duplicate, morbid, bad anatomy, bad proportions, disfigured, low resolution, deformed hands, deformed feet, deformed face, deformed body parts, ((same haircut)), etc. Don’t be afraid of describing the same thing with different words.
A good way to think about a prompt is The “What+SVCM (Subject, Verb, Context, Modifier)” structure:
So, an example of a positive prompt could be: Portrait of a cute poodle dog posing for the camera in an expensive hotel, (((black tail))), fall, bokeh, Masterpiece, hard light, film grain, Canon 5d mark 4, F/1.8, Agfacolor, unreal engine.
Negative prompts don’t need a proper structure, just add everything you don’t like, as if they were modifiers. If you generate a picture and see something you don’t like, just add it to your negative prompt, rerun the generation and evaluate the results. That’s how AI image generation works, it’s not a miracle.An example of a negative prompt could be: blurry, poorly drawn, cat, humans, person, sketch, horror, ugly, morbid, deformed, logo, text, bad anatomy, bad proportions
Keyword blending or prompt scheduling employs the syntax [keyword1: keyword2: factor]. The factor, a number between 0 and 1, determines at which step keyword1 switches to keyword2.
If you're unsure where to start, consider leveraging ideas from various websites and adapt them to suit your needs. Excellent sources for prompts include:
Alternatively, save an AI-generated image you admire, drag and drop it onto the “PNG Info” tab, and Stable Diffusion provides the prompt and relevant information to recreate it. If the image isn't AI-generated, consider using the CLIP Interrogator add-on to gain a better understanding of its description. Further details on this add-on are provided later in the guide.
Civitai lets people see the prompts used for many images/Jose Lanz/Decrypt
Stable Diffusion is only as good as the prompts it's given. Thriving on detail and accuracy, it's essential to provide clear and specific prompts and favor concepts over explanations. Instead of crafting an elaborate sentence to describe a spacious, naturally lit scene, simply say "spacious, natural light."
Be mindful of unintended associations that certain attributes might bring, such as specific ethnicities when specifying eye color. Staying alert to these potential pitfalls can help you craft more effective prompts.
Remember, the more specific your instructions, the more controlled your outcome. However, be careful if you pretend to create long prompts, because using contradictory keywords (for example long hair, and then sort hair, or blurry in the negative prompt and blur on the positive prompt) might lead to unexpected results:
Installing models is a straightforward process. Begin by identifying a model suited to your needs. A great starting point is Civitai, renowned for being the largest repository of Stable Diffusion tools. Unlike other alternatives, Civitai encourages the community to share their experiences, providing visual references to a model’s capabilities.
Visit Civitai, click on the filter icon, and select “Checkpoints” in the “model types” section.
Civitai uses filters to let users personalize their searches/Jose Lanz/Decrypt Media
Then, browse through all the models available on the site. Keep in mind that Stable Diffusion is uncensored, and you may encounter NSFW content. Select your preferred model and click on download. Ensure the model has a .safetensor extension for safety (older models used a .ckpt extension which is not as safe).
Example of a page to download a specific custom SD v1.5 model from Civitai. José Lanz
Once downloaded, place it in your local Automatic 1111's models folder. To do this, navigate to the folder where you installed your Stable Diffusion with A111 and follow this route: “stable-diffusion-webui\models\Stable-diffusion”
There are hundreds of models to choose, but for reference, some of our top picks are:
Stable Diffusion also allows you to use AI to edit images you don't like. You may want to change the artistic style of your composition, add birds to the sky, remove artifacts, or modify a hand with too many fingers. For this, there are two techniques: Image to Image and Inpainting.
Image created by Stable Diffusion (right) based on the photo used as reference (left) using Img2img/Jose Lanz
Image to Image essentially lets Stable Diffusion create a new image using another picture as reference, doesn’t matter whether it's a real image or one you've created. To do this, just click on the Image to Image (Img2Img) tab, place the reference image in the appropriate box, create the prompt you want the machine to follow, and click generate. It's important to note that the more denoising strength you apply, the less the new image will resemble the original because Stable Diffusion will have more creative freedom.
Knowing this, you can do some cool tricks, like scanning those old photos of your grandparents as a reference, running them through Stable Diffusion with low denoising strength and a very general prompt like “RAW, 4k image, highly detailed”, and see how the AI reconstructs your photo.
Inpainting allows you to paint or edit things within the original image. For that, from the same Img2Img tab, select the inpaint option and place your reference painting there.
Then, you simply paint the area you want to edit (for example, your character's hair) and add the prompt you want to create (for example, straight long blonde hair), and you're done!
Blue hair edited using inpaint over the reference image of a blonde supergirl. Generated with AI/Jose Lanz
We recommend generating several batches of images so you can choose the one you like best and modify your prompt. However, in the end, it's always good to have a tool like Photoshop on hand to get perfect results if you're very meticulous.
Now that you're familiar with Stable Diffusion, you might be eager to push your creativity further. Maybe you want to fix a specific hand position, force the model to generate a five-finger hand, specify a certain type of dress, enhance details, use a particular face, or transform your small image into a massive 8K file with minimum detail loss.
Extensions can help you achieve these goals. While there are numerous options available, we've highlighted five must-have extensions:
An image generated without LoRAs vs the same image generated using a LoRA to add more details. Credit: Jose Lanz
LoRAs are files designed to enhance the specificity of your model without downloading an entirely new model. This allows you to refine details, employ a certain face, dress, or style.
To install a LoRA, follow these steps:
Installing a LoRA follows the same steps as installing a model. On Civitai, set the filter to “LoRA” and place the file into the LoRA folder using this route: \stable-diffusion-webui\models\Lora
Remember, some LoRAs require a specific keyword in your prompt to activate, so make sure to read their description before use.
To use a LoRA, navigate to the text2img tab, click on the icon resembling a small painting (Show/hide extra networks), and the LoRAs will appear beneath your prompt.
An image generated without LoRAs vs the same image generated using a LoRA to add more details. Credit: Jose Lanz
If you're undecided about Stable Diffusion's capabilities, let the ControlNet extension be the definitive answer. Boasting immense versatility and power, ControlNet enables you to extract compositions from reference images, proving itself as a game-changer in image generation.
ControlNet is truly a jack-of-all-trades. Whether you need to replicate a pose, emulate a color scheme, redesign your living space, craft five-finger hands, perform virtually limitless upscaling without overtaxing your GPU, or morph simple doodles into awe-inspiring 3D renders or photorealistic visuals, ControlNet paves the way.
To enable ControlNet, you'll need to download models from this repository: https://huggingface.co/lllyasviel/ControlNet-v1-1/tree/main
Then, copy all the downloaded files into this folder: stable-diffusion-webui\extensions\sd-webui-controlnet\models
Upon restarting Stable Diffusion, you'll notice a new 'ControlNet' section in the text2img tab.
Two primary options are presented to you: a box to drag/drop your reference image, control type selection, and the preprocessor.
You have other more advanced options that let you finetune results: Preprocessors (technique used to activate the controlnet), Weights (how important is your reference) and start/end points (When will the controlnet begin/end its influence)
Here's a quick overview of what each control type accomplishes:
Mastering these options may take time, but the flexibility and customization they offer are worth the effort. Check out various tutorials and instructional videos online to get the most out of ControlNet.
Image edited using Roop to change a face for a provided reference. Credit: José Lanz
Roop provides a hassle-free method to generate realistic deepfakes. Instead of working with complex models or LoRAs, Roop handles the heavy lifting, enabling you to create high-quality deepfakes with a few simple clicks.
To download and activate, follow the instructions available on the official Roop Github repo
To use it, create a prompt, navigate to the Roop menu, upload a reference face, enable it, and generate your image. For the best results, use a high-res frontal shot of the face you wish to replicate. Remember, different images of the same person can yield varying results—some more lifelike than others.
How the Photopea extension looks inside of A1111
Sometimes, manual adjustments are needed to achieve the perfect result—that's where Photopea comes in. This extension brings Photoshop-like functionalities directly into the Stable Diffusion interface, allowing you to fine-tune your generated images without switching platforms.
You can install Photopea from this repository: https://github.com/yankooliveira/sd-webui-photopea-embed
This is a great tool if you don’t know where to start with prompts. Take an image, past it into the box, run the interrogator and it will tell you what words can be associated with the image you provided.
The CLIP Interrogator is a handy tool for deriving keywords from a specific image. By combining OpenAI's CLIP and Salesforce's BLIP, this extension generates text prompts that match a given reference image.
You can install it from this repository: https://github.com/pharmapsychotic/clip-interrogator-ext.git
With Stable Diffusion, you become the maestro of your visual orchestra. Be it a "hyperrealistic portrait of Emma Watson as a sorceress" or an "intricate digital painting of a pirate in a fantasy setting," the only limit is your imagination.
Now, armed with your newfound knowledge, go forth and paint your dreams into reality, one text prompt at a time.
Image created by Decrypt using AI/Jose Lanz
Decrypt-a-cookie
This website or its third-party tools use cookies. Cookie policy By clicking the accept button, you agree to the use of cookies.