- Stable Diffusion is a state-of-the-art text-to-image machine learning model trained on a large imageset.
- The algorithm takes a textual description and generates an image based on that description. The generated image will look similar to the text but will not be an exact replica.
- We will demonstrate how to train a Stable Diffusion model using DreamBooth textual inversion on a picture reference to build AI representations of your own face and generate result photos with incredible results.
Guest Post by Tarunabh Dutta.
If 2021 was the year of word-based AI language models, 2022 has taken a leap into Text-to-Image AI models. There are many text-to-image AI models available today that can produce high-quality images. Stable Diffusion is one of the most popular and well-known options. It is a fast and stable model that produces consistent results.
The process of image generation is still somewhat mysterious, but it is clear that Stable Diffusion produces excellent results. It can be used to generate images from text or to alter existing images. The available options and parameters allow for much customization and control over the final image.
While it’s relatively easier to work on images of celebrities and popular figures, purely because of the already available imageset, it’s not so easy to get the AI to work on your own face. The logic says to feed the AI model with your images and then let it do its magic, but how exactly can one do that?
In this article, we will try to demonstrate how to train a Stable Diffusion model using DreamBooth textual inversion on a picture reference to build AI representations of your own face or any other object and generate result photos with incredible results, precision, and consistency. If it sounds too technical, hang around, and we will try to make it as beginner-friendly as possible.
What is Stable Diffusion?
Let’s get the basics away. The Stable Diffusion model is a state-of-the-art text-to-image machine learning model trained on a large imageset. It is expensive to train, costing around $660,000. However, the Stable Diffusion model can be used to generate art using natural language.
Deep learning Text-to-Image AI models are becoming increasingly popular due to their ability to translate text accurately into images. This model is free to use and can be found on Hugging Face Spaces and DreamStudio. The model weights can also be downloaded and used locally.
Stable Diffusion uses a process called “diffusion” to generate images that look similar to the text prompt.
In short, the Stable Diffusion algorithm takes a textual description and generates an image based on that description. The generated image will look similar to the text but will not be an exact replica. The alternatives to Stable Diffusion include OpenAI’s Dall-E and Google’s Imagen models.
Related Read: 9 Best AI Art Generator Apps for iPhone and Android
Guide to Train Stable Diffusion AI with your Face to Create image using DreamBooth
Today, I’ll demonstrate how to train a Stable Diffusion model using my face as an initial reference in order to generate images with a highly consistent and accurate style that is both original and fresh.
So, for this purpose, we will be using a Google Colab called DreamBooth to train Stable Diffusion.
Before launching this Google Colab, we must prepare certain content assets.
Stage 1: Google Drive with enough free space
For this, you need a Google Drive account with at least 9 GB of free space.
A free Google Drive account comes with 15 GB of free storage space, which is enough for this task. So you can create a brand new (disposable) Gmail account just for this purpose.
Stage 2: Reference Images to train AI
Secondly, you must have at least a dozen portraits of your face or any target object ready for use as references.
- Please ensure that the facial features are visible and adequately illuminated in the captured images. Avoid using harsh shadows, particularly on the face.
- Additionally, the subject should face the camera or have a side profile in which both eyes and all facial features are clearly visible.
- The camera should be capable of capturing high-quality facial features. The best option is a professional-level DSLR or mirrorless camera. A smartphone camera of excellent quality can also suffice.
- The composition should be positioned in the frame’s center with a little headspace.
- As input images, a minimum of twelve close-up photos of the face, five mid-shot photos covering from head to above the waist, and roughly three full-figure photos should be adequate.
- A minimum of twenty reference photographs should be sufficient for this purpose.
In my case, I have shot and gathered a collection of approximately 50 self-portraits, which I have cropped to 512 x 512 pixels using the online tool – Birme. You may also use any alternative image editor for this purpose.
Please keep in mind that the final output image must be optimized for the web and reduced in file size with minimal loss of quality.
Stage 3: Google Colab
The Google Colab runtime can now be executed.
There are both free and paid versions of the Google Colab platform. Dreambooth can run on the free version, but the performance is significantly faster and more consistent on the Colab Pro (paid) version, which prioritizes the usage of a high-speed GPU and assigns at least 15 GB of VRAM to the task at hand.
If you don’t mind spending a few dollars, a $10 Colab Pro subscription that includes 100 compute units each month is more than adequate for this session.
You will also have access to extra memory RAM and GPUs that are relatively more powerful and faster.
Let me reiterate this: You DO NOT need to be a technical specialist to run this Colab. You also do not require any prior coding experience.
Once you sign up with Google Colab (free or paid version), sign in with your credentials and head to this link to open DreamBooth Stable Diffusion.
A Google Colab has “runtime” sections or cells with clickable play buttons on the left side, which are arranged sequentially. To play the runtime starting from the top, simply click the play buttons one by one. Each segment consists of a runtime that must be executed. When you click a play button, the corresponding section is executed as a runtime. After some time, a green check mark will appear to the left of the play button to indicate that the runtime was executed successfully.
Please ensure that you manually execute only one runtime at a time and go to the next “runtime” section only when the current runtime has finished.
In the runtime portion of the top menu bar, you have the option to run all runtimes simultaneously. However, this is not recommended.
Below that is an option labeled “Change runtime type.” If you are subscribed to a pro subscription, you can choose and save a “premium” GPU and high RAM for your execution.
Now you are ready to start the DreamBooth Colab.
10 Steps to Successfully Complete a Trained AI Model on DreamBooth
STEP 1: Decide on the GPU and VRAM
The initial step is to determine the type of GPU and VRAM available. Pro users will have access to fast GPU and enhanced VRAM that is more stable.
Once you click the play button, it will display a warning because GitHub, the developer’s source website, is being accessed. You only need to click “Run Anyway” to continue.
STEP 2: Run DreamBooth
In the next step, you have to install certain requirements and dependencies. You just need to click on the play button and let it run.
STEP 3: Log in to Hugging Face
After clicking the play button, the next step will require you to log in to your Hugging Face account. You can create a free account if you do not already have one. Once logged in, navigate to your Settings page from the upper-right corner.
Then, click the ‘Access Tokens‘ section and the ‘Create New‘ button to generate a new “access token” and rename it as desired.
Copy the access token, then return to the Colab tab and enter it into the field provided, then click “Login.”
STEP 4: Install xformers
In this step, you can click on the runtime to install xformers by simply hitting the play button.
STEP 5: Connect Google Drive
After clicking the play button, you will be asked in a new pop-up window for permission to access your Google Drive account. Click on “Allow” when asked for permissions.
After granting permissions, you must confirm that “save to Google Drive” is selected. You must also set a new name for the ‘CLASS NAME‘ variable. If you wish to submit reference images of a person, simply put ‘person,’ ‘man,’ or ‘woman.’ If your reference images are of a dog, type ‘dog’ and so on. You may keep the remaining fields unchanged. Alternatively, you can rename the input directory—’INSTANCE DIR’ or the output directory—’OUTPUT DIR.’
STEP 6: Upload reference photos
After clicking the play button in the previous step, you will see the option to upload and add all of your reference photos.
I would recommend a minimum of 6 and a maximum of 20 photographs. Refer to “STAGE 2” above for a concise explanation of how to select the best reference picture based on how the subject is captured.
Once all of your images have been uploaded, you may view them in the left-hand column. There is a folder icon. Once you click on it, you will be able to view the folders and subfolders in which your data is presently being stored.
Under the data directory, you may view your input directory, where all your uploaded photos are stored. In my instance, it is known as “sks” (default name).
Additionally, please note that this content is only temporarily stored in your Google Colab storage and not on Google Drive.
STEP 7: Train AI model with DreamBooth
This is the most crucial step, as you will be training a new AI model based on all your uploaded reference photos using DreamBooth.
You must only focus on two input fields. The first parameter is “—instance prompt.” Here, you must enter a very unique name. In my case, I will use my first name followed by my initials. The whole idea is to keep the complete name unique and precise.
The second crucial input field is the ‘—class prompt’ parameter. You must rename it to match the one you used in ‘STEP 4’. In my case, I used the term “man.” So I will retype it into this field and overwrite any previous entry.
The rest of the fields can be left untouched. I have observed users experimenting by altering fields such as ‘—num class images’ to 12 and ‘—max train steps’ to 1000, 2000, or even higher. However, please remember that modifying these fields may cause the Colab to run out of memory and crash, requiring you to restart from the beginning. Therefore, it is advisable not to edit them on the initial attempt. You could experiment with them in the future after gaining sufficient experience.
Once you execute this runtime by clicking the play button, the Colab will begin downloading the necessary executable files and will then be able to train using your reference pictures.
Training the model will take anywhere from 15 minutes to over an hour. You must be patient and keep track of the progress until the runtime is completed. If your Google Colab is idle for too long, it might reset. So keep checking on the progress and clicking on the tab occasionally.
STEP 8: Convert AI model to ckpt format
After training is complete, you will have the option to convert the trained model to a file in the ckpt format, which is directly compatible with Stable Diffusion.
The conversion can be performed in two runtime phases. The first is “Download script,” and the second is “Run conversion,” where you have the option to reduce the trained model’s download size. However, doing so will significantly degrade the resulting image quality.
Therefore, to maintain the original size, the ‘fp16‘ option must remain unchecked.
At the end of this particular runtime, a file called “model.ckpt” will be saved to your connected Google Drive.
We can save this file for future usage because your runtimes are immediately deleted when you shut the DreamBooth Colab browser tab. When you reopen the Colab version of DreamBooth later, you will have to start from scratch.
Suppose you save the trained model file to your Google Drive. In that case, you can retrieve it later to use with your locally installed Stable Diffusion GUI, DreamBooth, or any Stable Diffusion Colab notebooks that require the “model.ckpt” file to be loaded for the runtime to operate effectively. You can also save it to your local hard disks for later use.
STEP 9: Prepare for Textual Prompt
The next two runtime processes under the “Inference” category prepare the newly trained model for the textual prompt used for image generation. Simply press the play button for each runtime, and it will finish in a matter of minutes.
STEP 10: Generate AI images
This is the final step, where you can type the textual prompts, and the AI images will be generated.
You must use the exact name of ‘instance_prompt’ and ‘–class_prompt’ together from STEP 6 at the beginning of the text prompt. For example, in my case, I used “a portrait of tarunabhtd man, digital painting” to generate new AI images resembling myself.
Below you can see some image results generated with the trained model of DreamBooth.
Play Around with Prompts to Get Best Outputs
If you carefully follow the steps outlined above, you will be able to generate AI images that closely resemble the facial features in your reference images. This method just requires the online Google Colab platform to execute an upgraded version of the AI technology for textual inversion.
For better ideas for text prompts, you can check out sites like –
You also need to learn the art of crafting better and more effective text prompts using a variety of artistic styles and various combinations. A good starting place would be the Stable Diffusion SubReddit.
Reddit has got a huge community dedicated to Stable Diffusion. There are also a number of Facebook groups and Discord communities actively discussing, sharing, and exploring new avenues of Stable Diffusion.
Below I am also sharing links to a few DreamBooth tutorial videos that you can watch on Youtube –
I hope you find this guide useful. If you have any questions, feel free to comment below, and we will try to help you.