My GPU provider has powerful enough hardware to train the text encoder too. I’m used to setting the variable train_text_encoder to true or false in my various notebooks. However, what I don’t understand is that the source code of this repository says:
Whether to train the text encoder. If set, the text encoder should be float32 precision.
"train_text_encoder": None
What does it mean that it should be float32 precision?
Ah, yeah… I copied and pasted that comment directly from the description in diffusers, but I guess it’s not very clear. In short:
Yes, train_text_encoder is a boolean that can be set to True.
The text_encoder that you’re fine-tuning does need to be in float32 precision, but, that’s pretty much a given, since dreambooth training throws an error anyways if you try train on a fp16 model. And given a fp32 “model”, the model is made up of / includes a fp32 text_encoder, fp32 unet, etc.
In short, set { "train_text_encoder": True } and let me know if any errors I haven’t tried this before but it should work out the box