My GPU provider has powerful enough hardware to train the text encoder too. I’m used to setting the variable train_text_encoder to true or false in my various notebooks. However, what I don’t understand is that the source code of this repository says:

Whether to train the text encoder. If set, the text encoder should be float32 precision.

    "train_text_encoder": None

What does it mean that it should be float32 precision?

Ah, yeah… I copied and pasted that comment directly from the description in diffusers, but I guess it’s not very clear. In short:

  1. Yes, train_text_encoder is a boolean that can be set to True.
  2. The text_encoder that you’re fine-tuning does need to be in float32 precision, but, that’s pretty much a given, since dreambooth training throws an error anyways if you try train on a fp16 model. And given a fp32 “model”, the model is made up of / includes a fp32 text_encoder, fp32 unet, etc.

In short, set { "train_text_encoder": True } and let me know if any errors :slight_smile: I haven’t tried this before but it should work out the box :crossed_fingers:

Hereby reporting back to confirm that training the text encoder using docker-diffusers-api works perfectly.

1 Like

Fantastic. Thanks for taking the time to report back :pray: