Dreambooth training [first look]

Yea I imagined a lot of people were thinking about horizontal scale and dynamic model download, thats great we are kind of all in the same page.
Maybe I can help out since I’m focused on this, I’m gonna do my own way to workaround this just to get things moving, but maybe we can colab, hit me on discord whenever you have the chance if you like.

Thanks again for all your work btw, I would not be here doing those things if it were not for your template

Hey… so in the interim I think you could get running pretty quickly… remove download.py COPY/RUN from Dockerfile, copy the S3/extract logic from there into app.py to download/extract the model if it hasn’t been used previously on that run, and remove the MODEL_MISMATCH check. That should be all you need (famous last words).

I have a few other ideas in mind, not sure if they’ll work yet, looking forward to have the time to experiment with them. Once I do, it will actually be quite nice to compare that to anything else you’ve done, and use whatever works best. And for sure, as soon as I have time again, always keen to colaborate!

And my pleasure. Great to hear and know how it’s helping others! As you gathered, really love open source collaboration :smiley: So yeah, we’ll be in touch properly when things calm down a bit for me. Thanks so much, @grf :raised_hands:

Blockquote
Hey… so in the interim I think you could get running pretty quickly… remove download.py COPY/RUN from Dockerfile, copy the S3/extract logic from there into app.py to download/extract the model if it hasn’t been used previously on that run, and remove the MODEL_MISMATCH check. That should be all you need (famous last words).

Exactly what I’m working at right now, that really seems like the fastest way to get moving in that direction

Again, thanks to you my friend :slight_smile:
Good luck there with all the other work related stuff

1 Like

Hi! Now uploading / downloading trained_model through s3 works well.
But while training, It seems training time takes more than other colab examples.
For 140 steps it took 391 seconds, 1400 steps it took 2972 seconds with v100 with this repo.
But at this colab example it takes 23 minutes (1380 seconds) with t4

It seems fp16 is crucial for reducing training speed.
Also, trained model quality is quite different. At colab, it’s good, but with docker-diffusers-api, it looks bad. (Maybe not setting prior-preservation to True will be problem. I’m testing)
Basic models work great at this repo, too.
So for better speed, fp16 adoption might be needed, for better quality, Enable_text_encoder_training adoption might be needed.
And another suggestion, If we upload/download 10000+ trained_models through aws s3, egress fee will be huge. I am planning to use cloudflare R2, but not yet used it. How do you think of this idea?
Thank you!

1 Like

Hey, well done on getting everything working!! Great news!! And this is really great info & feedback, thanks.

Have been working with the diffusers team to get this working, we’re almost there. Once it’s done, we’ll need to compare with exactly the same inputs. LastBen’s repo is a super optimized version of the original Stable Diffusion code, whereas docker-diffusers-api wraps the diffusers library. Diffusers tends to be a little behind on performance but does usually catch up eventually with a lot of other benefits. Still, if a T4 is outperforming an A100, that’s very worrying… but let’s compare first with exactly the same inputs (fp16 and others).

Please keep us posted with your testing, this is very important feedback. Definitely with prior-loss-preservation will give better generalized results. We need to compare with exactly the same options.

May also be useful to search on Issues: dreambooth · huggingface/diffusers · GitHub if anyone else has noticed quality issues between the repos. But first important to train with exactly the same options.

Unfortunately this currently isn’t possible at Banana because of the 16GB GPU RAM limit. In the long term, they said they want to offer more RAM to instances at a higher cost per GPU second, but I don’t think it’s going to happen anytime soon :confused:

I’ve never used it either, but I see it’s S3-compatible storage. That means you can already use it by setting the AWS_S3_ENDPOINT_URL to https://<accountid>.r2.cloudflarestorage.com :tada:

Best solution in my mind for cost and speed is for Banana to host their own On-prem S3-compatible Storage, but it’s a big ask, to have them commit to high availability on cloud storage that isn’t their core business, still, I think it might be inevitable, so hope it will happen sooner than later :sweat_smile:

Thanks again for all this really helpful info and feedback! I think we’re on our way to create the best dreambooth :smiley:

Hey all, the latest commit on dev allows for (and now defaults to) fp16 mixed_precision training. It’s still important to use PRECISION="" to download the full weights to continue training on (as per diffuser devs, training with fp16 weights can produce unstable results[1]).

To create a fp32 fine-tuned model (the previous default behaviour), pass the modelInput { mixed_precision: "no" }.

@hihihi, would love to hear how this update affects your above experiences too.


  1. [Dreambooth Example] Attempting to unscale FP16 gradients. · Issue #1246 · huggingface/diffusers · GitHub ↩︎

3 Likes

Re runtime dowloads:

Hey @grf, did you get anywhere on this?

I’ve added some very early support on latest dev, sparse details in Runtime downloads (don't download during build). Still working on this though, with more fun stuff to come.

will check it soon after I test few things!
I found that low quality of image happens with other well performed pre-trained models.
At webui, my pre-trained model works really well, but when using this repo, result is bad.
AUTOMATIC1111’s repo don’t use pipeline and this repo use, maybe this is the reason?

1 Like

Ah its my bad, it works well on euler a scheduler, but not at ddim

1 Like

Oh that’s really interesting!! Thanks for reporting this. Clearly we’ll need to do some more experimentation to work out the best results. AUTOMATIC1111’s repo uses Euler by default I think, right?

Thanks for all your testing and feedback! :pray:

I think I found why there’s difference between webui and this repo.

When testing without highlight grammars, there’s same result.

But if there’s highlight grammers like (best quality:1.3), (masterpiece:1.3), (ultra-detailed:1.2), there’s different result. It works well on webui, but not at this repo.

for example at my custom model

prompt “best quality” makes same result

but with prompt (best quality)
web ui result

this repo result

I think difference of how text anaylzed exist. Cause many people use webui as standard, I think it’s better to change as there’s same result with same text at webui. I’m gonna look at how can I change it tomorrow! Thank you

1 Like

Thanks for posting this, @hihihi! It’s really helpful to see with pics.

You’re right, this type of grammar has no meaning in the standard diffusers pipelines. However, there is a diffusers community pipeline that supports it. I have a flight coming up next week but let’s see if I can try integrate it before then :sweat_smile: I’ve been wanting to do the same for https://kiri.art/ for some time now so it’s nice to have a push.

@hihihi, something for you to play with tomorrow :wink:

2 Likes

P.S., there’s no good way to run AUTOMATIC1111 webui as a serverless process. However, it would be possible to extract parts of it (and / or other SD projects), and it is indeed a path I’ve considered numerous times before. But in the end, diffusers always catches up, and there is much wider development happening there. So I’ve stopped looking into those other solutions and am focusing all my efforts here too, and so far, every time my patience has paid off.

2 Likes

lpw_stable_diffusion pipeline works well!
If there’s chance, I will try other community pipelines too.
It returns slightly different outcome compares to webui, but not a big deal.
Maybe webui uses latent diffusion?

It is message when webui loaded
webui

And there’s DPM ++ 2M Karras scheduler on webui which have great performance, but huggingface diffusers don’t have.
Is there way to add this scheduler on repo?
It is not a important thing, because I can use other schedulers.

Thank you!

Does this repo also work with safetensor instead of ckpt?
It seems safetensor is much faster so I will gonna try safetensor at using custom model.

Great news!! Thanks for reporting back (and so quickly!). It’s fun to squeeze in a new feature before bed and wake up to usage feedback already :smiley:

Not possible yet without modifying the code (but all you have to do is add the name of the pipeline here in app.py). This is going to change so that instead of initting all pipelines at load time, they’re only initted (and cached) when they’re first used. I’ll also have better error handling for misspelled / unavailable pipelines, but it’s a little further down the line.

Looks like it does indeed, but not sure where and for what. I see diffusers has latent diffusion support but it’s not specific to the stable diffusion pipelines. Maybe you can look into this more and report back :sweat_smile:

Unfortunately adding schedulers is quite difficult… but if you manage, I’m sure the whole diffusers community will love you :slight_smile: I don’t really understand the differences between all the schedulers, however, there’s a nice comparison here:

and also, did you see the DPMSolverMultistepScheduler that’s been in diffusers for about two weeks (and works just fine in docker-diffusers-api)? I’m not sure how exactly or if it’s related to DPM ++ 2M Karras but you get excellent results in just 20 steps!! (same quality as 50 steps on the older schedulers).

Not yet, but I indeed have some stuff planned here! Just wish Banana had on-prem S3-compatible storage. I’m looking forward to see how this compares to their current optimization stuff… the only thing is, there’s no GPU in banana’s build stage (their optimizations step transfers the built docker image to different machines to do the optimization), so we’ll have to get creative here… but I’m up to the challenge :slight_smile:

Thank you very much! will test DPMSolverMultistepScheduler!

By the way, I’m building images at banana which works well in gpu server, but optimization doesn’t finished for 6 hours. It seems there’s some problem at banana side now.

Do you experience the same?

DPMSolverMultistepScheduler gets also cool result as I test

2 Likes