Dreambooth training [first look]

gadicc · December 5, 2022, 8:02am

Hey, well done on getting everything working!! Great news!! And this is really great info & feedback, thanks.

Have been working with the diffusers team to get this working, we’re almost there. Once it’s done, we’ll need to compare with exactly the same inputs. LastBen’s repo is a super optimized version of the original Stable Diffusion code, whereas docker-diffusers-api wraps the diffusers library. Diffusers tends to be a little behind on performance but does usually catch up eventually with a lot of other benefits. Still, if a T4 is outperforming an A100, that’s very worrying… but let’s compare first with exactly the same inputs (fp16 and others).

Please keep us posted with your testing, this is very important feedback. Definitely with prior-loss-preservation will give better generalized results. We need to compare with exactly the same options.

May also be useful to search on Issues: dreambooth · huggingface/diffusers · GitHub if anyone else has noticed quality issues between the repos. But first important to train with exactly the same options.

Unfortunately this currently isn’t possible at Banana because of the 16GB GPU RAM limit. In the long term, they said they want to offer more RAM to instances at a higher cost per GPU second, but I don’t think it’s going to happen anytime soon

I’ve never used it either, but I see it’s S3-compatible storage. That means you can already use it by setting the AWS_S3_ENDPOINT_URL to https://<accountid>.r2.cloudflarestorage.com

Best solution in my mind for cost and speed is for Banana to host their own On-prem S3-compatible Storage, but it’s a big ask, to have them commit to high availability on cloud storage that isn’t their core business, still, I think it might be inevitable, so hope it will happen sooner than later

Thanks again for all this really helpful info and feedback! I think we’re on our way to create the best dreambooth