Hey @gadicc, this is something that has been on my mind for quite a while, not sure how to start approaching it, but open to suggestions, critics etc…
I see this docker-diffusers-api is quite a solution, with support for a lot of different features.
As someone starting out in the AI space, it’s always harder when you have to unpack a lot of new concepts like it’s the case here and for that reason I was thinking how could I create a solution that basically:
Training step:
- Download the base model (in my case it’s SD 1.5)
- run training on it and uploads the model to S3
- I would call this V1, then V2 will have more than 1 model, like SD2 perhaps, which will be already downloaded (both V1 and V2) so that call inputs would specify which one needs to run the training
Inference step:
- launch a new replica on banana
- download the model by it’s ID from my S3 bucket
- run a inference steps for different prompts on this model and return the images
- shutdown (idle timeout) the replica
From my limited knowledge and from the perspective of what I want to achieve, I don’t need/want to worry about:
- pipelines: honestly I could not even stop to learn what’s their fundamental difference and why I need one or another, whatever works is just fine
- schedulers: same idea as above, whatever works it’s just fine
- checkpoint: also not needed at all, I never use the web UI and don’t plan to port anything anywhere
- precision: don’t need for it to be a config, I want to set it to whatever has best results, maybe I could change to whatever makes things run faster, but definitely can be simplified/hardcoded
- other features that I’m not sure about: img2img and inpainting pipelines and others alike I don’t see a use for them, unless they are needed internally I would totally cut them out
From the above, does it sound like I’m gonna make my life easier if I go for this approach of making it simpler and less flexible/modular.
I ask that from a few points of view:
- maybe it’s just my limited knowledge that makes me believe I can maintain things by making something like that, maybe I’ll be forced to deal with those things listed above anyway, but I just really want to use SD from 1.5 to 2, and run different prompts but nothing crazy on top of it.
- maybe I have the wrong impression that you can make SD be able to create amazing outputs with a one size fits all solution, and when you need to have that one special prompt to output a new style then perhaps it would not be possible without touching some of the above.
- finally, it makes more sense to me, someone who is learning to start small (like the japanese filosofy ikigai) and then build new features from there. It makes sense to me in the long run to support different features if it comes to that, but really not relevant right now, so it does seems to make sense to build a kind of fixed solution and then make it more robust and flexible as I go.
The gold question is: do you believe that with little help from you and maybe others I could strip down this repo to the minimum version of what I mentioned above, considering that I’ll have little idea of what to cut out/modify vs just use this repo and keep patching it with my own things would obviously be faster?