If you don’t have a CUDA-compatible GPU on your dev machine, this is really something you can consider. Changing code and waiting for a deploy on banana for each code change is a total productivity killer! (In theory these instructions could help you run in production on these providers too, but you won’t get banana’s serverless scaling magic).
Ive used to use runpod.io back when Disco Diffusion was all the rage. Great experience but It seems their prices have gone up after checking a moment ago. I swear they used to have the cheapest A100 but Lambda has them beat now.
Ive been using LambdaLabs for finetuning Dance Diffusion lately and its a smooth experience.
Ive heard of brev.dev (i think on this forum actually) and I’ve been wanting to check them out. Maybe ill try them this weekend.
Inference takes about time same. you can store HF key in their sever, the rest is about the same. They got documentation with a fair amount of examples, but still very premature. Their slack channel is opened to public. You can get a direct response from the team.
Lately Banana is having a lot of outage issues, and Runpod cold start is around ~16 seconds and inference is around ~20 seconds so despite being cheaper Runpod ends up charging more due to slow speed. Other hourly options are not suitable for the production app since traffic is mostly inconsistent.
Replicate seems like a stable option and their Nividia T4 has an excellent price compared to speed.
Hey @Raj_Dhakad, welcome to the forums! Sorry for the delay while I’ve been travelling, and thanks for that interesting feedback on the other providers.
Yeah, I think we could adapt docker-diffusers-api to Replicate too. We’d probably need to bypass a lot of the Replicate specific features, so, no cog file, and probably no typed arguments (we know all the types we take for callInputs, but the modelInputs are for the most part not touched at all and parsed straight to the appropriate diffusers pipeline) - instead we’ll just take a long JSON string with everything, which should be ok.
So, the short answer is this is very possible, but unfortunately I’m a bit pressured for time at the moment and can’t say when I’ll have a chance to work on this. I’ve added it to my list though and will keep you posted here. If it’s something you’d like to try on your own, I’ll be around for guidance. I’ve opened a new topic [WIP] Can we get this working on Replicate? where we can discuss further
Hi, @gadicc I am trying to work with replicate but the cog file is having issues with docker diffusers. Thanks for the suggestion that we need to bypass those features. I am trying with replicate and pipeline.ai and will add details if either work.