Run diffusers-api on RunPod.io

Welcome, @jochemstoel! :grin:

No, that doesn’t sound normal at all. We’re talking about RunPod’s serverless, offering, right? Assuming you bundled the model up in your build with the docker-diffusers-api-runpod template, the downloading you saw in the logs was probably the downloading of the complete docker image (which includes the model), not the model itself.

You would definitely have seen this if you tried testing the API very soon after setting it up, as it would have needed to be downloaded the docker image for the first time. To my understanding, RunPod will also pre-download the full image to a number of machines to prime them for future quick starts.

For some more up-to-date results, I created a fresh template / API using the guide above, ran the test script 25 times (occasionally waiting for the container to spin down to measure cold starts), and here were some of the outputs:

# Running on a 24GB "RTX A5000 or RTX 3090"
# NB: I'm currently in South Africa, with higher latency!

# Cold starts
Request took 16.4s (inference: 3.1s, init: 3.0s) # best
Request took 19.9s (inference: 3.6s, init: 3.5s) # common

# Warm starts
Request took 6.3s (inference: 2.0s) # best
Request took 11.5s (inference: 1.9s) # common

# "Freezing" start when image wasn't downloaded yet
# Occured 3 times during 25 tests.
Request took 121.8s (inference: 3.1s, init: 3.1s)

I must admit, I did occasionally still get these “freezing” starts even a few minutes after everything was up and running… so that might be a bug. I haven’t been on the RunPod discord for a while so I’m not sure if their serverless running is out of closed beta yet. But could be worth checking what typical timings are like these days.

If you can let me know what wasn’t working I’ll be happy to take a look. Logs, errors, etc.