Run diffusers-api on

This is a sub-topic of Running on other cloud providers . See that post for why you might want to do this, and info and examples for other providers too.

This guide is a work in process.

Intro (referral link) prices are “8 times cheaper” than other cloud providers. They have both a server (“pod”) and serverless offering. Signing up with the referral link directly supports docker-diffusers-api development.

Sample pricing:

  • Serverless (scale-to-zero)

    • RTX A4000, 16GB, $0.00024 / second
    • RTX A5000 / 3090, 24GB, $0.00032 / second
  • Pods (long running)

    • 1x RTX 3090, 24GB, 0.440/hr
    • 1x A100, 80GB, $2.090/hr
    • and more (secure cloud vs community pricing)
    • Storage: $0.10/GB/month on running pods, $0.020/GB/month on stopped


RunPod’s “Serverless AI” is currently in a closed beta. There is some general information that is publicly available, but for access, you should use the contact button on that page to request access.

1. Create a docker container

  1. Clone
  2. Set any necessary environment variables (see and run e.g.
$ ./ -t user/runpod:sd-v2-1-512 \
  --build-arg MODEL_ID="stabilityai/stable-diffusion-2-1-base"

(or just use docker build as you see fit)

Your model will be downloaded at build time and be included in your final container for optimized cold starts.

Make sure HF_AUTH_TOKEN is set and that you’ve accepted the terms (if any) for your model on HuggingFace. Until the upcoming release, S3-compatible storage credentials are REQUIRED and a safetensors version of your model will be saved there

  1. Upload to your repository of choice, e.g.
$ docker push user/runpod:sd-v2-1-512

2. Create a Serverless AI Template

  1. Go to
  3. For CONTAINER IMAGE, give the details of your container above.

3. Create an API

  1. Go to
  2. Click NEW API
  3. For TEMPLATE, choose the template you created above.

4. Using / Testing

You need:

  1. Your MODEL key from the previous section (right below your API name, it looks like "mjxhzmtywlo34j")
  2. Your API key from (create a new one if you haven’t done so already, and store it somewhere safe; you need the full “long” string like "Z2KDP8CSD5ZNNLIHIO0D3PY4NVI0AJCB9106XTNM")

Using our build in file with the --runpod option and appropriate environment variables, e.g.

$ python --runpod txt2img
Running test: txt2img
    "modelInputs": {
        "prompt": "realistic field of grass",
        "num_inference_steps": 20
    "callInputs": {}
<Response [200]>
{'id': 'd08d30b3-5536-4273-b09a-ba03304bd5d7', 'status': 'IN_QUEUE'}
Request took 19.9s (inference: 4.8s, init: 3.1s)
Saved /home/dragon/www/banana/banana-sd-base/tests/output/txt2img.png

    "$mem_usage": 0.7631033823971289,
    "$meta": {
        "MODEL_ID": "stabilityai/stable-diffusion-2-1-base",
        "PIPELINE": "StableDiffusionPipeline",
        "SCHEDULER": "DPMSolverMultistepScheduler"
    "$timings": {
        "inference": 4836,
        "init": 3060
    "image_base64": "[512x512 PNG image, 429.8KiB bytes]"

Pod (Long running)

  1. Go to
  2. Pick SECURE CLOUD or COMMUNITY CLOUD and pick GPU, etc.
  3. For TEMPLATE, type docker-diffusers-api
  4. Container disks, 5GB, Volume Disk, depends on # of models.
  5. Finish up, click on the link to “My Pods” and watch the setup complete.

Using / testing

  1. Click on CONNECT above, Connect via HTTP, and copy the URL, like
$ export TEST_URL=""
$ python txt2img \
  --call-arg MODEL_ID="stabilityai/stable-diffusion-2-1-base"
  --call-arg MODEL_PRECISION="fp16"
  --call-arg MODEL_URL="s3://"

Running test: txt2img
    "modelInputs": {
        "prompt": "realistic field of grass",
        "num_inference_steps": 20
    "callInputs": {
        "MODEL_ID": "stabilityai/stable-diffusion-2-1-base",
        "MODEL_PRECISION": "fp16",
        "MODEL_URL": "s3://"

# Note: this includes downloading the model for the first time
# "Inference" includes loading the model for the first time
Request took 43.6s (init: 0ms, inference: 6.3s)
Saved /home/dragon/www/banana/banana-sd-base/tests/output/txt2img.png

    "$meta": {
        "PIPELINE": "StableDiffusionPipeline",
        "SCHEDULER": "DPMSolverMultistepScheduler"
    "image_base64": "[512x512 PNG image, 429.0KiB bytes]",
    "$timings": {
        "init": 0,
        "inference": 6251
    "$mem_usage": 0.7631341443491532

# 2nd request onwards (or restart pod after volume download)
Request took 4.6s (inference: 2.4s)

TODO, links to how model cache works, S3-compatible storage, etc.


excellent work. I didn’t know there was such a great open-source community.

1 Like

Thanks! That’s exactly what I needed :slight_smile:

1 Like

Thanks, @Research_Jang and @Martin_Rauscher. Let me know if you have any issues as the guide is still super new and a work-in-progress, and the docker images used aren’t actually official releases yet (everything does seem to work though :)).

The image seems to assume that the precision is also a branch on HF. When I run ./ --build-arg MODEL_ID=wavymulder/portraitplus -t mytag I get

#7 5.649 Revision Not Found for url:
#7 5.649 Invalid rev id: fp16

On what branch is the image based on, so I can try to fix this?

Yes, you’re right. This has been the case with all the stable diffusion models on huggingface until now. But even huggingface are deprecating “revision” now. I’d love to see what they do in the end before we decide how to handle it, but I have been thinking of at least providing a separate MODEL_REVISION arg too.

The current image is based on the split branch. I just need to finish up some new CI code before I can make a few more changes and merge and continue work in dev branch.

Hey @Martin_Rauscher, did you have any luck with this?

I just pushed 1.2.0 / latest tag which now clearly separates:

  • MODEL_ID - unchanged
  • HF_MODEL_ID - defaults to MODEL_ID

and have updated GitHub - kiri-art/docker-diffusers-api-runpod: Docker template for running docker-diffusers-api on (there’s an example in the README too for AnimeEverything v3 which uses MODEL_REVISION="diffusers").

It’s not well tested :confused: Yet!

I’m away with slow internet so hard to do full tests. I’ve also created an awesome new CI setup, but still have a lot more tests to add to it (but at least the structure is all ready… with automated semver releases).

Let me know how you go though.

Edit: also, S3 is still required - I’ll work on that next - but in anticipation of that fix, you now need to manually specify MODEL_URL="s3://" for everything to work. You’ll have to just remind me if you’re using a pod or serverless. This will all be sorted out by final official release, this is all still very alpha… will merge to dev branch (!) when it’s a bit more stable and then eventually main branch.

The cold start on RunPod is like 3 to 5 minutes. Is that normal? From the log files I can tell it downloads the model but honestly is that needed every time? Banana has 10 second cold boots, 3 minutes is not acceptable.

I was also not able to get my own huggingface model to work, only the default SD21.

Welcome, @jochemstoel! :grin:

No, that doesn’t sound normal at all. We’re talking about RunPod’s serverless, offering, right? Assuming you bundled the model up in your build with the docker-diffusers-api-runpod template, the downloading you saw in the logs was probably the downloading of the complete docker image (which includes the model), not the model itself.

You would definitely have seen this if you tried testing the API very soon after setting it up, as it would have needed to be downloaded the docker image for the first time. To my understanding, RunPod will also pre-download the full image to a number of machines to prime them for future quick starts.

For some more up-to-date results, I created a fresh template / API using the guide above, ran the test script 25 times (occasionally waiting for the container to spin down to measure cold starts), and here were some of the outputs:

# Running on a 24GB "RTX A5000 or RTX 3090"
# NB: I'm currently in South Africa, with higher latency!

# Cold starts
Request took 16.4s (inference: 3.1s, init: 3.0s) # best
Request took 19.9s (inference: 3.6s, init: 3.5s) # common

# Warm starts
Request took 6.3s (inference: 2.0s) # best
Request took 11.5s (inference: 1.9s) # common

# "Freezing" start when image wasn't downloaded yet
# Occured 3 times during 25 tests.
Request took 121.8s (inference: 3.1s, init: 3.1s)

I must admit, I did occasionally still get these “freezing” starts even a few minutes after everything was up and running… so that might be a bug. I haven’t been on the RunPod discord for a while so I’m not sure if their serverless running is out of closed beta yet. But could be worth checking what typical timings are like these days.

If you can let me know what wasn’t working I’ll be happy to take a look. Logs, errors, etc.