How can I make runpod serverless use the optimzation from S3

shao · April 14, 2023, 7:49am

Set up:

I built and uploaded the optimization file on s3

2023-04-14 00:27:36 2525069264 models--my_model_name--fp16.tar.zst

I build the runpod serverless with this repo: docker-diffusers-api-runpod on my Mac WITHOUT a GPU

./build.sh -t xxx \
  --build-arg CHECKPOINT_URL="xxxxxxxx" \
  --build-arg HF_AUTH_TOKEN="" \
  --build-arg MODEL_ID="my_model_name" \
  --build-arg AWS_ACCESS_KEY_ID="" \
  --build-arg AWS_SECRET_ACCESS_KEY="" \
  --build-arg AWS_DEFAULT_REGION="us-west-2" \
  --build-arg AWS_S3_DEFAULT_BUCKET="my_bucket"

I run the test.py

 python test.py txt2img  
--runpod 
--call-arg MODEL_ID="my_model_name"  
--call-arg MODEL_URL="s3://my_bucket/models--my_model_name--fp16.tar.zst"   
--call-arg MODEL_PRECISION="fp16"   
--call-arg MODEL_REVISION="fp16"

The code is working but I do NOT think it is using the optimzation since my response time is mostly like this
Request took 12.1s (inference: 3.0s, init: 1.8s)

please advice. thank you

gadicc · April 14, 2023, 10:37am

Hey @shao.

Firstly, welcome to the forums, and thanks for providing all these details, which was really helpful.

To be honest, 12s total response time sounds really good for a cold start!

The model init time is where you really see the difference in optimization. 1.8s is great. I unfortunately don’t recall runpod’s init time for unoptimized models, but on banana at least, unoptimized ~= 90-120s (!) vs optimized ~= 2.5s.
The difference in request time vs init+inference (i.e. 7.3s in your example above) is the container boot time (+ any network latency). You should be able to see exactly what’s happening in the banana logs.
Subsequent requests should be super fast, just the inference time. But it depends on your “idle timeout” setting (after this time, the container shuts down again, to save you money… you can make the timeout longer (default is 5s) or set a number of “min workers” to keep things past, but you’ll pay more for keeping the containers alive when no one is using them.

The only other thing I notice is that usually we need a triple-slash (///) for S3 URLs, but I think if it didn’t find the model, you would have gotten an error. Anything else in the logs on the runpod side? Also, if you’re using default bucket and model name - which it seems you are - you should be able to just put MODEL_URL="s3://" and it will figure out the full URL for you.

Hope this helps and let me know if anything wasn’t clear or you have any other questions. And thanks for posting your experience on the forums.

shao · April 14, 2023, 2:54pm

Thanks for the quick reply. Appreciate this and the high quality code you have written.

Agree with you that 12s is not bad at all.

I will look at the ///, but as you hinted, this might be already working as intended.