[FAQ] Optimization

gadicc · November 12, 2022, 2:31pm

Official Docs

Build Step - Optimization; or, in short:

app.py at the root of the repo

a function called init() in app.py to load your model into the global model variable.

only one model can be optimized per repo.

If your model can be found as the model object in the init() function of app.py , the optimizer will be able to find it and recompile it for faster coldboots and faster inference.

Note: only Pytorch models can be optimized.

Community Findings

Optimization is Banana’s “secret sauce” so details are a bit scarce. However, here are some helpful things we’ve managed to figure out as a community, that can lead to the dreaded WARNING: Image Optimization Failed - cold boots may be slow warning.

“It’s not you, it’s us”.

Occasionally Banana’s optimization servers go offline. Unfortunately we still get the above warning regardless, so it’s hard to realise this is the reason why.

It’s helpful to keep a second repo on hand, that’s known to definitely optimize, and redeploy it again to check if optimization is indeed working. Clearly this is not ideal. The issue is known and Banana and working on making it more stable. You should mention this in the #support discord and see if anyone else is having similar problems.
Docker Base Image

Changing the docker base image can break optimization. Try stick to one of the following base images known to work (and please report here if you find any more).
- pytorch/pytorch:1.11.0-cuda11.3-cudnn8-runtime (and -devel)
- pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime
Multi-line model= line in your app.py’s init() function

A single-line model = loadModel(...) statement will always work. If you split it into multiple lines, it may work fine, but it may also result in very hard to find errors about broken indentation. Something to be aware of.

How does it work?

Your image is built according to your Dockerfile, on banana’s build servers.
This image is then transferred to the optimization servers (different to the build servers), where your source files are modified / adapted to use the optimized model.
1. Presumably app.py’s init() is called, the model is optimized and saved, and then the model = line is replaced to instead load the optimized model.
2. Notably: the optimization servers have GPU access, while the build servers are CPU only.

Let us know if you figure out anything else!