Official Docs
Build Step - Optimization; or, in short:
app.py
at the root of the repo- a function called
init()
inapp.py
to load your model into the globalmodel
variable.- only one model can be optimized per repo.
If your model can be found as the
model
object in theinit()
function ofapp.py
, the optimizer will be able to find it and recompile it for faster coldboots and faster inference.
Note: only Pytorch models can be optimized.
Community Findings
Optimization is Banana’s “secret sauce” so details are a bit scarce. However, here are some helpful things we’ve managed to figure out as a community, that can lead to the dreaded WARNING: Image Optimization Failed - cold boots may be slow
warning.
-
“It’s not you, it’s us”.
Occasionally Banana’s optimization servers go offline. Unfortunately we still get the above warning regardless, so it’s hard to realise this is the reason why.
It’s helpful to keep a second repo on hand, that’s known to definitely optimize, and redeploy it again to check if optimization is indeed working. Clearly this is not ideal. The issue is known and Banana and working on making it more stable. You should mention this in the #support discord and see if anyone else is having similar problems.
-
Docker Base Image
Changing the docker base image can break optimization. Try stick to one of the following base images known to work (and please report here if you find any more).
-
pytorch/pytorch:1.11.0-cuda11.3-cudnn8-runtime
(and-devel
) pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime
-
-
Multi-line
model=
line in your app.py’sinit()
functionA single-line
model = loadModel(...)
statement will always work. If you split it into multiple lines, it may work fine, but it may also result in very hard to find errors about broken indentation. Something to be aware of.
How does it work?
-
Your image is built according to your Dockerfile, on banana’s build servers.
-
This image is then transferred to the optimization servers (different to the build servers), where your source files are modified / adapted to use the optimized model.
-
Presumably app.py’s
init()
is called, the model is optimized and saved, and then themodel =
line is replaced to instead load the optimized model. -
Notably: the optimization servers have GPU access, while the build servers are CPU only.
-
Let us know if you figure out anything else!