Yup! But after every cold start it will need to download the model again. It’s been a requested feature especially for dreambooth where people are training lots of models and there’s no API to auto-deploy (yet; I believe its landing very soon). There are some use-cases even after that, but it would be much more useful if we had S3-compatible storage at Banana HQ
However, once the model has been downloaded (in a particular container, and until the next cold-boot), it can of course be re-used. Multiple models can be stored on disk, and reloaded into memory if the requested model changes. Could be very useful for those using minimum-replicas, to have one “model” deployed that stays up all the time… vs needing to have minimum-replicas for many different models.
We’ll see what develops. Have further speed improvements planned here too. But “slow” cloud-storage is still currently the most limiting aspect.
Thanks Risko (discord) for reporting a bug in the above code whereby switching models at runtime did not actually work (they did still download though! ).
This has now been fixed. The fix has been backported to the dev branch, but note, currently there is no support for switching model precision at runtime in dev branch. This feature is, however, available in the cloud-cache branch, which I’m still working on, but seems to work well so far (and a few other surprises coming here, but it will be a while until it gets merged).
I’m currently away for a family holiday, I’m still working mornings but development has definitely slowed down. However, runtime downloads and related code is one of the features I’ll be prioritizing. So feel free to still share your experiences here.