This guide is a work-in-progress (v1 isn’t officially released yet), but you can help improve it! DO NOT USE IN PRODUCTION (unless you’re kiri.art :)).
“How can I help?” See if you can get up and running using the instructions and guides linked below, and let us know what doesn’t work or isn’t clear, so we can fix it for everyone ahead of the official release (around end of Jan).
New “split” architecture. We now have pre-built docker images that are super easy to deploy and upgrade. See further instructions below, it’s unlikely you’ll want to clone / pull the main repo anymore.
Our own optimization with super fast inits. We no longer support banana’s optimization, but ours is faster anyways, and this frees us to explore a number of other optimization strategies for even faster inference moving forwards.
PRECISIONbuild-arg was renamed
MODEL_PRECISIONto match call-arg.
MODEL_REVISIONmust now both be explicitly set. This gives a lot more flexibility for Hugging Face repos that don’t follow the convention of them being equal, and, in anycase, diffusers are deprecating
revision- see next point).
MODEL_VARIANTwill replace the deprecated
MODEL_REVISIONby diffusers. “Revision” required a separate branch on the repo for the desired revision, which worked great when people did it, but mostly just made a mess. For this reason, diffusers team is instead suggesting for variants to be kept in the same repo branch with other files, but with a variation in the filename, and it will only download these files. Watch out for more info in diffusers when this is officially announced.
send()) will be renamed to
$timings.handlerto cover the entire handler (which can do training too, e.g. dreambooth), and a new
.inferencewill be used to track just inference when inference occurs.
Automated testing w/ semver releases
I’ve spent a loooooot of time building out a new development flow. On every merge to main, in addition to unit testing (run on all branches), a suite of integration tests will be run to ensure all features work as expected in a real environment. If all tests pass, the commit history is analyzed to tag the new release with the appropriate semantic version number, and this is pushed to docker hub.
There are a number of guides available for various providers (not all are up-to-date at time of writing… if they still mention cloning the original repo, please nudge us). But in general:
Servers: In one line! Call
docker run --gpus all -p 8000:8000 -e MYVAR1="value" -e MYVAR2="etc" gadicc/diffusers-api(possible environment variables may include your S3 or compatible credentials, for example). This uses the new “runtime downloads” feature by default, and will download models as needed. Optionally adapt
-v ~/root-cache:/root/.cachefor persistent storage, if relevant / desired.
Serverless: See the “build-download” repo (banana, others) or “runpod” repo (for runpod.io) which use the above image but download and storage the model into your image at build time.