This guide is a work-in-progress (v1 isn’t officially released yet), but you can help improve it! DO NOT USE IN PRODUCTION (unless you’re kiri.art :)).
“How can I help?” See if you can get up and running using the instructions and guides linked below, and let us know what doesn’t work or isn’t clear, so we can fix it for everyone ahead of the official release (around end of Jan).
Breaking Changes
-
New “split” architecture. We now have pre-built docker images that are super easy to deploy and upgrade. See further instructions below, it’s unlikely you’ll want to clone / pull the main repo anymore.
-
Our own optimization with super fast inits. We no longer support banana’s optimization, but ours is faster anyways, and this frees us to explore a number of other optimization strategies for even faster inference moving forwards.
-
PRECISION
build-arg was renamedMODEL_PRECISION
to match call-arg. -
MODEL_PRECISION
andMODEL_REVISION
must now both be explicitly set. This gives a lot more flexibility for Hugging Face repos that don’t follow the convention of them being equal, and, in anycase, diffusers are deprecatingrevision
- see next point). -
Upcoming:
MODEL_VARIANT
will replace the deprecatedMODEL_REVISION
by diffusers. “Revision” required a separate branch on the repo for the desired revision, which worked great when people did it, but mostly just made a mess. For this reason, diffusers team is instead suggesting for variants to be kept in the same repo branch with other files, but with a variation in the filename, and it will only download these files. Watch out for more info in diffusers when this is officially announced. -
Upcoming:
$timings.inference
(andsend()
) will be renamed to$timings.handler
to cover the entire handler (which can do training too, e.g. dreambooth), and a new.inference
will be used to track just inference when inference occurs.
Automated testing w/ semver releases
I’ve spent a loooooot of time building out a new development flow. On every merge to main, in addition to unit testing (run on all branches), a suite of integration tests will be run to ensure all features work as expected in a real environment. If all tests pass, the commit history is analyzed to tag the new release with the appropriate semantic version number, and this is pushed to docker hub.
Deployment
There are a number of guides available for various providers (not all are up-to-date at time of writing… if they still mention cloning the original repo, please nudge us). But in general:
-
Servers: In one line! Call
docker run --gpus all -p 8000:8000 -e MYVAR1="value" -e MYVAR2="etc" gadicc/diffusers-api
(possible environment variables may include your S3 or compatible credentials, for example). This uses the new “runtime downloads” feature by default, and will download models as needed. Optionally adapt-v ~/root-cache:/root/.cache
for persistent storage, if relevant / desired. -
Serverless: See the “build-download” repo (banana, others) or “runpod” repo (for runpod.io) which use the above image but download and storage the model into your image at build time.