LoRA inference (including from A1111 / CivitAI)

Admin edit: working example here.

Hey, is it possible to do LoRA inference already? It’s so stupid I think I remember a notification about it where you said it is possible in the dev branch but I can’t find a post anywhere here on kiri.

(or does it not require any changes in the diffusers api repo?)

EDIT: I asked the following on Banana, which is essentially what I want to know:

I did a LoRA training run and now have a .safetensors file. Now what? Can I use this file by providing it as the MODEL_URL environment variable in the docker-diffusers-api repository by gadicc or does LoRA have certain requirements? I want to do inference. I trained it on Replicate but their API for inference has the stupid safety_checker enabled.

serverless-LoRA-inference-pokemon/app.py at b6e70261b44a8e9c25d63eadf9fa171e6287ec6e · lucataco/serverless-LoRA-inference-pokemon · GitHub

No, not yet. This shouldn’t be too difficult to add though (the training will take longer, but ability to use loras on existing models should be quick).

And looks like lucataco is doing great work there, as usual. Thanks for the exact ref which will be helpful.

I have an international flight today but I’ll take a look at this tomorrow and maybe we can get it out to dev same day.

Thanks :pray::raised_hands:

Ok, thanks. I hope so. This is frustrating. :stuck_out_tongue_winking_eye:

Was a bit from my travels but working on this today :slight_smile:

Ok, here’s a first-look in dev.

$ python test.py txt2img \
   `# Common options, but notably:` \
   `# MODEL_ID should match base model that was fine-tuned with LoRA` \
   --call-arg MODEL_ID="runwayml/stable-diffusion-v1-5" \
   --call-arg MODEL_PRECISION="fp16" \
   --call-arg MODEL_URL="s3://" \
   --model-arg prompt="A picture of a sks dog in a bucket" \
   --model-arg seed=1 \
   `# Specify the LoRA model` \
   --call-arg attn_procs="patrickvonplaten/lora_dreambooth_dog_example" \
   `# Optional; specify interpolation of LoRA with base model; 0.0 to 1.0 (default)` \
   --model-arg cross_attention_kwargs='{"scale": 0.5}'

Outputs: (with scale 0.0, 0.5, 1.0, respectively)


  1. Currently only models hosted on huggingface are supported. S3 / URL support will come next.
  2. No way to replace the text_encoder, that will come later (but currently, custom text_encoders are pretty rare… diffusers team still working out how to best deal with them, probably with a separate HF repo for just the text_encoder).


  1. Works with xformers (fix landed on diffusers for this 4 days ago :tada:)
  2. The LoRA can be switched or disabled without reloading the entire base model again.

Feedback welcome :slight_smile: Hope you have a good weekend, I’ll probably continue this early next week based on any feedback.

I will wait until models from URL is supported, to run it on Banana.

Shweet! Can’t promise but very likely this will be out tomorrow sometime.

Is it possible to determine from a .safetensors file what version of Stable Diffusion was used? This might be a useful feature if that information is unknown or convenient when users are uploading their own files.

Unfortunately not, at least not from a plain .safetensors or .bin file. However, when the LoRA is saved with diffusers, you get an adapter_config.json which stores this as base_model_name_or_path (example) - not sure if that’s in the current diffusers release yet or still being worked on.


  • Could contain a directory name instead of a HuggingFace user/repo id.
  • User might choose to send just the safetensors/bin without the other data.
  • If accepting files from users, make sure to only accept safetensors (as the bin files can contain arbitrary code).

File download code is coming along nicely but still needs a bit more work. Couldn’t publish the current dev because no capacity on Lambda to run the prerequisite integration tests :sweat_smile: HTTP downloads work for single files (so probably the S3 code does too), still need some work on the code to download archives (.tar.zst for diffusers format, etc). More updates soon.

You might like knowing that Banana is running on 40GB GPUs now. They haven’t officially announced it yet though but I am a special agent that infiltrated their enterprise.
So you might be able to run your tests on Banana.

Iiiinteresting. We are very honoured to have such an esteemed special agent in the forums here! :smiley:

That will open up a good bunch of fun possibilities I think. But for the automated tests, it’s nice to have a full system where we can quickly relaunch the container with different environment variables, host our own S3-compatible storage, etc. On the whole it works pretty well, but currently I manually specify the GPU type and geographical location in the script; need to make this more flexible in the future for if the requested system is not available.

In any event;

  1. Bumped diffusers to latest version
  2. Fixed a bug with model downloads that showed up in the automated tests.
  3. Tested S3 lora downloads locally and it indeed works as expected (since it goes through our “storage” library anyways).
  4. Skipping the “archive” code for now, since I’m not sure that diffusers team has settled on a final format for it yet. However, when I do lora training code, I’ll of course make sure that we can save/load archives of the data without going through HuggingFace.

Automated tests running now (should be done in about 10m but I need to go), and assuming those pass as expected, there’ll be a new :dev release that you experiment with. Hopefully it will work first time, and let me know if any issues / feedback. Thanks! :raised_hands:

Could you demonstrate for us how exactly to load a LoRA model and do inference?

Yeah, sure. I gave an example using test.py above but the JSON equivalent would be:

   "callInputs": {
     // Typical, common options, but
     // MODEL_ID should match base model that was fine-tuned with LoRA
     "MODEL_ID": "runwayml/stable-diffusion-v1-5",
     "MODEL_PRECISION": "fp16",
     "MODEL_REVISION": "fp16",
     // Specify the LoRA model
     "attn_procs": "patrickvonplaten/lora_dreambooth_dog_example",
   "modelInputs": {
     "prompt": "A picture of a sks dog in a bucket",
     "seed": 1 // To get same pictures as above,
     // Optional; specify interpolation of LoRA with base model; 0.0 to 1.0 (default)
     "cross_attention_kwargs": { "scale": 0.5 },

The “new” options are the attn_procs callInput, and then ability to (optionally) tell the model how to use those weights with cross_attention_kwargs modelInput. It will download the LoRA at runtime (from huggingface in the above example, but an http or s3 URL can be given too for a .bin file (with .safetensors support for diffusers coming soon).

Don’t hesitate to ask about anything that’s not clear so we can get nice, super clear docs for everyone :smiley: This is still a little new for diffusers too so things could change, but currently that’s how it all works.

Meh, it has to be a bin file.
What needs to be done to get my self hosted safetensors to work? Can I help in any way?

We just need the safetensors support from diffusers… shouldn’t need any more changes in docker-diffusers-api (which can already download the necessary files) and it should “just work” as soon they have the support their side and I bump the version.

The PR I linked to previously has the code done, has been approved, but I think is still waiting for final feedback from the team before they merge it. But it looks pretty close, I guess we’re a few days away, give or take :tada:

I was wrong about the above :frowning:

Well, actually, it depends.

  1. :dev release has latest diffusers, with the safetensors support, and a workaround for the regression to load non-safetensors files, BUT:

  2. I couldn’t get it to work with LoRA’s from CivitAI :confused: It seems there are a few different formats LoRA can be in (even in safetensors), and diffusers can’t load it. There are some issues open for this but the direction isn’t clear yet, at least that I could see.

  3. However, I recall your LoRA was trained on a colab somewhere… so it’s possible it might work. Depending on the format. Don’t get your hopes up but its worth a shot.

Important note: it will load as safetensors only if the filename in the URL includes ".safetensors", otherwise you should specify callInput { "attn_procs_from_safetensors": True } to force safetensor loading for other filenames.

The latest diffusers added “limited” support to load LoRAs genereated with A1111 (as are commonly found on CivitAi).

This is now possible with docker-diffusers-api too. It’s not well tested but in the latest :dev, you can now do something like (taking note of callInputs.lora_weights and modelInputs.prompt):

  "callInputs": {
    "MODEL_ID": "NED-v1-22",
    // Model page: https://civitai.com/models/10028/neverending-dream-ned?modelVersionId=64094
    "CHECKPOINT_URL": "https://civitai.com/api/download/models/64094#fname=neverendingDreamNED_v122BakedVae.safetensors",
    "MODEL_PRECISION": "fp16",
    "safety_checker": false,
    "PIPELINE": "lpw_stable_diffusion",
    // This is the important line here, "lora_weights":
    // Model page: https://civitai.com/models/5373/makima-chainsaw-man-lora
    "lora_weights": "https://civitai.com/api/download/models/6244#fname=makima_offset.safetensors",
  "modelInputs": {
    // the important part here is "makima (chainsaw man)" which the LoRA introduces.
    "prompt": "masterpiece, (photorealistic:1.4), best quality, beautiful lighting, (ulzzang-6500:0.5), makima \\(chainsaw man\\), (red hair)+(long braided hair)+(bangs), yellow eyes, golden eyes, ((ringed eyes)), (white shirt), (necktie), RAW photo, 8k uhd, film grain",
    "num_inference_steps": 30,
    "negative_prompt": "(painting by bad-artist-anime:0.9), (painting by bad-artist:0.9), watermark, text, error, blurry, jpeg artifacts, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, artist name, (worst quality, low quality:1.4), bad anatomy",
    "width": 864,
    "height": 1034,
    "seed": 2281759351,
    "guidance_scale": 9

This will download and cache the model on first use. For that reason, it’s important to specify the #fname=makima_offset.safetensors part, so we can check in advance if we’ve previously cached the file without making an additional request every time to check the filename (which differs from the URL and comes in via a Content-Disposition header), and - no less important - understand that the filename ends with .safetensors. Everything after the hash (“#”) is used by docker-diffusers-api only and not sent as part of the HTTP request.


Note: We don’t have the ulzzang-6500 embedding, so it doesn’t look quite as good as the CivitAI example.