Help running docker-diffusers-api on docker locally

I just rented a google cloud machine to run the docker container locally in order to do some quick tests
Downloaded the repo and built the image, when running the image with the provided command on the readme file I got an error that sanic was not found, very weird.
I looked at the build logs (I don’t have anymore right now but I can reproduce) and it seemed that it did install everything, every step of the dockerfile ran ok without major errors apparently.

Finding this very strange I did enter the container as bash, and pip list indeed did not show sanic, i did install manually just because wtf and then tried to execute the server file manually on the docker bash shell, and to my surprise, now pipelines module could not be found, indeed it was not in pip list either.

Something quite strange happened at some point and I cannot figure out, I hardly ever use docker.

My major question would be, does that seem like a classic problem with docker build installing only a few modules and not others?

  • After writing the above I did another test, still on the docker bash terminal, I just ran the same command that dockerfile uses to install, pip install -r requirements.txt and to my surprise here’s the error:
root@0fe72a6ac9e3:/api# pip install -r requirements.txt 
Collecting sanic==22.6.2
  Using cached sanic-22.6.2-py3-none-any.whl (271 kB)
Collecting transformers==4.22.2
  Using cached transformers-4.22.2-py3-none-any.whl (4.9 MB)
ERROR: Ignored the following versions that require a different python version: 1.8.0 Requires-Python >=3.8,<3.11; 1.8.0rc1 Requires-Python >=3.8,<3.11; 1.8.0rc2 Requires-Python >=3.8,<3.11; 1.8.0rc3 Requires-Python >=3.8,<3.11; 1.8.0rc4 Requires-Python >=3.8,<3.11; 1.8.1 Requires-Python >=3.8,<3.11; 1.9.0 Requires-Python >=3.8,<3.12; 1.9.0rc1 Requires-Python >=3.8,<3.12; 1.9.0rc2 Requires-Python >=3.8,<3.12; 1.9.0rc3 Requires-Python >=3.8,<3.12; 1.9.1 Requires-Python >=3.8,<3.12; 1.9.2 Requires-Python >=3.8; 1.9.3 Requires-Python >=3.8
ERROR: Could not find a version that satisfies the requirement scipy==1.9.3 (from versions: 0.8.0, 0.9.0, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.12.1, 0.13.0, 0.13.1, 0.13.2, 0.13.3, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.16.0, 0.16.1, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 1.0.0, 1.0.1, 1.1.0, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.3.0rc1, 1.3.0rc2, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.4.0rc1, 1.4.0rc2, 1.4.0, 1.4.1, 1.5.0rc1, 1.5.0rc2, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.5.4, 1.6.0rc1, 1.6.0rc2, 1.6.0, 1.6.1, 1.6.2, 1.6.3, 1.7.0rc1, 1.7.0rc2, 1.7.0, 1.7.1, 1.7.2, 1.7.3)
ERROR: No matching distribution found for scipy==1.9.3

Then, obviously to double check, and another surprise:

root@0fe72a6ac9e3:/api# python --version
Python 3.7.13

Looking at the dockerfile is not intuitive as I see no explicit python version being installed or configured, at least to my very limited knowledge.

Could someone give some light how can I get this rolling?

Hey. That is indeed really surprising, for a number of reasons.

I’m going to respond to a few things in the meantime and then try a clean install on a new VM to compare.

Thanks also for the full build logs, which is exactly what I would have asked for… unfortunately I’m going to need you to run it again with --no-cache as I’m not seeing the output of the commands that successfully (“successfully”? :)) ran on your first build.

Dockerfiles on big projects can unfortunately get quite unruly due to docker’s architecture and limitations (which overall are a great advantage, but can be quite painful in a few specific areas).

Re versions, we use conda to set up a virtual python environment and operate within that. You’ll see around line 62 that we create an environment based on python 3.10 (!) and then install the right combination of cudatoolkit and pytorch that xformers has precompiled binaries for.

Anyway, I’ll get back to you soon after I’ve done a fresh install, maybe also with full build log there for you to compare (but please do send me yours with --no-cache so I can take a look too).

Lastly, congrats on taking this fairly big step :raised_hands: Docker is definitely a little painful in the beginning, but as soon as you’ve got a bit more experience with it, it’s an absolute gem in the long term! (Aside from a few painful annoyances and workarounds needed in the Dockerfile for a few specific things).

Here’s an example run on LambaLabs, with full output.

Can you confirm the exact google cloud machine config you chose, and I’ll give it a shot there too?

Machine configuration:

Machine type
CPU platform
Unknown CPU Platform
vCPUs to core ratio 
Custom visible cores 
Display device
Enable to use screen capturing and recording tools

One thing to point out when I selected the GPU type instance I see this:
So I did click on switch and choose:

About the logs, these were from a first run actually but I’ll add the no cache flag and get them again no problem in just a while

Here it is a fresh build log: docker build diffusers 2 -nocache -

And the exact same error when trying to run it:

gabriel_rf0@ubuntu-8vcpu-t4-100gb-nvidia-cuda-1:~/dev/docker-diffusers-api$ docker run -it --gpus all -p 8000:8000 banana-sd python3
Traceback (most recent call last):
  File "", line 6, in <module>
    from sanic import Sanic, response
ModuleNotFoundError: No module named 'sanic'

Oh thanks, this is really helpful… I was actually just busy getting set up on google.

If you run just docker run -it --gpus all -p 8000:8000 banana-sd, without the python3 at the end, does it work?

Oh man! yes it did!!! Sanic just launched, thank you so much for all the help once more Gadi :slight_smile:

This is kind of mind bogglin, how does that make sense? For one, I was trying to understand how come docker would make it work anyway if there is never an explicit command to activate the new conda environment that was created, there is no mention of the conda activate command anywhere but in the dockerfile.
Can I assume that when the image is finished building, it’s put in a kind of hibernation state where the build was when it was completed? Because during build it clearly activated the conda environment

Ok, so:

  1. Awesome! So happy you’re up and running! :tada:

  2. Super sorry about this… you followed the docs exactly and wasted a lot of hours with this. Please accept my humble apology :bowing_man: I just updated the README right now on both dev and main branches to reflect this, so at least have some solace that you’ve helped all future users!

Ok, so, as you guessed, the command there totally used to work until we started relying on new conda virtual environment. Here’s the issue:

  1. There is no way to “activate” a conda environment inside of a Dockerfile, as there’s no real / continuously running shell.

  2. The trick we use, is to define the “shell” as:

SHELL ["/opt/conda/bin/conda", "run", "--no-capture-output", "-n", "xformers", "/bin/bash", "-c"]

This gets applied to every following RUN command, and to CMD at the end. But as we just learnt, it’s not used when manually specifying a different command to run with docker run (which was a useful hack for a historic issue that is no longer relevant).

So to start a new session like that, it would actually be like this:

# diffusers-api is the new name for "banana-sd", now in dev branch
$ docker run -it -p 8000:8000 --gpus all diffusers-api /bin/bash
(base) root@53ef5e4968cb:/api# conda activate xformers
(xformers) root@53ef5e4968cb:/api# 

and then you’re back in the environment. On my system at least, you can see it shows the conda environment in brackets at the beginning (e.g. (base), (xformers))… not sure why it doesn’t do that for you, it might have made this easier to spot.

Anyway, there you have it! Thanks for the patience and we both learnt something new today :raised_hands:

For sure learning a lot everyday in this new journey, sometimes just overwhelming as I’m learning it all at the same time.
No need to apologize, if it wasn’t for you in the first place I would not be here playing around with these things :star_struck:, but I have to confess I’m a bit sad that I cannot move as fast as I can like when I’m coding things I’m already used to.

Can’t wait to break other things and send my new repo back to run on banana serverless. Will keep you posted. I’m nowhere close to stop asking you for help with other ideas :rofl:

Haha awesome. And yes, totally!

You’re not too behind me, btw! This is my person Python project and first ML project. Learning loads every day too. And yeah, often it’s like “mmm how do I do this thing in Python that I do every day with my eyes closed in JavaScript”. Usually manage to find answers quickly on Google / StackOverflow when you know what to look for, but sometimes Python does things strange! (for a JS developer, at least). So totally get you. (You’re also coming from JS?)

Anyway, let’s have us fullstack developers do a small revolution here :grin: This is actually what I loved about banana and this project so much, been wanting to get into ML for years, and now I can, and still make good use my web and devops background.

Dude what zone did you setup your instance in? I’ve tried like 4 already and it’s only after you configure EVERYTHING that it tells you no n1-standard-8 instances are available in that region, and there’s no way to go back and you have to configure everything from scratch again. Google :man_shrugging:

I had to try different zones and regions for 20 minutes until I found one that was free, real pain!
The last one I got setup was us-west-4a.

I see that with google cloud if you shutdown an instance you must make an image out of it because it is very likely that you will need to spin it up in another region.

I was thinking of creating a script with the gcloud CLI to keep trying a list of regions to boot in image, what a pain, the feedback should be instant when you are selecting the region+zone, not when you try to start it, right? Weird UX right there…

Wow, what a pain. But, I’m not surprised at all. Google is really famous for this kind of stuff. I mean, I love them on some things, and hate them on others. This is very typically “google”.