I’m deploying the dev branch of this repo directly to banana, used the test script as an example to look at how the call to the model should be made, and created a script based on that, where I have the same model and call inputs, grab the images from a folder in the same way the test script does.
Here is the log on banana dashboard: 2022-11-28T05:08:58.000Z 2022-11-28 05:09:04.808803 {'type': 'init', 'status': ' - Pastebin.com
Here are the outputs on my terminal:
RUN finished with result:
{'id': '56a8b867-d1e4-4451-aad1-edbe6aba2449', 'message': '', 'created': 1669571311, 'apiVersion': '28 July 2022', 'modelOutputs': [{'test': '{\n "modelInputs": {\n "instance_prompt": "a photo of sks dog",\n "instance_images": [\n "/9j/4A..."\n ]\n },\n "callInputs": {\n "MODEL_ID": "runwayml/stable-diffusion-v1-5",\n "PIPELINE": "StableDiffusionPipeline",\n "SCHEDULER": "DDPMScheduler",\n "train": "dreambooth"\n }\n}'}]}
*not sure if the last run call was the one above or the one below*
RUN finished with result:
{'id': '596b736c-bbf4-4284-bbb9-c99355ff606a', 'message': 'success', 'created': 1669609220, 'apiVersion': '28 July 2022', 'modelOutputs': [{'done': True, '$timings': {'init': 7393, 'inference': 845194, 'training': 832903, 'upload': 0}}]}
START finished with result:
call_5cd8910c-79ac-4daa-996f-b1a9635aff1c
check:
{'id': 'a7ddc116-587b-41da-bf09-c3015c19155a', 'message': 'running', 'created': 1669612220, 'apiVersion': '28 July 2022', 'modelOutputs': None}
check:
{'id': '25c7bb8f-5053-4a55-8251-6053a52fb5ef', 'message': 'running', 'created': 1669612861, 'apiVersion': '28 July 2022', 'modelOutputs': None}
check:
{'id': '2de6e7d0-a119-43ae-9a1a-c89271cdd289', 'message': 'success', 'created': 1669613655, 'apiVersion': '28 July 2022', 'modelOutputs': [{'done': True, '$timings': {'init': 6718, 'inference': 847500, 'training': 831806, 'upload': 0}}]}
I was really just experimenting, not worrying too much about the outcomes, just really trying to make the model run correctly for the first time and check the results
From all the above, I would say that by looking at the banana dashboard logs on the pastebin link, the model ran, since the step count started to increment, but apparently they got truncated (or perhaps not, I have no previous reference of running it)
After seeing the steps I was expecting to just wait some time and be able to cal the check api, which I did as you can see the plain logs above, and the status change from running to success, from that, what is the upload=0 telling me? Would that be the images that were generated that are about to be uploaded? Or is it the model that just go trained and supposed to be uploaded to S3?
I left it all as is and went to bed, this morning I checked the s3 bucket again and nothing was uploaded.
To my surprise after calling the check API with the same call id as before I got this error now, is it supposed to happen? :
check:
Exception: inference server error: taskID does not exist: task_1d298fc1-bb1e-4be9-a2ab-f1457f3678c5. This is a general inference pipeline error, and could be due to:
So, its not clear to me if things really finished running or not. Is anyone able to check one of those Ids and give me some info?
A few questions:
- Where am I supposed to get the model generated images after the inference ran?
- I was under the impression that just calling the run API does not really run everything, so I called the start API too, coincidentally or not it seems that only by calling start API things really started running, also calling start I finally got a call ID so I could call the check API.
- assuming my credentials were all setup correctly and all that, should the trained model be uploaded to my S3 bucket? If there was an error there, where should it show logs? On dashboard?
Right now I’m confused as to if I should just really call the run API and expect it all to run from start to finish based only on that single call or if I should call run and start APIs.
Is there anything else I’m missing?
Again, what about the generated images? Where should they go?
Thanks!