Streaming status / pass info *during* inference

Banana streaming status proposal

Rationale

Requests can take a long time and it would be nice to know what’s going on and create a good UX around it.

Proposal

Instead of a single long-poll request, we can use chunked transfer encoding to send multiple JSON objects back until the request finishes. I can think of two good sources:

From Banana.Dev Infrastructure giving status to run container:

{ $banana: "COLD_START", timestamp: 1662900847 }
{ $banana: "WARM_START", 1662900942 }
{ $banana: "QUEUED", position: 1, timestamp, 1662900242 }

From our containers giving status on the inference:

// Server
def init(sendStatus):
  sendStatus({ $model: "INIT", timestamp: 1662900847 });

def inference(model_inputs:dict, sendStatus) -> dict:
  # ...
  sendStatus({ $model: "PROGRESS", progress: 0.8, timestamp: 1662900847 })
  # ...

Client library

Could remain the same without a breaking API change. The existence of an extra statusCallback argument could opt into status updates from the server.

// Client
function statusCallback(status) { console.log(status); }
const out = await banana.run(apiKey, modelKey, modelOpts, statusCallback);