Wondering about a very minified version of docker-diffusers-api

Hey @gadicc, this is something that has been on my mind for quite a while, not sure how to start approaching it, but open to suggestions, critics etc…

I see this docker-diffusers-api is quite a solution, with support for a lot of different features.

As someone starting out in the AI space, it’s always harder when you have to unpack a lot of new concepts like it’s the case here and for that reason I was thinking how could I create a solution that basically:

Training step:

  • Download the base model (in my case it’s SD 1.5)
  • run training on it and uploads the model to S3
  • I would call this V1, then V2 will have more than 1 model, like SD2 perhaps, which will be already downloaded (both V1 and V2) so that call inputs would specify which one needs to run the training

Inference step:

  • launch a new replica on banana
  • download the model by it’s ID from my S3 bucket
  • run a inference steps for different prompts on this model and return the images
  • shutdown (idle timeout) the replica

From my limited knowledge and from the perspective of what I want to achieve, I don’t need/want to worry about:

  • pipelines: honestly I could not even stop to learn what’s their fundamental difference and why I need one or another, whatever works is just fine
  • schedulers: same idea as above, whatever works it’s just fine
  • checkpoint: also not needed at all, I never use the web UI and don’t plan to port anything anywhere
  • precision: don’t need for it to be a config, I want to set it to whatever has best results, maybe I could change to whatever makes things run faster, but definitely can be simplified/hardcoded
  • other features that I’m not sure about: img2img and inpainting pipelines and others alike I don’t see a use for them, unless they are needed internally I would totally cut them out

From the above, does it sound like I’m gonna make my life easier if I go for this approach of making it simpler and less flexible/modular.
I ask that from a few points of view:

  • maybe it’s just my limited knowledge that makes me believe I can maintain things by making something like that, maybe I’ll be forced to deal with those things listed above anyway, but I just really want to use SD from 1.5 to 2, and run different prompts but nothing crazy on top of it.
  • maybe I have the wrong impression that you can make SD be able to create amazing outputs with a one size fits all solution, and when you need to have that one special prompt to output a new style then perhaps it would not be possible without touching some of the above.
  • finally, it makes more sense to me, someone who is learning to start small (like the japanese filosofy ikigai) and then build new features from there. It makes sense to me in the long run to support different features if it comes to that, but really not relevant right now, so it does seems to make sense to build a kind of fixed solution and then make it more robust and flexible as I go.

The gold question is: do you believe that with little help from you and maybe others I could strip down this repo to the minimum version of what I mentioned above, considering that I’ll have little idea of what to cut out/modify vs just use this repo and keep patching it with my own things would obviously be faster?

Hey. Good questions. I don’t think it would be a good idea to minimize the repo though, because you’re still asking for a lot of things that you need but others won’t :sweat_smile: (notably: dreambooth, s3, and runtime downloads, are 3 very big features that most people will never use).

If you wanted to go that route, which I totally get, I’d say rather than minimizing docker-diffusers-api, rather maximize banana’s starter repo. Because then it will be much clearer what’s going on and you’ll only be adding what you directly need. But it’s a lot of work.

Don’t forget that even banana’s super simple starter template is built around diffusers library, which is huuuuuge and covers everything (docker-diffusers-api is just a fancy wrapper around it).

The truth is though, you really don’t need to know / understand all parts of docker-diffusers-api or even diffusers, for the simple use cases. e.g. all the build-arg defaults are enough to get started pretty quick, although I admit it would help to have a default scheduler and pipeline, and that’s something we could add quite easily.

However, these really are things you’ll probably want in the future… the schedulers can have quite a big impact on the quality of certain types of images and models (and of course the latest scheduler, gets great results in just 20 (!) steps instead of 50, so you’re more than halving your image generation time and GPU cost). Pipelines, especially the community pipelines, add amazing features like weighted prompts (if you see everyone using e.g. “girl with ((big eyes))” everywhere, to emphasize the big eyes). But yeah, you shouldn’t need to understand this stuff to get started.

Hope that helps, not sure how clear my answer was, it’s nap time now haha. But P.S., runtime download and management of models (instead of during the build step) is being actively worked on now (it already works for dreambooth models).

P.S. What’s with the Japanese reference, any personal interest? (I’ve been learning Japanese for years (well, slowly on Duolingo at least) and I’m trying to structure my year to spend more time there).

However, these really are things you’ll probably want in the future… the schedulers can have quite a big impact on the quality of certain types of images and models (and of course the latest scheduler, gets great results in just 20 (!) steps instead of 50, so you’re more than halving your image generation time and GPU cost). Pipelines, especially the community pipelines, add amazing features like weighted prompts (if you see everyone using e.g. “girl with ((big eyes))” everywhere, to emphasize the big eyes). But yeah, you shouldn’t need to understand this stuff to get started.

That’s exactly the kind of thing I needed to understand to better evaluate my strategy, thanks a lot for sharing. As I said I never looked up what different diffusers and pipelines do and your explanation was a great eye opener in that sense, much appreciated.

So then I agree with you to stick to the repo as is and build things around it, for one, weighed prompts is totally something I need, and soon! Halving steps is an awesome optimization, I did not even knew about it, lower costs and faster processing is kind of key here to what I plan, so I’m actually happy to stick to it now.

About Japan, haha man… I really like the Japanese culture, I’ve also studied the language but just for a month while I was in vacations, trained a bit Ninjutsu (the traditional ninja martial art), and someone recommended me this book recently Ikigai by Ken Mogi and I’m really loving it, it’s the one things that gave me the most insight into how come the Japanese culture (and obviously it’s people) are the way they are, mainly how they view and deal with their traditions and how they see work, quite unique. I’d love to spend a few months there, must be awesome.

Hey, awesome response… glad I’m getting better at explaining stuff haha (feedback helps!) :sweat_smile: It’s good to follow all the news from all the diffusers releases, but I know its hard; I try when I can to mention the most important stuff in the docker-diffusers-api changelog.

Ok awesome, so one day, we’ll have a gaijin (foreigner) ML compound in Tokyo! Google Roppongi Hills, because that’s where we’ll be staying!

Dude your explanations are great, I specially like how descriptive you are, really helps me.

I just did look it up, looks like a very nice place to stay in Tokyo, hope you enjoy! :slight_smile:

I just checked the last entry you posted on the dev changelogs: Development Releases (`dev` branch) - #8 by gadicc
This last update has some nice things that I know I’ll need and I don’t need to do my workarounds anymore, for one, I had problem with fps16 before and had to disable it. Kudos for error handling updates too, I was meaning to do that so I don’t have to rely on banana logs, it seems that sometimes a few lines just don’t show up.

I’ll share as always feedback with you but it would be nice to know if you see some potential bug for my case and believe I should wait a bit before it’s a bit more stable?

Thanks for all your kind words :blush:

Yeah, those missing log entries were a pain… I wasted hours helping another user with something this morning, all because banana’s logs weren’t showing the error… was immediately obvious once I deployed locally. So, problem solved :raised_hands:

Don’t see any issues with your use case. More fun stuff planned for next week, but nothing you need to wait for to start implementing. I just wasn’t sure on your inference plan, there’s no need to manually launch a new replica… you’ll just do a regular request to banana, it will boot a container if needed, download model from S3*, run inference and return images, and auto shutdown after the regular banana idle-timeut.

*Oh, I guess you weren’t aware of the runtime downloads feature yet… so yes, this is already all here! With more stuff coming, but nothing that will change your workflow.

In short, hop to it soldier :smiley: And we’ll continue to let each other know of any issues that come up.

I’m off for now, chat soon :raised_hands:

1 Like