Is there a more precise documentation on cli option ? #1635
Replies: 1 comment
-
Running with --help will list everything but it's pretty much unsorted so: You don't need the quotes around anything that wouldn't normally need them on windows (filenames with spaces, the prompts if they contain characters that require spaces around them). You don't need the = between the arg name and value either as far as I can tell. Your get_schedulers issue sounds like either diffusers isn't installed, it's the wrong version, or something else is broken; all that function does is pull the one GPU-based scheduler available (defined in shark) and the ones from diffusers into an array. Without knowing the exact error I have no idea what the cause is. There's no --model_id command line option unless they've renamed something. The precision defaults to fp16 so you don't need to specify that (in fact specifying fp32 to override it doesn't seem to work in my experience; fp16 should be ~2x as fast on AMD cards but forcing fp32 didn't change generation speed at all). Likewise the default device is vulkan, so you don't need that. PromptingPrompt is just Shark unhelpfully doesn't warn or anything if a command line argument isn't typed correctly, doesn't exist, or the parameter isn't formatted the way it wants. Selecting ModelsCheckpoint is --ckpt_loc [filename] OR --hf_model_id [huggingface repo ID] (if you don't specify it'll load the 512x512 version of SD 2.1) Warning: User-made SD2.0 768 or SD2.1 768 models won't work right or at all (neither does the default huggingface one but it'll pretend to let you use it). Shark forces a fallback to the base 512x512 models on huggingface for both of those and optimizes that code. It also seems to ignore model config yaml files. With the user-made models I tried this resulted in it downloading the gigantic version of CLiP then failing with a size mismatch. With huggingface models you can get it to download the one you actually asked it for using the JSON in the base directory, but it'll apply a tuning file created for the 512x512 model to the higher resolution version that results in catastrophic miscompilation and some kind of nearly pure-computation loop on the GPU that gets it running 200MHz higher than max boost clock (but only using 30% of power limit) on my machine that takes over 10 seconds per iteration, makes the entire Windows UI stutter, and if you actually wait for it to complete, Note that it only supports original LoRA, not LyCORIS or any of the other variants of it, only one at a time, and the LoRA weight cannot be specified. It always uses 0.75 which is a little bit high for some LoRAs, WAY too high for lots of them, and not high enough for a rare few. The default being that high means you're going to be more likely to run into LoRA / model combos that seem really broken when you'd normally just be able to turn the strength down. Shark does it this way because most parameters to the compiled programs are pre-cooked before compilation (except the prompts) the way things are done now, so changing LoRA requires recompiling another 3GB of flatbuffers. Changing their strength would too. Image Count--batch_size X controls how many images are generated simultaneously. It usually slowed things down more than expected on my 7900XTX when I tried it in the past because it pushes the memory boost clock and GPU boost clock to full speed at the same time, the card hits the power limit, and everything downclocks sharply and the frequency hops all over. (Single image generation doesn't max out the memory clock and things stay just under the power limit so computation is full speed). More recent versions of shark have a longer delay between images when batch_count is used for some reason so there might be some advantage to this, but it's broken in the UI so I haven't tested it. Low Memory Options--ondemand only loads the current stage of the model into VRAM and unloads when done; the default is to keep every stage between images which is much faster but might not work if you only have 6GB or something. RecommendationsI'd suggest generating multiple images with --batch_count N if you're running from the command line since startup takes longer than producing a single image on my machine and because VMFB files are ZIP64 archives of multiple smaller files (not compressed) which prevents Windows superfetch / standby memory from working correctly (requires the file to be extracted in memory again, possibly reloaded from disk depending on how badly Python implemented mmap; They don't seem to be aware of how to use the equivalent on Windows last I looked and some files are created as uncacheable temporaries but treated as permanent files on the first run anyway for some unknown reason) then copied to the GPU (again) so you'll incur a huge time penalty between runs. Switching models even if they're already compiled and should still be in standby memory (I have 512GB of ram so for me this is anything in the past couple of weeks in general) in the webUI is slow enough. I suspect python's ZIP64 implementation isn't so hot either. Uncompressed / STOREd files shouldn't take much time to copy elsewhere in memory... although llvm-iree splits the constants out into a single module and each layer of the neural net into another module so it may need to do something silly to this before loading it too. I'd set Don't set |
Beta Was this translation helpful? Give feedback.
-
Hello everyone,
I try for some time to generate from a prompt an image (txt2Img). I succeed to generate a few with a prompt and a negative prompt but I can't succeed to generate one with custom lora or model/checkpoint.
Like doing something like :
.\apps\stable_diffusion\scripts\main.py --app="txt2img" --precision="fp16" --prompt="planetes and stars" --device="vulkan" --negative_prompt="text" --????="anythingV3_fp16.ckpt"
"--????" should be model_id I think but there is some kind of get with "get_schedulers" that throw an exception when using with a custom argument
There is a check after to be sure it's end with ckpt or sentensor, so what am I missing ?
For the lora or vae, how could I use a custom ones ?
Some simple example on the documentation, like for the prompt would be so much clearer.
Like just to recap for some basic feature (batch size, count, width height, vae, lora, model)
If someone has played with that in CLI, It will be so great if it could just give some example with rapid explanation.
Good night and thanks for the work !!!
Beta Was this translation helpful? Give feedback.
All reactions