Inference takes huge time and results are not astonishing

#2
by MonsterMMORPG - opened

I have tested keyword owl

It took over 3 minutes to generate output on core i7 10700F and RTX 3060 computer

the output is : owl, by wlop, artgerm

output is good but definitely very weak

also if you are interested in how to use stable diffusion i have a playlist and 3 tutorial videos so far

https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3

image.png

for the below owl I have used the following input

image.png

Model anything v3

owl, bird owl,animal owl, by ARTIST_NAME, fantasy, intricate, smooth, sharp focus, illustration, intricate, cinematic lighting, highly detailed, octane, digital painting, artstation, concept art, smooth, sharp focus, illustration, vibrant colors, 3d render, cinematic, high quality, amazing, masterpiece, featured on deviantart, artstation

Negative prompt:

woman,female,human,bad anatomy, bw, black and white, ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, signature, cut off, draft, amateur, multiple, gross, weird, uneven, furnishing, decorating, decoration, furniture, text, poor, low, basic, worst, juvenile, unprofessional, failure, crayon, oil, nude, sex, label, thousand hands

Microsoft org

Thanks for your feedback. The regular time on CPU is about 15s. It took 3 minutes because there were too many queries waiting in the queue. We recommend loading the model into local nodes with GPU.

For speed see my edits here:
https://huggingface.co/spaces/microsoft/Promptist/discussions/1/files

Works pretty fast even on my Xeon E5-1650 v4 (from 2016). Much faster than 15 seconds. I get a long completion in ~1.5s. Loading it into the GPU probably isn't necessary unless you want to do thousands.

Thanks for your feedback. The regular time on CPU is about 15s. It took 3 minutes because there were too many queries waiting in the queue. We recommend loading the model into local nodes with GPU.

I tested on my pc here the source code

image.png

For speed see my edits here:
https://huggingface.co/spaces/microsoft/Promptist/discussions/1/files

Works pretty fast even on my Xeon E5-1650 v4 (from 2016). Much faster than 15 seconds. I get a long completion in ~1.5s. Loading it into the GPU probably isn't necessary unless you want to do thousands.

so they use 8 beams but displaying only 1 :D

so they use 8 beams but displaying only 1 :D

Looks like it! Did you try the edits? Using the token indices is also important, since encoding & decoding can mutate the text (and its length). I first ran into that issue with GPT-J-6B.

so they use 8 beams but displaying only 1 :D

Looks like it! Did you try the edits? Using the token indices is also important, since encoding & decoding can mutate the text (and its length). I first ran into that issue with GPT-J-6B.

I just did a test with your changes and it took like 1 sec :d

the result is also better

for owl keyword : owl, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, high definition

Glad to know it's working well for you!

the result is also better

My changes shouldn't have any effect on the quality of the result in the vast majority of cases, the second change just ensures that the only tokens that are decoded are the ones after the prompt. The original code decodes all the tokens and uses the string position to get the generated tokens, which is unreliable.

Microsoft org

For speed see my edits here:
https://huggingface.co/spaces/microsoft/Promptist/discussions/1/files

Works pretty fast even on my Xeon E5-1650 v4 (from 2016). Much faster than 15 seconds. I get a long completion in ~1.5s. Loading it into the GPU probably isn't necessary unless you want to do thousands.

I duplicate the demo at https://huggingface.co/spaces/unilm/Promptist-faster and merge your update.

I also added a note as follows:

Note: This is a version with beam_size=1 while the original demo uses beam_size=8. So there would be a difference in terms of performance, but this demo is much faster. Many thanks to @HughPH for pointing out this improvement.

Sign up or log in to comment