This is an absolute gem! Can't thank you enough.

#4
by pgalko - opened

My OpenAI bill is going to get so much smaller... not sure that the same can be said about my GPU compute bill :-)))

Are you sure this is a good model? Have you tested it? Because according to the results in the leaderboard, this is probably the worst llama model, even worse than 7b... or is the result so bad because it is a specialized model?

bad model.png

It's code-specialized. It's not been properly instruction to be an all-purpose model. We'll have new models coming soon that will address these deficiencies.

It is a completion model trained and finetuned for Python coding. It absolutely excels on HumanEval benchmark where it beats March version of GPT-4. Low ranks on other benchmarks are to be expected. Based on the HumanEval score of 69.5 it is the best OS model out there by a large margin. The closest is WizardCoder 15B with 57. I will have some tests completed by end of today.

It's code-specialized. It's not been properly instruction to be an all-purpose model. We'll have new models coming soon that will address these deficiencies.

Are you planning a release of an instruct model ?

It's code-specialized. It's not been properly instruction to be an all-purpose model. We'll have new models coming soon that will address these deficiencies.

Thanks for answering, I'm looking forward to trying it!

HumanEval for widardcoderpython 34b is 73.2 .... so looks even better ... much better.

Despite the above test results, in practice nothing intelligible could be achieved from the model. This is worse than some models with fewer parameters. Maybe the next version will be better...
By the way, based on the same model 34B, WizardCoder-Python-34B works almost flawlessly.

As a noob, how would I go about downloading, installing and trying this model?

This comment has been hidden

Despite the above test results, in practice nothing intelligible could be achieved from the model. This is worse than some models with fewer parameters. Maybe the next version will be better...
By the way, based on the same model 34B, WizardCoder-Python-34B works almost flawlessly.

I agree the WizardCoder-Python-34B is for now the new benchmark for OS coding models. Even 15B version released a while ago was quite impressive. Phind would be more useful if it was an instruct model , for now it is just a nice experiment. It is kind of OK with shorter prompts, but as soon as you throw something longer at it, it kind of gives up... at least for me.

Thanks Phind for the model. Very helpful. What are the hardware requirements here ?
Right now, I am trying to infer with A100 in colab and it takes forever to infer.
How many A100s or H100s would you recommend using for instant inferences ?(like within 5-10 seconds)

I have rtx 3090.

On CPU I have 4 t/s
With RTX 3090 30 t/s.

Sign up or log in to comment