What is DS, CL, S-DS and S-CL
What are these things and how are models affected by them, which suits best for function calling with what prompt?
As refer to their paper, I think that DS means that model is fined tune from DeepSeek base model, and CL is fined tune from CodeLLama base model
yeah, CL and DS are quite clear. But what "S-" stand for? ;)
If I understand this tweet (xeet?) correctly, it's a Model additionally trained on Magicoder-Evol-Instruct-110K data set?
So:
- Magicoder-CL-7B = CodeLlama 7b fine-tuned on Magicoder-OSS-Instruct-75K data set
- Magicoder-S-CL-7B = CodeLlama7b fine-tuned on ise-uiuc/Magicoder-OSS-Instruct-75K and Magicoder-Evol-Instruct-110K data sets?
- Magicoder-CL-7B = DeepSeek 7b fine-tuned on Magicoder-OSS-Instruct-75K data set
- Magicoder-S-CL-7B = DeepSeek fine-tuned on ise-uiuc/Magicoder-OSS-Instruct-75K and Magicoder-Evol-Instruct-110K data sets?
Is that correct? I may add this to Docs if you want.
Can you also confirm the minimum V-RAM required to run specific models?
I tried to load the Gradio demo but got CUDA out of memory. Tried to allocate 32.00 MiB.
- so it requires much more than I was expecting :D
Sorry, my bad... It's already in README... :P
https://github.com/ise-uiuc/magicoder/blob/main/README.md#-dataset
Magicoder-OSS-Instruct-75K: generated through OSS-Instruct using gpt-3.5-turbo-1106 and used to train both Magicoder and Magicoder-S series.
Magicoder-Evol-Instruct-110K: decontaminated and redistributed from theblackcat102/evol-codealpaca-v1, used to further finetune Magicoder series and obtain Magicoder-S models.
But I would add it to the Models table for total clarity (Base Model, Data Sets, and minimum V-RAM required).