DeepSeek-based Models
Collection
3 items
•
Updated
4, 5 and 8-bit GGUF models for CPU+GPU inference
Use the following dataset to fine-tune deepseek-ai/deepseek-coder-6.7b in order to improve the model's reasoning and planning abilities.
context window length: 8192 max_tokens > 128 && < 8192
Total 185,193 samples 426 MB
50 samples/T=0.2/MaxTokens=512/Top_P=0.95
Code: https://github.com/uukuguy/speechless
This model accepts the Alpaca instruction format.
For example:
You are an intelligent programming assistant.
### Instruction:
Implement a linked list in C++
### Response:
Metric | Value |
---|---|
humaneval-python |
CodeLlama-34B-Python: 53.29
CodeLlama-34B-Instruct: 50.79
CodeLlama-13B-Instruct: 50.6
CodeLlama-34B: 45.11
CodeLlama-13B-Python: 42.89
CodeLlama-13B: 35.07
0.314188
0.390111
Metric | Value |
---|---|
ARC | |
HellaSwag | |
MMLU | |
TruthfulQA | |
Average |