Edit model card

llama3.1-8b-spaetzle-v51

This is only a quick test in merging 3 and 3.1 llamas despite a number of differences in tokenizer setup i.a., also motivated by ongoing problems with BOS, looping, etc, with 3.1, esp. with llama.cpp, missing full RoPE scaling yet, etc. Performance is yet not satisfactory of course, which might have a number of causes.

GGUF is (for another test purpose) done with old llama.cpp binary (b2750).

Summary Table

Model AGIEval TruthfulQA Bigbench
llama3.1-8b-spaetzle-v51 42.23 57.29 44.3
llama3-8b-spaetzle-v39 43.43 60.0 45.89

AGIEval Results

Task llama3.1-8b-spaetzle-v51 llama3-8b-spaetzle-v39
agieval_aqua_rat 27.95 24.41
agieval_logiqa_en 38.10 37.94
agieval_lsat_ar 24.78 22.17
agieval_lsat_lr 42.94 45.29
agieval_lsat_rc 59.11 62.08
agieval_sat_en 68.45 71.36
agieval_sat_en_without_passage 38.35 44.17
agieval_sat_math 38.18 40.00
Average 42.23 43.43

TruthfulQA Results

Task llama3.1-8b-spaetzle-v51 llama3-8b-spaetzle-v39
mc1 38.07 43.82
mc2 57.29 60.00
Average 57.29 60.00

Bigbench Results

Task llama3.1-8b-spaetzle-v51 llama3-8b-spaetzle-v39
bigbench_causal_judgement 56.32 59.47
bigbench_date_understanding 69.65 70.73
bigbench_disambiguation_qa 31.40 34.88
bigbench_geometric_shapes 29.81 24.23
bigbench_logical_deduction_five_objects 30.20 36.20
bigbench_logical_deduction_seven_objects 23.00 24.00
bigbench_logical_deduction_three_objects 55.67 65.00
bigbench_movie_recommendation 33.00 36.20
bigbench_navigate 55.10 51.70
bigbench_reasoning_about_colored_objects 66.55 68.60
bigbench_ruin_names 52.23 51.12
bigbench_salient_translation_error_detection 25.55 28.96
bigbench_snarks 61.88 62.43
bigbench_sports_understanding 51.42 53.96
bigbench_temporal_sequences 59.30 53.60
bigbench_tracking_shuffled_objects_five_objects 23.28 22.32
bigbench_tracking_shuffled_objects seven objects 17.31 17.66
bigbench_tracking_shuffled_objects three objects 55.67 65.00
Average 44.30 45.89

(GPT4All run broke.)

🧩 Configuration

models:
  - model: cstr/llama3-8b-spaetzle-v34
    # no parameters necessary for base model
  - model: sparsh35/Meta-Llama-3.1-8B-Instruct
    parameters:
      density: 0.65
      weight: 0.5
merge_method: dare_ties
base_model: cstr/llama3-8b-spaetzle-v34
parameters:
  int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer_source: base

πŸ’» Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "cstr/llama3-8b-spaetzle-v51"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Downloads last month
38
GGUF
Model size
8.03B params
Architecture
llama

16-bit

Inference API
Unable to determine this model's library. Check the docs .