Chat2Eco
Chat2Eco is a merge of the following models using Chat2EcoMerger:
- meta-llama/Llama-2-70b-chat-hf
- mistralai/Mixtral-8x7B-Instruct-v0.1
- google/gemma-7b
- tiiuae/falcon-180B
🧩 Configuration
slices:
- sources:
- model: meta-llama/Llama-2-70b-chat-hf
layer_range: [0, 32]
- model: mistralai/Mixtral-8x7B-Instruct-v0.1
layer_range: [0, 32]
- model: google/gemma-7b
layer_range: [0, 64]
- model: tiiuae/falcon-180B
layer_range: [0, 64]
merge_method: slerp
base_model: meta-llama/Llama-2-70b-chat-hf
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
dtype: bfloat16
💻 Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "retiredcarboxyl/Chat2Eco"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Benchmark Results
Chat2Eco is a major improvement across the board on the benchmarks below compared to the base model, and is the first model to beat the all good benchmarks.
GPT4All:
| Task |Version| Metric |Value | |Stderr|
|-------------|------:|--------|-----:|---|-----:|
|arc_challenge| 0|acc |0.5990|± |0.0143|
| | |acc_norm|0.6425|± |0.0140|
|arc_easy | 0|acc |0.8657|± |0.0070|
| | |acc_norm|0.8636|± |0.0070|
|boolq | 1|acc |0.8783|± |0.0057|
|hellaswag | 0|acc |0.6661|± |0.0047|
| | |acc_norm|0.8489|± |0.0036|
|openbookqa | 0|acc |0.3440|± |0.0213|
| | |acc_norm|0.4660|± |0.0223|
|piqa | 0|acc |0.8324|± |0.0087|
| | |acc_norm|0.8379|± |0.0086|
|winogrande | 0|acc |0.7616|± |0.0120|
Average: 87.25
AGIEval:
| Task |Version| Metric |Value | |Stderr|
|------------------------------|------:|--------|-----:|---|-----:|
|agieval_aqua_rat | 0|acc |0.2402|± |0.0269|
| | |acc_norm|0.2520|± |0.0273|
|agieval_logiqa_en | 0|acc |0.4117|± |0.0193|
| | |acc_norm|0.4055|± |0.0193|
|agieval_lsat_ar | 0|acc |0.2348|± |0.0280|
| | |acc_norm|0.2087|± |0.0269|
|agieval_lsat_lr | 0|acc |0.5549|± |0.0220|
| | |acc_norm|0.5294|± |0.0221|
|agieval_lsat_rc | 0|acc |0.6617|± |0.0289|
| | |acc_norm|0.6357|± |0.0294|
|agieval_sat_en | 0|acc |0.8010|± |0.0279|
| | |acc_norm|0.7913|± |0.0284|
|agieval_sat_en_without_passage| 0|acc |0.4806|± |0.0349|
| | |acc_norm|0.4612|± |0.0348|
|agieval_sat_math | 0|acc |0.4909|± |0.0338|
| | |acc_norm|0.4000|± |0.0331|
Average: 89.05
BigBench:
| Task |Version| Metric |Value | |Stderr|
|------------------------------------------------|------:|---------------------|-----:|---|-----:|
|bigbench_causal_judgement | 0|multiple_choice_grade|0.6105|± |0.0355|
|bigbench_date_understanding | 0|multiple_choice_grade|0.7182|± |0.0235|
|bigbench_disambiguation_qa | 0|multiple_choice_grade|0.5736|± |0.0308|
|bigbench_geometric_shapes | 0|multiple_choice_grade|0.4596|± |0.0263|
| | |exact_str_match |0.0000|± |0.0000|
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.3500|± |0.0214|
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.2500|± |0.0164|
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.5200|± |0.0289|
|bigbench_movie_recommendation | 0|multiple_choice_grade|0.3540|± |0.0214|
|bigbench_navigate | 0|multiple_choice_grade|0.5000|± |0.0158|
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.6900|± |0.0103|
|bigbench_ruin_names | 0|multiple_choice_grade|0.6317|± |0.0228|
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.2535|± |0.0138|
|bigbench_snarks | 0|multiple_choice_grade|0.7293|± |0.0331|
|bigbench_sports_understanding | 0|multiple_choice_grade|0.6744|± |0.0149|
|bigbench_temporal_sequences | 0|multiple_choice_grade|0.7400|± |0.0139|
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2176|± |0.0117|
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1543|± |0.0086|
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.5200|± |0.0289|
Average: 87.45
Benchmark Comparison Charts
GPT4All
AGI-Eval
BigBench Reasoning Test
- Downloads last month
- 10
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.