Edit model card

tangled-llama-t-32k-base-v0.1

logo

A pretrained language model based on the Llama model with about 25M parameters. This model has been trained on 22.1B (22,111,299,936) tokens from more than 3.6M (3,597,088) dataset rows.

This model isn't designed for immediate use but rather for Continued Pretraining and Finetuning on a downstream task. While it can handle a context length of up to 128K (131,072) tokens, it was pretrained with sequences of 2K (2048) tokens.

The objective is to streamline the cognitive or reasoning core, eliminating any redundant knowledge from the model.

loss, val_loss

val_ppl

epoch

learning_rate

lm-evaluation-harness

litgpt evaluate --tasks 'hellaswag,gsm8k,truthfulqa_mc2,mmlu,winogrande,arc_challenge' --out_dir 'evaluate-quick/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
             Tasks                 |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|

|---------------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:| |arc_challenge | 1|none | 0|acc |↑ |0.1971|± |0.0116| | | |none | 0|acc_norm |↑ |0.2423|± |0.0125| |gsm8k | 3|flexible-extract| 5|exact_match|↑ |0.0099|± |0.0027| | | |strict-match | 5|exact_match|↑ |0.0000|± |0.0000| |hellaswag | 1|none | 0|acc |↑ |0.2608|± |0.0044| | | |none | 0|acc_norm |↑ |0.2665|± |0.0044| |mmlu | 2|none | |acc |↑ |0.2451|± |0.0036| | - humanities | 2|none | |acc |↑ |0.2470|± |0.0063| | - formal_logic | 1|none | 0|acc |↑ |0.3254|± |0.0419| | - high_school_european_history | 1|none | 0|acc |↑ |0.2545|± |0.0340| | - high_school_us_history | 1|none | 0|acc |↑ |0.2745|± |0.0313| | - high_school_world_history | 1|none | 0|acc |↑ |0.2194|± |0.0269| | - international_law | 1|none | 0|acc |↑ |0.2231|± |0.0380| | - jurisprudence | 1|none | 0|acc |↑ |0.2685|± |0.0428| | - logical_fallacies | 1|none | 0|acc |↑ |0.2025|± |0.0316| | - moral_disputes | 1|none | 0|acc |↑ |0.2457|± |0.0232| | - moral_scenarios | 1|none | 0|acc |↑ |0.2670|± |0.0148| | - philosophy | 1|none | 0|acc |↑ |0.1865|± |0.0221| | - prehistory | 1|none | 0|acc |↑ |0.2500|± |0.0241| | - professional_law | 1|none | 0|acc |↑ |0.2523|± |0.0111| | - world_religions | 1|none | 0|acc |↑ |0.1871|± |0.0299| | - other | 2|none | |acc |↑ |0.2456|± |0.0077| | - business_ethics | 1|none | 0|acc |↑ |0.3400|± |0.0476| | - clinical_knowledge | 1|none | 0|acc |↑ |0.2113|± |0.0251| | - college_medicine | 1|none | 0|acc |↑ |0.2543|± |0.0332| | - global_facts | 1|none | 0|acc |↑ |0.1800|± |0.0386| | - human_aging | 1|none | 0|acc |↑ |0.1749|± |0.0255| | - management | 1|none | 0|acc |↑ |0.3398|± |0.0469| | - marketing | 1|none | 0|acc |↑ |0.2479|± |0.0283| | - medical_genetics | 1|none | 0|acc |↑ |0.3100|± |0.0465| | - miscellaneous | 1|none | 0|acc |↑ |0.2171|± |0.0147| | - nutrition | 1|none | 0|acc |↑ |0.2647|± |0.0253| | - professional_accounting | 1|none | 0|acc |↑ |0.2270|± |0.0250| | - professional_medicine | 1|none | 0|acc |↑ |0.2978|± |0.0278| | - virology | 1|none | 0|acc |↑ |0.3133|± |0.0361| | - social sciences | 2|none | |acc |↑ |0.2584|± |0.0079| | - econometrics | 1|none | 0|acc |↑ |0.2193|± |0.0389| | - high_school_geography | 1|none | 0|acc |↑ |0.2677|± |0.0315| | - high_school_government_and_politics| 1|none | 0|acc |↑ |0.2435|± |0.0310| | - high_school_macroeconomics | 1|none | 0|acc |↑ |0.2538|± |0.0221| | - high_school_microeconomics | 1|none | 0|acc |↑ |0.2647|± |0.0287| | - high_school_psychology | 1|none | 0|acc |↑ |0.2679|± |0.0190| | - human_sexuality | 1|none | 0|acc |↑ |0.3435|± |0.0416| | - professional_psychology | 1|none | 0|acc |↑ |0.2190|± |0.0167| | - public_relations | 1|none | 0|acc |↑ |0.2091|± |0.0390|
| - security_studies | 1|none | 0|acc |↑ |0.2980|± |0.0293| | - sociology | 1|none | 0|acc |↑ |0.2836|± |0.0319| | - us_foreign_policy | 1|none | 0|acc |↑ |0.3000|± |0.0461| | - stem | 2|none | |acc |↑ |0.2287|± |0.0075| | - abstract_algebra | 1|none | 0|acc |↑ |0.2100|± |0.0409| | - anatomy | 1|none | 0|acc |↑ |0.2000|± |0.0346| | - astronomy | 1|none | 0|acc |↑ |0.2434|± |0.0349| | - college_biology | 1|none | 0|acc |↑ |0.3333|± |0.0394|
| - college_chemistry | 1|none | 0|acc |↑ |0.3000|± |0.0461|
| - college_computer_science | 1|none | 0|acc |↑ |0.2600|± |0.0441|
| - college_mathematics | 1|none | 0|acc |↑ |0.3100|± |0.0465|
| - college_physics | 1|none | 0|acc |↑ |0.2353|± |0.0422|
| - computer_security | 1|none | 0|acc |↑ |0.2300|± |0.0423|
| - conceptual_physics | 1|none | 0|acc |↑ |0.2085|± |0.0266|
| - electrical_engineering | 1|none | 0|acc |↑ |0.2621|± |0.0366|
| - elementary_mathematics | 1|none | 0|acc |↑ |0.2011|± |0.0206|
| - high_school_biology | 1|none | 0|acc |↑ |0.2097|± |0.0232|
| - high_school_chemistry | 1|none | 0|acc |↑ |0.2217|± |0.0292|
| - high_school_computer_science | 1|none | 0|acc |↑ |0.2300|± |0.0423|
| - high_school_mathematics | 1|none | 0|acc |↑ |0.1926|± |0.0240|
| - high_school_physics | 1|none | 0|acc |↑ |0.2318|± |0.0345|
| - high_school_statistics | 1|none | 0|acc |↑ |0.1806|± |0.0262|
| - machine_learning | 1|none | 0|acc |↑ |0.2857|± |0.0429|
|truthfulqa_mc2 | 2|none | 0|acc |↑ |0.4880|± |0.0161|
|winogrande | 1|none | 0|acc |↑ |0.5185|± |0.0140|

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.2451 ± 0.0036
- humanities 2 none acc 0.2470 ± 0.0063
- other 2 none acc 0.2456 ± 0.0077
- social sciences 2 none acc 0.2584 ± 0.0079
- stem 2 none acc 0.2287 ± 0.0075
litgpt evaluate --tasks 'leaderboard' --out_dir 'evaluate-leaderboard/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
Tasks Version Filter n-shot Metric Value Stderr
leaderboard N/A
- leaderboard_bbh N/A
- leaderboard_bbh_boolean_expressions 1 none 3 acc_norm 0.4600 ± 0.0316
- leaderboard_bbh_causal_judgement 1 none 3 acc_norm 0.5027 ± 0.0367
- leaderboard_bbh_date_understanding 1 none 3 acc_norm 0.1720 ± 0.0239
- leaderboard_bbh_disambiguation_qa 1 none 3 acc_norm 0.2960 ± 0.0289
- leaderboard_bbh_formal_fallacies 1 none 3 acc_norm 0.4880 ± 0.0317
- leaderboard_bbh_geometric_shapes 1 none 3 acc_norm 0.0000 ± 0
- leaderboard_bbh_hyperbaton 1 none 3 acc_norm 0.5160 ± 0.0317
- leaderboard_bbh_logical_deduction_five_objects 1 none 3 acc_norm 0.2000 ± 0.0253
- leaderboard_bbh_logical_deduction_seven_objects 1 none 3 acc_norm 0.1480 ± 0.0225
- leaderboard_bbh_logical_deduction_three_objects 1 none 3 acc_norm 0.3160 ± 0.0295
- leaderboard_bbh_movie_recommendation 1 none 3 acc_norm 0.2360 ± 0.0269
- leaderboard_bbh_navigate 1 none 3 acc_norm 0.4680 ± 0.0316
- leaderboard_bbh_object_counting 1 none 3 acc_norm 0.0480 ± 0.0135
- leaderboard_bbh_penguins_in_a_table 1 none 3 acc_norm 0.1918 ± 0.0327
- leaderboard_bbh_reasoning_about_colored_objects 1 none 3 acc_norm 0.1440 ± 0.0222
- leaderboard_bbh_ruin_names 1 none 3 acc_norm 0.2360 ± 0.0269
- leaderboard_bbh_salient_translation_error_detection 1 none 3 acc_norm 0.1360 ± 0.0217
- leaderboard_bbh_snarks 1 none 3 acc_norm 0.5225 ± 0.0375
- leaderboard_bbh_sports_understanding 1 none 3 acc_norm 0.4560 ± 0.0316
- leaderboard_bbh_temporal_sequences 1 none 3 acc_norm 0.2960 ± 0.0289
- leaderboard_bbh_tracking_shuffled_objects_five_objects 1 none 3 acc_norm 0.2120 ± 0.0259
- leaderboard_bbh_tracking_shuffled_objects_seven_objects 1 none 3 acc_norm 0.1840 ± 0.0246
- leaderboard_bbh_tracking_shuffled_objects_three_objects 1 none 3 acc_norm 0.3160 ± 0.0295
- leaderboard_bbh_web_of_lies 1 none 3 acc_norm 0.5200 ± 0.0317
- leaderboard_gpqa N/A
- leaderboard_gpqa_diamond 1 none 0 acc_norm 0.2172 ± 0.0294
- leaderboard_gpqa_extended 1 none 0 acc_norm 0.2454 ± 0.0184
- leaderboard_gpqa_main 1 none 0 acc_norm 0.2478 ± 0.0204
- leaderboard_ifeval 3 none 0 inst_level_loose_acc 0.1727 ± N/A
none 0 inst_level_strict_acc 0.1559 ± N/A
none 0 prompt_level_loose_acc 0.0832 ± 0.0119
none 0 prompt_level_strict_acc 0.0795 ± 0.0116
- leaderboard_math_hard N/A
- leaderboard_math_algebra_hard 1 none 4 exact_match 0.0000 ± 0
- leaderboard_math_counting_and_prob_hard 1 none 4 exact_match 0.0000 ± 0
- leaderboard_math_geometry_hard 1 none 4 exact_match 0.0000 ± 0
- leaderboard_math_intermediate_algebra_hard 1 none 4 exact_match 0.0000 ± 0
- leaderboard_math_num_theory_hard 1 none 4 exact_match 0.0000 ± 0
- leaderboard_math_prealgebra_hard 1 none 4 exact_match 0.0000 ± 0
- leaderboard_math_precalculus_hard 1 none 4 exact_match 0.0000 ± 0
- leaderboard_mmlu_pro 0.1 none 5 acc 0.1135 ± 0.0029
- leaderboard_musr N/A
- leaderboard_musr_murder_mysteries 1 none 0 acc_norm 0.5240 ± 0.0316
- leaderboard_musr_object_placements 1 none 0 acc_norm 0.2734 ± 0.0279
- leaderboard_musr_team_allocation 1 none 0 acc_norm 0.3000 ± 0.0290
litgpt evaluate --tasks 'bbh_zeroshot,bbh_fewshot,bbh_cot_fewshot,bbh_cot_zeroshot' --out_dir 'evaluate-bigbenchhard/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
litgpt evaluate --tasks 'mmlu,mmlu_pro' --out_dir 'evaluate-mmlu/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
litgpt evaluate --tasks 'arc_challenge,boolq,gpqa,hellaswag,openbookqa,piqa,truthfulqa_mc2,winogrande' --out_dir 'evaluate-reasoning/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
litgpt evaluate --tasks 'mmlu_multilingual,mgsm' --out_dir 'evaluate-multilinguals/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
litgpt evaluate --tasks 'gsm8k,mathqa' --out_dir 'evaluate-math/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.0099 ± 0.0027
strict-match 5 exact_match 0.0000 ± 0.0000
mathqa 1 none 0 acc 0.2121 ± 0.0075
none 0 acc_norm 0.2114 ± 0.0075
litgpt evaluate --tasks 'wikitext,qasper' --out_dir 'evaluate-long/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
Downloads last month
32
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train tangledgroup/tangled-llama-t-128k-base-v0.1