--- license: apache-2.0 datasets: - argilla/distilabel-intel-orca-dpo-pairs base_model: sethuiyer/Chikuma_10.7B library_name: transformers pipeline_tag: text-generation tags: - dpo --- # Chikuma_10.7B - V2 (Enhanced with DPO)

Chikuma

This model is the **DPO fine tuned version** of [Chikuma_10.7B](https://huggingface.co/sethuiyer/Chikuma_10.7B), which was a depth upscaled merge of: * [sethuiyer/SynthIQ-7b](https://huggingface.co/sethuiyer/SynthIQ-7b) * [openchat/openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) The name "Chikuma" is inspired by the [Chikuma River](https://en.wikipedia.org/wiki/Shinano_River), the longest in Japan, known for its continuous flow and meandering path. This metaphorically represents the model's depth, fluidity, and adaptability in processing and understanding language. # Dataset used for Fine Tuning Dataset: `/argilla/distilabel-intel-orca-dpo-pairs` The dataset was roughly ~3000 samples but they were high quality (according to the chosen_score). The following filters were applied to the original dataset: ```python dataset = dataset.filter( lambda r: r["status"] != "tie" and r["chosen_score"] >= 8 and not r["in_gsm8k_train"] ) ``` # Chat Template The chat template for Chikuma_10.7B - V2 is a modified version of ChatML, optimized for improved interaction and engagement: ``` <|im_start|>GPT4 Correct system: {system} Always use <|end_of_turn|> when you want to end the answer. <|im_end|> <|im_start|>GPT4 Correct user: {user}<|im_end|> <|im_start|>GPT4 Correct Assistant: {asistant}<|im_end|> ``` ## Nous Benchmark Evaluation | Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average | |-------------------------------|---------|---------|------------|----------|---------| | SynthIQ-7b | 42.67 | 73.71 | 56.51 | 44.59 | 54.37 | | openchat/openchat-3.5-0106 | 44.17 | 73.72 | 52.53 | 44.4 | 53.71 | | Chikuma_10.7B | 42.41 | 73.41 | 56.69 | 43.5 | 54.00 | | **distilabled_Chikuma_10.7B** | **42.77** | **73.81** | **58.83** | **44.83** | **55.06** | # OpenLLM Leaderboard | Benchmark Name | Performance | |----------------|-------------| | ARC | 66.38 | | HellaSwag | 85 | | MMLU | 65.27 | | TruthfulQA | 58.83 | | Winogrande | 78.77 | | GSM8K | 63.68 | | **Average** | **69.65** | ### Training Environment - Hardware: Single A100 80GB GPU in a runpod, utilized for approximately 1.5 hours. - Training Script: Accessible via [Google Colab Notebook](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing). Special thanks to [mlabonne](https://huggingface.co/mlabonne) for providing the template. ## Usage ```python # Format prompt from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained(new_model) # Create pipeline pipeline = transformers.pipeline( "text-generation", model=new_model, tokenizer=tokenizer, device="cuda" ) # Generate text message = [ {"role": "system", "content": "You are a helpful assistant chatbot."}, {"role": "user", "content": "Who invented LLMs?"} ] prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False) sequences = pipeline( prompt, max_new_tokens=512 ) print(sequences[0]['generated_text']) ``` ## Acknowledgements A heartfelt appreciation goes to the vibrant open-source community, particularly: * The Intel team for publishing a great open dataset and show how well it worked in the first place * Teknium and NousResearch for their awesome work and models. * Maxime for sharing such great resources. * Argilla for publishing argilla/distilabel-intel-orca-dpo-pairs