--- base_model: - TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T datasets: - HuggingFaceH4/ultrachat_200k library_name: peft tags: - ultrachat pipeline_tag: text-generation --- # Model Card for Model ID This is a **quantized adapter** trained on the Ultrachat 200k dataset for the TinyLlama-1.1B Intermediate Step 1431k 3T model. ```python adapter_name = 'iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat' ``` ## Model Details Base model was quantized using BitsAndBytes ```python from bitsandbytes import BitsAndBytesConfig bnb_config = BitsAndBytesConfig( load_in_4bit=True, # Use 4-bit precision model loading bnb_4bit_quant_type="nf4", # Quantization type bnb_4bit_compute_dtype="float16", # Compute data type bnb_4bit_use_double_quant=True # Apply nested quantization ) ``` ### Model Description This is quantized adapters trained on the Ultrachat 200k dataset for the TinyLlama-1.1B Intermediate Step 1431k 3T model. - Finetuned from model : [TinyLlama](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) ### How to use #### Method 1: Direct loading via AutoPeftModel ```python from peft import PeftModel, AutoPeftModelForCausalLM from transformers import pipeline, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0") adapter_name = 'iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat' model = AutoPeftModelForCausalLM.from_pretrained( adapter_name, device_map="auto" ) model = model.merge_and_unload() prompt = """<|user|> Tell me something about Large Language Models. <|assistant|> """ pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer) print(pipe(prompt)[0]["generated_text"]) ``` ### Method 2: direct loading AutoModel ```python model = AutoModelForCausalLM.from_pretrained(adapter_name, device_map="auto" ) prompt = """<|user|> Tell me something about Large Language Models. <|assistant|> """ pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer) print(pipe(prompt)[0]["generated_text"]) ``` #### Method 3: Using peftModel ```python bnb_config = BitsAndBytesConfig( load_in_4bit=True, # Use 4-bit precision model loading bnb_4bit_quant_type="nf4", # Quantization type bnb_4bit_compute_dtype="float16", # Compute dtype bnb_4bit_use_double_quant=True, # Apply nested quantization ) model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T" adapter_name = 'iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat' model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=bnb_config,) model = PeftModel.from_pretrained( model,adapter_name ) tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0") prompt = """<|user|> Tell me something about Large Language Models. <|assistant|> """ inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( input_ids=inputs['input_ids'], temperature=0.7, # Controls randomness: lower = more deterministic top_p=0.9, # Nucleus sampling top_k=50, # Top-K sampling num_return_sequences=1,) for i, output in enumerate(outputs): generated_text = tokenizer.decode(output, skip_special_tokens=True) print(f"--- Generated Sequence {i + 1} ---") print(generated_text) ```