File size: 3,600 Bytes
0ab7ab0
4c81d3e
 
 
 
0b752fa
a97d8aa
 
fdba0f4
0ab7ab0
 
 
 
da4d410
0ab7ab0
21541f1
 
 
0ab7ab0
 
 
da4d410
0ab7ab0
da4d410
 
0ab7ab0
da4d410
 
 
 
 
 
8e8b72d
0ab7ab0
da4d410
 
0ab7ab0
21541f1
 
 
 
c2eecdb
21541f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b2b1191
c2eecdb
b2b1191
 
 
 
 
 
 
 
 
 
 
 
 
c2eecdb
 
af9a42b
c2eecdb
af9a42b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
base_model:
- TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
datasets:
- HuggingFaceH4/ultrachat_200k
library_name: peft
tags:
- ultrachat
pipeline_tag: text-generation
---

# Model Card for Model ID

This is quantized adapters trained on the Ultrachat 200k dataset for the TinyLlama-1.1B Intermediate Step 1431k 3T model.

```python
adapter_name = 'iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat'
```

## Model Details

Base model was quantized using  BitsAndBytes

```python
from bitsandbytes import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,                  # Use 4-bit precision model loading
    bnb_4bit_quant_type="nf4",          # Quantization type
    bnb_4bit_compute_dtype="float16",   # Compute data type
    bnb_4bit_use_double_quant=True      # Apply nested quantization
)
```

### Model Description
This is quantized adapters trained on the Ultrachat 200k dataset for the TinyLlama-1.1B Intermediate Step 1431k 3T model.

- Finetuned from model : [TinyLlama](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T)

### How to use

#### Method 1: Direct loading via AutoPeftModel
```python
from peft import PeftModel, AutoPeftModelForCausalLM
from transformers import pipeline, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
adapter_name = 'iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat'
model = AutoPeftModelForCausalLM.from_pretrained(
    adapter_name,
    device_map="auto"
)
model = model.merge_and_unload()

prompt = """<|user|>
Tell me something about Large Language Models.</s>
<|assistant|>
"""

pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
print(pipe(prompt)[0]["generated_text"])

```

### Method 2: direct loading AutoModel

```python
model = AutoModelForCausalLM.from_pretrained(adapter_name,
                                             device_map="auto"
                                             )

prompt = """<|user|>
Tell me something about Large Language Models.</s>
<|assistant|>
"""

pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
print(pipe(prompt)[0]["generated_text"])
```

#### Method 3: Using peftModel

```python

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,  # Use 4-bit precision model loading
    bnb_4bit_quant_type="nf4",  # Quantization type
    bnb_4bit_compute_dtype="float16",  # Compute dtype
    bnb_4bit_use_double_quant=True,  # Apply nested quantization
)

model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
adapter_name = 'iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat'
model = AutoModelForCausalLM.from_pretrained(
          model_name, quantization_config=bnb_config,)

model = PeftModel.from_pretrained(
            model,adapter_name
)

tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")

prompt = """<|user|>
Tell me something about Large Language Models.</s>
<|assistant|>
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(
        input_ids=inputs['input_ids'],
        temperature=0.7,            # Controls randomness: lower = more deterministic
        top_p=0.9,                  # Nucleus sampling
        top_k=50,                   # Top-K sampling
        num_return_sequences=1,)
for i, output in enumerate(outputs):
    generated_text = tokenizer.decode(output, skip_special_tokens=True)
    print(f"--- Generated Sequence {i + 1} ---")
    print(generated_text)

```