File size: 3,337 Bytes
783cdab
8e829da
 
 
 
 
 
 
 
 
 
bbf1e3d
8e829da
783cdab
 
8e829da
783cdab
8e829da
 
 
783cdab
bbf1e3d
783cdab
8e829da
783cdab
 
 
8e829da
c26830a
8e829da
 
c26830a
8e829da
 
 
 
 
c26830a
8e829da
c26830a
 
8e829da
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
language:
- en
license: cc-by-nc-4.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
- lora
- finlang
base_model: unsloth/llama-3-8b-bnb-4bit
---

# Uploaded  model

- **Developed by:** anamikac2708
- **License:** cc-by-nc-4.0
- **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit

This are lora adapeters that are trained on top of llama3-8B model using 2x faster [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library using open-sourced finance dataset https://huggingface.co/datasets/FinLang/investopedia-instruction-tuning-dataset developed for finance application by FinLang Team

This project is for research purposes only. Third-party datasets may be subject to additional terms and conditions under their associated licenses.

## How to Get Started with the Model

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
You can infer the adapters directly using Peft/Unsloth library or you can merge the adapter with the base model and can use it.
Please find an example below using Unsloth:

```python
import torch
from unsloth import FastLanguageModel
from transformers import AutoTokenizer, pipeline
max_seq_length=2048
model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "anamikac2708/Llama3-8b-finetuned-investopedia-Lora-Adapters", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = torch.bfloat16,
        load_in_4bit = False #Make it True if you want to use bitsandbytes 4bit
    )
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
example = [{'content': 'You are a financial expert and you can answer any questions related to finance. You will be given a context and a question. Understand the given context and\n        try to answer. Users will ask you questions in English and you will generate answer based on the provided CONTEXT.\n        CONTEXT:\n        D. in Forced Migration from the University of the Witwatersrand (Wits) in Johannesburg, South Africa; A postgraduate diploma in Folklore & Cultural Studies at Indira Gandhi National Open University (IGNOU) in New Delhi, India; A Masters of International Affairs at Columbia University; A BA from Barnard College at Columbia University\n', 'role': 'system'}, {'content': ' In which universities did the individual obtain their academic qualifications?\n', 'role': 'user'}, {'content': ' University of the Witwatersrand (Wits) in Johannesburg, South Africa; Indira Gandhi National Open University (IGNOU) in New Delhi, India; Columbia University; Barnard College at Columbia University.', 'role': 'assistant'}]
prompt = pipe.tokenizer.apply_chat_template(example[:2], tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.1, top_k=50, top_p=0.1, eos_token_id=pipe.tokenizer.eos_token_id, pad_token_id=pipe.tokenizer.pad_token_id)
print(f"Query:\n{example[1]['content']}")
print(f"Context:\n{example[0]['content']}")
print(f"Original Answer:\n{example[2]['content']}")
print(f"Generated Answer:\n{outputs[0]['generated_text'][len(prompt):].strip()}")
```

## License

Since non-commercial datasets are used for fine-tuning, we release this model as cc-by-nc-4.0.