--- license: apache-2.0 tags: - merge - Korean - Mistral-7B - LLM --- # QI-mistral-7B-slerp This model is based on the mistral model and merged several DPO fine-tuned models with SLERP. It processes Korean language relatively well, so it is useful when creating various applications. QI-mistral-7B-slerp is a merge of the following models using [mergekit](https://github.com/cg123/mergekit): * [OpenPipe/mistral-ft-optimized-1218](https://huggingface.co/OpenPipe/mistral-ft-optimized-1218) * [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B) ## ๐Ÿงฉ Configuration ```yaml slices: - sources: - model: OpenPipe/mistral-ft-optimized-1218 layer_range: [0, 32] - model: mlabonne/NeuralHermes-2.5-Mistral-7B layer_range: [0, 32] merge_method: slerp base_model: OpenPipe/mistral-ft-optimized-1218 parameters: t: - filter: self_attn value: [0, 0.5, 0.3, 0.7, 1] - filter: mlp value: [1, 0.5, 0.7, 0.3, 0] - value: 0.5 dtype: bfloat16 ``` ### Basic Usage ``` from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig import transformers import torch model_id = "QuantumIntelligence/QI-mistral-7B-slerp" tokenizer = AutoTokenizer.from_pretrained(model_id) # model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", load_in_8bit=True) # quantization pipeline = transformers.pipeline( "text-generation", model=model, torch_dtype=torch.float16, device_map="auto", tokenizer=tokenizer, ) prompt = """Classify the text into neutral, negative or positive. Text: This movie is definitely one of my favorite movies of its kind. The interaction between respectable and morally strong characters is an ode to chivalry and the honor code amongst thieves and policemen. Sentiment: """ outputs = pipeline(prompt, max_new_tokens=6) print(outputs[0]["generated_text"]) ``` ### Using Korean - Sentiment ``` # prompt = """ # ๋‹ค์Œ ํ…์ŠคํŠธ๋ฅผ ์ค‘๋ฆฝ, ๋ถ€์ •, ๊ธ์ •์œผ๋กœ ๋ถ„๋ฅ˜ํ•ด์ค˜. # ํ…์ŠคํŠธ: ํ•˜๋Š˜์„ ๋ณด๋‹ˆ ๋น„๊ฐ€ ์˜ฌ๋“ฏ ํ•˜๋‹ค. ์šฐ์šธํ•œ ๊ธฐ๋ถ„์ด ๋“ค์–ด์„œ ์ˆ ์„ ํ•œ์ž” ํ• ๊นŒ ๊ณ ๋ฏผ์ค‘์ธ๋ฐ ๊ฐ™์ด ๋งˆ์‹ค ์‚ฌ๋žŒ์ด ์—†๋‹ค. # ๊ฐ์ •: # """ outputs = pipeline(prompt, max_new_tokens=6) print(outputs[0]["generated_text"]) # ``` - Summarization ``` prompt = """ ์ด์ˆœ์‹ (ํ•œ๊ตญ ํ•œ์ž: ๆŽ่ˆœ่‡ฃ, 1545๋…„ 4์›” 28์ผ (์Œ๋ ฅ 3์›” 8์ผ) ~ 1598๋…„ 12์›” 16์ผ (์Œ๋ ฅ 11์›” 19์ผ))์€ ์กฐ์„  ์ค‘๊ธฐ์˜ ๋ฌด์‹ ์ด์—ˆ๋‹ค. ๋ณธ๊ด€์€ ๋•์ˆ˜(ๅพทๆฐด), ์ž๋Š” ์—ฌํ•ด(ๆฑ่ซง), ์‹œํ˜ธ๋Š” ์ถฉ๋ฌด(ๅฟ ๆญฆ)์˜€์œผ๋ฉฐ, ํ•œ์„ฑ ์ถœ์‹ ์ด์—ˆ๋‹ค. ๋ฌธ๋ฐ˜ ๊ฐ€๋ฌธ ์ถœ์‹ ์œผ๋กœ 1576๋…„(์„ ์กฐ 9๋…„) ๋ฌด๊ณผ(ๆญฆ็ง‘)์— ๊ธ‰์ œ[2]ํ•˜์—ฌ ๊ทธ ๊ด€์ง์ด ๋™๊ตฌ๋น„๋ณด ๊ถŒ๊ด€, ํ›ˆ๋ จ์› ๋ด‰์‚ฌ, ๋ฐœํฌ์ง„ ์ˆ˜๊ตฐ๋งŒํ˜ธ, ์กฐ์‚ฐ๋ณด ๋งŒํ˜ธ, ์ „๋ผ๋‚จ๋„์ˆ˜์‚ฌ๋ฅผ ๊ฑฐ์ณ ์ •ํ—Œ๋Œ€๋ถ€ ์‚ผ๋„์ˆ˜๊ตฐํ†ต์ œ์‚ฌ์— ์ด๋ฅด๋ €๋‹ค. ํ•จ๊ฒฝ๋„ ๋™๊ตฌ๋น„๋ณด๊ถŒ๊ด€(่‘ฃไป‡้žๅ กๆฌŠ็ฎก), 1581๋…„ ๋ฐœํฌ ์ˆ˜๊ตฐ๋งŒํ˜ธ(้‰ขๆตฆๆฐด่ป่ฌๆˆถ)๊ฐ€ ๋˜์—ˆ๋‹ค๊ฐ€ ์ „๋ผ๋‚จ์ˆ˜์˜์˜ ์˜ค๋™๋‚˜๋ฌด๋ฅผ ๋ฒ ๊ธฐ๋ฅผ ๊ฑฐ์ ˆํ•˜์—ฌ ์ขŒ์ˆ˜์‚ฌ ์„ฑ๋ฐ•์˜ ๋ฏธ์›€์„ ๋ฐ›๊ธฐ๋„ ํ–ˆ๋‹ค. ์ดํ›„ 1584๋…„ ๋‚จ๋ณ‘์‚ฌ์˜ ๊ตฐ๊ด€๊ณผ ๊ฑด์›๋ณด๊ถŒ๊ด€, ํ›ˆ๋ จ์›์ฐธ๊ตฐ, 1586๋…„ ์‚ฌ๋ณต์‹œ์ฃผ๋ถ€๋ฅผ ๊ฑฐ์ณ ์กฐ์‚ฐ๋ณด๋งŒํ˜ธ ๊ฒธ ๋…น๋„๋‘”์ „์‚ฌ์˜(้€ ๅฑฑๅ ก่ฌๆˆถๅ…ผ้นฟๅณถๅฑฏ็”ฐไบ‹ๅฎœ)๋กœ ๋ถ€์ž„ํ–ˆ๋‹ค. ์กฐ์‚ฐ๋งŒํ˜ธ ๊ฒธ ๋…น๋‘”๋„์‚ฌ์˜ ์žฌ์ง ์ค‘ 1587๋…„(์„ ์กฐ 20๋…„) 9์›”์˜ ์—ฌ์ง„์กฑ์˜ ์‚ฌ์ „ ๊ธฐ์Šต๊ณต๊ฒฉ์œผ๋กœ ๋ฒŒ์–ด์ง„ ๋…น๋‘”๋„์ „ํˆฌ์—์„œ ์ด๊ฒผ์ง€๋งŒ ํ”ผํ•ด๊ฐ€ ์ปค์„œ, ๋ถ๋ณ‘์‚ฌ ์ด์ผ์˜ ํƒ„ํ•ต์„ ๋ฐ›๊ณ  ๋ฐฑ์˜์ข…๊ตฐ(็™ฝ่กฃๅพž่ป)ํ•˜๋Š” ์œ„์น˜์— ์„œ๊ธฐ๋„ ํ–ˆ๋‹ค. ๊ทธ ๋’ค ๋‘๋ฒˆ์งธ ์—ฌ์ง„์กฑ๊ณผ์˜ ๊ต์ „์—์„œ ์Šน์ „, ๋ณต์งํ•˜์˜€๋‹ค. ๊ทธ ๋’ค ์ „๋ผ๊ด€์ฐฐ์‚ฌ ์ด๊ด‘(ๆŽๆดธ)์—๊ฒŒ ๋ฐœํƒ๋˜์–ด ์ „๋ผ๋„ ์กฐ๋ฐฉ์žฅ, ์„ ์ „๊ด€ ๋“ฑ์„ ์—ญ์ž„ํ–ˆ๋‹ค. 1589๋…„ ์ •์ํ˜„๊ฐ ์žฌ์ง ์ค‘ ๋ฅ˜์„ฑ๋ฃก์˜ ์ถ”์ฒœ์œผ๋กœ ๊ณ ์‚ฌ๋ฆฌ์ฒจ์‚ฌ(้ซ˜ๆฒ™้‡Œๅƒ‰ไฝฟ)๊ฐ€ ๋˜๊ณ , ์ ˆ์ถฉ์žฅ๊ตฐ(ๆŠ˜่กๅฐ‡่ป), ๋งŒํฌ์ง„์ฒจ์‚ฌ(ๆปฟๆตฆ้Žญๅƒ‰ไฝฟ), ์ง„๋„๊ตฐ์ˆ˜ ๋“ฑ์„ ๊ฑฐ์ณ ์ „๋ผ์ขŒ๋„์ˆ˜๊ตฐ์ ˆ๋„์‚ฌ๊ฐ€ ๋˜์–ด ์ž„์ง„์™œ๋ž€์„ ๋งŒ๋‚˜๊ฒŒ ๋˜์—ˆ๋‹ค. ์ž„์ง„์™œ๋ž€ ๋•Œ ์กฐ์„ ์˜ ์‚ผ๋„์ˆ˜๊ตฐํ†ต์ œ์‚ฌ๊ฐ€ ๋˜์–ด ๋ถ€ํ•˜๋“ค์„ ํ†ต์†”ํ•˜๋Š” ์ง€๋„๋ ฅ, ๋›ฐ์–ด๋‚œ ์ง€๋žต, ๊ทธ๋ฆฌ๊ณ  ํƒ์›”ํ•œ ์ „๋žต๊ณผ ๋Šฅ์ˆ˜๋Šฅ๋ž€ํ•œ ์ „์ˆ ๋กœ ์ผ๋ณธ ์ˆ˜๊ตฐ๊ณผ์˜ ํ•ด์ „์—์„œ ์—ฐ์ „์—ฐ์Šนํ•ด ๋‚˜๋ผ๋ฅผ ๊ตฌํ•œ ์„ฑ์›…(่–้›„)์œผ๋กœ ์ถ”์•™๋ฐ›๊ณ  ์žˆ๋‹ค. ๋…ธ๋Ÿ‰ ํ•ด์ „์—์„œ ์ „์‚ฌํ•œ ๋’ค ์„ ๋ฌด๊ณต์‹  1๋“ฑ๊ด€์— ์ถ”๋ก๋˜๊ณ  ์ฆ ์˜์ •๋ถ€์šฐ์˜์ •์— ์ถ”์ฆ๋˜๊ณ  ๋•ํ’๊ตฐ์— ์ถ”๋ด‰๋˜์—ˆ๋‹ค๊ฐ€, ๊ด‘ํ•ด๊ตฐ ๋•Œ ๋‹ค์‹œ ์ฆ ์˜์ •๋ถ€์ขŒ์˜์ •์— ์ถ”์ฆ๋˜๊ณ  ๋•ํ’๋ถ€์›๊ตฐ์— ์ถ”๋ด‰๋˜์—ˆ๊ณ , ์ •์กฐ ๋•Œ์—๋Š” ์ฆ ์˜์ •๋ถ€์˜์˜์ •์œผ๋กœ ๊ฐ€์ฆ(ๅŠ ่ดˆ)๋˜์—ˆ๋‹ค. ๊ณ ๋ ค ๋•Œ ์ •5ํ’ˆ ์ค‘๋ž‘์žฅ(ไธญ้ƒŽๅฐ‡)์„ ์ง€๋‚ธ ๋•์ˆ˜ ์ด์”จ์˜ ์‹œ์กฐ ์ด๋ˆ์ˆ˜(ๆŽๆ•ฆๅฎˆ)์˜ 12๋Œ€์†์ด๋ฉฐ, ์กฐ์„  ์ดˆ ์˜์ค‘์ถ”๋ถ€์‚ฌ(้ ˜ไธญๆจžๅบœไบ‹)๋ฅผ ์ง€๋‚ธ ์ด๋ณ€(ๆŽ้‚Š)[3]์˜ ํ›„์†์ด๋‹ค. ์™ธ๊ฐ€๋Š” ์ดˆ๊ณ„ ๋ณ€์”จ(ๅžๆฐ), ์ฒ˜๊ฐ€๋Š” ์˜จ์–‘ ๋ฐฉ์”จ(ๆ–นๆฐ, ๋‹น์‹œ์—๋Š” ์ƒ์ฃผ ๋ฐฉ์”จ)์ด๋‹ค. ๊ทธ์˜ ๋ฌ˜๋Š” ์ถฉ์ฒญ๋‚จ๋„ ์•„์‚ฐ์‹œ์— ์žˆ๋‹ค. ์œ„ ๋ฌธ์žฅ์„ 300์ž๋‚ด๋กœ ์š”์•ฝํ•ด์ค˜. ์š”์•ฝ: """ outputs = pipeline(prompt, max_new_tokens=300, do_sample=True, top_k=50, return_full_text = False) print(outputs[0]["generated_text"]) ``` - Question answering ``` prompt = """ ๋‹ค์Œ ๋ฌธ๋งฅ์— ๋Œ€ํ•ด ์•„๋ž˜ ์งˆ๋ฌธ์— ๋Œ€ํ•ด ๋‹ตํ•ด์ค˜. ๋ฌธ๋งฅ: 1565๋…„ ์ด์ˆœ์‹ ์€ ๋ฐฉ์”จ(ๆ–นๆฐ)์™€ ํ˜ผ์ธํ•˜๊ณ  ๋ณด์„ฑ๊ตฐ์ˆ˜๋ฅผ ์ง€๋‚ธ ์žฅ์ธ ๋ฐฉ์ง„์˜ ํ›„์›์œผ๋กœ ๋ณ‘ํ•™์„ ๋ฐฐ์šฐ๋ฉด์„œ ๋ฌด๊ณผ(ๆญฆ็ง‘)๋ฅผ ์ค€๋น„ํ•˜์˜€๋‹ค. 28์‚ด์ด๋˜ 1572๋…„(์„ ์กฐ 5๋…„) ํ›ˆ๋ จ์› ๋ณ„๊ณผ(่จ“้Œฌ้™ข ๅˆฅ็ง‘)์— ์‘์‹œํ–ˆ์œผ๋‚˜ ์‹œํ—˜์„ ๋ณด๋˜ ์ค‘, ๋ง์—์„œ ๋‚™๋งˆํ•˜์—ฌ ์ฃผ๋ณ€ ์‚ฌ๋žŒ๋“ค์ด ๊ธฐ์ ˆํ•œ ์ค„ ์•Œ์•˜์œผ๋‚˜ ์˜†์— ์žˆ๋˜ ๋ฒ„๋“œ๋‚˜๋ฌด ๊ป์งˆ์„ ๋ฒ—๊ฒจ ๋‹ค๋ฆฌ๋ฅผ ๋™์—ฌ๋งค๊ณ  ์‹œํ—˜์„ ๋๊นŒ์ง€ ์น˜๋ €๋‹ค. ํ•˜์ง€๋งŒ ๊ฒฐ๊ตญ ์‹œํ—˜์—์„œ๋Š” ๋‚™๋ฐฉํ•˜๊ณ  ๋งŒ๋‹ค. ์งˆ๋ฌธ: ์ด์ˆœ์‹ ์€ 28์‚ด์— ๋ฌด๊ณผ์— ํ•ฉ๊ฒฉํ•˜๋Š”๊ฐ€? ๋Œ€๋‹ต: """ outputs = pipeline(prompt, max_new_tokens=30, do_sample=True, top_k=50, return_full_text = False) generated_text = outputs[0]["generated_text"] print(generated_text) # ์•„๋‹ˆ์š”, 28์‚ด์— ๋ฌด๊ณผ์— ํ•ฉ๊ฒฉํ•˜์ง€ ๋ชปํ•˜์˜€๋‹ค. ``` - Chatbot style ``` messages = [{"role": "user", "content": "์ข‹์€ ์ทจ๋ฏธ๋ฅผ ๊ฐ€์ง€๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•˜๋‚˜์š”?"}] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) outputs = pipeline(prompt, max_new_tokens=512, do_sample=True, temperature=0.7, top_k=50, top_p=0.95, return_full_text = False) generated_text = outputs[0]["generated_text"] print(generated_text) ``` ### For Development The support of GPU computing resource is required for the development and implementation of state-of-the-art models. I would appreciate if anyone could help. Email: baida21@naver.com