beomi's picture
Update README.md
8ad47b6 verified
metadata
language:
  - ko
  - en
license: cc-by-nc-sa-4.0
library_name: transformers

Llama-3-KoEn-8B-xtuner-llava-preview πŸŒ‹

Llama-3-KoEn-8B-xtuner-llava-preview πŸŒ‹ is Korean based MutliModal based on Llava architecture, merged with ChatVector methods leveraging 2 models:

  1. beomi/Llama-3-KoEn-8B-preview
  2. xtuner/llava-llama-3-8b-transformers

Model Details

Model Description

Direct Use

Cat walking on frozen Han-River, Seoul

Two version recommended

v1. revision='a38aac3': Basic ChatVector, with 25B+ trained KoEn ckpt(rev. d4d25a2).

v1-1. revision='0224971': Basic ChatVector, with 40B+ trained KoEn ckpt(rev. ad39b32).

v1-2. revision='170746c': Basic ChatVector, with 80B+ trained KoEn ckpt(rev. b4c45ab).

v2. revision='4f04d1e': Model diff based merging(ref. https://huggingface.co/blog/maywell/llm-feature-transfer), with 25B+ trained KoEn ckpt(rev. d4d25a2).

import requests
from PIL import Image

import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration

model_id = "beomi/Llama-3-KoEn-8B-xtuner-llava-preview"

model = LlavaForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype='auto', 
    device_map='auto',
    revision='a38aac3', # 'a38aac3' for basic ChatVector, '4f04d1e' for Model diff based merging(ref. https://huggingface.co/blog/maywell/llm-feature-transfer)
)

processor = AutoProcessor.from_pretrained(model_id)

tokenizer = processor.tokenizer
terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

prompt = ("<|start_header_id|>user<|end_header_id|>\n\n<image>\n이 이미지에 λŒ€ν•΄μ„œ μ„€λͺ…ν•΄μ£Όμ„Έμš”.<|eot_id|>"
          "<|start_header_id|>assistant<|end_header_id|>\n\n이 μ΄λ―Έμ§€μ—λŠ”")
image_file = "https://cdn-uploads.huggingface.co/production/uploads/5e56829137cb5b49818287ea/NWfoArWI4UPAxpEnolkwT.jpeg"

raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16)

output = model.generate(**inputs, max_new_tokens=400, do_sample=True, eos_token_id=terminators,)
print(processor.decode(output[0][2:], skip_special_tokens=False))

# --- Example Output [v1, Chat Vector] ---
user<|end_header_id|>

<image>
이 이미지에 λŒ€ν•΄μ„œ μ„€λͺ…ν•΄μ£Όμ„Έμš”.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

이 μ΄λ―Έμ§€μ—λŠ” 고양이 ν•œ λ§ˆλ¦¬κ°€ κ°•λ¬Ό μœ„λ₯Ό κ±Έμ–΄κ°€λŠ” λͺ¨μŠ΅μ΄ λ³΄μ—¬μ§‘λ‹ˆλ‹€. κ³ μ–‘μ΄λŠ” κ°•λ¬Όμ˜ μž”λ¬Όκ²°μ— λ―Έλ„λŸΌμ„ 타고 κ°• κ°€λ‘œλ₯Ό μ§€λ‚˜λŠ” 데 λŠ₯μˆ™ν•˜κ²Œ λ³΄μž…λ‹ˆλ‹€. κ³ μ–‘μ΄μ˜ λ°œμ€ κ°•λ¬Όλ‘œ 잘 λ“€μ–΄κ°€, 그것을 즐기며 κ±Έμ–΄κ°‘λ‹ˆλ‹€. 

λ˜ν•œ 이 이미지도 μŒμ„± λ…ΉμŒμ„ ν•˜κ±°λ‚˜ λ…Ήν™”λœ 자료둜 μ œμž‘λ˜μ—ˆμœΌλ©°, 주둜 κ³ μ–‘μ΄μ˜ λͺ¨μŠ΅μ„ κ°•ν•˜κ²Œ λ³΄μ—¬μ€λ‹ˆλ‹€. μ†Œλ¦¬ νš¨κ³Όλ„ μ—¬λŸ¬ κ°€μ§€λ‘œ μΆ”κ°€ν•˜μ—¬ κ³ μ–‘μ΄μ˜ μŠ€ν† λ¦¬λ₯Ό λ‹€μ–‘ν•˜κ²Œ μ „λ‹¬ν•©λ‹ˆλ‹€. 강물은 μž”λ¬Όκ²°μ„ λ‚˜νƒ€λ‚΄λ©° κ°•λ¬Ό μœ„λ₯Ό κ±·λŠ” κ³ μ–‘μ΄μ˜ λͺ¨μŠ΅μ„ λ”μš± κ°•λ ¬ν•˜κ²Œ κ°•μ‘°ν•˜κΈ° μœ„ν•΄ μž”λ¬Όκ²°μ„ 톡해 더 λ””ν…ŒμΌν•œ μž₯면을 λ³΄μ—¬μ€λ‹ˆλ‹€.<|eot_id|>

# --- Example Output [v1-1, Chat Vector] ---
user<|end_header_id|>

<image>
이 이미지에 λŒ€ν•΄μ„œ μ„€λͺ…ν•΄μ£Όμ„Έμš”.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

이 μ΄λ―Έμ§€μ—μ„œλŠ” ν•œ 고양이가 μ„œν•΄μ•ˆμ— μœ„μΉ˜ν•œ λ°”λ‹€λ₯Ό κ±·κ³  μžˆλŠ” λͺ¨μŠ΅μ„ λ³Ό 수 μžˆμŠ΅λ‹ˆλ‹€. κ³ μ–‘μ΄λŠ” ν•΄λ³€μ—μ„œλΆ€ν„° λ°”λ‹€λ‘œ κ±Έμ–΄λ“€μ–΄κ°€λŠ” 쀑이며, μ£Όλ³€μ—λŠ” μž”μž”ν•œ νŒŒλ„κ°€ λ°€λ €μ˜€λŠ” λͺ¨μŠ΅μ„ 보여주고 μžˆμŠ΅λ‹ˆλ‹€. 이 κ³ μ–‘μ΄λŠ” νƒœμ–΄λ‚  λ•ŒλΆ€ν„° 고양이와 κ°•μ•„μ§€μ™€λŠ” λ‹€λ₯΄κ²Œ λ°”λ‹€λ₯Ό κ²½ν—˜ν•˜κ³ , 적응해가고 μžˆμŠ΅λ‹ˆλ‹€. κ³ μ–‘μ΄λŠ” λ°”λ‹€λ₯Ό μ’‹μ•„ν•˜κ³ , 이 ν™˜κ²½μ—μ„œ 행볡을 λŠλΌλŠ” 것 κ°™μŠ΅λ‹ˆλ‹€. 이 κ³ μ–‘μ΄λŠ” 인간이 μ•„λ‹Œ μžμ—°μ˜ μΌλΆ€λ‘œμ¨ 이 ν™˜κ²½μ—μ„œ μ‚΄μ•„κ°€κ³  μžˆμŠ΅λ‹ˆλ‹€.<|eot_id|>

# --- Example Output [v1-2, Chat Vector] ---
# model.generate(**inputs, max_new_tokens=200, do_sample=True, top_p=0.7, eos_token_id=terminators,)
user<|end_header_id|>

<image>
이 이미지에 λŒ€ν•΄μ„œ μ„€λͺ…ν•΄μ£Όμ„Έμš”.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

이 μ΄λ―Έμ§€λŠ” ν•œ 고양이가 λ¬Ό μœ„λ₯Ό κ±·κ³  μžˆλŠ” λͺ¨μŠ΅μ„ ν¬μ°©ν•œ μ‚¬μ§„μž…λ‹ˆλ‹€. κ³ μ–‘μ΄λŠ” 두 발둜 λ¬Ό μœ„λ₯Ό κ±Έμ–΄ κ°€κ³  μžˆμŠ΅λ‹ˆλ‹€. κ³ μ–‘μ΄λŠ” 4개의 발 쀑 2개의 λ°œμ€ 물에 빠지지 μ•Šκ³  2개의 λ°œμ€ 물에 λΉ μ Έ μžˆμŠ΅λ‹ˆλ‹€. κ³ μ–‘μ΄μ˜ 발이 빠진 뢀뢄은 λ°˜μ˜λ˜μ–΄ 물에 비쳐 μžˆμŠ΅λ‹ˆλ‹€. λ¬Ό μœ„λ₯Ό κ±·λŠ” κ³ μ–‘μ΄μ˜ λͺ¨μŠ΅μ΄ 참으둜 κ·€μ—½κ³  μ‚¬λž‘μŠ€λŸ½μŠ΅λ‹ˆλ‹€. 이 사진은 KBS λ™λ¬Όμ˜ μ™•κ΅­μ—μ„œ λ°©μ˜λ˜μ—ˆμŠ΅λ‹ˆλ‹€. KBS λ™λ¬Όμ˜ 왕ꡭ은 1985λ…„λΆ€ν„° μ‹œμž‘ν•˜μ—¬ 2019λ…„κΉŒμ§€ 34λ…„ λ™μ•ˆ 방영된 KBS의 λŒ€ν‘œμ μΈ μžμ—° λ‹€νλ©˜ν„°λ¦¬ ν”„λ‘œκ·Έλž¨μž…λ‹ˆλ‹€. KBS λ™λ¬Όμ˜ 왕ꡭ은 λ™λ¬Όμ˜ μƒνƒœμ™€ μŠ΅μ„±, 행동, 그리고 μžμ—° ν™˜κ²½μ„ μ΄ν•΄ν•˜κ³  λ³΄ν˜Έν•˜λŠ” 데 κΈ°μ—¬ν•˜κ³ μž ν•©λ‹ˆλ‹€.

# --- Example Output [v2, Model diff based merging] ---
user<|end_header_id|>

<image>
이 이미지에 λŒ€ν•΄μ„œ μ„€λͺ…ν•΄μ£Όμ„Έμš”.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

이 μ΄λ―Έμ§€μ—λŠ” ν•œκ΅­μ–΄ μžλ§‰κ³Ό ν•¨κ»˜ 고양이가 물에 λ°œμ„ λ””λ””κ³  κ±·λŠ” λͺ¨μŠ΅μ΄ 담겨 μžˆμŠ΅λ‹ˆλ‹€. κ³ μ–‘μ΄λŠ” 였λ₯Έμͺ½ λ°œμ„ 물에 λ‹΄κ·Έκ³  κ±·λŠ” 쀑이며, ν•œκ΅­μ–΄ μžλ§‰μ€ "κ³ μ–‘μ΄λŠ” 물을 μ’‹μ•„ν•©λ‹ˆλ‹€"λΌλŠ” λ¬Έμž₯을 ν¬ν•¨ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€. 이 μžλ§‰μ€ 고양이가 물을 μ’‹μ•„ν•˜λŠ” 것을 κ°•μ‘°ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.<|eot_id|>