Image-Text-to-Text
xtuner
llava-internlm-7b / README.md
LZHgrla's picture
first commit
018f102
|
raw
history blame
2.5 kB
metadata
library_name: peft
datasets:
  - liuhaotian/LLaVA-Pretrain
  - liuhaotian/LLaVA-Instruct-150K
pipeline_tag: visual-question-answering

Generic badge

Model

llava-internlm-chat-7b-clip-vit-large-p14-336 is a LLaVA model fine-tuned from InternLM-Chat-7B and CLIP-ViT-Large-patch14-336 with LLaVA-Pretrain and LLaVA-Instruct by XTuner.

Quickstart

Installation

pip install -U 'xtuner[deepspeed]'

Chat

xtuner chat internlm/internlm-chat-7b \
  --visual-encoder openai/clip-vit-large-patch14 \
  --llava xtuner/llava-internlm-chat-7b-clip-vit-large-p14-336 \
  --prompt-template internlm_chat \
  --image $IMAGE_PATH

Training

  1. Alignment module pretraining (saved by default in ./work_dirs/)
NPROC_PER_NODE=8 xtuner train llava_internlm_chat_7b_clip_vit_large_p14_336_e1_gpu8_pretrain --deepspeed deepspeed_zero2
  1. Instruction following fine-tuning (saved by default in ./work_dirs/)
NPROC_PER_NODE=8 xtuner train llava_internlm_chat_7b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune --deepspeed deepspeed_zero2

MMBench Evaluation

XTuner integrates the MMBench evaluation, and you can perform evaluations with the following command!

xtuner mmbench internlm/internlm-chat-7b \
  --visual-encoder openai/clip-vit-large-patch14 \
  --llava xtuner/llava-internlm-chat-7b-clip-vit-large-p14-336 \
  --prompt-template internlm_chat \
  --data-path $MMBENCH_DATA_PATH \
  --language en \
  --work-dir $RESULT_PATH

After the evaluation is completed, if it's a development set, it will directly print out the results; If it's a test set, you need to submit mmbench_result.xlsx to the official MMBench for final evaluation to obtain precision results!

Citation

@misc{2023xtuner,
    title={XTuner: A Toolkit for Efficiently Fine-tuning LLM},
    author={XTuner Contributors},
    howpublished = {\url{https://github.com/InternLM/xtuner}},
    year={2023}
}