Image-Text-to-Text
Transformers
Safetensors
English
Chinese
llava
vision-language
llm
lmm
conversational
Inference Endpoints
bczhou commited on
Commit
507d5f6
1 Parent(s): 6846de5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -33,14 +33,14 @@ More evaluations are ongoing.
33
 
34
  ## Model Preparations
35
 
36
- ### Transformers Version
37
  Make sure to have `transformers >= 4.35.3`.
38
 
39
- ### Prompt Template
40
  The model supports multi-image and multi-prompt generation. When using the model, make sure to follow the correct prompt template (`USER: <image>xxx\nASSISTANT:`), where `<image>` token is a place-holding special token for image embeddings.
41
 
42
  ## Model Inference from `pipeline` and `transformers`
43
- ### Using `pipeline`:
44
  Below we used [`"bczhou/tiny-llava-v1-hf"`](https://huggingface.co/bczhou/tiny-llava-v1-hf) checkpoint.
45
 
46
  ```python
@@ -57,7 +57,7 @@ print(outputs[0])
57
  >>> {"generated_text': 'USER: \nWhat does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud\nASSISTANT: The label 15 represents lava, which is a type of volcanic rock."}
58
  ```
59
 
60
- ### Using pure `transformers`:
61
  Below is an example script to run generation in `float16` precision on a GPU device:
62
 
63
  ```python
 
33
 
34
  ## Model Preparations
35
 
36
+ #### - Transformers Version
37
  Make sure to have `transformers >= 4.35.3`.
38
 
39
+ #### - Prompt Template
40
  The model supports multi-image and multi-prompt generation. When using the model, make sure to follow the correct prompt template (`USER: <image>xxx\nASSISTANT:`), where `<image>` token is a place-holding special token for image embeddings.
41
 
42
  ## Model Inference from `pipeline` and `transformers`
43
+ #### - Using `pipeline`:
44
  Below we used [`"bczhou/tiny-llava-v1-hf"`](https://huggingface.co/bczhou/tiny-llava-v1-hf) checkpoint.
45
 
46
  ```python
 
57
  >>> {"generated_text': 'USER: \nWhat does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud\nASSISTANT: The label 15 represents lava, which is a type of volcanic rock."}
58
  ```
59
 
60
+ #### - Using pure `transformers`:
61
  Below is an example script to run generation in `float16` precision on a GPU device:
62
 
63
  ```python