chrisc36 commited on
Commit
bfcc419
1 Parent(s): 23a46a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -92,6 +92,27 @@ print(generated_text)
92
  # The puppy is positioned in the center of the frame, looking up at the camera...
93
  ```
94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  ## Evaluations
96
 
97
  | Model | Average Score on 11 Academic Benchmarks | Human Preference Elo Rating |
 
92
  # The puppy is positioned in the center of the frame, looking up at the camera...
93
  ```
94
 
95
+ To make inference more efficient, run with autocast:
96
+
97
+ with torch.autocast(device_type="cuda", enabled=True, dtype=torch.bfloat16):
98
+ output = model.generate_from_batch(
99
+ inputs,
100
+ GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
101
+ tokenizer=processor.tokenizer
102
+ )
103
+ We did most of our evaluation in this setting (autocast on, but float32 weights)
104
+
105
+ To even further reduce the memory requirements, the model can be run with bfloat16 weights:
106
+
107
+ model.to(dtype=torch.bfloat16)
108
+ inputs["images"] = inputs["images"].to(torch.bfloat16)
109
+ output = model.generate_from_batch(
110
+ inputs,
111
+ GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
112
+ tokenizer=processor.tokenizer
113
+ )
114
+ Note that we have observed that this can change the output of the model compared to running with float32 weights.
115
+
116
  ## Evaluations
117
 
118
  | Model | Average Score on 11 Academic Benchmarks | Human Preference Elo Rating |