chrisc36 commited on
Commit
2ce8d82
1 Parent(s): bfcc419

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -94,16 +94,21 @@ print(generated_text)
94
 
95
  To make inference more efficient, run with autocast:
96
 
 
 
97
  with torch.autocast(device_type="cuda", enabled=True, dtype=torch.bfloat16):
98
  output = model.generate_from_batch(
99
  inputs,
100
  GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
101
  tokenizer=processor.tokenizer
102
  )
 
 
103
  We did most of our evaluation in this setting (autocast on, but float32 weights)
104
 
105
  To even further reduce the memory requirements, the model can be run with bfloat16 weights:
106
 
 
107
  model.to(dtype=torch.bfloat16)
108
  inputs["images"] = inputs["images"].to(torch.bfloat16)
109
  output = model.generate_from_batch(
@@ -111,6 +116,7 @@ output = model.generate_from_batch(
111
  GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
112
  tokenizer=processor.tokenizer
113
  )
 
114
  Note that we have observed that this can change the output of the model compared to running with float32 weights.
115
 
116
  ## Evaluations
 
94
 
95
  To make inference more efficient, run with autocast:
96
 
97
+
98
+ ```python
99
  with torch.autocast(device_type="cuda", enabled=True, dtype=torch.bfloat16):
100
  output = model.generate_from_batch(
101
  inputs,
102
  GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
103
  tokenizer=processor.tokenizer
104
  )
105
+ ```
106
+
107
  We did most of our evaluation in this setting (autocast on, but float32 weights)
108
 
109
  To even further reduce the memory requirements, the model can be run with bfloat16 weights:
110
 
111
+ ```
112
  model.to(dtype=torch.bfloat16)
113
  inputs["images"] = inputs["images"].to(torch.bfloat16)
114
  output = model.generate_from_batch(
 
116
  GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
117
  tokenizer=processor.tokenizer
118
  )
119
+ ```
120
  Note that we have observed that this can change the output of the model compared to running with float32 weights.
121
 
122
  ## Evaluations