rogerxfeng8 commited on
Commit
e4f4e18
1 Parent(s): 97bc412

Change the assert to warning in __init__

Browse files

When enabling phi-3-small on non-cuda devices, flash_attn package is not available. The assert of flash_attn in __init__ will force the exit. The patch changes the assert into warning, so that we can use customized implementation of flash attention.

Files changed (1) hide show
  1. modeling_phi3_small.py +2 -1
modeling_phi3_small.py CHANGED
@@ -215,7 +215,8 @@ class Phi3SmallSelfAttention(nn.Module):
215
  f"Layer {layer_idx + 1} is using dense attention since it is divisible by "
216
  f"{self.config.dense_attention_every_n_layers}"
217
  )
218
- assert is_flash_attention_available, "Flash Attention is not available, but is needed for dense attention"
 
219
  else:
220
  # BlockSparse related Parameters
221
  self.blocksparse_params = BlockSparseParams.from_config(config)
 
215
  f"Layer {layer_idx + 1} is using dense attention since it is divisible by "
216
  f"{self.config.dense_attention_every_n_layers}"
217
  )
218
+ # use warnings to allow the modeling use different flash attention implementation later
219
+ logger.warning_once("Flash Attention is not available, but is needed for dense attention")
220
  else:
221
  # BlockSparse related Parameters
222
  self.blocksparse_params = BlockSparseParams.from_config(config)