Is it valid to use CausalLM with zero attention values?
#178
by
soumyasanyal08
- opened
Hi,
I'm trying to understand what happens when we call LlamaForCausalLM model with attention_mask = [1, 1, 1, 1, 0, 0, 0, 0]
. For the 5th index (using 0-indexing), would it have teacher-forcing internally? For instance, what's the internal differences in using above attention_mask vs. attention_mask_2 = [1, 1, 1, 1, 1, 0, 0, 0]
?
Does it make sense to call causal models with zeros in the attention mask?