Is it valid to use CausalLM with zero attention values?

#178

by soumyasanyal08 - opened Jun 4

Jun 4

Hi,

I'm trying to understand what happens when we call LlamaForCausalLM model with attention_mask = [1, 1, 1, 1, 0, 0, 0, 0]. For the 5th index (using 0-indexing), would it have teacher-forcing internally? For instance, what's the internal differences in using above attention_mask vs. attention_mask_2 = [1, 1, 1, 1, 1, 0, 0, 0]?

Does it make sense to call causal models with zeros in the attention mask?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment