Confusing ArcticDecoderLayer::forward() implementation

#11

by sszymczyk - opened Apr 26

Apr 26

•

I'm a bit confused about the ArcticDecoderLayer::forward() method implementation in the model:

Does the model work correctly with parallel_attn_mlp_res set to false?
There is a normalization layer called post_attention_layernorm. Do I understand correctly that if parallel_attn_mlp_res is set to true then it actually normalizes the layer input, not the attention output?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment