Questions on Training and Architecture

by crosant13 - opened 20 days ago

20 days ago

I’m exploring this model, particularly its training methods and architectural specifics, and I have a few questions:

How exactly is training of KaLM on top of Qwen?
What loss function or objective was used to train KaLM? Was a specific ranking or contrastive loss applied?
What metric was chosen to optimize embeddings, and how was it used in training?
Was a particular method of positional encoding used, given the multilingual scope and Qwen’s involvement?
Thank you in advance for any insights or resources on KaLM’s architecture and training processes.

YanshekWoo

HITsz-Text Machine Group org 20 days ago

Thank you for your interest in our model. We have trained it using the Qwen2 model without any architectural modifications. For detailed information on the architecture, please refer to the Qwen model documentation.

Regarding the loss function, we employ the widely-used Info-NCE loss. You can currently access the training code from FlagEmbedding.

We will be releasing more details about the training process and data soon.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment