File size: 557 Bytes
2646361 95b4916 2646361 |
1 2 3 4 5 6 7 8 9 10 11 12 |
Core implementation of Jina XLM-RoBERTa
This implementation is adapted from [XLM-Roberta](https://huggingface.co/docs/transformers/en/model_doc/xlm-roberta). In contrast to the original implementation, this model uses Rotary positional encodings and supports flash-attention 2.
### Models that use this implementation
to be added soon
### Converting weights
Weights from an [original XLMRoberta model](https://huggingface.co/FacebookAI/xlm-roberta-large) can be converted using the `convert_roberta_weights_to_flash.py` script in the model repository. |