README.md · LLaMAX/LLaMAX2-7B-XNLI at main

metadata

tags:
  - Multilingual
license: mit
language:
  - af
  - am
  - ar
  - hy
  - as
  - ast
  - az
  - be
  - bn
  - bs
  - bg
  - my
  - ca
  - ceb
  - zho
  - hr
  - cs
  - da
  - nl
  - en
  - et
  - tl
  - fi
  - fr
  - ff
  - gl
  - lg
  - ka
  - de
  - el
  - gu
  - ha
  - he
  - hi
  - hu
  - is
  - ig
  - id
  - ga
  - it
  - ja
  - jv
  - kea
  - kam
  - kn
  - kk
  - km
  - ko
  - ky
  - lo
  - lv
  - ln
  - lt
  - luo
  - lb
  - mk
  - ms
  - ml
  - mt
  - mi
  - mr
  - mn
  - ne
  - ns
  - 'no'
  - ny
  - oc
  - or
  - om
  - ps
  - fa
  - pl
  - pt
  - pa
  - ro
  - ru
  - sr
  - sn
  - sd
  - sk
  - sl
  - so
  - ku
  - es
  - sw
  - sv
  - tg
  - ta
  - te
  - th
  - tr
  - uk
  - umb
  - ur
  - uz
  - vi
  - cy
  - wo
  - xh
  - yo
  - zu

Model Sources

Paper: LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
Link: https://arxiv.org/pdf/2407.05975
Repository: https://github.com/CONE-MT/LLaMAX/

Model Description

🔥 LLaMAX-7B-X-NLI is a NLI model with multilingual capability, which is fully fine-tuned the powerful multilingual model LLaMAX-7B on MultiNLI dataset.

🔥 Compared with fine-tuning Llama-2 on the same setting, LLaMAX-7B-X-CSQA improves the average accuracy up to 5.6% on the XNLI dataset.

Experiments

XNLI	Avg.	Sw	Ur	Hi	Th	Ar	Tr	El	Vi	Zh	Ru	Bg	De	Fr	Es	En
Llama2-7B-X-XNLI	70.6	44.6	55.1	62.2	58.4	64.7	64.9	65.6	75.4	75.9	78.9	78.6	80.7	81.7	83.1	89.5
LLaMAX-7B-X-XNLI	76.2	66.7	65.3	69.1	66.2	73.6	71.8	74.3	77.4	78.3	80.3	81.6	82.2	83.0	84.1	89.7

Model Usage

Code Example:

from transformers import AutoTokenizer, LlamaForCausalLM

model = LlamaForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS)
tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER)

query = "Premise: She doesn’t really understand. Hypothesis: Actually, she doesn’t get it. Label:"
inputs = tokenizer(query, return_tensors="pt")

generate_ids = model.generate(inputs.input_ids, max_length=30)
tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
# =>  Entailment

Citation

if our model helps your work, please cite this paper:

@article{lu2024llamax,
  title={LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages},
  author={Lu, Yinquan and Zhu, Wenhao and Li, Lei and Qiao, Yu and Yuan, Fei},
  journal={arXiv preprint arXiv:2407.05975},
  year={2024}
}