Safetensors
llama
X-ALMA-13B-Group5 / README.md
haoranxu's picture
Create README.md
6b5c31b verified
|
raw
history blame
5.29 kB
metadata
license: mit
datasets:
  - oscar-corpus/OSCAR-2301
  - allenai/nllb
  - Helsinki-NLP/opus-100
language:
  - hu
  - el
  - cs
  - pl
  - lt
  - lv
base_model:
  - haoranxu/ALMA-13B-Pretrain
  - meta-llama/Llama-2-13b-hf

X-ALMA builds upon ALMA-R by expanding support from 6 to 50 languages. It utilizes a plug-and-play architecture with language-specific modules, complemented by a carefully designed training recipe. This release includes the language-specific X-ALMA LoRA module and a merged model that supports the languages in Group 5: English (en), Hungarian (hu), Greek (el), Czech (cs), Polish (pl), Lithuanian (lt), and Latvian (lv).

Model X-ALMA checkpoints are released at huggingface:

Models Base Model Link Description
X-ALMA haoranxu/X-ALMA X-ALMA model with all its modules
X-ALMA-13B-Pretrain haoranxu/X-ALMA-13B-Pretrain X-ALMA 13B multilingual pre-trained base model
X-ALMA-Group1 haoranxu/X-ALMA-13B-Group1 X-ALMA group1 specific module and the merged model
X-ALMA-Group2 haoranxu/X-ALMA-13B-Group2 X-ALMA group2 specific module and the merged model
X-ALMA-Group3 haoranxu/X-ALMA-13B-Group3 X-ALMA group3 specific module and the merged model
X-ALMA-Group4 haoranxu/X-ALMA-13B-Group4 X-ALMA group4 specific module and the merged model
X-ALMA-Group5 haoranxu/X-ALMA-13B-Group5 X-ALMA group5 specific module and the merged model
X-ALMA-Group6 haoranxu/X-ALMA-13B-Group6 X-ALMA group6 specific module and the merged model
X-ALMA-Group7 haoranxu/X-ALMA-13B-Group7 X-ALMA group7 specific module and the merged model
X-ALMA-Group8 haoranxu/X-ALMA-13B-Group8 X-ALMA group8 specific module and the merged model

A quick start:

There are three ways to load X-ALMA for translation. An example of translating "我爱机器翻译。" into English (X-ALMA should also able to do multilingual open-ended QA).

The first way: loading the merged model where the language-specific module has been merged into the base model (Recommended):

import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
from peft import PeftModel

GROUP2LANG = {
1: ["da", "nl", "de", "is", "no", "sv", "af"],
2: ["ca", "ro", "gl", "it", "pt", "es"],
3: ["bg", "mk", "sr", "uk", "ru"],
4: ["id", "ms", "th", "vi", "mg", "fr"],
5: ["hu", "el", "cs", "pl", "lt", "lv"],
6: ["ka", "zh", "ja", "ko", "fi", "et"],
7: ["gu", "hi", "mr", "ne", "ur"],
8: ["az", "kk", "ky", "tr", "uz", "ar", "he", "fa"],
}
LANG2GROUP = {lang: str(group) for group, langs in GROUP2LANG.items() for lang in langs}
group_id = LANG2GROUP["zh"]

model = AutoModelForCausalLM.from_pretrained(f"haoranxu/X-ALMA-13B-Group{group_id}", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(f"haoranxu/X-ALMA-13B-Group{group_id}", padding_side='left')

# Add the source sentence into the prompt template
prompt="Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"

# X-ALMA needs chat template but ALMA and ALMA-R don't need it.
chat_style_prompt = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(chat_style_prompt, tokenize=False, add_generation_prompt=True)

input_ids = tokenizer(prompt, return_tensors="pt", padding=True, max_length=40, truncation=True).input_ids.cuda()

# Translation
with torch.no_grad():
generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9)
outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(outputs)

The second way: loading the base model and language-specific module (Recommended):

model = AutoModelForCausalLM.from_pretrained("haoranxu/X-ALMA-13B-Pretrain", torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(model, f"haoranxu/X-ALMA-13B-Group{group_id}")
tokenizer = AutoTokenizer.from_pretrained(f"haoranxu/X-ALMA-13B-Group{group_id}", padding_side='left')

The third way: loading the base model with all language-specific modules like MoE: (Require large GPU memory)

from modeling_xalma import XALMAForCausalLM
model = XALMAForCausalLM.from_pretrained("haoranxu/X-ALMA", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("haoranxu/X-ALMA", padding_side='left')

# Add `lang="zh"`: specify the language to instruct the model on which group to use for the third loading method during generation.
generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9, lang="zh")