Finetuning Genna for Foreign Language

#78

by deleted - opened Mar 21

deleted

Mar 21

I am attempting to fine-tune Gemma for one of the languages on which it has been pretrained. Could you provide any suggestions regarding the optimal size of the dataset to ensure a noticeable improvement in performance? The best format for the training files? Any other recommendations? Thank you.

Ekabilisim

Mar 22

This comment has been hidden

rhaymison

Mar 23

•

edited Mar 23

@user1357925

hello friend. I had a good response from the gemma 2b model using this format to pass to the dataset. I did the Fine tuning for Brazilian Portuguese. He follows
I have 2 datasets. One for mental 36k (gemma 2b )
another with 100k for instruct ( gemma 7b )

def formatting_func(example):
     instruction = example['question']
     output = example['answer']
     text = f"<start_of_turn>user\n{instruction}<end_of_turn> <start_of_turn>model\n{output}<end_of_turn>"
     return text