Dataset used to fine-tune?
#2
by
TeddyB
- opened
Hi,
Could you please comment on the dataset that you used to train the model?
- Was it an open-source dataset, or a fully custom dataset?
- How many rows/data points did you need to train on to see good function-calling results?
- Have any tips on making a function calling dataset?
- The dataset I used is a fully custom dataset, however I did refine it from this dataset https://huggingface.co/datasets/hiyouga/glaive-function-calling-v2-sharegpt
- Usually it's the more the better, but I'm always start at 1k rows (depends on model, someone said 7x samples were enough to teach llama3 the pattern) to see the loss, performance, ... and decide if I want to adjust the hyperparameter
- Have a look a this to cross check every condition at making the dataset https://github.com/MeetKai/functionary?tab=readme-ov-file#the-differences-between-related-projects
Hope it helps :D
hiieu
changed discussion status to
closed