fine-tuning the model with orphan sequences

by rqh - opened Jun 13, 2023

rqh

Jun 13, 2023

Dear Authors,

Thank you for the excellent work with ZymCTRL. I'm trying to fine-tune the model so that it generates homologous sequences of a target sequence. However, the target sequence is an orphan sequence, and only a dozen high-identity sequences can be used as dataset for fine-tuning. Is it possible to fine tune the model with this dataset? If yes, then could you please guide with how to implement this?

Thank you so much!

nferruz

AI for protein design org Jun 13, 2023

hi rqh,

You can fine-tune your dataset with the info in the documentation. There's no rule of thumb for how many sequences are the minimum, although it would be good to have at least 100. I'd still suggest you give it a try even if you only have, let's say, 20. One thing you can do is to fine-tune each sequence and its reverse.

Hope this helps!
noelia

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment