--- base_model: westlake-repl/SaProt_35M_AF2 library_name: peft --- # Base model: [westlake-repl/SaProt_35M_AF2](https://huggingface.co/westlake-repl/SaProt_35M_AF2) # Model Card for Model ID This model is trained on a sigle site deep mutation scanning dataset and can be used to predict fitness score of mutant amino acid sequence of protein [PTEN_HUMAN](https://www.uniprot.org/uniprotkb/P60484/entry) (Phosphatidylinositol 3,4,5-trisphosphate 3-phosphatase and dual-specificity protein phosphatase). ## Protein Function Dual-specificity protein phosphatase, dephosphorylating tyrosine-, serine- and threonine-phosphorylated proteins. Also functions as a lipid phosphatase, removing the phosphate in the D3 position of the inositol ring of PtdIns(3,4,5)P3/phosphatidylinositol 3,4,5-trisphosphate, PtdIns(3,4)P2/phosphatidylinositol 3,4-diphosphate and PtdIns3P/phosphatidylinositol 3-phosphate with a preference for PtdIns(3,4,5)P3. Furthermore, this enzyme can also act as a cytosolic inositol 3-phosphatase acting on Ins(1,3,4,5,6)P5/inositol 1,3,4,5,6 pentakisphosphate and possibly Ins(1,3,4,5)P4/1D-myo-inositol 1,3,4,5-tetrakisphosphate. ### Task type protein level regression ### Dataset description The dataset is from [Deep generative models of genetic variation capture the effects of mutations](https://www.nature.com/articles/s41592-018-0138-4). And can also be found on [SaprotHub dataset](https://huggingface.co/datasets/SaProtHub/DMS_PTEN_HUMAN). Label means fitness score of each mutant amino acid sequence, ranging from minus infinity to positive infinity, smaller means more stable. ### Model input type Amino acid sequence ### Performance 0.62 Spearman's ρ ### LoRA config lora_dropout: 0.0 lora_alpha: 16 target_modules: ["query", "key", "value", "intermediate.dense", "output.dense"] modules_to_save: ["classifier"] ### Training config class: AdamW betas: (0.9, 0.98) weight_decay: 0.01 learning rate: 1e-4 epoch: 50 batch size: 64 precision: 16-mixed