M2 - small CNN trained on embeddings
The model is trained on ProtBert-BFD embeddings of knots_AF dataset to recognize between knotted and unknotted proteins based on their amino acid sequence.
Accuracy on the test set:
Dataset size | Unknotted set size | Accuracy | TPR | TNR | |
---|---|---|---|---|---|
All | 39412 | 19718 | 0.9690 | 0.9569 | 0.9811 |
SPOUT | 7371 | 550 | 0.9712 | 0.9815 | 0.8436 |
TDD | 612 | 24 | 0.9673 | 0.9796 | 0.6667 |
DUF | 716 | 429 | 0.9413 | 0.8955 | 0.9720 |
AdoMet synthase | 1794 | 240 | 0.9727 | 0.9755 | 0.9542 |
Carbonic anhydrase | 1531 | 539 | 0.8870 | 0.8619 | 0.9332 |
UCH | 477 | 125 | 0.8700 | 0.8892 | 0.816 |
ATCase/OTCase | 3799 | 3352 | 0.9932 | 0.9418 | 1.0 |
ribosomal-mitochondrial | 147 | 41 | 0.8163 | 0.8319 | 0.7805 |
membrane | 8309 | 1577 | 0.9740 | 0.9857 | 0.9239 |
VIT | 14347 | 12639 | 0.9742 | 0.8214 | 0.9948 |
biosynthesis of lantibiotics | 392 | 286 | 0.9388 | 0.8019 | 0.9895 |