|
--- |
|
license: mit |
|
--- |
|
This model is provided to compare with official ESM-2 35M model. It only receives residue sequence but shares the same vocabulary with normal SaProt, |
|
which means all structure(3Di) tokens are marked as ``#`` during training. |
|
|
|
### Huggingface model |
|
The following code shows how to load the model. |
|
``` |
|
from transformers import EsmTokenizer, EsmForMaskedLM |
|
|
|
model_path = "/your/path/to/SaProt_35M_AF2_seqOnly" |
|
tokenizer = EsmTokenizer.from_pretrained(model_path) |
|
model = EsmForMaskedLM.from_pretrained(model_path) |
|
|
|
#################### Example #################### |
|
device = "cuda" |
|
model.to(device) |
|
|
|
seq = "M#E#V#Q#L#V#Q#Y#K#" |
|
tokens = tokenizer.tokenize(seq) |
|
print(tokens) |
|
|
|
inputs = tokenizer(seq, return_tensors="pt") |
|
inputs = {k: v.to(device) for k, v in inputs.items()} |
|
|
|
outputs = model(**inputs) |
|
print(outputs.logits.shape) |
|
|
|
""" |
|
['M#', 'E#', 'V#', 'Q#', 'L#', 'V#', 'Q#', 'Y#', 'K#'] |
|
torch.Size([1, 11, 446]) |
|
""" |
|
``` |