BERT base for SMILES
This is bidirectional transformer pretrained on SMILES (simplified molecular-input line-entry system) strings.
Example: Amoxicillin
O=C([C@@H](c1ccc(cc1)O)N)N[C@@H]1C(=O)N2[C@@H]1SC([C@@H]2C(=O)O)(C)C
Two training objectives were used:
- masked language modeling
- molecular-formula validity prediction
Intended uses
This model is primarily aimed at being fine-tuned on the following tasks:
- molecule classification
- molecule-to-gene-expression mapping
- cell targeting
How to use in your code
from transformers import BertTokenizerFast, BertModel
checkpoint = 'unikei/bert-base-smiles'
tokenizer = BertTokenizerFast.from_pretrained(checkpoint)
model = BertModel.from_pretrained(checkpoint)
example = 'O=C([C@@H](c1ccc(cc1)O)N)N[C@@H]1C(=O)N2[C@@H]1SC([C@@H]2C(=O)O)(C)C'
tokens = tokenizer(example, return_tensors='pt')
predictions = model(**tokens)
- Downloads last month
- 1,961
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.