Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
pipeline_tag: token-classification
|
4 |
+
tags:
|
5 |
+
- biology
|
6 |
+
- RNA
|
7 |
+
- Torsional
|
8 |
+
- Angles
|
9 |
+
---
|
10 |
+
# `RNA-TorsionBERT`
|
11 |
+
|
12 |
+
## Model Description
|
13 |
+
|
14 |
+
`RNA-TorsionBERT` is a 331 MB parameter BERT-based language model that predicts RNA torsional and pseudo-torsional angles from the sequence.
|
15 |
+
|
16 |
+
`RNA-TorsionBERT` is a DNABERT model that was pre-trained on ~4200 RNA structures before being fine-tuned on 185 non-redundant structures.
|
17 |
+
|
18 |
+
It provides an improvement of MAE of 6.2° over the previous state-of-the-art model, SPOT-RNA-1D, on the Test Set (composed of RNA-Puzzles and CASP-RNA).
|
19 |
+
|
20 |
+
|
21 |
+
| Model | alpha | beta | gamma | delta | epsilon | zeta | chi | eta | theta |
|
22 |
+
|------------------|----------|-------|-------|-------|---------|-------|-------|-------|-------|
|
23 |
+
| **RNA-TorsionBERT** | 37.3 | 19.6 | 29.4 | 13.6 | 16.6 | 26.6 | 14.7 | 20.1 | 25.4 |
|
24 |
+
| SPOT-RNA-1D | 45.7 | 23 | 33.6 | 19 | 21.1 | 34.4 | 19.3 | 28.9 | 33.9 |
|
25 |
+
|
26 |
+
**Key Features**
|
27 |
+
* Torsional and Pseudo-torsional angles prediction
|
28 |
+
* Predict sequences up to 512 nucleotides
|
29 |
+
|
30 |
+
## Usage
|
31 |
+
|
32 |
+
Get started generating text with `RNA-TorsionBERT` by using the following code snippet:
|
33 |
+
|
34 |
+
```python
|
35 |
+
from transformers import AutoModel, AutoTokenizer
|
36 |
+
|
37 |
+
tokenizer = AutoTokenizer.from_pretrained("sayby/rna_torsionbert", trust_remote_code=True)
|
38 |
+
model = AutoModel.from_pretrained("sayby/rna_torsionbert", trust_remote_code=True)
|
39 |
+
|
40 |
+
sequence = "ACG CGG GGT GTT"
|
41 |
+
params_tokenizer = {
|
42 |
+
"return_tensors": "pt",
|
43 |
+
"padding": "max_length",
|
44 |
+
"max_length": 512,
|
45 |
+
"truncation": True,
|
46 |
+
}
|
47 |
+
inputs = tokenizer(sequence, **params_tokenizer)
|
48 |
+
output = model(inputs)["logits"]
|
49 |
+
```
|
50 |
+
|
51 |
+
- Please note that it was fine-tuned from a DNABERT-3 model and therefore the tokenizer is the same as the one used for DNABERT. Nucleotide `U` should therefore be replaced by `T` in the input sequence.
|
52 |
+
- The output is the sinus and the cosine for each angle. The angles are in the following order: `alpha`, `beta`, `gamma`, `delta`, `epsilon`, `zeta`, `chi`, `eta`, `theta`.
|
53 |
+
|
54 |
+
To convert the predictions into angles, you can use the following code snippet:
|
55 |
+
|
56 |
+
```python
|
57 |
+
from typing import Optional
|
58 |
+
|
59 |
+
import numpy as np
|
60 |
+
|
61 |
+
ANGLES_ORDER = [ "alpha", "beta", "gamma", "delta", "epsilon", "zeta", "chi", "eta", "theta" ]
|
62 |
+
|
63 |
+
def convert_sin_cos_to_angles(output: np.ndarray, input_ids: Optional[np.ndarray] = None):
|
64 |
+
"""
|
65 |
+
Convert the raw predictions of the RNA-TorsionBERT into angles.
|
66 |
+
It converts the cos and sinus into angles using:
|
67 |
+
alpha = arctan(sin(alpha)/cos(alpha))
|
68 |
+
:param output: Dictionary with the predictions of the RNA-TorsionBERT per angle
|
69 |
+
:param input_ids: the input_ids of the RNA-TorsionBERT. It allows to only select the of the sequence,
|
70 |
+
and not the special tokens.
|
71 |
+
:return: a np.ndarray with the angles for the sequence
|
72 |
+
"""
|
73 |
+
if input_ids is not None:
|
74 |
+
output[ (input_ids == 0) | (input_ids == 1) | (input_ids == 2) | (input_ids == 3) | (input_ids == 4) ] = np.nan
|
75 |
+
pair_indexes, impair_indexes = np.arange(0, output.shape[-1], 2), np.arange(
|
76 |
+
1, output.shape[-1], 2
|
77 |
+
)
|
78 |
+
sin, cos = output[:, :, impair_indexes], output[:, :, pair_indexes]
|
79 |
+
tan = np.arctan2(sin, cos)
|
80 |
+
angles = np.degrees(tan)
|
81 |
+
return angles
|
82 |
+
|
83 |
+
output = output.cpu().detach().numpy()
|
84 |
+
input_ids = inputs["input_ids"].cpu().detach().numpy()
|
85 |
+
real_angles = convert_sin_cos_to_angles(output, input_ids)
|
86 |
+
```
|