InstaDeepAI
/

nucleotide-transformer-500m-1000g

Inference Endpoints

Model card Files Files and versions Community

hdallatorre commited on Apr 25, 2023

Commit

42876a8

•

1 Parent(s): 2c07731

Update README.md

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ tags:
 The Nucleotide Transformers are a collection of foundational language models that were pre-trained on DNA sequences from whole-genomes. Compared to other approaches, our models do not only integrate information from single reference genomes, but leverage DNA sequences from over 3,200 diverse human genomes, as well as 850 genomes from a wide range of species, including model and non-model organisms. Through robust and extensive evaluation, we show that these large models provide extremely accurate molecular phenotype prediction compared to existing methods
-Part of this collection is the **nucleotide-transformer-500m-1000g**, a 500M parameters transformer pre-trained on a collection of 3202 genetically diverse human genomes.
 **Developed by:** InstaDeep, NVIDIA and TUM
@@ -25,6 +25,12 @@ Part of this collection is the **nucleotide-transformer-500m-1000g**, a 500M par
 ### How to use
 <!-- Need to adapt this section to our model. Need to figure out how to load the models from huggingface and do inference on them -->
 ```python
 from transformers import AutoTokenizer, AutoModelForMaskedLM
 import torch

 The Nucleotide Transformers are a collection of foundational language models that were pre-trained on DNA sequences from whole-genomes. Compared to other approaches, our models do not only integrate information from single reference genomes, but leverage DNA sequences from over 3,200 diverse human genomes, as well as 850 genomes from a wide range of species, including model and non-model organisms. Through robust and extensive evaluation, we show that these large models provide extremely accurate molecular phenotype prediction compared to existing methods
+Part of this collection is the **nucleotide-transformer-500m-1000g**, a 500M parameters transformer pre-trained on a collection of 3202 genetically diverse human genomes. The model is made available both in Tensorflow and Pytorch.
 **Developed by:** InstaDeep, NVIDIA and TUM
 ### How to use
 <!-- Need to adapt this section to our model. Need to figure out how to load the models from huggingface and do inference on them -->
+Until its next release, the `transformers` library needs to be installed from source with the following command in order to use the models:
+```bash
+pip install --upgrade git+https://github.com/huggingface/transformers.git
+```
+A small snippet of code is given here in order to retrieve both logits and embeddings from a dummy DNA sequence.
 ```python
 from transformers import AutoTokenizer, AutoModelForMaskedLM
 import torch