Please help with the input datasets
hello, thanks for your great work !!!
I want to redo the cell_classification.ipynb analysis
But I can find the input files:
# load train dataset (includes all tissues)
train_dataset=load_from_disk("/path/to/cell_type_train_data.dataset")
# load evaluation dataset (includes all tissues)
eval_dataset=load_from_disk("/path/to/cell_type_test_data.dataset")
Could you help with this problem ? Thanks !!!
I have the same issue - cannot find the cell_type_train_data.dataset and gene_train_data.dataset - please advise where to source or share a link. Cheers. Hannah
[Edited to reflect updated example input files directory in dataset repository]
Thank you for your interest in Geneformer. The example notebooks are meant to be a generally applicable examples for fine-tuning for gene or cell classification, so these lines are where you would supply your own tokenized single cell RNAseq data that you are interested in fine-tuning the model with. There is not one dataset that is meant to be supplied here. Please see the last section of the model card for information about fine-tuning and example input files in the dataset repository. If you are interested in replicating any of the other gene or cell classification tasks described in the manuscript, please email me to request the dataset of interest and I would be happy to provide it.