Edit model card

MatSciBERT

A Materials Domain Language Model for Text Mining and Information Extraction

This is the pretrained model presented in MatSciBERT: A materials domain language model for text mining and information extraction, which is a BERT model trained on material science research papers.

The training corpus comprises papers related to the broad category of materials: alloys, glasses, metallic glasses, cement and concrete. We have utilised the abstracts and full text of papers(when available). All the research papers have been downloaded from ScienceDirect using the Elsevier API. The detailed methodology is given in the paper.

The codes for pretraining and finetuning on downstream tasks are shared on GitHub.

If you find this useful in your research, please consider citing:

@article{gupta_matscibert_2022,
  title   = "{MatSciBERT}: A Materials Domain Language Model for Text Mining and Information Extraction",
  author  = "Gupta, Tanishq and 
            Zaki, Mohd and 
            Krishnan, N. M. Anoop and 
            Mausam",
  year    = "2022",
  month   = may,
  journal = "npj Computational Materials",
  volume  = "8",
  number  = "1",
  pages   = "102",
  issn    = "2057-3960",
  url     = "https://www.nature.com/articles/s41524-022-00784-w",
  doi     = "10.1038/s41524-022-00784-w"
}
Downloads last month
3,667
Safetensors
Model size
110M params
Tensor type
I64
·
F32
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for m3rg-iitd/matscibert

Finetunes
14 models

Space using m3rg-iitd/matscibert 1