--- license: apache-2.0 datasets: - cxllin/medinstructv2 language: - en library_name: transformers pipeline_tag: question-answering tags: - medical --- `StableMed` is a 3 billion parameter decoder-only language model fine tuned on 18k rows of medical questions over 1 epoch. ## Usage Get started generating text with `StableMed` by using the following code snippet: ```python from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("cxllin/StableMed-3b") model = AutoModelForCausalLM.from_pretrained( "stabilityai/stablelm-3b-4e1t", trust_remote_code=True, torch_dtype="auto", ) model.cuda() inputs = tokenizer("The weather is always wonderful", return_tensors="pt").to("cuda") tokens = model.generate( **inputs, max_new_tokens=64, temperature=0.75, top_p=0.95, do_sample=True, ) print(tokenizer.decode(tokens[0], skip_special_tokens=True)) ``` ### Model Architecture The model is a decoder-only transformer similar to the LLaMA ([Touvron et al., 2023](https://arxiv.org/abs/2307.09288)) architecture with the following modifications: | Parameters | Hidden Size | Layers | Heads | Sequence Length | |----------------|-------------|--------|-------|-----------------| | 2,795,443,200 | 2560 | 32 | 32 | 4096 | * **Position Embeddings**: Rotary Position Embeddings ([Su et al., 2021](https://arxiv.org/abs/2104.09864)) applied to the first 25% of head embedding dimensions for improved throughput following [Black et al. (2022)](https://arxiv.org/pdf/2204.06745.pdf). * **Normalization**: LayerNorm ([Ba et al., 2016](https://arxiv.org/abs/1607.06450)) with learned bias terms as opposed to RMSNorm ([Zhang & Sennrich, 2019](https://arxiv.org/abs/1910.07467)). * **Tokenizer**: GPT-NeoX ([Black et al., 2022](https://arxiv.org/abs/2204.06745)).