EpistemeAI2
/

FireStorm-Llama-3.1-8B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

legolasyiu commited on 29 days ago

Commit

4c18874

•

1 Parent(s): 342e3db

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -95,7 +95,9 @@ Llama-3.1-Storm-8B is a powerful generalist model useful for diverse application
 4. 🚀 Ollama: `ollama run ajindal/llama3.1-storm:8b`
-## 💻 How to Use the Model
 The Hugging Face `transformers` library loads the model in `bfloat16` by default. This is the type used by the [Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) checkpoint, so it’s the recommended way to run to ensure the best results.
 ### Installation
@@ -160,7 +162,7 @@ print(response)  # Expected Output: '2 + 2 = 4'
 ```python
 from vllm import LLM, SamplingParams
 from transformers import AutoTokenizer
-model_id = "akjindal53244/Llama-3.1-Storm-8B"  # FP8 model: "EpistemeAI2/FireStorm-Llama-3.1-8B"
 num_gpus = 1
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 llm = LLM(model=model_id, tensor_parallel_size=num_gpus)

 4. 🚀 Ollama: `ollama run ajindal/llama3.1-storm:8b`
+---
+## 💻 How to Use the Model of EpistemeAI2's FireStorm-Llama-3.1-8B
 The Hugging Face `transformers` library loads the model in `bfloat16` by default. This is the type used by the [Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) checkpoint, so it’s the recommended way to run to ensure the best results.
 ### Installation
 ```python
 from vllm import LLM, SamplingParams
 from transformers import AutoTokenizer
+model_id = "EpistemeAI2/FireStorm-Llama-3.1-8B"  # FP8 model: "EpistemeAI2/FireStorm-Llama-3.1-8B"
 num_gpus = 1
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 llm = LLM(model=model_id, tensor_parallel_size=num_gpus)