fsaudm
/

Meta-Llama-3.1-70B-Instruct-NF4

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Edit model card

Model Card for Model ID

This is a quantized version of Llama 3.1 70B Instruct. Quantized to 4-bit using bistandbytes and accelerate.

Developed by: Farid Saud @ DSRS
License: llama3.1
Base Model: meta-llama/Meta-Llama-3.1-70B-Instruct

Use this model

Use a pipeline as a high-level helper:

# Use a pipeline as a high-level helper
from transformers import pipeline

messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="fsaudm/Meta-Llama-3.1-70B-Instruct-NF4")
pipe(messages)

Load model directly

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("fsaudm/Meta-Llama-3.1-70B-Instruct-NF4")
model = AutoModelForCausalLM.from_pretrained("fsaudm/Meta-Llama-3.1-70B-Instruct-NF4")

The base model information can be found in the original meta-llama/Meta-Llama-3.1-70B-Instruct

Downloads last month: 775

Safetensors

Model size

37.4B params

Tensor type

F32

·

FP16

·

U8

·

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for fsaudm/Meta-Llama-3.1-70B-Instruct-NF4

Base model

meta-llama/Llama-3.1-70B

Finetuned

meta-llama/Llama-3.1-70B-Instruct

Quantized

(85)

this model

Collection including fsaudm/Meta-Llama-3.1-70B-Instruct-NF4

Meta-Llama-3.1-Quantized

Collection of quantized Llama 3.1 models (8B & 70B versions for now), using bitsandbites. • 4 items • Updated Aug 28 • 1

Evaluation results

Metadata error: specify a dataset to view leaderboard