togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1

This is the togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1 model but the model file(s) were sharded to ~2GB each to ensure it's possible to load on low-RAM runtimes (like Colab).

Please refer to the original model card for all details/issues w.r.t. to this model. Below as an adapted version of the inference code just as a reference.

basic inference

See the original model card for more options etc.

install packages

pip install -U transformers accelerate

inference (this will use a GPU if available):

import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM

MIN_TRANSFORMERS_VERSION = "4.25.1"

# check transformers version
assert (
    transformers.__version__ >= MIN_TRANSFORMERS_VERSION
), f"Please upgrade transformers to version {MIN_TRANSFORMERS_VERSION} or higher."

model_name = "ethzanalytics/RedPajama-INCITE-Instruct-7B-v0.1-sharded-bf16"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype=torch.bfloat16, device_map="auto"
)
# infer
prompt = "Q: The capital of France is?\nA:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
input_length = inputs.input_ids.shape[1]
outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    do_sample=True,
    temperature=0.7,
    top_p=0.7,
    top_k=50,
    return_dict_in_generate=True,
)
token = outputs.sequences[0, input_length:]
output_str = tokenizer.decode(token)
print(output_str)
"""
Paris
"""
Downloads last month
15
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train ethzanalytics/RedPajama-INCITE-Instruct-7B-v0.1-sharded-bf16