Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Sample repository

Development Status :: 2 - Pre-Alpha
Developed by MinWoo Park, 2023, Seoul, South Korea. Contact: [email protected]. Hits

danielpark/ko-llama-2-jindo-7b-instruct-4bit-128g-gptq model card

Prompt Template

### System:
{System}

### User:
{User}

### Assistant:
{Assistant}

Inference

Open In Colab Install AutoGPTQ for generating.

$ pip install auto-gptq
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM

# Set config
MODEL_NAME_OR_PATH = "danielpark/ko-llama-2-jindo-7b-instruct-4bit-128g-gptq"
MODEL_BASENAME = "gptq_model-4bit-128g"
USE_TRITON = False
MODEL, TOKENIZER = AutoGPTQForCausalLM.from_quantized(
    MODEL_NAME_OR_PATH,
    model_basename=MODEL_BASENAME,
    use_safetensors=True,
    trust_remote_code=True,
    device="cuda:0",
    use_triton=USE_TRITON,
    quantize_config=None
), AutoTokenizer.from_pretrained(MODEL_NAME_OR_PATH, use_fast=True)


def generate_text_with_model(prompt):
    prompt_template = f"{prompt}\n"
    input_ids = TOKENIZER(prompt_template, return_tensors='pt').input_ids.cuda()
    output = MODEL.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
    generated_text = TOKENIZER.decode(output[0])
    return generated_text

def generate_text_with_pipeline(prompt):
    logging.set_verbosity(logging.CRITICAL)
    pipe = pipeline(
        "text-generation",
        model=MODEL,
        tokenizer=TOKENIZER,
        max_new_tokens=512,
        temperature=0.7,
        top_p=0.95,
        repetition_penalty=1.15
    )
    prompt_template = f"{prompt}\n"
    generated_text = pipe(prompt_template)[0]['generated_text']
    return generated_text

# Example
prompt_text = "What is GPTQ?"
generated_text_model = generate_text_with_model(prompt_text)
print(generated_text_model)

generated_text_pipeline = generate_text_with_pipeline(prompt_text)
print(generated_text_pipeline)

Web Demo

I implement the web demo using several popular tools that allow us to rapidly create web UIs.

model web ui quantinized
danielpark/ko-llama-2-jindo-7b-instruct. using gradio on colab -
danielpark/ko-llama-2-jindo-7b-instruct-4bit-128g-gptq using text-generation-webui on colab gptq
danielpark/ko-llama-2-jindo-7b-instruct-ggml koboldcpp-v1.38 ggml
Downloads last month
0
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.