Introduction
This model is quantized version of Universal-NER/UniNER-7B-all.
Quantization
The quantization was applied using LLM Compressor with 512 random examples from Universal-NER/Pile-NER-definition dataset.
The recipe for quantization:
recipe = [
SmoothQuantModifier(smoothing_strength=0.8),
GPTQModifier(targets="Linear", scheme="W4A16", ignore=["lm_head"]),
]
Inference
We added chat template for the tokenizer, thus it can be directly used with vLLM without any other preprocessing compered to original model.
Example:
import json
from vllm import LLM, SamplingParams
# Loading model
llm = LLM(model="daisd-ai/UniNER-W4A16")
sampling_params = SamplingParams(temperature=0, max_tokens=256)
# Define text and entities types
text = "Some long text with multiple entities"
entities_types = ["entity type 1", "entity type 2"]
# Applying tokenizer
prompts = []
for entity_type in entities_types:
messages = [
{
"role": "user",
"content": f"Text: {text}",
},
{"role": "assistant", "content": "I've read this text."},
{"role": "user", "content":f"What describes {entity_type} in the text?"},
]
prompt = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prompts.append(prompt)
# Run inference
outputs = llm.generate(prompts, self.sampling_params)
outputs = [output.outputs[0].text for output in outputs]
# Results are returned is JSON format, parse it to python list
results = []
for lst in outputs:
try:
entities = list(set(json.loads(lst)))
except Exception:
entities = []
results.append(entities)
- Downloads last month
- 197
Model tree for daisd-ai/UniNER-W4A16
Base model
Universal-NER/UniNER-7B-all