Quantization made by Richard Erkhov.

Llama-3-Patronus-Lynx-8B-Instruct-v1.1 - GGUF

Model creator: https://huggingface.co/PatronusAI/
Original model: https://huggingface.co/PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct-v1.1/

Name	Quant method	Size
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q2_K.gguf	Q2_K	2.96GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.IQ3_XS.gguf	IQ3_XS	3.28GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.IQ3_S.gguf	IQ3_S	3.43GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q3_K_S.gguf	Q3_K_S	3.41GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.IQ3_M.gguf	IQ3_M	3.52GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q3_K.gguf	Q3_K	3.74GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q3_K_M.gguf	Q3_K_M	3.74GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q3_K_L.gguf	Q3_K_L	4.03GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.IQ4_XS.gguf	IQ4_XS	4.18GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q4_0.gguf	Q4_0	4.34GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.IQ4_NL.gguf	IQ4_NL	4.38GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q4_K_S.gguf	Q4_K_S	4.37GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q4_K.gguf	Q4_K	4.58GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q4_K_M.gguf	Q4_K_M	4.58GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q4_1.gguf	Q4_1	4.78GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q5_0.gguf	Q5_0	5.21GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q5_K_S.gguf	Q5_K_S	3.92GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q5_K.gguf	Q5_K	3.82GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q5_K_M.gguf	Q5_K_M	5.34GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q5_1.gguf	Q5_1	5.34GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q6_K.gguf	Q6_K	5.92GB
Llama-3-Patronus-Lynx-8B-Instruct-v1.1.Q8_0.gguf	Q8_0	5.93GB

Original model description:

library_name: transformers tags: - text-generation - pytorch - Lynx - Patronus AI - evaluation - hallucination-detection license: cc-by-nc-4.0 language: - en

Model Card for Model ID

Lynx is an open-source hallucination evaluation model. Patronus-Lynx-8B-Instruct-v1.1 was trained on a mix of datasets including CovidQA, PubmedQA, DROP, RAGTruth. The datasets contain a mix of hand-annotated and synthetic data. The maximum sequence length is 128000 tokens.

Model Details

Model Type: Patronus-Lynx-8B-Instruct-v1.1 is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct model.
Language: Primarily English
Developed by: Patronus AI
Paper: https://arxiv.org/abs/2407.08488
License: https://creativecommons.org/licenses/by-nc/4.0/

Model Sources

Repository: https://github.com/patronus-ai/Lynx-hallucination-detection

How to Get Started with the Model

Lynx is trained to detect hallucinations in RAG settings. Provided a document, question and answer, the model can evaluate whether the answer is faithful to the document.

To use the model, we recommend using the following prompt:

PROMPT = """
Given the following QUESTION, DOCUMENT and ANSWER you must analyze the provided answer and determine whether it is faithful to the contents of the DOCUMENT. The ANSWER must not offer new information beyond the context provided in the DOCUMENT. The ANSWER also must not contradict information provided in the DOCUMENT. Output your final verdict by strictly following this format: "PASS" if the answer is faithful to the DOCUMENT and "FAIL" if the answer is not faithful to the DOCUMENT. Show your reasoning.

--
QUESTION (THIS DOES NOT COUNT AS BACKGROUND INFORMATION):
{question}

--
DOCUMENT:
{context}

--
ANSWER:
{answer}

--

Your output should be in JSON FORMAT with the keys "REASONING" and "SCORE":
{{"REASONING": <your reasoning as bullet points>, "SCORE": <your final score>}}
"""

The model will output the score as 'PASS' if the answer is faithful to the document or FAIL if the answer is not faithful to the document.

Inference

To run inference, you can use HF pipeline:


model_name = 'PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct-v1.1'
pipe = pipeline(
          "text-generation",
          model=model_name,
          max_new_tokens=600,
          device="cuda",
          return_full_text=False
        )

messages = [
    {"role": "user", "content": prompt},
]

result = pipe(messages)
print(result[0]['generated_text'])

Since the model is trained in chat format, ensure that you pass the prompt as a user message.

For more information on training details, refer to our ArXiv paper.

Evaluation

The model was evaluated on PatronusAI/HaluBench.

Model	HaluEval	RAGTruth	FinanceBench	DROP	CovidQA	PubmedQA	Overall
GPT-4o	87.9%	84.3%	85.3%	84.3%	95.0%	82.1%	86.5%
GPT-4-Turbo	86.0%	85.0%	82.2%	84.8%	90.6%	83.5%	85.0%
GPT-3.5-Turbo	62.2%	50.7%	60.9%	57.2%	56.7%	62.8%	58.7%
Claude-3.5-Sonnet	84.5%	79.1%	69.3%	69.7%	70.8%	84.8%	83.7%
RAGAS Faithfulness	70.6%	75.8%	59.5%	59.6%	75.0%	67.7%	66.9%
Mistral-Instruct-7B	78.3%	77.7%	56.3%	56.3%	71.7%	77.9%	69.4%
Llama-3-Instruct-8B	83.1%	80.0%	55.0%	58.2%	75.2%	70.7%	70.4%
Llama-3-Instruct-70B	87.0%	83.8%	72.7%	69.4%	85.0%	82.6%	80.1%
Lynx (8B)	85.7%	80.0%	72.5%	77.8%	96.3%	85.2%	82.9%
Lynx v1.1 (8B)	87.3%	79.9%	75.6%	77.5%	96.9%	88.9%	84.3%

Citation

If you are using the model, cite using

@article{ravi2024lynx,
  title={Lynx: An Open Source Hallucination Evaluation Model},
  author={Ravi, Selvan Sunitha and Mielczarek, Bartosz and Kannappan, Anand and Kiela, Douwe and Qian, Rebecca},
  journal={arXiv preprint arXiv:2407.08488},
  year={2024}
}

Model Card Contact

@sunitha-ravi @RebeccaQian1 @presidev