Edit model card

πŸ’« Community Model> Llama 3 ChatQA 1.5 70B by NVIDIA

πŸ‘Ύ LM Studio Community models highlights program. Highlighting new & noteworthy models by the community. Join the conversation on Discord.

Model creator: nvidia
Original model: Llama3-ChatQA-1.5-70B
GGUF quantization: provided by bartowski based on llama.cpp release b2777

Model Summary:

ChatQA 1.5 is a series of models trained to excel at RAG (retrieval augmented generation) tasks.
This model may work for general uses, but it primarily meant for use as a context sumarizer or context extraction.
Using the context provided after the system message, the model is able to provide contextual and accurate answers to queries.

Prompt Template:

For now, you'll need to make your own template. Choose the LM Studio Blank Preset in your LM Studio.

Then, set the system prompt to whatever you'd like (check the recommended one below), and set the following values:
System Message Prefix: 'System: '
User Message Prefix: '\n\nUser: '
User Message Suffix: '\n\nAssistant: <|begin_of_text|>'

If you want to provide context, place that in the system message suffix like so:

System Message Suffix: '\n\n{context}'

Under the hood, the model will see a prompt that's formatted like:

System: This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context.

This is some context

User: {Question}

Assistant: 

nVidia also seems to recommend starting your query with "Please give a full and complete answer for the question."

Technical Details

Llama3-ChatQA-1.5 excels at conversational question answering (QA) and retrieval-augmented generation (RAG). Llama3-ChatQA-1.5 is developed using an improved training recipe from ChatQA (1.0), and it is built on top of Llama-3 base model.
Specifically, more conversational QA data was used to enhance its tabular and arithmetic calculation capability.

Special thanks

πŸ™ Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.

πŸ™ Special thanks to Kalomaze for his dataset (linked here) that was used for calculating the imatrix for the IQ1_M and IQ2_XS quants, which makes them usable even at their tiny size!

Disclaimers

LM Studio is not the creator, originator, or owner of any Model featured in the Community Model Program. Each Community Model is created and provided by third parties. LM Studio does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any Community Model. You understand that Community Models can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. Each Community Model is the sole responsibility of the person or entity who originated such Model. LM Studio may not monitor or control the Community Models and cannot, and does not, take responsibility for any such Model. LM Studio disclaims all warranties or guarantees about the accuracy, reliability or benefits of the Community Models. LM Studio further disclaims any warranty that the Community Model will meet your requirements, be secure, uninterrupted or available at any time or location, or error-free, viruses-free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to the Community Models, your downloading of any Community Model, or use of any other Community Model provided by or through LM Studio.

Downloads last month
168
GGUF
Model size
70.6B params
Architecture
llama

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for lmstudio-community/Llama3-ChatQA-1.5-70B-GGUF

Quantized
(3)
this model