maddes8cht
/

adept-persimmon-8b-base-gguf

GGUF

Inference Endpoints

Model card Files Files and versions Community

maddes8cht commited on Nov 1, 2023

Commit

81d35d0

•

1 Parent(s): fe2124c

"Update README.md"

Browse files

Files changed (1) hide show

README.md +7 -19

README.md CHANGED Viewed

@@ -9,21 +9,7 @@ I'm constantly enhancing these model descriptions to provide you with the most r
 - Model creator: [adept](https://huggingface.co/adept)
 - Original model: [persimmon-8b-base](https://huggingface.co/adept/persimmon-8b-base)
-# Important Update for Falcon Models in llama.cpp Versions After October 18, 2023
-As noted on the [Llama.cpp GitHub repository](https://github.com/ggerganov/llama.cpp#hot-topics), all new Llama.cpp releases after October 18, 2023, will require a re-quantization due to the new BPE tokenizer.
-**Good news!** I am glad that my re-quantization process for Falcon Models is nearly complete. Download the latest quantized models to ensure compatibility with recent llama.cpp software.
-**Key Points:**
-- **Stay Informed:** Keep an eye on software application release schedules using llama.cpp libraries.
-- **Monitor Upload Times:** Re-quantization is *almost* done. Watch for updates on my Hugging Face Model pages.
-**Important Compatibility Note:** Old software will work with old Falcon models, but expect updated software to exclusively support the new models.
-This change primarily affects **Falcon** and **Starcoder** models, with other models remaining unaffected.
@@ -35,19 +21,21 @@ The core project making use of the ggml library is the [llama.cpp](https://githu
 # Quantization variants
-There is a bunch of quantized files available. How to choose the best for you:
 # Legacy quants
 Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
 Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
-Falcon 7B models cannot be quantized to K-quants.
 # K-quants
-K-quants are based on the idea that the quantization of certain parts affects the quality in different ways. If you quantize certain parts more and others less, you get a more powerful model with the same file size, or a smaller file size and lower memory load with comparable performance.
 So, if possible, use K-quants.
-With a Q6_K you should find it really hard to find a quality difference to the original model - ask your model two times the same question and you may encounter bigger quality differences.

 - Model creator: [adept](https://huggingface.co/adept)
 - Original model: [persimmon-8b-base](https://huggingface.co/adept/persimmon-8b-base)
+Persimmon is a Large language Model from Adept AI. It is trained from Scratch with a context legth of 16k, which is 4 times the context size of LLaMA2 and 8 times that of GPT-3
 # Quantization variants
+There is a bunch of quantized files available to cater to your specific needs. Here's how to choose the best option for you:
 # Legacy quants
 Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
 Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
+## Note:
+Now there's a new option to use K-quants even for previously 'incompatible' models, although this involves some fallback solution that makes them not *real* K-quants. More details can be found in affected model descriptions.
+(This mainly refers to Falcon 7b and Starcoder models)
 # K-quants
+K-quants are designed with the idea that different levels of quantization in specific parts of the model can optimize performance, file size, and memory load.
 So, if possible, use K-quants.
+With a Q6_K, you'll likely find it challenging to discern a quality difference from the original model - ask your model two times the same question and you may encounter bigger quality differences.