Alternative quantizations.
These are my own quantizations (updated almost daily).
The difference with normal quantizations is that I quantize the output and embed tensors to f16.
and the other tensors to 15_k,q6_k or q8_0.
This creates models that are little or not degraded at all and have a smaller size.
They run at about 3-6 t/sec on CPU only using llama.cpp
And obviously faster on computers with potent GPUs
Example Usage:llama-cli -m /content/gemma-2-9b-it.q5_k.gguf -t 2 -ngl 99 -p "User: Hi\nBot:Hi\nUser: Tell me all you know about LLMs in 1000 tokens.\nBot:"
Large Language Models (LLMs) are a type of artificial intelligence (AI) that excel at understanding and generating human-like text. They are trained on massive datasets of text and code, enabling them to learn patterns, grammar, and contextual nuances of language.
Key Characteristics of LLMs:
- Generative: LLMs can create new text, translate languages, write different kinds of creative content, and answer your questions in an informative way.
- Contextual Understanding: They can analyze text and understand the relationships between words and sentences, allowing for more coherent and meaningful responses.
- Scale and Training Data: LLMs are typically trained on vast amounts of data, which is crucial for their performance and ability to generalize to new tasks.
- Transformer Architecture: Many powerful LLMs, like GPT-3 and BERT, are based on the transformer architecture, which allows them to process and understand long-range dependencies in text.
Applications of LLMs:
- Chatbots and Conversational AI: LLMs power chatbots that can engage in natural-sounding conversations with humans.
- Text Generation: They can generate creative content such as stories, poems, articles, and marketing copy.
- Language Translation: LLMs can translate text from one language to another with high accuracy.
- Code Generation: Some LLMs have been trained on code and can assist developers in writing and debugging code.
- Summarization and Information Extraction: LLMs can summarize large amounts of text and extract key information.
Challenges and Considerations:
- Bias and Fairness: LLMs can inherit biases present in the training data, leading to unfair or discriminatory outputs.
- Explainability: It can be difficult to understand how LLMs arrive at their outputs, which can raise concerns about transparency and accountability.
- Misinformation and Malicious Use: LLMs can be used to generate convincing fake news, propaganda, or spam.
Future Directions:
Research in LLMs is rapidly progressing, with ongoing efforts to address the challenges and explore new applications. Some key areas of development include:
- Improving Fairness and Bias Mitigation: Techniques are being developed to identify and mitigate biases in LLMs.
- Enhancing Explainability: Researchers are working on methods to make LLM decision-making more transparent.
- Multimodality: Integrating LLMs with other modalities, such as vision and audio, to enable more comprehensive understanding and generation.
Let me know if you have any more questions about LLMs! [end of text]
Statistics on colab CPU ONLY:
llama_print_timings: load time = 51762.26 ms
llama_print_timings: sample time = 226.65 ms / 522 runs ( 0.43 ms per token, 2303.16 tokens per second)
llama_print_timings: prompt eval time = 27039.12 ms / 30 tokens ( 901.30 ms per token, 1.11 tokens per second)
llama_print_timings: eval time = 627527.87 ms / 521 runs ( 1204.47 ms per token, 0.83 tokens per second)
llama_print_timings: total time = 656354.94 ms / 551 tokens
Log end