Edit model card

SandLogic Technologies - Quantized Nxcode-CQ-7B-orpo Models

Model Description

We have quantized the Nxcode-CQ-7B-orpo model into two variants:

  1. Q5_KM
  2. Q4_KM

These quantized models offer improved efficiency while maintaining performance.

Discover our full range of quantized language models by visiting our SandLogic Lexicon GitHub. To learn more about our company and services, check out our website at SandLogic.

Original Model Information

  • Name: Nxcode-CQ-7B-orpo
  • Base Model: Qwen/CodeQwen1.5-7B
  • Fine-tuning Approach: Monolithic Preference Optimization without Reference Model
  • Fine-tuning Data: 100k samples of high-quality ranking data
  • Model Type: Transformer-based decoder-only language model
  • Parameters: 7 billion
  • Context Length: 64K tokens
  • Supported Languages: 92 coding languages

Model Capabilities

Nxcode-CQ-7B-orpo is designed for code-related tasks, with strong performance in:

  • Code generation
  • Long context understanding and generation
  • Text-to-SQL conversion
  • Bug fixing

Performance

Evalplus benchmark results:

  • HumanEval pass@1: 86.6
  • HumanEval+ pass@1: 83.5
  • MBPP (v0.2.0) pass@1: 82.3
  • MBPP+ (v0.2.0) pass@1: 70.4

Use Cases

  1. Code Generation: Create Python code based on function descriptions or partial implementations
  2. Code Completion: Suggest completions for partially written code
  3. Error Understanding: Potential to help identify and explain coding errors
  4. Programming Education: Provide explanations and examples of coding concepts and patterns

Model Variants

We offer two quantized versions of the Nxcode-CQ-7B-orpo model:

  1. Q5_KM: 5-bit quantization using the KM method
  2. Q4_KM: 4-bit quantization using the KM method

These quantized models aim to reduce model size and improve inference speed while maintaining performance as close to the original model as possible.

Input and Output

  • Input: Text string (e.g., function descriptions, partial code implementations)
  • Output: Generated code, completions, or explanations based on the input

Usage

pip install llama-cpp-python 

Please refer to the llama-cpp-python documentation to install with GPU support.

Basic Text Completion

Here's an example demonstrating how to use the high-level API for basic text completion:

from llama_cpp import Llama

llm = Llama(
    model_path="./models/7B/Nxcode-CQ-7b.gguf",
    verbose=False,
    # n_gpu_layers=-1, # Uncomment to use GPU acceleration
    # n_ctx=2048, # Uncomment to increase the context window
)

output = llm.create_chat_completion(
    messages = [
        {"role": "system", "content": "You're an AI coding  assistant who help in solving coding questions"},
        {
            "role": "user",
            "content": "Write an python code to find prime number"
        }
    ]
)

print(output["choices"][0]['message']['content'])

Download

You can download Llama models in gguf format directly from Hugging Face using the from_pretrained method. This feature requires the huggingface-hub package.

To install it, run: pip install huggingface-hub

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="SandLogicTechnologies/Nxcode-CQ-7B-orpo-GGUF",
    filename="*Nxcode-CQ-7B-orpo-Q5_K_M.gguf",
    verbose=False
)

By default, from_pretrained will download the model to the Hugging Face cache directory. You can manage installed model files using the huggingface-cli tool.

Acknowledgements

We thank the original developers of Nxcode-CQ-7B-orpo and Qwen/CodeQwen1.5-7B for their contributions to the field.Special thanks to Georgi Gerganov and the entire llama.cpp development team for their outstanding contributions.

Contact

For any inquiries or support, please contact us at [email protected] or visit our support page.

Downloads last month
23
GGUF
Model size
7.25B params
Architecture
qwen2

4-bit

5-bit

Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for SandLogicTechnologies/Nxcode-CQ-7B-orpo-GGUF

Quantized
this model