Update README.md

6ba9143 11 months ago

3.81 kB

	---
	inference: false
	datasets:
	- bigcode/commitpackft
	model-index:
	- name: patched-coder-34b
	results:
	- task:
	type: text-generation
	dataset:
	type: openai_humaneval
	name: HumanEval
	metrics:
	- name: pass@1
	type: pass@1
	value: 53.567
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode/humanevalpack
	name: HumanEvalFix Python
	metrics:
	- name: pass@1
	type: pass@1
	value: 41.341
	verified: false
	- task:
	type: text-generation
	dataset:
	type: patched-codes/static-analysis-eval
	name: Static Analysis Eval
	metrics:
	- name: pass@1
	type: pass@1
	value: 51.316
	verified: false
	license: llama2
	---
	# Model Card for patched-coder-34b

	This is an instruction fine-tuned model focussed on the task of patching code. Patching may include fixing bugs, remediating security vulnerabilities,
	doing API migrations and other kinds of code maintenance.

	## Model Details

	### Model Description

	- Developed by: [codelion](https://huggingface.co/codelion)
	- Model type: Code Llama
	- Finetuned from model: [CodeLlama-34b-Python](https://huggingface.co/codellama/CodeLlama-34b-Python-hf)


	## How to Get Started with the Model

	Make sure to install Transformers from the main git branch:

	```bash
	pip install git+https://github.com/huggingface/transformers.git
	```

	## How to Prompt the Model

	This model accepts the alpaca instruction format.

	For example:

	```
	### Instruction:
	{instruction}

	### Input:
	{input}

	### Response:
	...
	```

	## Bias, Risks, and Limitations

	This model has undergone very limited testing. Additional safety testing should be performed before any real-world deployments.

	## Training Details

	- GPU: A100 80 GB
	- Time: ~8 hrs

	### Training Data

	The model was fine-tuned on [commitpackft](https://huggingface.co/datasets/bigcode/commitpackft), an open dataset consisting of commits.
	We started with the commits for the `python` langauge from the dataset and then filtered all the commits that were related to fixing bugs.

	### Training Procedure

	Instruction fine-tuning to follow instructions in natural langauge related to code. We load the quantized base model in 4 bits
	and then use QLoRA for Parameter-Efficient Fine-Tuning (PEFT) with Flash Attention. The model was trained for 2 epochs.

	#### Training Hyperparameters

	Training regime:

	The following `bitsandbytes` quantization config was used during training:
	- quant_method: bitsandbytes
	- load_in_8bit: False
	- load_in_4bit: True
	- llm_int8_threshold: 6.0
	- llm_int8_skip_modules: None
	- llm_int8_enable_fp32_cpu_offload: False
	- llm_int8_has_fp16_weight: False
	- bnb_4bit_quant_type: nf4
	- bnb_4bit_use_double_quant: True
	- bnb_4bit_compute_dtype: bfloat16

	## Evaluation

	We evaluated the model on `HumanEval` (for code generation) and `HumanEvalFix Python` (for bug fixing) benchmarks using
	[Code Generation LM Evaluation Harness](https://github.com/bigcode-project/bigcode-evaluation-harness).

	To evaluate the model for vulnerability remediation we used the `Static Analysis Eval` benchmark available [here](https://huggingface.co/datasets/patched-codes/static-analysis-eval).

	### Results

	\| Model \| HumanEval \| HumanEval Fix Python\| Static Analysis Eval \|
	\| ----- \| ----------\| ------------------- \| -------------------- \|
	\| patched-coder-34b \| 53.57 \| 41.34 \| 51.32 \|
	\| CodeLlama-34b-Python \| 53.29 \| 33.14 \| 27.63 \|
	\| GPT-4 \| 86.6 \| 47 \| 55.26 \|

	Based on the results on these benchmarks, patched-coder-34b is the SOTA open code LLM. Other code LLMs (e.g. from WizardCoder and Phind) are trained on
	either unknown proprietary datasets or used OpenAI's APIs for training, thus making them unviable for commercial use.