PandasSolver

PandasSolver is a fine-tuned generative text models with 7 billion parameters. It achieves 54.98% on DS-1000 Pandas Completion tasks, which is ~11% better than GPT-4 (43.99%, tested with August 2023 version).

Model Use

To use this model, please make sure to install transformers from main until the next version is released:

pip install git+https://github.com/huggingface/transformers.git@main accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "Transform72/PandasSolver"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

sample_prompt = """
PROBLEM:
You have been given a dataset that contains information about students, including their names, ages, grades, and favorite subjects. You need to perform the following tasks using Pandas:

1. Load the dataset into a Pandas DataFrame named "students_df". The dataset is provided as a CSV file named "students.csv".

2. Find the maximum and minimum ages of the students.

3. Create a pivot table that shows the average grades of students for each favorite subject. The pivot table should have the subjects as columns and the average grades as values.

4. Calculate the sum of ages for students who have the same favorite subject.
"""

sequences = pipeline(
    sample_prompt,
    do_sample=True,
    temperature=0.2,
    top_p=0.95,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=512,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Model Details

Model Developers Transform72

Model Architecture PandasSolver is an auto-regressive language model that uses codellama transformer architecture, and fine tuned on WizardCoder.

Intended Use

Intended Use Cases Given the relative small number of parameters, this model may need the prompt to be as detailed as the sample example above to perform well.

Out-of-Scope Uses Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for [Code Llama] (https://huggingface.co/codellama/CodeLlama-7b-Python-hf).

Training Data

~24 million tokens were used to fine tune the model. They are all high quality Pandas question & answer pairs.

Evaluation Results

Performance on DS-1000 Completion:

Pandas Avg. Acc: 54.98%
Numpy Avg. Acc: 36.36%
Matplotlib Avg. Acc: 52.90%
Tensorflow Avg. Acc: 28.89%
Scipy Avg. Acc: 29.25%
Sklearn Avg. Acc: 25.22%
Pytorch Avg. Acc: 27.94%
DS-1000 Avg. Acc: 41.40%

Although it is fine-tuned on Pandas Q&A pairs, it has also achieved good improvements on other libraries (except for Tensorflow):

Pandas Avg. Acc: +38.83%
Numpy Avg. Acc: +10.91%
Matplotlib Avg. Acc: +1.29%
Tensorflow Avg. Acc: -2.22%
Scipy Avg. Acc: +11.33%
Sklearn Avg. Acc: +6.09%
Pytorch Avg. Acc: +7.35%
DS-1000 Avg. Acc: +16.2%

Ethical Considerations and Limitations

PandasSolver is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, PandasSolver’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate or objectionable responses to user prompts. Therefore, before deploying any applications of PandasSolver, developers should perform safety testing and tuning tailored to their specific applications of the model.