This model looks insanely good for coding ( 73.2 for humanEval )!

#1
by mirek190 - opened

Have to make tests later ....

HumanEval 73.2 is insane ... that is better than older GPT-4 .

Right now I can say that codellama-34b is a total crap comparing to that wizard.

My command for wizardcoder - this one works the best for me for no reason ...

main --model models\new3\wizardcoder-python-34b-v1.0.Q4_K_M.gguf --mlock --color --threads 16 --keep -1 --batch_size 512 --n_predict -1 --top_k 40 --top_p 0.9 --temp 0.96 --repeat_penalty 1.1 --ctx_size 4096 --interactive --instruct --reverse-prompt "### Human:" -ngl 51 --simple-io --in-prefix " ### Instruction: " --in-suffix " ### Response: " -p "Below is an instruction that describes a task. Write a response that appropriately completes the request. "

Build a webpage in one html file with html, css and javascript code, the webpage should have 10 jokes and the button that changing background colors randomly.

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <title>Joke Page</title>
    <style>
        body {
            background-color: #f5f5f5f5;
        }
        .joke {
            width: 70%;
            margin: auto;
            padding: 2rem;
            border: 1px solid black;
            border-radius: 10px;
            box-shadow: 5px 5px 10px grey;
        }
        h1 {
            text-align: center;
            background-color: #6d73ae;
            color: white;
            padding: 1rem;
            border-radius: 10px;
            margin: 2rem 0;
        }
        button {
            display: block;
            margin: 2rem auto;
            padding: 1rem;
            background-color: #6d73ae;
            color: white;
            border-radius: 10px;
            font-size: 1.5rem;
            border: none;
        }
    </style>
</head>
<body>
    <h1>Joke Page</h1>
    <div class="joke">
        <p id="joke">Click on the button below to see a random joke!</p>
    </div>
    <button onclick="changeBackgroundColor()">Change Color</button>

    <button onclick="getRandomJoke()">Get Random Joke</button>

    <script>
        function getRandomJoke() {
            const jokes = [
                "What did the chemist say when he found two helium molecules? HeHe made a noble gas.",
                "Why are mountains so funny? Because they are hill areas.",
                "How do you make a tissue dance? Put a little boogie in it!",
                "What is the difference between ignorance and apathy? I don't know.",
                "What's the best thing about Switzerland? Not sure, but the flag is a big plus.",
                "Why can't you trust an atom? Because they make up everything!",
                "What does a baby grape say when it gets stuck in a grapefruit juicer? 'Eeeeghh!'",
                "What do you call a guy with a rubber toe? Roberto.",
                "Why did the physics teacher lose her job? She didn't have enough momentum.",
                "How does a dog stop a video? By pressing the pause button.",
                "I'm not a big fan of stairs, they always make me feel like I'm going up in life.",
                "What kind of shoes do ninjas wear? Sneakers."
            ];
            const joke = document.getElementById("joke");
            joke.innerHTML = jokes[Math.floor(Math.random() * jokes.length)];
        }
        function changeBackgroundColor() {
            let colors = ["red", "green", "blue", "yellow", "orange", "purple", "pink"];
            document.body.style.backgroundColor = colors[Math.floor(Math.random() * colors.length)];
        }
    </script>
</body>
</html>

So far works ... above GOOD.

mirek190 changed discussion title from This model looks insanely good for coding ..... to This model looks insanely good for coding ( 73.2 for humanEval )!

Have to make tests later ....

HumanEval 73.2 is insane ... that is better than older GPT-4 .

Nah, I tried it and it's not even as good for coding as the current ChatGPT-3.5.
I stay away from anything with the word "Wizard" in it: they have lots of data, but it's a very low-quality dataset.

My hope is jondurbin's 34b model. He has historically been creating the only competitive language models: see my post analyzing the performance of his 70B model: https://huggingface.co/jondurbin/airoboros-l2-70b-gpt4-1.4.1/discussions/4#64c396ed7918ee895ba54f53

You know that model is for coding only?

humanEval 73.2 <-- indicates how good is in coding.

Look how good code was generated from my question above.

Screenshot 2023-08-26 184812.png

Look how good code was generated from my question above.

Screenshot 2023-08-26 184812.png

I must admit. That's a very good result.
My test was the precise prompt:
Create a realistic looking tree using a popular Python graphics library.

Result from:
koboldcpp_nocuda.exe wizardcoder-python-34b-v1.0.Q8_0.gguf --contextsize 4096
after fixing some Python errors and logic error (to go up you need to subtract from y, not add).
image.png

ChatGPT-3.5 attempt 2 (attempt 1 was completely wrong):
image.png

As you can see, ChatGPT-3.5 was the winner in my test (especially since I didn't have to edit the Python code to get it working).
Sure, wizardcoder-python-34b is currently better than any other open-source LLM for coding, but I definitely don't buy that MMLU score (MMLU dataset was probably leaked in the training or something).

I'm just waiting until https://huggingface.co/jondurbin/airoboros-c34b-2.1 gets a GGUF.

This comment has been hidden

I tried your prompt

"Create a realistic looking tree using a popular Python graphics library."

GPT3.5 ( chat gpt ) - 10 attempts - never get a proper tree or code just not worked
GPT 4 ( bing AI ) - 3 attempts - I got some kind of tree but very deformed.

So you had to be very lucky to get working that prompt under GPT 3.5

Also tried that prompt with wizard

on the second attempt I got something like this ( did not have to fix the code to work )

Screenshot 2023-08-26 202122.png

I am using wizardcoder-python-34b-v1.0.Q4_K_M.gguf

@mirek190 I changed the prompt to try to give the best chance to wizardcoder-python-34b-v1.0.Q8_0.gguf (running in koboldcpp in CPU mode).
This is the prompt:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Provide complete working code for a realistic looking tree in Python using the Turtle graphics library and a recursive algorithm.

### Response:

Result:
image.png

GPT-3.5 result:
image.png

It seems that there are many small mistakes that GPT-3.5 just doesn't make. Not to mention that fact that ChatGPT3.5 (August version) tends to generate code that doesn't crash. Even when the code doesn't produce the desired result, there rarely any Python errors.

You should know that Bing Chat is stupid compared to ChatGPT-4 ($20 USD per month).
Here is the result of that same prompt given to ChatGPT-4 (August version):

image.png

So no. WizardCoder-Python-34B-V1.0 is not comparable to ChatGPT3.5, nor is it comparable to ChatGPT-4.

you are right .. gpt3.5 is still better ... not too far but still .... considering local AI for coding few months ago was practically not exist this one is really impassive ;D 33B vs 170B (gpt3)

Ok, so the GPT 4 version is darned impressive but with TheBloke_CodeLlama-34B-Python-GPTQ_gptq-4bit-32g-actorder_True using the Debug-deterministic parameter preset in ooga I ran the following:
(The code renders better in HTML but is much less compact so I'm using the crappy markdown render since it fits in one screen):

image.png

Which draws:
image.png

Only to say: These models are insanely impressive for being liberally licensed and running on consumer hardware.

And, I think if there was an instruct training on top of the python model as a base it might be very very good at Python.

Ok, so the GPT 4 version is darned impressive but with TheBloke_CodeLlama-34B-Python-GPTQ_gptq-4bit-32g-actorder_True using the Debug-deterministic parameter preset in ooga I ran the following:
(The code renders better in HTML but is much less compact so I'm using the crappy markdown render since it fits in one screen):

image.png

Which draws:
image.png

Only to say: These models are insanely impressive for being liberally licensed and running on consumer hardware.

And, I think if there was an instruct training on top of the python model as a base it might be very very good at Python.

so the difference in quantized versions really matters for that type of result. 🤔

what specs do you have to run that ?

Ok, so the GPT 4 version is darned impressive but with TheBloke_CodeLlama-34B-Python-GPTQ_gptq-4bit-32g-actorder_True using the Debug-deterministic parameter preset in ooga I ran the following:
(The code renders better in HTML but is much less compact so I'm using the crappy markdown render since it fits in one screen):

image.png

Which draws:
image.png

Only to say: These models are insanely impressive for being liberally licensed and running on consumer hardware.

And, I think if there was an instruct training on top of the python model as a base it might be very very good at Python.

so the difference in quantized versions really matters for that type of result. 🤔

what specs do you have to run that ?

Sorry, I should have made this more clear that I also wasn't using the wizard coder version, just the Python one from Meta. So I don't think it was strictly the quantization but the different model and the different style of prompt (the non-wizard version of the python model is a continuation model, so my prompt was a module level python docstring and a import statement to get it started.)

Edit: you asked what specs I have, I have a 3090 and a 3060 in my desktop but I'm pretty sure this 34B 4 bit GPTQ can fit in just the 24GB card alone.

@gnomealone You could try running my exact prompt on TheBloke/WizardCoder-Python-34B-V1.0-GGUF and see what kind of results you get.

""" This module generates a graph of a realistic looking tree using the Turtle graphics library and a recursive algorithm.
"""
import turtle

@mirek190 I changed the prompt to try to give the best chance to wizardcoder-python-34b-v1.0.Q8_0.gguf (running in koboldcpp in CPU mode).
This is the prompt:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Provide complete working code for a realistic looking tree in Python using the Turtle graphics library and a recursive algorithm.

### Response:

Result:
image.png

GPT-3.5 result:
image.png

It seems that there are many small mistakes that GPT-3.5 just doesn't make. Not to mention that fact that ChatGPT3.5 (August version) tends to generate code that doesn't crash. Even when the code doesn't produce the desired result, there rarely any Python errors.

You should know that Bing Chat is stupid compared to ChatGPT-4 ($20 USD per month).
Here is the result of that same prompt given to ChatGPT-4 (August version):

image.png

So no. WizardCoder-Python-34B-V1.0 is not comparable to ChatGPT3.5, nor is it comparable to ChatGPT-4.

Base code llama 34B at higher precision is much better.

image.png
image.png

For me base llama 2 code 34 is the worst .

As I thought, airoboros-c34b-2.1.Q8_0.gguf is better than any wizard model:
koboldcpp_nocuda.exe airoboros-c34b-2.1.Q8_0.gguf --contextsize 4096
Prompt:

A chat
USER: Provide complete working code for a realistic looking tree in Python using the Turtle graphics library and a recursive algorithm.
ASSISTANT:

Edit: I later discovered the the actual correct prompt template includes a period (that might affect quality of generation, who knows):

A chat.
USER: Provide complete working code for a realistic looking tree in Python using the Turtle graphics library and a recursive algorithm.
ASSISTANT:

Result (zero-shot):
image.png

Still not as good as GPT-3.5

As I thought, airoboros-c34b-2.1.Q8_0.gguf is better than any wizard model:
koboldcpp_nocuda.exe airoboros-c34b-2.1.Q8_0.gguf --contextsize 4096
Prompt:

A chat
USER: Provide complete working code for a realistic looking tree in Python using the Turtle graphics library and a recursive algorithm.
ASSISTANT:

Edit: I later discovered the the actual correct prompt template includes a period (that might affect quality of generation, who knows):

A chat.
USER: Provide complete working code for a realistic looking tree in Python using the Turtle graphics library and a recursive algorithm.
ASSISTANT:

Result (zero-shot):
image.png

Still not as good as GPT-3.5

You right I also tested that model and getting a better results than wizardcoder.... even a text responses are cleaner as sentences.

As I thought, airoboros-c34b-2.1.Q8_0.gguf is better than any wizard model:
koboldcpp_nocuda.exe airoboros-c34b-2.1.Q8_0.gguf --contextsize 4096
Prompt:

A chat
USER: Provide complete working code for a realistic looking tree in Python using the Turtle graphics library and a recursive algorithm.
ASSISTANT:

Edit: I later discovered the the actual correct prompt template includes a period (that might affect quality of generation, who knows):

A chat.
USER: Provide complete working code for a realistic looking tree in Python using the Turtle graphics library and a recursive algorithm.
ASSISTANT:

Result (zero-shot):
image.png

Still not as good as GPT-3.5

You right I also tested that model and getting a better results than wizardcoder.... even a text responses are cleaner as sentences.

Stupid question .... Did you use 16k ctx?

Look what I got with 16k ctx ....on first attempt

Screenshot 2023-08-29 005136.png

Instruction: Provide complete working code for a realistic looking tree in Python using the Turtle graphics library and a recursive algorithm.

Response: Here is an implementation of a recursive algorithm to draw a fractal tree using Python's turtle library. It uses the Sierpinski triangle technique to create a realistic-looking tree with random branching, size, and color variations:

import turtle
import random


def draw_branch(branch_length, angle, level):
    if level == 0:
        return

    # Draw the branch
    turtle.pensize(level)
    turtle.forward(branch_length)

    # Randomly set the branch's color
    r = random.random()
    g = random.random()
    b = random.random()
    turtle.color(r, g, b)


    # Turn left and right by some randomly generated angle to create the branching effect
    turtle.left(angle)
    draw_branch(branch_length * 0.8, angle * 1.15, level - 1)  # Recursively draw the left branch

    turtle.right(2*angle)
    draw_branch(branch_length * 0.8, angle * 1.15, level - 1)  # Recursively draw the right branch

    turtle.left(angle)
    turtle.backward(branch_length)

# Set up the turtle and screen
turtle.setup(600, 600)
turtle.speed(0)
turtle.bgcolor('black')

turtle.penup()
turtle.goto(-300, -250)
turtle.pendown()
turtle.left(90)


# Start the recursive drawing process with a branch length of 150 and an angle of 45 degrees at level 7
draw_branch(150, 45, 7)


turtle.done()

In this code snippet, we define the draw_branch function that takes in three arguments: branch_length, angle, and level. The branch length controls how long each branch of the tree is, and the angle determines the size of the branches as they get smaller. The level represents the depth of recursion. At level 0, the algorithm stops drawing new branches and just returns.

Inside the function, we set the turtle's pen color to a random RGB value using the random module. We then move forward by branch_length pixels, turn left by angle, draw the left branch (by calling draw_branch again with 80% of the length and the same angle), turn right by twice angle, and then draw the right branch. Finally, we come back to the original position by moving backwards.

To use this function, we set up the turtle screen and start at a specific location on the canvas with turtle.setup and turtle.goto. We also set the background color to black using turtle.bgcolor and the speed of the drawing process to 0 (fastest) using turtle.speed. Then, we start the recursive drawing process with a branch length of 150 pixels, an angle of 45 degrees, and a level of 7 (which will result in approximately 32,0000 branches). Finally, we call turtle.done() to display the tree on the screen and wait for user input before exiting.

The resulting image should look something like this:

Realistic Tree

You can adjust the initial branch length, angle, and level to get different shapes and sizes of trees. Here's an example with longer branches, a smaller angle, and a lower level:

draw_branch(250, 30, 6)

It is not as good as gpt-4. I had it code a simple tic-tac-toe game using monte carlo for the AI. It mostly worked, but it had an error where it was trying to pass 2 arguments to a method that could only accept 1. Anyway, I copied the code into gpt-4 and asked it what was wrong and to fix it. It found every issue (3 of them) and fixed them all perfectly. The model is quite nice (free!), but let's not kid ourselves about it being better than you-know-who. At least not yet.

Sign up or log in to comment