Broken Model

#2
by Nycoorias - opened

Either spites out gibberish or writes one sentence before looping.
The previous version had none of these issues.

Example:
Tell me about Anthros.

34b:
Anthros are a fascinating race of anthropomorphic, bipedal creatures that coexist with humans in our world. They come in various forms, such as wolves, cats, dogs, foxes, birds, and more. Humans and anthros generally see themselves as equals despite their differences in size and strength.

34b-200k:
* * * * * * * * *

Cognitive Computations org
edited Dec 8, 2023

Might be something with your settings.
I downloaded Bloke's gguf (dolphin-2.2-yi-34b-200k.Q4_K_M.gguf), and I ran in on obabooga for past half hour, found no issues with the model at all, so far at least.

I use Koboldcpp and all I manage to produce is world vomit, no matter what setting I tic or how I change the sliders, the model is broken!

Cognitive Computations org

If model would be broken, it should perform equally poorly on every inference setup, not just your Koboldcpp setup. Now it sounds more like your Koboldcpp is broken, or quants you're using are garbled-up.
Test it again on latest obbabooga UI, with some already quantized model by Bloke, to see if you can reproduce the same behavior, and let us know if the issue persist.

I use the exact same quants you use I don’t know what’s going on. I even re downloaded them today in case the download got damaged somehow.

I will have a few thinks to do the coming days but I will see what I can do.

Cognitive Computations org
edited Dec 9, 2023

I generally test, at least briefly, almost every Dolphin release, and this is one of the few to cause me the least issues out of the box, to be more precise - none so far. Due that I'm inclined to blame your issues on Koboldcpp, because that seem to be the only difference between your and mine setups.
Are you using ChatML format with this model?

Yes I do, I made sure to have identical settings between 34b and 34b-200k. And I had the opposed experience: none of the dolphin models had issues except this one.
I’ll ask in the kobold discord if other people also had problems using Kcpp.

Edit:
Some people reported similar issues to me, having to do with the bos token (seems to be a conmen issue with 200k Yi fine-tunes).
I was unable to find out if the previous dolphin-2.2-yi-34b works for other people (it does for me) thou.

I'm running a fresh install of the latest version of Koboldcpp, and I'm not having any of the issues you've described. There were bos token issues like 2 weeks ago, but it didn't affect the model in this way, and assuming you have the latest versions of llamacpp or koboldcpp, then you shouldn't run into issues related to it. Maybe you accidentally set a custom rope frequency or scale, or there's an issue with your installation.

I have tried everything, re download the model, re download Kcpp, different settings, default launcher options, different prompts, every combinations of the above, and it. Is. Still. Fucked!
I don’t know what you haver done in order to make it work but I just can’t.

Cognitive Computations org

Sorry to hear you are still having issues, but I really don't know what else to recommend, apart from testing it on obabooga perhaps.
I still haven't run in to any issues with it, it's been following instructions so far, haven't seen any excessive repetition nor talking to itself, let alone any kind of gibberish.

Screenshot from 2023-12-10 19-28-49-merge.jpg

I have a few hypothesis, I use world info a lot and maybe it doesn't like that? Or I somehow manage to ask exactly the prompts it can’t handle? Or it is because I use the nocuda version. I probably should install oba and test it there, if I ever get to that I will report.

P.S.:
This is chat more, right? You did try with instruct mode?
I mostly used it for creative writing and maybe that’s it’s achilleas heal?

Cognitive Computations org

It's in "chat-instruct" mode now, recently I've always been using that by default, so I don't even remember if there's any difference in other modes.
But later, when we finish current session (currently in the middle of some R&D) I might test it in other modes.

Cognitive Computations org

...it's been following instructions so far...

Correction: "She made efforts to adhere..."

Screenshot from 2023-12-10 21-00-13.jpg

Actually I think there is something to this.

Not only is it seemingly "broken" on the hf leaderboard, its also broken in Ayumi's IQ metric: http://ayumi.m8geil.de/ayumi_bench_v3_results.html

Its perplexity also seems pretty high.

Cognitive Computations org

Actually I think there is something to this.

Not only is it seemingly "broken" on the hf leaderboard, its also broken in Ayumi's IQ metric: http://ayumi.m8geil.de/ayumi_bench_v3_results.html

Its perplexity also seems pretty high.

Potentially yes. That thought crossed my mind too, when I saw HF leaderboard score for this one, which seemed surprisingly low. Perhaps my tests so far haven't been extensive enough to find conditions, under which its performance degrades enough to notice.
But with so many downloads of this model, and with only one report of issues, it's hard to tell, if conditions to reproduce issue are very exotic, or if 99% of people only downloading it for hoarding, without actually testing it.

I tested it in a novel continuation and retrieval and wasn't too impressed either, but I did not test the long context perf. Need an exllama quant for that.

Feedback for models isnt super common, so I wouldn't be surprised if few brought this up.

I tried it again with the recent Kcpp update and it is still busted. I am over trying to get this to work, but I am still curies as to what is going on.

Cognitive Computations org

Well on those words.. I busted my own now too, somehow.
I reinstalled/recompiled latest obbabooga, to try new Mixtral model, which left me unimpressed, so I reloaded this one, and.. I just can't get it stop rambling endlessly.
Same quants, exactly same model file, same settings (at least should be same), but even changing parameters around (temp etc) not doing much good.

I tried it again with the recent Kcpp update and it is still busted. I am over trying to get this to work, but I am still curies as to what is going on.

I am pretty sure Yi is busted in GGML period. I actually suspect TheBloke messed up the quants by hardcoding rope theta, but I'm not sure yet.

Yi in general works in exllamav2 for me, but its extremely sensitive to sampling as well

Cognitive Computations org

On newest version of obabooga this model didn't want to work, so I went back to the older one, where it works fine, took screenshot of all settings, reloaded newest version again, to compare, and it works fine now on that one too, without me changing anything.
There definitely is something, somewhere.

I suspect the quants are messed up.

Tried tell me an interesting bed time story with TheBloke's Q4_K_M and Q5_K_M model on both koboldcpp and llamacpp for several times, having same problem here. Here are its output:

  1. Hello, how are you today? How has been? I'm here to help you. What do you need? Hello, how are you today? How has been? I'm here to help you. What do you need?

  2. They had many stories to tell and they had many stories to tell. They had many stories to tell and they had many stories to tell. They had many stories to tell and they had many stories to tell. They had many stories to tell and they had many stories to tell.

  3. //_/-user//- (endlessly)

Maybe you can raise the issue to https://huggingface.co/TheBloke/dolphin-2.2-yi-34b-200k-GGUF/discussions. There definitely is something, somewhere.

Not necessarly there is a problem on the quantisation. We should always remember that whenever we quantise, we are changing the models weights for another set of weights that tries to approximate the original set of weights in another rescaled representation. So I would not expect everything working perfectly, despite the main objective being minimize loss and maximize compression.
Furthermore, the quantisation procedure is pretty much straightforward and standard. If I were to guess up, I wouldn't think the quantised files are messed.

Cognitive Computations org

you should test this on the unquantized version

Yeah, issue was with the 4-bit BNB quantized and exllama version. TBH I never tested the full FP16.

There is actually a potential separate issue with old Yi GGUFs where they may have been quantized with the wrong rope_theta like Mixtral was.

Sign up or log in to comment