Good model but some holes in the dataset.

#2
by Autumnlight - opened

This Model is so far (per my testing group) the best RP model out there for 70B. It stands on top with New Dawn, though that model is better at novel-style prose and spatial IQ and object/location tracking. This model is superior with emotional IQ and depth of reaction, and your technique to not mix too many similar characters actually gives out many distinct personalities which is a huge issue with many other models - characters feel very lively and unique rather than being sourced from Central Casting most of the time. So this is really amazing.

Where it could use some improvement:
-Characters seem very compliant with User - do you lack Data where User is Evil => more distrust may be useful.
-You lack Data where User should not be the focus of Char.
-You lack Data where Char should lack emotions/be anhedonic due to past trauma or something, and open up really slowly. At the moment characters with apathy open up immediately, which may be related to the above note about character compliance being very high.
-What is the erotic to non-erotic ratio in the RP dataset? The model seems to be more aggressive about going down that route than other models in the 70B range.

moving to reddit.

Arli AI org

Thanks for the detailed feedback! I really appreciate it when users give insights on where it is lacking. Super happy that you said it is one of the best RP models you've tested though, as it validates the methods I've used.

Where it could use some improvement:
-Characters seem very compliant with User - do you lack Data where User is Evil => more distrust may be useful.
-You lack Data where User should not be the focus of Char.
-You lack Data where Char should lack emotions/be anhedonic due to past trauma or something, and open up really slowly. At the moment characters with apathy open up >immediately, which may be related to the above note about character compliance being very high.

The dataset definitely has a long way for improvement, and has a lot of low hanging fruit since it is basically almost just a "raw" dataset from the repos I found. It isn't altered too much by me aside from my deduplication and curation. If you do have some recommendations of good RP datasets, I will take a look and see how to implement those as well.

-What is the erotic to non-erotic ratio in the RP dataset? The model seems to be more aggressive about going down that route than other models in the 70B range.

Honestly, I don't know what's the ratio haha it is just a mix of it all. I thought for Llama 3.1 8B it turned out to be not so eager to be going in that direction, but I guess it is much more willing on 70B.

Hey there! I've been doing testing on this with Autumn - really solid model!

Re: Erotic/non ratio I feel like the default for "RP" style data I've seen is pretty heavy on that end, which is -fine- but does tend to overbias the model. Deduplicating the character names is amazing in terms of how well the model does personalities but everyone is just a litttttle too eager if you know what I mean.

From your descriptions of the training path it sounded like this was largely raw (other than the deduplication).

I'd recommend seeking out medical/psychology datasets. In the past with models like Psyfighter and Psyonic-Cetacean I've found that kind of information to dramatically improve personality and (the surprising element) spatial reasoning and logic. Its plausible that just having more of that in the mix vs the RP-data may chill things out to a more flexible baseline in terms of ERP aggressiveness and agreeableness.

That said - great model! I'll let you know when we start doing Unspeakable Frankensteining with NewDawn Ultra. :D

OwenArli changed discussion status to closed
OwenArli changed discussion status to open
Arli AI org

Hey there! I've been doing testing on this with Autumn - really solid model!

Re: Erotic/non ratio I feel like the default for "RP" style data I've seen is pretty heavy on that end, which is -fine- but does tend to overbias the model. Deduplicating the character names is amazing in terms of how well the model does personalities but everyone is just a litttttle too eager if you know what I mean.

From your descriptions of the training path it sounded like this was largely raw (other than the deduplication).

I'd recommend seeking out medical/psychology datasets. In the past with models like Psyfighter and Psyonic-Cetacean I've found that kind of information to dramatically improve personality and (the surprising element) spatial reasoning and logic. Its plausible that just having more of that in the mix vs the RP-data may chill things out to a more flexible baseline in terms of ERP aggressiveness and agreeableness.

That said - great model! I'll let you know when we start doing Unspeakable Frankensteining with NewDawn Ultra. :D

Thanks for the feedback!

Interesting to hear that the 70B really is a bit too eager lol the Llama 3.1 8B and Nemo 12B versions seem to get the opposite feedback sometimes. Yes the dataset is mostly raw, with some minor modifications via some mass LLM generated improvements to the system prompt section of the datasets. So it can definitely be improved a lot more.

Thank you for recommending the psychology datasets too, I was thinking of adding medical datasets but now that you mentioned psychology datasets that seems like a really great idea too.

Do let me know if you merge it with something and come up with something even greater haha

70B generally is more co-operative with prompt intent due to the larger intellect, and Mistral is kind of dry in that regard generally so I can see where the same/similar dataset that gets 8B/12B to where you want turns around and overcooks 70B.

Psych/med data has a cascading effect on writing quality, I think because it improves the models internal tracking of the character's thoughts -> more of the generation is topical -> more of the context is topical to feed the next generation.

Re: merging, I've got some ideas - it can be a crude process but even little 2-3% spritzers of the right mergestock can push things over the line. I'm doing 90%+ of my work with LLM for novel-style prose writing, so I don't often find models trained towards my goals (and I'm not helping having not completed my pulp dataset yet whups) but... :D

Sign up or log in to comment