Context lenght?

#1
by xxx777xxxASD - opened

Its 32768.

People often make MoE out of mistral 7B v0.1 models, so the sliding window config becames null meanwhile they can't reach real 32k

I couldn't find a model that can reach even 12K. If you could find, please let me know. I want to able able to put a full article in about 2500 words as input and i expect the model that can handle it properly.

Would yarn models work for you? They're not as smart as their base models but there's some yarn merges that are pretty smart with high context.
https://huggingface.co/NousResearch/Yarn-Solar-10b-64k
They include benchmarks in the model card
There's also yi 200k in 6b and 34b, but I haven't tried them.
Also, you could try rope scaling. A smart model can handle quite a bit of roping. I've had some mistral 7b based models perform well at 32k ctx.

@saishf , I haven't yet tried the Yarn-Solar-10b-64k, but I have tested several 32K, 64K, 128K, and even 200K models. I asked the model to elaborate and add details to the provided input, which was about 2500 words. However, the model seemed to struggle with handling such a large amount of information and forgot some details. Have you had any success with such requests? Could you please share the settings you've used for similar purposes?

That is something hard for models to do because they're trained with messages much smaller than what you're trying to do.
Because while a model may be trained on 32k ctx conversations, the messages are individually much smaller.
Also try mixtral instruct 8x it seems to hold together pretty well for long inputs and outputs. You can try it on hugging chat

Thank you so much for your help.

So, anyone knows this model's real context lenght?

Only @Undi95 would know as we have no idea what the base models are

Owner

Should be 32k since it's 2 mistral model I trained on top

Sign up or log in to comment