Context lenght?

by xxx777xxxASD - opened Feb 25

Discussion

xxx777xxxASD

Feb 25

HR1777

Feb 25

Its 32768.

xxx777xxxASD

Feb 25

People often make MoE out of mistral 7B v0.1 models, so the sliding window config becames null meanwhile they can't reach real 32k

HR1777

Feb 25

I couldn't find a model that can reach even 12K. If you could find, please let me know. I want to able able to put a full article in about 2500 words as input and i expect the model that can handle it properly.

saishf

Feb 25

Would yarn models work for you? They're not as smart as their base models but there's some yarn merges that are pretty smart with high context.
https://huggingface.co/NousResearch/Yarn-Solar-10b-64k
They include benchmarks in the model card
There's also yi 200k in 6b and 34b, but I haven't tried them.
Also, you could try rope scaling. A smart model can handle quite a bit of roping. I've had some mistral 7b based models perform well at 32k ctx.

HR1777

Feb 25

@saishf , I haven't yet tried the Yarn-Solar-10b-64k, but I have tested several 32K, 64K, 128K, and even 200K models. I asked the model to elaborate and add details to the provided input, which was about 2500 words. However, the model seemed to struggle with handling such a large amount of information and forgot some details. Have you had any success with such requests? Could you please share the settings you've used for similar purposes?

saishf

Feb 26

That is something hard for models to do because they're trained with messages much smaller than what you're trying to do.
Because while a model may be trained on 32k ctx conversations, the messages are individually much smaller.
Also try mixtral instruct 8x it seems to hold together pretty well for long inputs and outputs. You can try it on hugging chat

HR1777

Feb 26

Thank you so much for your help.

xxx777xxxASD

Feb 26

So, anyone knows this model's real context lenght?

saishf

Feb 26

Only @Undi95 would know as we have no idea what the base models are

Undi95

Owner Mar 6

Should be 32k since it's 2 mistral model I trained on top

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment