Test outputs?
Awesome work! Thanks for making these! Was wondering if you'd had a chance to test and of the SLERPs yet and see what the outputs are like?
Thanks! These models which I used an accumulative slerp method show extremely degraded outputs. They can generate valid English sentences but exhibit infinite loop behavior. I am currently training the expert layers to see if some performance can be recovered. I will put up relevant code and stats soon.
Edit:
Performance is not as degraded as I thought with addition of repetition_penalty
. Finetuning shows good results and decreases repetition_penality
required. Chat Adapter available for the Jamba-4xMoe_slerp. But still no evaluation on any benchmarks as of yet.
This is so awesome! Apologies for missing your first reply.
Thanks for experimenting and crafting these with some test outputs. I'm going to download the 4x version and run it through some trainings and see what happens
I'll share the results and hopefully subsequent models! You are a legend for making these πͺ