--- language: - en - zh - de - fr - es - pt - ru - it - ja - ko - vi - ar tags: - pytorch - text-generation - causal-lm - rwkv license: apache-2.0 datasets: - EleutherAI/pile - togethercomputer/RedPajama-Data-1T --- # RWKV-4 World ## Model Description RWKV-4 trained on 100+ world languages (70% English, 15% multilang, 15% code). How to use: * use latest rwkv pip package (0.7.4+) * use https://github.com/BlinkDL/ChatRWKV/blob/main/v2/benchmark_world.py to test it * larger models are stronger even though not fully trained yet The difference between World & Raven: * set pipeline = PIPELINE(model, "rwkv_vocab_v20230424") instead of 20B_tokenizer.json (EXACTLY AS WRITTEN HERE. "rwkv_vocab_v20230424" is included in rwkv 0.7.4+) * use Question/Answer or User/AI or Human/Bot prompt for Q&A. **DO NOT USE Bob/Alice or Q/A** * use **fp32** (will overflow in fp16 at this moment - fixable in future) or bf16 (slight degradation) NOTE: the new greedy tokenizer (https://github.com/BlinkDL/ChatRWKV/blob/main/tokenizer/rwkv_tokenizer.py) will tokenize '\n\n' as one single token instead of ['\n','\n'] prompt: ``` Instruction: xxx Input: xxx Response: ``` A good chat prompt: ``` Question: hi Answer: Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it. Question: xxxxxx Answer: ```