Text Generation
PyTorch
causal-lm
rwkv
File size: 1,271 Bytes
7a9ace6
cccf815
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7a9ace6
cccf815
 
 
7a9ace6
cccf815
 
 
 
 
512186e
39379db
 
ff9ea24
6f726bb
dd2f5c9
6f726bb
 
3957ef8
512186e
b02c0bb
aee797f
512186e
0616e70
6f726bb
06dc12e
88df261
aee797f
88df261
aee797f
88df261
aee797f
88df261
06dc12e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
language:
- en
- zh
- de
- fr
- es
- pt
- ru
- it
- ja
- ko
- vi
- ar
tags:
- pytorch
- text-generation
- causal-lm
- rwkv
license: apache-2.0
datasets:
- EleutherAI/pile
- togethercomputer/RedPajama-Data-1T
---

# RWKV-4 World

## Model Description

RWKV-4 trained on 100+ world languages (70% English, 15% multilang, 15% code).

How to use:
* use latest rwkv pip package (0.7.4+)
* use latest ChatRWKV v2/benchmark_world.py to test
* larger models are stronger even though not fully trained yet

The difference between World & Raven:
* set pipeline = PIPELINE(model, "rwkv_vocab_v20230424") instead of 20B_tokenizer.json (EXACTLY AS WRITTEN HERE. "rwkv_vocab_v20230424" is included in rwkv 0.7.4+)
* use Question/Answer or User/AI or Human/Bot prompt for Q&A. **DO NOT USE Bob/Alice or Q/A**
* use **fp32** (will overflow in fp16 at this moment - fixable in future)

NOTE: the new greedy tokenizer (https://github.com/BlinkDL/ChatRWKV/blob/main/tokenizer/rwkv_tokenizer.py) will tokenize '\n\n' as one single token instead of ['\n','\n']

A good prompt example:
```
Question: hi

Answer: Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it.

Question: xxxxxx

Answer:
```