kalpeshk2011
commited on
Commit
•
5d71a07
1
Parent(s):
7d73e03
Update README.md
Browse files
README.md
CHANGED
@@ -6,6 +6,36 @@ https://github.com/martiansideofthemoon/rankgen
|
|
6 |
|
7 |
RankGen is a suite of encoder models (100M-1.2B parameters) which map prefixes and generations from any pretrained English language model to a shared vector space. RankGen can be used to rerank multiple full-length samples from an LM, and it can also be incorporated as a scoring function into beam search to significantly improve generation quality (0.85 vs 0.77 MAUVE, 75% preference according to humans annotators who are English writers). RankGen can also be used like a dense retriever, and achieves state-of-the-art performance on [literary retrieval](https://relic.cs.umass.edu/leaderboard.html).
|
8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
## Using RankGen
|
10 |
|
11 |
Loading RankGen is simple using the HuggingFace APIs (see Method-2 below), but we suggest using [`RankGenEncoder`](https://github.com/martiansideofthemoon/rankgen/blob/master/rankgen/rankgen_encoder.py), which is a small wrapper around the HuggingFace APIs for correctly preprocessing data and doing tokenization automatically. You can either download [our repository](https://github.com/martiansideofthemoon/rankgen) and install the API, or copy the implementation from [below](#rankgenencoder-implementation).
|
|
|
6 |
|
7 |
RankGen is a suite of encoder models (100M-1.2B parameters) which map prefixes and generations from any pretrained English language model to a shared vector space. RankGen can be used to rerank multiple full-length samples from an LM, and it can also be incorporated as a scoring function into beam search to significantly improve generation quality (0.85 vs 0.77 MAUVE, 75% preference according to humans annotators who are English writers). RankGen can also be used like a dense retriever, and achieves state-of-the-art performance on [literary retrieval](https://relic.cs.umass.edu/leaderboard.html).
|
8 |
|
9 |
+
## Setup
|
10 |
+
|
11 |
+
**Requirements** (`pip` will install these dependencies for you)
|
12 |
+
|
13 |
+
Python 3.7+, `torch` (CUDA recommended), `transformers`
|
14 |
+
|
15 |
+
**Installation**
|
16 |
+
|
17 |
+
```
|
18 |
+
python3.7 -m virtualenv rankgen-venv
|
19 |
+
source rankgen-venv/bin/activate
|
20 |
+
pip install rankgen
|
21 |
+
```
|
22 |
+
|
23 |
+
Get the data [here](https://drive.google.com/drive/folders/1DRG2ess7fK3apfB-6KoHb_azMuHbsIv4?usp=sharing) and place folder in root directory. Alternatively, use `gdown` as shown below,
|
24 |
+
|
25 |
+
```
|
26 |
+
gdown --folder https://drive.google.com/drive/folders/1DRG2ess7fK3apfB-6KoHb_azMuHbsIv4
|
27 |
+
```
|
28 |
+
|
29 |
+
Run the test script to make sure the RankGen checkpoint has loaded correctly,
|
30 |
+
|
31 |
+
```
|
32 |
+
python -m rankgen.test_rankgen_encoder --model_path kalpeshk2011/rankgen-t5-base-all
|
33 |
+
|
34 |
+
### Expected output
|
35 |
+
0.0009239262409127233
|
36 |
+
0.0011521980725477804
|
37 |
+
```
|
38 |
+
|
39 |
## Using RankGen
|
40 |
|
41 |
Loading RankGen is simple using the HuggingFace APIs (see Method-2 below), but we suggest using [`RankGenEncoder`](https://github.com/martiansideofthemoon/rankgen/blob/master/rankgen/rankgen_encoder.py), which is a small wrapper around the HuggingFace APIs for correctly preprocessing data and doing tokenization automatically. You can either download [our repository](https://github.com/martiansideofthemoon/rankgen) and install the API, or copy the implementation from [below](#rankgenencoder-implementation).
|