File size: 896 Bytes
4b84804
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
license: apache-2.0
datasets:
- gair-prox/RedPajama-pro
language:
- en
base_model:
- gair-prox/RedPJ-ProX-0.3B
pipeline_tag: text-generation
library_name: transformers
tags:
- llama
- code
---

# Web-doc-refining-lm

<p align="center">
  <img src="prox-teaser.png">
</p>

[ArXiv](http://arxiv.org/abs/2409.17115) | [Code](https://github.com/GAIR-NLP/program-every-example)

**Web-doc-refining-lm** is an adapted [0.3B-ProX](https://huggingface.co/gair-prox/RedPJ-ProX-0.3B) model, fine-tuned for document level refining via program generation.

<p align="center">
  <img src="func_design.png">
</p>

### Citation
```
@article{zhou2024programming,
  title={Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale},
  author={Zhou, Fan and Wang, Zengzhi and Liu, Qian and Li, Junlong and Liu, Pengfei},
  journal={arXiv preprint arXiv:2409.17115},
  year={2024}
}
```