arxiv:2405.07863
Wei Xiong
weqweasdas
AI & ML interests
Machine learning, RLHF
Organizations
models
23
weqweasdas/zephyr-7b-dpo-full
Text Generation
•
Updated
•
17
weqweasdas/zephyr-7b-gemma-dpo
Updated
weqweasdas/zephyr-7b-sft-full
Updated
weqweasdas/zephyr-7b-dpo-qlora
Updated
weqweasdas/gpt2-cpt-dutch
Text Generation
•
Updated
•
15
weqweasdas/zephyr-7b-gemma-sft
Updated
weqweasdas/raft_baseline_zephyr_packing_model6_1_4_e6_weight085
Text Generation
•
Updated
weqweasdas/raft_baseline_zephyr_packing_model6_1_4_e6
Text Generation
•
Updated
•
1
weqweasdas/raft_baseline_zephyr_packing_model6
Text Generation
•
Updated
•
1
weqweasdas/raft_baseline_openchat_llama13b_model1
Text Generation
•
Updated
•
2
datasets
68
weqweasdas/meta_math
Updated
weqweasdas/DS-MATH
Updated
weqweasdas/MS-MATH
Updated
weqweasdas/hn_mistral_prm_pairwise_only_step2
Viewer
•
Updated
•
138k
•
12
weqweasdas/hn_mistral_prm_pairwise
Viewer
•
Updated
•
447k
•
15
weqweasdas/alpaca_in_one
Viewer
•
Updated
•
805
•
52
weqweasdas/filtered_reward_bench
Viewer
•
Updated
•
2.85k
•
43
weqweasdas/prm_math_prompt
Viewer
•
Updated
•
858k
•
43
weqweasdas/prm_gsm8k_prompt
Viewer
•
Updated
•
574k
•
41
weqweasdas/attacked_data
Viewer
•
Updated
•
73.3k
•
34