Zhaolin Gao
GitBag
AI & ML interests
Reinforcement Learning from Human Feedback
Organizations
Collections
1
models
250
GitBag/reasoning_rebel_iter_4_1731513485_eta_1e4_lr_3e-7_1731719519
Text Generation
•
Updated
•
2
GitBag/reasoning_rebel_iter_4_1731513485_eta_1e3_lr_3e-7_1731714556
Text Generation
•
Updated
•
2
GitBag/reasoning_rebel_iter_4_1731513485_eta_1e2_lr_3e-7_1731709582
Text Generation
•
Updated
•
2
GitBag/reasoning_rebel_iter_4_1731513485_eta_1e1_lr_3e-7_1731686912
Text Generation
•
Updated
•
2
GitBag/reasoning_rebel_iter_3_1731243878_eta_1e5_lr_3e-7_1731523653
Text Generation
•
Updated
•
8
GitBag/reasoning_rebel_iter_3_1731243878_eta_1e6_lr_3e-7_1731528705
Text Generation
•
Updated
•
2
GitBag/reasoning_rebel_iter_3_1731243878_eta_1e4_lr_3e-7_1731518535
Text Generation
•
Updated
•
4
GitBag/reasoning_rebel_iter_3_1731243878_eta_1e3_lr_3e-7_1731513485
Text Generation
•
Updated
•
45
GitBag/reasoning_rebel_iter_3_1731243878_eta_1e2_lr_3e-7_1731508404
Text Generation
•
Updated
•
4
GitBag/reasoning_rebel_iter_3_1731243878_eta_1e1_lr_3e-7_1731485433
Text Generation
•
Updated
•
4
datasets
247
GitBag/llama3-ultrafeedback-reasoning-iter_4-1731513485-armo-tokenized_harvard
Viewer
•
Updated
•
56.3k
•
9
GitBag/llama3-ultrafeedback-reasoning-iter_4-1731513485-armo-tokenized
Viewer
•
Updated
•
56.3k
•
6
GitBag/llama3-ultrafeedback-reasoning-iter_4-1731513485-armo
Viewer
•
Updated
•
60.8k
•
6
GitBag/llama3-ultrafeedback-reasoning-iter_4-1731513485
Viewer
•
Updated
•
60.8k
•
6
GitBag/llama3-ultrafeedback-reasoning-iter_3-1731243878-armo-tokenized_harvard
Viewer
•
Updated
•
57.2k
•
12
GitBag/llama3-ultrafeedback-reasoning-iter_3-1731243878-armo-tokenized
Viewer
•
Updated
•
57.2k
•
9
GitBag/llama3-ultrafeedback-reasoning-iter_3-1731243878-armo
Viewer
•
Updated
•
60.8k
•
11
GitBag/llama3-ultrafeedback-reasoning-iter_3-1731243878
Viewer
•
Updated
•
60.8k
•
12
GitBag/llama3-cs3780
Viewer
•
Updated
•
1.6k
•
14
GitBag/llama3-ultrafeedback-reasoning-iter_2-1731046941-armo-tokenized_harvard
Viewer
•
Updated
•
57.2k
•
14