Difference from alpaca-farm-ppo-sim-gpt4-20k-wdiff

#1
by robkirk - opened

Hi, what's the difference between this model and https://huggingface.co/tatsu-lab/alpaca-farm-ppo-sim-gpt4-20k-wdiff ?

Tatsu Lab org

Great question. ppo-sim refers to the PPO model trained on the standard AlpacaFarm simulation preference data (randomizing over prompts/API LLMs + injecting label nosie). ppo-sim-gpt4-20k refers to the PPO model trained with a single prompt/API LLM (gpt4). Mapping to the paper https://arxiv.org/pdf/2305.14387.pdf, ppo-sim is the PPO model in Table 2 (left column) and also final row in Table 4. ppo-sim-gpt4-20k is the third row in Table 4.

rtaori changed discussion status to closed

Sign up or log in to comment