@m-ric on Hugging Face: "𝐎𝐩𝐞𝐧𝐀𝐈 𝐟𝐢𝐧𝐚𝐥𝐥𝐲 𝐫𝐞𝐯𝐞𝐚𝐥𝐬 “🍓”: 𝐜𝐫𝐚𝐳𝐲…"

Post

1038

𝐎𝐩𝐞𝐧𝐀𝐈 𝐟𝐢𝐧𝐚𝐥𝐥𝐲 𝐫𝐞𝐯𝐞𝐚𝐥𝐬 “🍓”: 𝐜𝐫𝐚𝐳𝐲 𝐜𝐡𝐚𝐢𝐧-𝐨𝐟-𝐭𝐡𝐨𝐮𝐠𝐡𝐭-𝐭𝐮𝐧𝐞𝐝 𝐦𝐨𝐝𝐞𝐥 >> 𝐆𝐏𝐓-𝟒𝐨 💥

OpenAI had hinted at a mysterious “project strawberry” for a long time: 𝘁𝗵𝗲𝘆 𝗽𝘂𝗯𝗹𝗶𝘀𝗵𝗲𝗱 𝘁𝗵𝗶𝘀 𝗻𝗲𝘄 𝗺𝗼𝗱𝗲𝗹 𝗰𝗮𝗹𝗹𝗲𝗱 “𝗼𝟭” 𝟭𝗵𝗼𝘂𝗿 𝗮𝗴𝗼, 𝗮𝗻𝗱 𝘁𝗵𝗲 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗶𝘀 𝗷𝘂𝘀𝘁 𝗺𝗶𝗻𝗱-𝗯𝗹𝗼𝘄𝗶𝗻𝗴.

🤯 Ranks among the top 500 students in the US in a qualifier for the USA Math Olympiad
🤯 Beats human-PhD-level accuracy by 8% on GPQA, hard science problems benchmark where the previous best was Claude 3.5 Sonnet with 59.4.
🤯 Scores 78.2% on vision benchmark MMMU, making it the first model competitive w/ human experts
🤯 GPT-4o on MATH scored 60% ⇒ o1 scores 95%

How did they pull this? Sadly OpenAI keeps increasing their performance in “making cryptic AF reports to not reveal any real info”, so here are excerpts:

💬 “𝗼𝟭 𝘂𝘀𝗲𝘀 𝗮 𝗰𝗵𝗮𝗶𝗻 𝗼𝗳 𝘁𝗵𝗼𝘂𝗴𝗵𝘁 𝘄𝗵𝗲𝗻 𝗮𝘁𝘁𝗲𝗺𝗽𝘁𝗶𝗻𝗴 𝘁𝗼 𝘀𝗼𝗹𝘃𝗲 𝗮 𝗽𝗿𝗼𝗯𝗹𝗲𝗺. 𝗧𝗵𝗿𝗼𝘂𝗴𝗵 𝗿𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴, 𝗼𝟭 𝗹𝗲𝗮𝗿𝗻𝘀 𝘁𝗼 𝗵𝗼𝗻𝗲 𝗶𝘁𝘀 𝗰𝗵𝗮𝗶𝗻 𝗼𝗳 𝘁𝗵𝗼𝘂𝗴𝗵𝘁 𝗮𝗻𝗱 𝗿𝗲𝗳𝗶𝗻𝗲 𝘁𝗵𝗲 𝘀𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗲𝘀 𝗶𝘁 𝘂𝘀𝗲𝘀. It learns to recognize and correct its mistakes.”

And of course, they decide to hide the content of this precious Chain-of-
Thought. Would it be for maximum profit? Of course not, you awful capitalist, it’s to protect users:

💬 “We also do not want to make an unaligned chain of thought directly visible to users.”

They’re right, it would certainly have hurt my feelings to see the internal of this model tearing apart math problems.

🤔 I suspect it could be not only CoT, but also some agentic behaviour where the model can just call a code executor. The kind of score improvement the show certainly looks like the ones you see with agents.

This model will be immediately released for ChatGPT and some “trusted API users”.

Let’s start cooking to release the same thing in 6 months! 🚀

They just afraid to show how it works cos they know they cant keep up with the open source train

Join the conversation