Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
m-ricย 
posted an update 6 days ago
Post
1038
๐Ž๐ฉ๐ž๐ง๐€๐ˆ ๐Ÿ๐ข๐ง๐š๐ฅ๐ฅ๐ฒ ๐ซ๐ž๐ฏ๐ž๐š๐ฅ๐ฌ โ€œ๐Ÿ“โ€: ๐œ๐ซ๐š๐ณ๐ฒ ๐œ๐ก๐š๐ข๐ง-๐จ๐Ÿ-๐ญ๐ก๐จ๐ฎ๐ ๐ก๐ญ-๐ญ๐ฎ๐ง๐ž๐ ๐ฆ๐จ๐๐ž๐ฅ >> ๐†๐๐“-๐Ÿ’๐จ ๐Ÿ’ฅ

OpenAI had hinted at a mysterious โ€œproject strawberryโ€ for a long time: ๐˜๐—ต๐—ฒ๐˜† ๐—ฝ๐˜‚๐—ฏ๐—น๐—ถ๐˜€๐—ต๐—ฒ๐—ฑ ๐˜๐—ต๐—ถ๐˜€ ๐—ป๐—ฒ๐˜„ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐—ฐ๐—ฎ๐—น๐—น๐—ฒ๐—ฑ โ€œ๐—ผ๐Ÿญโ€ ๐Ÿญ๐—ต๐—ผ๐˜‚๐—ฟ ๐—ฎ๐—ด๐—ผ, ๐—ฎ๐—ป๐—ฑ ๐˜๐—ต๐—ฒ ๐—ฝ๐—ฒ๐—ฟ๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—ถ๐˜€ ๐—ท๐˜‚๐˜€๐˜ ๐—บ๐—ถ๐—ป๐—ฑ-๐—ฏ๐—น๐—ผ๐˜„๐—ถ๐—ป๐—ด.

๐Ÿคฏ Ranks among the top 500 students in the US in a qualifier for the USA Math Olympiad
๐Ÿคฏ Beats human-PhD-level accuracy by 8% on GPQA, hard science problems benchmark where the previous best was Claude 3.5 Sonnet with 59.4.
๐Ÿคฏ Scores 78.2% on vision benchmark MMMU, making it the first model competitive w/ human experts
๐Ÿคฏ GPT-4o on MATH scored 60% โ‡’ o1 scores 95%

How did they pull this? Sadly OpenAI keeps increasing their performance in โ€œmaking cryptic AF reports to not reveal any real infoโ€, so here are excerpts:

๐Ÿ’ฌ โ€œ๐—ผ๐Ÿญ ๐˜‚๐˜€๐—ฒ๐˜€ ๐—ฎ ๐—ฐ๐—ต๐—ฎ๐—ถ๐—ป ๐—ผ๐—ณ ๐˜๐—ต๐—ผ๐˜‚๐—ด๐—ต๐˜ ๐˜„๐—ต๐—ฒ๐—ป ๐—ฎ๐˜๐˜๐—ฒ๐—บ๐—ฝ๐˜๐—ถ๐—ป๐—ด ๐˜๐—ผ ๐˜€๐—ผ๐—น๐˜ƒ๐—ฒ ๐—ฎ ๐—ฝ๐—ฟ๐—ผ๐—ฏ๐—น๐—ฒ๐—บ. ๐—ง๐—ต๐—ฟ๐—ผ๐˜‚๐—ด๐—ต ๐—ฟ๐—ฒ๐—ถ๐—ป๐—ณ๐—ผ๐—ฟ๐—ฐ๐—ฒ๐—บ๐—ฒ๐—ป๐˜ ๐—น๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด, ๐—ผ๐Ÿญ ๐—น๐—ฒ๐—ฎ๐—ฟ๐—ป๐˜€ ๐˜๐—ผ ๐—ต๐—ผ๐—ป๐—ฒ ๐—ถ๐˜๐˜€ ๐—ฐ๐—ต๐—ฎ๐—ถ๐—ป ๐—ผ๐—ณ ๐˜๐—ต๐—ผ๐˜‚๐—ด๐—ต๐˜ ๐—ฎ๐—ป๐—ฑ ๐—ฟ๐—ฒ๐—ณ๐—ถ๐—ป๐—ฒ ๐˜๐—ต๐—ฒ ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ฒ๐—ด๐—ถ๐—ฒ๐˜€ ๐—ถ๐˜ ๐˜‚๐˜€๐—ฒ๐˜€. It learns to recognize and correct its mistakes.โ€

And of course, they decide to hide the content of this precious Chain-of-
Thought. Would it be for maximum profit? Of course not, you awful capitalist, itโ€™s to protect users:

๐Ÿ’ฌ โ€œWe also do not want to make an unaligned chain of thought directly visible to users.โ€

Theyโ€™re right, it would certainly have hurt my feelings to see the internal of this model tearing apart math problems.

๐Ÿค” I suspect it could be not only CoT, but also some agentic behaviour where the model can just call a code executor. The kind of score improvement the show certainly looks like the ones you see with agents.

This model will be immediately released for ChatGPT and some โ€œtrusted API usersโ€.

Letโ€™s start cooking to release the same thing in 6 months! ๐Ÿš€

They just afraid to show how it works cos they know they cant keep up with the open source train

In this post