32 206 195

Aymeric Roucher

m-ric

http://aymeric-roucher.github.io

AI & ML interests

MLE at Hugging Face 🤗 LLMs, Agents, RAG, Multimodal.

Articles

Organizations

Posts 38

Post

966

𝐎𝐩𝐞𝐧𝐀𝐈 𝐟𝐢𝐧𝐚𝐥𝐥𝐲 𝐫𝐞𝐯𝐞𝐚𝐥𝐬 “🍓”: 𝐜𝐫𝐚𝐳𝐲 𝐜𝐡𝐚𝐢𝐧-𝐨𝐟-𝐭𝐡𝐨𝐮𝐠𝐡𝐭-𝐭𝐮𝐧𝐞𝐝 𝐦𝐨𝐝𝐞𝐥 >> 𝐆𝐏𝐓-𝟒𝐨 💥

OpenAI had hinted at a mysterious “project strawberry” for a long time: 𝘁𝗵𝗲𝘆 𝗽𝘂𝗯𝗹𝗶𝘀𝗵𝗲𝗱 𝘁𝗵𝗶𝘀 𝗻𝗲𝘄 𝗺𝗼𝗱𝗲𝗹 𝗰𝗮𝗹𝗹𝗲𝗱 “𝗼𝟭” 𝟭𝗵𝗼𝘂𝗿 𝗮𝗴𝗼, 𝗮𝗻𝗱 𝘁𝗵𝗲 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗶𝘀 𝗷𝘂𝘀𝘁 𝗺𝗶𝗻𝗱-𝗯𝗹𝗼𝘄𝗶𝗻𝗴.

🤯 Ranks among the top 500 students in the US in a qualifier for the USA Math Olympiad
🤯 Beats human-PhD-level accuracy by 8% on GPQA, hard science problems benchmark where the previous best was Claude 3.5 Sonnet with 59.4.
🤯 Scores 78.2% on vision benchmark MMMU, making it the first model competitive w/ human experts
🤯 GPT-4o on MATH scored 60% ⇒ o1 scores 95%

How did they pull this? Sadly OpenAI keeps increasing their performance in “making cryptic AF reports to not reveal any real info”, so here are excerpts:

💬 “𝗼𝟭 𝘂𝘀𝗲𝘀 𝗮 𝗰𝗵𝗮𝗶𝗻 𝗼𝗳 𝘁𝗵𝗼𝘂𝗴𝗵𝘁 𝘄𝗵𝗲𝗻 𝗮𝘁𝘁𝗲𝗺𝗽𝘁𝗶𝗻𝗴 𝘁𝗼 𝘀𝗼𝗹𝘃𝗲 𝗮 𝗽𝗿𝗼𝗯𝗹𝗲𝗺. 𝗧𝗵𝗿𝗼𝘂𝗴𝗵 𝗿𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴, 𝗼𝟭 𝗹𝗲𝗮𝗿𝗻𝘀 𝘁𝗼 𝗵𝗼𝗻𝗲 𝗶𝘁𝘀 𝗰𝗵𝗮𝗶𝗻 𝗼𝗳 𝘁𝗵𝗼𝘂𝗴𝗵𝘁 𝗮𝗻𝗱 𝗿𝗲𝗳𝗶𝗻𝗲 𝘁𝗵𝗲 𝘀𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗲𝘀 𝗶𝘁 𝘂𝘀𝗲𝘀. It learns to recognize and correct its mistakes.”

And of course, they decide to hide the content of this precious Chain-of-
Thought. Would it be for maximum profit? Of course not, you awful capitalist, it’s to protect users:

💬 “We also do not want to make an unaligned chain of thought directly visible to users.”

They’re right, it would certainly have hurt my feelings to see the internal of this model tearing apart math problems.

🤔 I suspect it could be not only CoT, but also some agentic behaviour where the model can just call a code executor. The kind of score improvement the show certainly looks like the ones you see with agents.

This model will be immediately released for ChatGPT and some “trusted API users”.

Let’s start cooking to release the same thing in 6 months! 🚀

Post

603

𝗘𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗻𝗴 𝘆𝗼𝘂𝗿 𝗛𝗧𝗠𝗟 𝘄𝗲𝗯𝗽𝗮𝗴𝗲𝘀 𝘁𝗼 𝗺𝗮𝗿𝗸𝗱𝗼𝘄𝗻 𝗶𝘀 𝗻𝗼𝘄 𝗽𝗼𝘀𝘀𝗶𝗯𝗹𝗲 𝗲𝗻𝗱-𝘁𝗼-𝗲𝗻𝗱 𝘄𝗶𝘁𝗵 𝗮 𝘀𝗶𝗺𝗽𝗹𝗲 𝗟𝗟𝗠! 👏

Jina just released Reader-LM, that handles the whole pipeline of extracting markdown from HTML webpages.

A while ago, Jina had released a completely code-based deterministic program to do this extraction, based on some heuristics : e.g., “if the text is in a <p> tag, keep it, but if it’s hidden behind another, remove it”.

🤔 But they received complaints from readers: some found it too detailed, other not enough, depending on the pages.

➡️ So they decided, 𝗺𝗮𝘆𝗯𝗲 𝗵𝗲𝘂𝗿𝗶𝘀𝘁𝗶𝗰𝘀 𝘄𝗲𝗿𝗲 𝗻𝗼𝘁 𝗲𝗻𝗼𝘂𝗴𝗵: 𝗶𝗻𝘀𝘁𝗲𝗮𝗱, 𝘁𝗵𝗲𝘆 𝘁𝗿𝗶𝗲𝗱 𝘁𝗼 𝘁𝗿𝗮𝗶𝗻 𝗮 𝗟𝗟𝗠 𝘁𝗼 𝗱𝗼 𝘁𝗵𝗲 𝗰𝗼𝗺𝗽𝗹𝗲𝘁𝗲 𝗲𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻. This LLM does not need to be very strong,but it should handle a very long context: it’s a challenging, “shallow-but-wide” architecture.

𝗧𝗲𝗰𝗵𝗻𝗶𝗰𝗮𝗹 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀:
2️⃣ models: Reader-LM-0.5B and 1.5B
⚙️ Two stages of training: first, short and simple HTML to get the basics, then ramp up to longer and harder HTML up to 128k tokens
🔎 Use contrastive search for decoding: this empirically reduces “repeating output” issues
➡️ Their models beat much larger models at HTML extraction 🔥
🤗 Weights available on HF (sadly cc-by-nc license): jinaai/reader-lm-1.5b

View all posts