Aymeric Roucher

m-ric

AI & ML interests

MLE at Hugging Face ๐Ÿค— LLMs, Agents, RAG, Multimodal.

Articles

Organizations

Posts 38

view post
Post
966
๐Ž๐ฉ๐ž๐ง๐€๐ˆ ๐Ÿ๐ข๐ง๐š๐ฅ๐ฅ๐ฒ ๐ซ๐ž๐ฏ๐ž๐š๐ฅ๐ฌ โ€œ๐Ÿ“โ€: ๐œ๐ซ๐š๐ณ๐ฒ ๐œ๐ก๐š๐ข๐ง-๐จ๐Ÿ-๐ญ๐ก๐จ๐ฎ๐ ๐ก๐ญ-๐ญ๐ฎ๐ง๐ž๐ ๐ฆ๐จ๐๐ž๐ฅ >> ๐†๐๐“-๐Ÿ’๐จ ๐Ÿ’ฅ

OpenAI had hinted at a mysterious โ€œproject strawberryโ€ for a long time: ๐˜๐—ต๐—ฒ๐˜† ๐—ฝ๐˜‚๐—ฏ๐—น๐—ถ๐˜€๐—ต๐—ฒ๐—ฑ ๐˜๐—ต๐—ถ๐˜€ ๐—ป๐—ฒ๐˜„ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐—ฐ๐—ฎ๐—น๐—น๐—ฒ๐—ฑ โ€œ๐—ผ๐Ÿญโ€ ๐Ÿญ๐—ต๐—ผ๐˜‚๐—ฟ ๐—ฎ๐—ด๐—ผ, ๐—ฎ๐—ป๐—ฑ ๐˜๐—ต๐—ฒ ๐—ฝ๐—ฒ๐—ฟ๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—ถ๐˜€ ๐—ท๐˜‚๐˜€๐˜ ๐—บ๐—ถ๐—ป๐—ฑ-๐—ฏ๐—น๐—ผ๐˜„๐—ถ๐—ป๐—ด.

๐Ÿคฏ Ranks among the top 500 students in the US in a qualifier for the USA Math Olympiad
๐Ÿคฏ Beats human-PhD-level accuracy by 8% on GPQA, hard science problems benchmark where the previous best was Claude 3.5 Sonnet with 59.4.
๐Ÿคฏ Scores 78.2% on vision benchmark MMMU, making it the first model competitive w/ human experts
๐Ÿคฏ GPT-4o on MATH scored 60% โ‡’ o1 scores 95%

How did they pull this? Sadly OpenAI keeps increasing their performance in โ€œmaking cryptic AF reports to not reveal any real infoโ€, so here are excerpts:

๐Ÿ’ฌ โ€œ๐—ผ๐Ÿญ ๐˜‚๐˜€๐—ฒ๐˜€ ๐—ฎ ๐—ฐ๐—ต๐—ฎ๐—ถ๐—ป ๐—ผ๐—ณ ๐˜๐—ต๐—ผ๐˜‚๐—ด๐—ต๐˜ ๐˜„๐—ต๐—ฒ๐—ป ๐—ฎ๐˜๐˜๐—ฒ๐—บ๐—ฝ๐˜๐—ถ๐—ป๐—ด ๐˜๐—ผ ๐˜€๐—ผ๐—น๐˜ƒ๐—ฒ ๐—ฎ ๐—ฝ๐—ฟ๐—ผ๐—ฏ๐—น๐—ฒ๐—บ. ๐—ง๐—ต๐—ฟ๐—ผ๐˜‚๐—ด๐—ต ๐—ฟ๐—ฒ๐—ถ๐—ป๐—ณ๐—ผ๐—ฟ๐—ฐ๐—ฒ๐—บ๐—ฒ๐—ป๐˜ ๐—น๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด, ๐—ผ๐Ÿญ ๐—น๐—ฒ๐—ฎ๐—ฟ๐—ป๐˜€ ๐˜๐—ผ ๐—ต๐—ผ๐—ป๐—ฒ ๐—ถ๐˜๐˜€ ๐—ฐ๐—ต๐—ฎ๐—ถ๐—ป ๐—ผ๐—ณ ๐˜๐—ต๐—ผ๐˜‚๐—ด๐—ต๐˜ ๐—ฎ๐—ป๐—ฑ ๐—ฟ๐—ฒ๐—ณ๐—ถ๐—ป๐—ฒ ๐˜๐—ต๐—ฒ ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ฒ๐—ด๐—ถ๐—ฒ๐˜€ ๐—ถ๐˜ ๐˜‚๐˜€๐—ฒ๐˜€. It learns to recognize and correct its mistakes.โ€

And of course, they decide to hide the content of this precious Chain-of-
Thought. Would it be for maximum profit? Of course not, you awful capitalist, itโ€™s to protect users:

๐Ÿ’ฌ โ€œWe also do not want to make an unaligned chain of thought directly visible to users.โ€

Theyโ€™re right, it would certainly have hurt my feelings to see the internal of this model tearing apart math problems.

๐Ÿค” I suspect it could be not only CoT, but also some agentic behaviour where the model can just call a code executor. The kind of score improvement the show certainly looks like the ones you see with agents.

This model will be immediately released for ChatGPT and some โ€œtrusted API usersโ€.

Letโ€™s start cooking to release the same thing in 6 months! ๐Ÿš€
view post
Post
603
๐—˜๐˜…๐˜๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ป๐—ด ๐˜†๐—ผ๐˜‚๐—ฟ ๐—›๐—ง๐— ๐—Ÿ ๐˜„๐—ฒ๐—ฏ๐—ฝ๐—ฎ๐—ด๐—ฒ๐˜€ ๐˜๐—ผ ๐—บ๐—ฎ๐—ฟ๐—ธ๐—ฑ๐—ผ๐˜„๐—ป ๐—ถ๐˜€ ๐—ป๐—ผ๐˜„ ๐—ฝ๐—ผ๐˜€๐˜€๐—ถ๐—ฏ๐—น๐—ฒ ๐—ฒ๐—ป๐—ฑ-๐˜๐—ผ-๐—ฒ๐—ป๐—ฑ ๐˜„๐—ถ๐˜๐—ต ๐—ฎ ๐˜€๐—ถ๐—บ๐—ฝ๐—น๐—ฒ ๐—Ÿ๐—Ÿ๐— ! ๐Ÿ‘

Jina just released Reader-LM, that handles the whole pipeline of extracting markdown from HTML webpages.

A while ago, Jina had released a completely code-based deterministic program to do this extraction, based on some heuristics : e.g., โ€œif the text is in a <p> tag, keep it, but if itโ€™s hidden behind another, remove itโ€.

๐Ÿค” But they received complaints from readers: some found it too detailed, other not enough, depending on the pages.

โžก๏ธ So they decided, ๐—บ๐—ฎ๐˜†๐—ฏ๐—ฒ ๐—ต๐—ฒ๐˜‚๐—ฟ๐—ถ๐˜€๐˜๐—ถ๐—ฐ๐˜€ ๐˜„๐—ฒ๐—ฟ๐—ฒ ๐—ป๐—ผ๐˜ ๐—ฒ๐—ป๐—ผ๐˜‚๐—ด๐—ต: ๐—ถ๐—ป๐˜€๐˜๐—ฒ๐—ฎ๐—ฑ, ๐˜๐—ต๐—ฒ๐˜† ๐˜๐—ฟ๐—ถ๐—ฒ๐—ฑ ๐˜๐—ผ ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป ๐—ฎ ๐—Ÿ๐—Ÿ๐—  ๐˜๐—ผ ๐—ฑ๐—ผ ๐˜๐—ต๐—ฒ ๐—ฐ๐—ผ๐—บ๐—ฝ๐—น๐—ฒ๐˜๐—ฒ ๐—ฒ๐˜…๐˜๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ผ๐—ป. This LLM does not need to be very strong,but it should handle a very long context: itโ€™s a challenging, โ€œshallow-but-wideโ€ architecture.

๐—ง๐—ฒ๐—ฐ๐—ต๐—ป๐—ถ๐—ฐ๐—ฎ๐—น ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€:
2๏ธโƒฃ models: Reader-LM-0.5B and 1.5B
โš™๏ธ Two stages of training: first, short and simple HTML to get the basics, then ramp up to longer and harder HTML up to 128k tokens
๐Ÿ”Ž Use contrastive search for decoding: this empirically reduces โ€œrepeating outputโ€ issues
โžก๏ธ Their models beat much larger models at HTML extraction ๐Ÿ”ฅ
๐Ÿค— Weights available on HF (sadly cc-by-nc license): jinaai/reader-lm-1.5b