1208 41 46

Quentin Gallouédec

qgallouedec

https://gallouedec.com

AI & ML interests

None yet

Recent Activity

updated a dataset 4 minutes ago

open-rl-leaderboard/results_v2

updated a Space 34 minutes ago

science/README

New activity 34 minutes ago

science/README

Articles

Preference Optimization for Vision Language Models

Jul 10

• 42

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Apr 22

• 78

Organizations

qgallouedec's activity

upvoted an article 24 days ago

Article

Finetuning PaliGemma with AutoTrain

•

Jul 25

• 8

upvoted 2 papers about 2 months ago

The Perfect Blend: Redefining RLHF with Mixture of Judges

Paper • 2409.20370 • Published Sep 30 • 4

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Paper • 2401.08417 • Published Jan 16 • 33

upvoted a collection about 2 months ago

PaliGemma Release

Collection

Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Jul 31 • 137

upvoted 3 papers about 2 months ago

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF

Paper • 2405.21046 • Published May 31 • 3

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 62

Binary Classifier Optimization for Large Language Model Alignment

Paper • 2404.04656 • Published Apr 6 • 2

upvoted 2 papers 3 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 118

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Paper • 2402.14740 • Published Feb 22 • 11

upvoted an article 3 months ago

Article

The 5 Most Under-Rated Tools on Hugging Face

Aug 22

• 85

upvoted 2 papers 3 months ago

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking

Paper • 2312.09244 • Published Dec 14, 2023 • 8

Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment

Paper • 2408.06266 • Published Aug 12 • 9

upvoted 2 papers 4 months ago

A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

Paper • 2312.03732 • Published Nov 28, 2023 • 7

The Curious Case of Neural Text Degeneration

Paper • 1904.09751 • Published Apr 22, 2019 • 3

upvoted an article 4 months ago

Article

Putting RL back in RLHF

Jun 12

• 62

upvoted a paper 4 months ago

Understanding Reference Policies in Direct Preference Optimization

Paper • 2407.13709 • Published Jul 18 • 16

upvoted 3 articles 4 months ago

Article

Docmatix - a huge dataset for Document Visual Question Answering

Jul 18

• 67

Article

How NuminaMath Won the 1st AIMO Progress Prize

Jul 11

• 104

Article

Preference Optimization for Vision Language Models

Jul 10

• 42

upvoted a paper 5 months ago

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

Paper • 2310.00036 • Published Sep 29, 2023 • 2