dfuhoiysOHSVFh82934gfjklb

huba-buba

AI & ML interests

None yet

Recent Activity

liked a model 3 days ago

msu-rcc-lair/RuadaptQwen2.5-32B-instruct

liked a model 6 days ago

qq8933/OpenLongCoT-Base-Gemma2-2B

liked a model 8 days ago

Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8

Organizations

None yet

huba-buba's activity

upvoted a paper 17 days ago

AgentInstruct: Toward Generative Teaching with Agentic Flows

Paper • 2407.03502 • Published Jul 3 • 48

upvoted an article 17 days ago

Article

The Rise of Agentic Data Generation

•

Jul 15

• 78

upvoted 3 papers 21 days ago

Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning

Paper • 2410.22304 • Published 23 days ago • 15

Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning

Paper • 2410.21845 • Published 23 days ago • 11

CLEAR: Character Unlearning in Textual and Visual Modalities

Paper • 2410.18057 • Published 29 days ago • 199

upvoted 3 papers 24 days ago

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 48

Direct Language Model Alignment from Online AI Feedback

Paper • 2402.04792 • Published Feb 7 • 29

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking

Paper • 2312.09244 • Published Dec 14, 2023 • 8

upvoted 8 papers about 1 month ago

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Paper • 2410.13232 • Published Oct 17 • 40

HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks

Paper • 2410.12381 • Published Oct 16 • 42

Baichuan-Omni Technical Report

Paper • 2410.08565 • Published Oct 11 • 84

Not All LLM Reasoners Are Created Equal

Paper • 2410.01748 • Published Oct 2 • 27

upvoted an article about 1 month ago

Article

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

Mar 9, 2023

• 34

upvoted a paper about 2 months ago

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

Paper • 2410.01679 • Published Oct 2 • 22

upvoted an article 3 months ago

Article

Selective fine-tuning of Language Models with Spectrum

•

Sep 3

• 29

upvoted a collection 3 months ago

LLaVA-OneVision

Collection

a model good at arbitrary types of visual input • 15 items • Updated Oct 5 • 20