Kai Zuberbühler's picture

243 246

Kai Zuberbühler

kaizuberbuehler

·

k-zubi

AI & ML interests

language models, agents, image generation, music generation

Organizations

None yet

kaizuberbuehler's activity

upvoted a paper about 4 hours ago

MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning

Paper • 2310.16049 • Published Oct 24, 2023 • 4

upvoted a paper 4 days ago

AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments

Paper • 2405.07960 • Published May 13 • 1

upvoted a paper 9 days ago

ART: Automatic multi-step reasoning and tool-use for large language models

Paper • 2303.09014 • Published Mar 16, 2023 • 1

upvoted 3 papers 11 days ago

FuzzCoder: Byte-level Fuzzing Test via Large Language Model

Paper • 2409.01944 • Published 16 days ago • 44

Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published 14 days ago • 83

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

Paper • 2409.01322 • Published 17 days ago • 94

upvoted 5 papers 15 days ago

ContextCite: Attributing Model Generation to Context

Paper • 2409.00729 • Published 18 days ago • 13

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published 16 days ago • 31

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Paper • 2409.02095 • Published 16 days ago • 32

VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

Paper • 2409.01071 • Published 18 days ago • 26

LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models

Paper • 2409.00509 • Published 19 days ago • 38

upvoted 2 papers 16 days ago

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published 16 days ago • 74

FLUX that Plays Music

Paper • 2409.00587 • Published 19 days ago • 31

upvoted a collection 24 days ago

Leaderboards and benchmarks ✨

Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 67 items • Updated Aug 6 • 83

upvoted a paper 24 days ago

SWE-bench-java: A GitHub Issue Resolving Benchmark for Java

Paper • 2408.14354 • Published 24 days ago • 40

upvoted 4 papers 25 days ago

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published 28 days ago • 50

Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published 28 days ago • 61

Sapiens: Foundation for Human Vision Models

Paper • 2408.12569 • Published 28 days ago • 84

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Paper • 2408.12570 • Published 28 days ago • 29

upvoted a paper 26 days ago

Data curation via joint example selection further accelerates multimodal learning

Paper • 2406.17711 • Published Jun 25 • 3

upvoted 11 papers about 1 month ago

ShortCircuit: AlphaZero-Driven Circuit Design

Paper • 2408.09858 • Published Aug 19 • 16

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19 • 51

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Paper • 2408.06195 • Published Aug 12 • 55

ControlNeXt: Powerful and Efficient Control for Image and Video Generation

Paper • 2408.06070 • Published Aug 12 • 52

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Paper • 2408.06292 • Published Aug 12 • 114

Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

Paper • 2408.07060 • Published Aug 13 • 39

Imagen 3

Paper • 2408.07009 • Published Aug 13 • 60

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Paper • 2408.07055 • Published Aug 13 • 65

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

Paper • 2408.03361 • Published Aug 6 • 85

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8 • 152

AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation

Paper • 2408.00764 • Published Aug 1 • 1

upvoted an article about 1 month ago

Article

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

By

•

Jul 29

• 193

upvoted a collection about 1 month ago

"Physics of Language Models" series

6 items • Updated 20 days ago • 28

upvoted a paper about 1 month ago

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

Paper • 2404.05405 • Published Apr 8 • 7

upvoted 14 papers about 2 months ago

FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention

Paper • 2407.19918 • Published Jul 29 • 47

Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification

Paper • 2407.19340 • Published Jul 27 • 55

Matting by Generation

Paper • 2407.21017 • Published Jul 30 • 22

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30 • 23

ThinK: Thinner Key Cache by Query-Driven Pruning

Paper • 2407.21018 • Published Jul 30 • 30

MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Paper • 2407.21770 • Published Jul 31 • 20

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31 • 73

SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement

Paper • 2408.00653 • Published Aug 1 • 27

WebArena: A Realistic Web Environment for Building Autonomous Agents

Paper • 2307.13854 • Published Jul 25, 2023 • 23

OmniParser for Pure Vision Based GUI Agent

Paper • 2408.00203 • Published Aug 1 • 17

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

Paper • 2408.00754 • Published Aug 1 • 21

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 103

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Paper • 2407.21787 • Published Jul 31 • 3

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 102

upvoted a collection about 2 months ago

Gemma 2 2B Release

The 2.6B parameter version of Gemma 2. • 6 items • Updated Jul 31 • 76

upvoted 4 papers about 2 months ago

SHIC: Shape-Image Correspondences with no Keypoint Supervision

Paper • 2407.18907 • Published Jul 26 • 38

AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26 • 31

Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26 • 30

MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains

Paper • 2407.18961 • Published Jul 18 • 38

upvoted a collection about 2 months ago

Llama 3.1

This collection hosts the transformers and original repos of the Meta Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Aug 2 • 569

upvoted a paper about 2 months ago

Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation

Paper • 2406.10970 • Published Jun 16 • 1

upvoted 4 papers 2 months ago

Scaling Retrieval-Based Language Models with a Trillion-Token Datastore

Paper • 2407.12854 • Published Jul 9 • 29

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

Paper • 2407.13623 • Published Jul 18 • 52

SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks

Paper • 2305.17390 • Published May 27, 2023 • 2

Mixture of A Million Experts

Paper • 2407.04153 • Published Jul 4 • 4

upvoted a collection 2 months ago

🪐 SmolLM

A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated Aug 18 • 169