taesiri's picture

taesiri PRO

taesiri

·

https://taesiri.ai/

AI & ML interests

AGI

Recent Activity

updated a model about 1 hour ago

taesiri/BugsBunny-LLama-3.2-11B-Vision-Instruct-VGGHeads_Small2

upvoted a paper about 5 hours ago

upvoted a paper about 5 hours ago

Organizations

taesiri's activity

upvoted 2 papers about 5 hours ago

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

Paper • 2411.13281 • Published 1 day ago • 15

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

Paper • 2411.10958 • Published 5 days ago • 32

upvoted 2 papers 1 day ago

SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning

Paper • 2411.10161 • Published 6 days ago • 6

RedPajama: an Open Dataset for Training Large Language Models

Paper • 2411.12372 • Published 2 days ago • 36

upvoted 3 papers 2 days ago

Generative World Explorer

Paper • 2411.11844 • Published 3 days ago • 55

AnimateAnything: Consistent and Controllable Animation for Video Generation

Paper • 2411.10836 • Published 5 days ago • 17

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Paper • 2411.10640 • Published 6 days ago • 37

upvoted 2 papers 3 days ago

Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement

Paper • 2411.06558 • Published 11 days ago • 29

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

Paper • 2411.10323 • Published 6 days ago • 26

upvoted 3 papers 4 days ago

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published 6 days ago • 87

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Paper • 2411.08033 • Published 9 days ago • 21

Thinking LLMs: General Instruction Following with Thought Generation

Paper • 2410.10630 • Published Oct 14 • 16

upvoted 2 papers 6 days ago

LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

Paper • 2411.09595 • Published 7 days ago • 65

MagicQuill: An Intelligent Interactive Image Editing System

Paper • 2411.09703 • Published 7 days ago • 50

upvoted 2 papers 7 days ago

Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination

Paper • 2411.03823 • Published 15 days ago • 43

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published 16 days ago • 60

upvoted 2 papers 8 days ago

Stronger Models are NOT Stronger Teachers for Instruction Tuning

Paper • 2411.07133 • Published 10 days ago • 28

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

Paper • 2411.07975 • Published 9 days ago • 24

upvoted 2 papers 9 days ago

VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads

Paper • 2407.18245 • Published Jul 25 • 8

Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models

Paper • 2411.07126 • Published 10 days ago • 28