17 26 214

Xin Li PRO

lixin4ever

https://lixin4ever.github.io/

lixin4ever

AI & ML interests

Natural Language Processing, Machine Learning

Organizations

lixin4ever's activity

upvoted a paper about 15 hours ago

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Paper • 2410.23266 • Published 6 days ago • 19

upvoted a paper 12 days ago

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published 14 days ago • 86

upvoted 2 papers 19 days ago

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

Paper • 2410.12490 • Published 21 days ago • 8

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published 19 days ago • 86

upvoted a paper 20 days ago

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Paper • 2410.12787 • Published 20 days ago • 30

upvoted a paper 28 days ago

Differential Transformer

Paper • 2410.05258 • Published 29 days ago • 165

upvoted a paper about 1 month ago

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3 • 36

upvoted 3 papers about 2 months ago

upvoted a paper 3 months ago

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Paper • 2407.19672 • Published Jul 29 • 54

upvoted 4 papers 5 months ago

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Paper • 2406.05132 • Published Jun 7 • 27

Depth Anything V2

Paper • 2406.09414 • Published Jun 13 • 92

What If We Recaption Billions of Web Images with LLaMA-3?

Paper • 2406.08478 • Published Jun 12 • 39

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Paper • 2406.07476 • Published Jun 11 • 32

upvoted a paper 9 months ago

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

Paper • 2402.03161 • Published Feb 5 • 14

upvoted a paper 10 months ago

A Vision Check-up for Language Models

Paper • 2401.01862 • Published Jan 3 • 9

upvoted 3 papers 11 months ago

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Paper • 2312.14125 • Published Dec 21, 2023 • 44

Reasons to Reject? Aligning Language Models with Judgments

Paper • 2312.14591 • Published Dec 22, 2023 • 17

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 138