-
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
Paper • 2404.02101 • Published • 22 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 17 -
Interactive3D: Create What You Want by Interactive 3D Generation
Paper • 2404.16510 • Published • 18 -
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Paper • 2406.07394 • Published • 22
Collections
Discover the best community collections!
Collections including paper arxiv:2404.06773
-
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 11 -
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
Paper • 2404.00987 • Published • 21 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 44 -
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Paper • 2404.02893 • Published • 20
-
Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling
Paper • 2403.14551 • Published • 2 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 17 -
Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective
Paper • 2404.07200 • Published • 1 -
An inclusive review on deep learning techniques and their scope in handwriting recognition
Paper • 2404.08011 • Published • 1
-
RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
Paper • 2403.00483 • Published • 12 -
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Paper • 2403.01779 • Published • 27 -
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
Paper • 2401.11605 • Published • 21 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 48
-
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 36 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 79 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 99 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 48