Collections
Discover the best community collections!
Collections including paper arxiv:2403.02330
-
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 30 -
RegionGPT: Towards Region Understanding Vision Language Model
Paper • 2403.02330 • Published • 2 -
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Paper • 2404.12803 • Published • 29 -
Pegasus-v1 Technical Report
Paper • 2404.14687 • Published • 30
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 77 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 11 -
RegionGPT: Towards Region Understanding Vision Language Model
Paper • 2403.02330 • Published • 2 -
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 3
-
Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss
Paper • 2404.02731 • Published • 1 -
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Paper • 2309.12284 • Published • 18 -
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 7 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 17
-
LITA: Language Instructed Temporal-Localization Assistant
Paper • 2403.19046 • Published • 18 -
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Paper • 2403.20275 • Published • 8 -
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 11 -
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Paper • 2404.03820 • Published • 24
-
Veagle: Advancements in Multimodal Representation Learning
Paper • 2403.08773 • Published • 7 -
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Paper • 2304.14178 • Published • 2 -
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Paper • 2403.12596 • Published • 9 -
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Paper • 2403.11703 • Published • 16
-
Demystifying CLIP Data
Paper • 2309.16671 • Published • 20 -
Model Stock: All we need is just a few fine-tuned models
Paper • 2403.19522 • Published • 10 -
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 20 -
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 17
-
Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images
Paper • 2310.16186 • Published • 2 -
H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes
Paper • 1709.07330 • Published • 2 -
Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans
Paper • 1801.08599 • Published • 2 -
RTSeg: Real-time Semantic Segmentation Comparative Study
Paper • 1803.02758 • Published • 2