Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.02330

Papers - Image - Captioning

RegionGPT: Towards Region Understanding Vision Language Model

Paper • 2403.02330 • Published Mar 4 • 2
Grounded Language-Image Pre-training

Paper • 2112.03857 • Published Dec 7, 2021 • 3
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 71

Papers - Image - Grounding

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11 • 30
RegionGPT: Towards Region Understanding Vision Language Model

Paper • 2403.02330 • Published Mar 4 • 2

Papers - Image - VQA

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11 • 30
RegionGPT: Towards Region Understanding Vision Language Model

Paper • 2403.02330 • Published Mar 4 • 2
TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Paper • 2404.12803 • Published Apr 19 • 29
Pegasus-v1 Technical Report

Paper • 2404.14687 • Published Apr 23 • 30

Papers - Image - Clip - Coco Testing

Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

Paper • 2310.03502 • Published Oct 5, 2023 • 77
Transferable and Principled Efficiency for Open-Vocabulary Segmentation

Paper • 2404.07448 • Published Apr 11 • 11
RegionGPT: Towards Region Understanding Vision Language Model

Paper • 2403.02330 • Published Mar 4 • 2
GLIGEN: Open-Set Grounded Text-to-Image Generation

Paper • 2301.07093 • Published Jan 17, 2023 • 3

Papers - University - Hong Kong University of Science and Te

Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss

Paper • 2404.02731 • Published Apr 3 • 1
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Paper • 2309.12284 • Published Sep 21, 2023 • 18
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Paper • 2404.03204 • Published Apr 4 • 7
Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10 • 17

Papers - Nvidia

LITA: Language Instructed Temporal-Localization Assistant

Paper • 2403.19046 • Published Mar 27 • 18
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces

Paper • 2403.20275 • Published Mar 29 • 8
Condition-Aware Neural Network for Controlled Image Generation

Paper • 2404.01143 • Published Apr 1 • 11
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues

Paper • 2404.03820 • Published Apr 4 • 24

Papers - Image - Understanding

Veagle: Advancements in Multimodal Representation Learning

Paper • 2403.08773 • Published Jan 18 • 7
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

Paper • 2304.14178 • Published Apr 27, 2023 • 2
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Paper • 2403.12596 • Published Mar 19 • 9
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Paper • 2403.11703 • Published Mar 18 • 16

Papers - Image - Clip

Demystifying CLIP Data

Paper • 2309.16671 • Published Sep 28, 2023 • 20
Model Stock: All we need is just a few fine-tuned models

Paper • 2403.19522 • Published Mar 28 • 10
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

Paper • 2404.01367 • Published Apr 1 • 20
On the Scalability of Diffusion-based Text-to-Image Generation

Paper • 2404.02883 • Published Apr 3 • 17

Papers - Image - Segmentation

Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images

Paper • 2310.16186 • Published Oct 24, 2023 • 2
H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes

Paper • 1709.07330 • Published Sep 21, 2017 • 2
Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans

Paper • 1801.08599 • Published Jan 25, 2018 • 2
RTSeg: Real-time Semantic Segmentation Comparative Study

Paper • 1803.02758 • Published Mar 7, 2018 • 2

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs