Mervyn1937
's Collections
My Papers of Interest
updated
Self-Alignment with Instruction Backtranslation
Paper
•
2308.06259
•
Published
•
40
ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free
Domain Adaptation
Paper
•
2308.03793
•
Published
•
10
From Sparse to Soft Mixtures of Experts
Paper
•
2308.00951
•
Published
•
20
Revisiting DETR Pre-training for Object Detection
Paper
•
2308.01300
•
Published
•
9
Unified Model for Image, Video, Audio and Language Tasks
Paper
•
2307.16184
•
Published
•
14
Scaling TransNormer to 175 Billion Parameters
Paper
•
2307.14995
•
Published
•
21
NeRF-Det: Learning Geometry-Aware Volumetric Representation for
Multi-View 3D Object Detection
Paper
•
2307.14620
•
Published
•
13
Less is More: Focus Attention for Efficient DETR
Paper
•
2307.12612
•
Published
•
6
Replacing softmax with ReLU in Vision Transformers
Paper
•
2309.08586
•
Published
•
17
A Distributed Data-Parallel PyTorch Implementation of the Distributed
Shampoo Optimizer for Training Neural Networks At-Scale
Paper
•
2309.06497
•
Published
•
4
Multimodal Foundation Models: From Specialists to General-Purpose
Assistants
Paper
•
2309.10020
•
Published
•
40
FoleyGen: Visually-Guided Audio Generation
Paper
•
2309.10537
•
Published
•
8
LMDX: Language Model-based Document Information Extraction and
Localization
Paper
•
2309.10952
•
Published
•
65
RMT: Retentive Networks Meet Vision Transformers
Paper
•
2309.11523
•
Published
•
33
Aligning Large Multimodal Models with Factually Augmented RLHF
Paper
•
2309.14525
•
Published
•
29
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of
Language Models
Paper
•
2309.15098
•
Published
•
7
Vision Transformers Need Registers
Paper
•
2309.16588
•
Published
•
77
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Paper
•
2309.16414
•
Published
•
19
Enable Language Models to Implicitly Learn Self-Improvement From Data
Paper
•
2310.00898
•
Published
•
23
Lemur: Harmonizing Natural Language and Code for Language Agents
Paper
•
2310.06830
•
Published
•
30
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Paper
•
2310.09199
•
Published
•
24
MiniGPT-v2: large language model as a unified interface for
vision-language multi-task learning
Paper
•
2310.09478
•
Published
•
19
Context-Aware Meta-Learning
Paper
•
2310.10971
•
Published
•
16
An Early Evaluation of GPT-4V(ision)
Paper
•
2310.16534
•
Published
•
21
Segment and Caption Anything
Paper
•
2312.00869
•
Published
•
18
OneLLM: One Framework to Align All Modalities with Language
Paper
•
2312.03700
•
Published
•
20