SHIC: Shape-Image Correspondences with no Keypoint Supervision Paper • 2407.18907 • Published Jul 26 • 38
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains Paper • 2407.18961 • Published Jul 18 • 38
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages Paper • 2407.19672 • Published Jul 29 • 53
Theia: Distilling Diverse Vision Foundation Models for Robot Learning Paper • 2407.20179 • Published Jul 29 • 45
Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings Paper • 2407.20581 • Published Jul 30 • 23
Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation Paper • 2407.20445 • Published Jul 29 • 20
A Large Encoder-Decoder Family of Foundation Models For Chemical Language Paper • 2407.20267 • Published Jul 24 • 31
Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification Paper • 2407.19340 • Published Jul 27 • 55
ImagiNet: A Multi-Content Dataset for Generalizable Synthetic Image Detection via Contrastive Learning Paper • 2407.20020 • Published Jul 29 • 19
Mixture of Nested Experts: Adaptive Processing of Visual Tokens Paper • 2407.19985 • Published Jul 29 • 33
LKCell: Efficient Cell Nuclei Instance Segmentation with Large Convolution Kernels Paper • 2407.18054 • Published Jul 25 • 10
BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation Paper • 2407.17952 • Published Jul 25 • 27
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval Paper • 2407.19669 • Published Jul 29 • 17
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26 • 31
LLM-FP4: 4-Bit Floating-Point Quantized Transformers Paper • 2310.16836 • Published Oct 25, 2023 • 13
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior Paper • 2310.16818 • Published Oct 25, 2023 • 30
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models Paper • 2310.13671 • Published Oct 20, 2023 • 18
Tuna: Instruction Tuning using Feedback from Large Language Models Paper • 2310.13385 • Published Oct 20, 2023 • 10
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models Paper • 2310.13127 • Published Oct 19, 2023 • 11
Contrastive Prefence Learning: Learning from Human Feedback without RL Paper • 2310.13639 • Published Oct 20, 2023 • 24
ControlMat: A Controlled Generative Approach to Material Capture Paper • 2309.01700 • Published Sep 4, 2023 • 13
AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections Paper • 2309.02186 • Published Sep 5, 2023 • 21
YaRN: Efficient Context Window Extension of Large Language Models Paper • 2309.00071 • Published Aug 31, 2023 • 65
FACET: Fairness in Computer Vision Evaluation Benchmark Paper • 2309.00035 • Published Aug 31, 2023 • 16
CityDreamer: Compositional Generative Model of Unbounded 3D Cities Paper • 2309.00610 • Published Sep 1, 2023 • 18
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model Paper • 2402.03766 • Published Feb 6 • 12
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks Paper • 2402.04248 • Published Feb 6 • 28
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset Paper • 2402.05937 • Published Feb 8 • 11
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones Paper • 2312.16862 • Published Dec 28, 2023 • 30
Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models Paper • 2311.13141 • Published Nov 22, 2023 • 12
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline Paper • 2311.13073 • Published Nov 22, 2023 • 56
Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying Paper • 2311.09578 • Published Nov 16, 2023 • 14
ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks Paper • 2311.09835 • Published Nov 16, 2023 • 9
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs Paper • 2311.09257 • Published Nov 14, 2023 • 45
E3 TTS: Easy End-to-End Diffusion-based Text to Speech Paper • 2311.00945 • Published Nov 2, 2023 • 14
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation Paper • 2311.01455 • Published Nov 2, 2023 • 28
FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 35
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time Paper • 2310.17157 • Published Oct 26, 2023 • 11
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding Paper • 2310.15308 • Published Oct 23, 2023 • 22
Incremental FastPitch: Chunk-based High Quality Text to Speech Paper • 2401.01755 • Published Jan 3 • 8
Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions Paper • 2401.01827 • Published Jan 3 • 15