A Large Encoder-Decoder Family of Foundation Models For Chemical Language Paper • 2407.20267 • Published Jul 24 • 31
TAPTRv2: Attention-based Position Update Improves Tracking Any Point Paper • 2407.16291 • Published Jul 23 • 10
ATHAR: A High-Quality and Diverse Dataset for Classical Arabic to English Translation Paper • 2407.19835 • Published Jul 29 • 20
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models Paper • 2407.19474 • Published Jul 28 • 22
Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification Paper • 2407.19340 • Published Jul 27 • 56
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains Paper • 2407.18961 • Published Jul 18 • 38
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain Paper • 2407.19584 • Published Jul 28 • 61
YaRN: Efficient Context Window Extension of Large Language Models Paper • 2309.00071 • Published Aug 31, 2023 • 65
Contrastive Feature Masking Open-Vocabulary Vision Transformer Paper • 2309.00775 • Published Sep 2, 2023 • 8
AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections Paper • 2309.02186 • Published Sep 5, 2023 • 21
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models Paper • 2404.04478 • Published Apr 6 • 12
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Paper • 2404.05726 • Published Apr 8 • 20
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations Paper • 2404.04421 • Published Apr 5 • 16