PlanetMoon (Xingye)

upvoted a paper 3 months ago

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5 • 37

upvoted a paper 4 months ago

Stable Audio Open

Paper • 2407.14358 • Published Jul 19 • 23

upvoted a collection 4 months ago

BigVGAN

Collection

BigVGAN is a universal neural vocoder that generates audio waveform using mel spectrogram as input. • 11 items • Updated Oct 1 • 10

upvoted a paper 4 months ago

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

Paper • 2407.04051 • Published Jul 4 • 35

upvoted an article 6 months ago

Article

Let's talk about LLM evaluation

By

•

May 23

• 133

upvoted a collection 6 months ago

Standard-format-preference-dataset

Collection

We collect the open-source datasets and process them into the standard format. • 14 items • Updated May 8 • 21

upvoted a paper 7 months ago

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23 • 29

upvoted an article 7 months ago

Article

Introducing the Open Chain of Thought Leaderboard

Apr 23

• 25

upvoted 2 papers 8 months ago

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5 • 34

Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

Paper • 2402.15504 • Published Feb 23 • 21

upvoted a collection 11 months ago

Seamless Communication

Collection

A significant step towards removing language barriers through expressive, fast and high-quality AI translation. • 16 items • Updated Jan 16 • 150

upvoted 9 papers about 1 year ago

Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 77

Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

Paper • 2309.15223 • Published Sep 26, 2023 • 19

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Paper • 2309.15807 • Published Sep 27, 2023 • 32

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

Paper • 2309.15103 • Published Sep 26, 2023 • 42

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Paper • 2309.11674 • Published Sep 20, 2023 • 31

LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

Paper • 2309.12311 • Published Sep 21, 2023 • 17

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

Paper • 2309.11998 • Published Sep 21, 2023 • 24

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Paper • 2309.10150 • Published Sep 18, 2023 • 24

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Paper • 2309.10020 • Published Sep 18, 2023 • 40

Xingye

AI & ML interests

Organizations

PlanetMoon's activity

Language Model Can Listen While Speaking

Stable Audio Open

BigVGAN

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

Let's talk about LLM evaluation

Standard-format-preference-dataset

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Introducing the Open Chain of Thought Leaderboard

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

Seamless Communication

Vision Transformers Need Registers

Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Multimodal Foundation Models: From Specialists to General-Purpose Assistants