Krinal Joshi's picture

Krinal Joshi

krinal

·

kjdeveloper8

AI & ML interests

NLP, Speech, Music

Organizations

None yet

krinal's activity

upvoted a collection 3 months ago

Llama 3.1

This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Sep 25 • 609

upvoted a paper 6 months ago

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23 • 29

upvoted an article 7 months ago

Article

Welcome Llama 3 - Meta's new open LLM

Apr 18

• 275

upvoted a paper 9 months ago

Proactive Detection of Voice Cloning with Localized Watermarking

Paper • 2401.17264 • Published Jan 30 • 16

upvoted 3 papers 10 months ago

Masked Audio Generation using a Single Non-Autoregressive Transformer

Paper • 2401.04577 • Published Jan 9 • 41

Pheme: Efficient and Conversational Speech Generation

Paper • 2401.02839 • Published Jan 5 • 16

CoMoSVC: Consistency Model-based Singing Voice Conversion

Paper • 2401.01792 • Published Jan 3 • 8

upvoted 11 papers 11 months ago

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 258

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 53

StemGen: A music generation model that listens

Paper • 2312.08723 • Published Dec 14, 2023 • 47

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

Paper • 2312.06134 • Published Dec 11, 2023 • 2

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Paper • 2311.04257 • Published Nov 7, 2023 • 20

Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

Paper • 2312.03491 • Published Dec 6, 2023 • 34

Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models

Paper • 2312.03632 • Published Dec 6, 2023 • 4

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 138

Segment and Caption Anything

Paper • 2312.00869 • Published Dec 1, 2023 • 18

Merlin:Empowering Multimodal LLMs with Foresight Minds

Paper • 2312.00589 • Published Nov 30, 2023 • 24

HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

Paper • 2311.12454 • Published Nov 21, 2023 • 29

upvoted 2 papers 12 months ago

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Paper • 2310.00704 • Published Oct 1, 2023 • 19

Music ControlNet: Multiple Time-varying Controls for Music Generation

Paper • 2311.07069 • Published Nov 13, 2023 • 43