RAVE Models

This is a collection of RAVE models trained by the Intelligent Instruments Lab for various projects.

For a full description see our blog post at: https://iil.is/news/ravemodels, and for more about RAVE, see the original paper from IRCAM.

Most of these models are encoder-decoder only, no prior, and all use the --causal mode and are exported for streaming inference with nn~, NN.ar or rave-supercollider.

In the checkpoints/ directory are some complete checkpoints which can be used with our fork of RAVE to speed up training by transfer learning.

Citation:

@misc {intelligent_instruments_lab_2023,
    author       = { {Intelligent Instruments Lab} },
    title        = { rave-models (Revision ad15daf) },
    year         = 2023,
    url          = { https://huggingface.co/Intelligent-Instruments-Lab/rave-models },
    doi          = { 10.57967/hf/1235 },
    publisher    = { Hugging Face }
}

Musical Instruments

guitar_iil_b2048_r48000_z16.ts

Dataset: IILGuitarTimbre, a timbre-oriented collection of plucking, strumming, striking, scraping and more recorded dry from an electric guitar.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

sax_soprano_franziskaschroeder_b2048_r48000_z20.ts

Dataset: Soprano sax improvisation by Franziska Schroeder.

Model: modified RAVE v1, 48kHz, block size 2048, 20 latent dimensions.

organ_archive_b2048_r48000_z16.ts

Dataset: various recordings of organ music sourced from archive.org. Small amounts of voice and other instruments were included, and vinyl record noises are prominent.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

organ_bach_b2048_sr48000_z16.ts

Dataset: various recordings of J.S. Bach music for church organ.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

mrp_strengjavera_b2048_r44100_z16.ts

Dataset: magnetic resonator piano controlled by artificial life, as part of generative installation Strengjavera by Jack Armitage premiered at AIMC 2023. See paper and Zenodo for citation.

Model: RAVE v3, 44.1kHz, block size 2048, 16 latent dimensions.

Voice

voice_vocalset_b2048_r48000_z16.ts

Dataset: VocalSet singing voice dataset.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

voice_hifitts_b2048_r48000_z16.ts

Dataset: Hi-Fi TTS audiobooks dataset.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

voice_jvs_b2048_r44100_z16.ts

Dataset: Hi-Fi TTS speaker 9017 (John Van Stan).

Model: RAVE v3, 44.1kHz, block size 2048, 16 latent dimensions.

voice_vctk_b2048_r44100_z22.ts

Dataset: CSTR VCTK Corpus multispeaker read speech dataset.

Model: RAVE v3, 44.1kHz, block size 2048, 22 latent dimensions.

voice_multivoice_b2048_r48000_z11.ts

Dataset: combination of speaking and singing voice datasets: CSTR VCTK Corpus, VocalSet, Children's Song Dataset, NUS-48E, attHACK.

Model: RAVE v3 with spectral discriminator, 48kHz, block size 2048, 11 latent dimensions.

Birds

birds_motherbird_b2048_r48000_z16.ts

This model of bird sounds was curated by Manuel Cherep, Jessica Shand and Jack Armitage for their piece Motherbird, performed at TENOR 2023 in Boston, May 2023.

Dataset: bird sounds.

Model: RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

birds_pluma_b2048_r48000_z12.ts

This model of bird sounds was curated by Giacomo Lepri for his instrument Pluma

Dataset: bird sounds.

Model: modified RAVE v1, 48kHz, block size 2048, 12 latent dimensions.

Pond Brain Marine Sounds

These models of marine sounds were trained for Jenna Sutela's Pond Brain installations at Copenhagen Contemporary and the Helsinki Biennial

Caution: these decoders sometimes produce a loud chirp on first initialization.

water_pondbrain_b2048_r48000_z16.ts

Dataset: water recordings from freesound.org.

list of freesound users

inspectorj, inchadney, aesqe, vonfleisch, javetakami, atomediadesign, kolezan, zabuhailo, zaziesound, repdac3, al_sub, lgarrett, uzbazur, lydmakeren, frenkfurth, edo333, boredtoinsanity, owl, kaydinhamby, tliedes, ilmari_freesound, manoslindos, l3ardoc, alexbuk, s-light

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

humpbacks_pondbrain_b2048_r48000_z20.ts

Dataset: humpback whale recordings from the Watkins database, MBARI, and BBC.

Model: modified RAVE v1, 48kHz, block size 2048, 20 latent dimensions.

marinemammals_pondbrain_b2048_r48000_z20.ts

Dataset: various marine mammal sounds from NOAA, the Watkins database, freesound users felixblume and geraldfiebig, and sound effects databases.

Model: modified RAVE v1, 48kHz, block size 2048, 20 latent dimensions.

Thales magnets_b2048_r48000_z8.ts

Dataset: One hour recording of magnets of different dimensions hitting each other or scratching wooden and metallic surfaces. Used for Thales, a musical instrument based on magnets

Model: RAVE v1, 48Khz, block size 2048, 8 latent dimensions.

Crozzoli's Music crozzoli_bigensemblesmusic_18d.ts

Dataset: Six recordings of long contemporary compositions for electronic and acoustic big ensembles.

Model: RAVE v3, 48Khz, block size 2048, 18 latent dimensions.

Aulus-les-Bains Dawn Chorus @ CAMP birds_dawnchorus_b2048_r48000_z8.ts

Dataset: ~230 minutes of dawn chorus recorded by Gregory White at Aulus-les-Bains as part of a residency at CAMPfr.com.

Model: RAVE v3, 48Khz, block size 2048, 8 latent dimensions.