--- datasets: - ffurfaro/PixelBytes-Pokemon language: en library_name: pytorch license: mit pipeline_tag: text-to-image tags: - image-generation - text-generation - multimodal --- # PixelBytes: Unified Multimodal Generation Welcome to the **PixelBytes** repository! This project features models designed to generate text and images simultaneously, pixel by pixel, using a unified embedding. (only testing weight) ## Overview ### Key Concepts - **Image Transformer**: Generates images pixel by pixel. - **Bi-Mamba+**: A bidirectional model for time series prediction. - **MambaByte**: A selective state-space model without tokens. The PixelByte model generates mixed sequences of text and images, handling transitions with line breaks and maintaining image dimension consistency. ## Dataset We use the **PixelBytes-Pokemon** dataset, available on Hugging Face: [PixelBytes-Pokemon](https://huggingface.co/datasets/ffurfaro/PixelBytes-Pokemon). It contains text and image sequences of Pokémon for training our model. ## Models Trained - **10 LSTM Models**: (Uni-Bi)directional + 1, 2, 3 layers (including special config : p_embed + 3xhidden_state + 3xembedding_dim) - **3 Mamba Models**: Bidirectional + 1, 2 layers, Unidirectional + 2 layers - **2 Transformer Models**: 1, 2 layers Citation -------- Furfaro, F. (2024). PixelBytes: A Unified Multimodal Representation Learning Project. (https://github.com/fabienfrfr/PixelBytes) --- Thank you for exploring **PixelBytes**! We hope this model aids your multimodal generation projects.