|
--- |
|
datasets: |
|
- ffurfaro/PixelBytes-Pokemon |
|
language: en |
|
library_name: pytorch |
|
license: mit |
|
pipeline_tag: text-to-image |
|
tags: |
|
- image-generation |
|
- text-generation |
|
- multimodal |
|
--- |
|
|
|
# PixelBytes: Unified Multimodal Generation |
|
|
|
Welcome to the **PixelBytes** repository! This project features models designed to generate text and images simultaneously, pixel by pixel, using a unified embedding. (only testing weight) |
|
|
|
## Overview |
|
|
|
### Key Concepts |
|
- **Image Transformer**: Generates images pixel by pixel. |
|
- **Bi-Mamba+**: A bidirectional model for time series prediction. |
|
- **MambaByte**: A selective state-space model without tokens. |
|
|
|
The PixelByte model generates mixed sequences of text and images, handling transitions with line breaks and maintaining image dimension consistency. |
|
|
|
## Dataset |
|
|
|
We use the **PixelBytes-Pokemon** dataset, available on Hugging Face: [PixelBytes-Pokemon](https://huggingface.co/datasets/ffurfaro/PixelBytes-Pokemon). It contains text and image sequences of Pokémon for training our model. |
|
|
|
## Models Trained |
|
|
|
- **10 LSTM Models**: (Uni-Bi)directional + 1, 2, 3 layers (including special config : p_embed + 3xhidden_state + 3xembedding_dim) |
|
- **3 Mamba Models**: Bidirectional + 1, 2 layers, Unidirectional + 2 layers |
|
- **2 Transformer Models**: 1, 2 layers |
|
|
|
Citation |
|
-------- |
|
|
|
Furfaro, F. (2024). PixelBytes: A Unified Multimodal Representation Learning Project. |
|
|
|
|
|
--- |
|
|
|
Thank you for exploring **PixelBytes**! We hope this model aids your multimodal generation projects. |