ffurfaro
/

PixelBytes-Pokemon

image-generation

text-generation

Model card Files Files and versions Community

ffurfaro commited on Aug 26

Commit

3fa2b36

•

1 Parent(s): 67a4fa5

Update README.md

Files changed (1) hide show

README.md +34 -3

README.md CHANGED Viewed

@@ -1,3 +1,34 @@
----
-license: mit
----

+# PixelBytes: Unified Multimodal Generation
+Welcome to the **PixelBytes** repository! This project features models designed to generate text and images simultaneously, pixel by pixel, using a unified embedding.
+## Overview
+### Key Concepts
+- **Image Transformer**: Generates images pixel by pixel.
+- **Bi-Mamba+**: A bidirectional model for time series prediction.
+- **MambaByte**: A selective state-space model without tokens.
+The PixelByte model generates mixed sequences of text and images, handling transitions with line breaks and maintaining image dimension consistency.
+## Dataset
+We use the **PixelBytes-Pokemon** dataset, available on Hugging Face: [PixelBytes-Pokemon](https://huggingface.co/datasets/ffurfaro/PixelBytes-Pokemon). It contains text and image sequences of Pokémon for training our model.
+## Models Trained
+- **8 LSTM Models**: Bidirectional + 1, 2, 3 layers (including p_embed + bi-2 layers)
+- **6 Mamba Models**: Bidirectional + 1, 2, 3 layers
+- **3 Transformer Models**: 1, 2, 3 layers
+## Pre-test
+Before training the LSTMs, we will test the pembed-bi-2 LSTM for generation. The model generates the next central element, reconstructing a 2D structure.
+---
+Thank you for exploring **PixelBytes**! We hope this model aids your multimodal generation projects.
+---
+license: mit
+---