ffurfaro commited on
Commit
3fa2b36
1 Parent(s): 67a4fa5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -3
README.md CHANGED
@@ -1,3 +1,34 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PixelBytes: Unified Multimodal Generation
2
+
3
+ Welcome to the **PixelBytes** repository! This project features models designed to generate text and images simultaneously, pixel by pixel, using a unified embedding.
4
+
5
+ ## Overview
6
+
7
+ ### Key Concepts
8
+ - **Image Transformer**: Generates images pixel by pixel.
9
+ - **Bi-Mamba+**: A bidirectional model for time series prediction.
10
+ - **MambaByte**: A selective state-space model without tokens.
11
+
12
+ The PixelByte model generates mixed sequences of text and images, handling transitions with line breaks and maintaining image dimension consistency.
13
+
14
+ ## Dataset
15
+
16
+ We use the **PixelBytes-Pokemon** dataset, available on Hugging Face: [PixelBytes-Pokemon](https://huggingface.co/datasets/ffurfaro/PixelBytes-Pokemon). It contains text and image sequences of Pokémon for training our model.
17
+
18
+ ## Models Trained
19
+
20
+ - **8 LSTM Models**: Bidirectional + 1, 2, 3 layers (including p_embed + bi-2 layers)
21
+ - **6 Mamba Models**: Bidirectional + 1, 2, 3 layers
22
+ - **3 Transformer Models**: 1, 2, 3 layers
23
+
24
+ ## Pre-test
25
+
26
+ Before training the LSTMs, we will test the pembed-bi-2 LSTM for generation. The model generates the next central element, reconstructing a 2D structure.
27
+
28
+ ---
29
+
30
+ Thank you for exploring **PixelBytes**! We hope this model aids your multimodal generation projects.
31
+
32
+ ---
33
+ license: mit
34
+ ---