PixelBytes-Pokemon / README.md
ffurfaro's picture
Upload tokenizer
214fc0a verified
|
raw
history blame
No virus
1.31 kB
metadata
datasets:
  - ffurfaro/PixelBytes-Pokemon
language: en
library_name: pytorch
license: mit
pipeline_tag: text-to-image
tags:
  - image-generation
  - text-generation
  - multimodal

PixelBytes: Unified Multimodal Generation

Welcome to the PixelBytes repository! This project features models designed to generate text and images simultaneously, pixel by pixel, using a unified embedding.

Overview

Key Concepts

  • Image Transformer: Generates images pixel by pixel.
  • Bi-Mamba+: A bidirectional model for time series prediction.
  • MambaByte: A selective state-space model without tokens.

The PixelByte model generates mixed sequences of text and images, handling transitions with line breaks and maintaining image dimension consistency.

Dataset

We use the PixelBytes-Pokemon dataset, available on Hugging Face: PixelBytes-Pokemon. It contains text and image sequences of Pokémon for training our model.

Models Trained

  • 8 LSTM Models: Bidirectional + 1, 2, 3 layers (including p_embed + bi-2 layers)
  • 6 Mamba Models: Bidirectional + 1, 2, 3 layers
  • 3 Transformer Models: 1, 2, 3 layers

Thank you for exploring PixelBytes! We hope this model aids your multimodal generation projects.