Architext GPT-J 162M

Model Description

Architext GPT-J-162M is a transformer model trained using Ben Wang's Mesh Transformer JAX on the Pile and finetuned specifically on a synthetically generated dataset of architectural layouts of apartments. It is capable of generating a large diversity of designs, in a convenient geometric representation that can be used downstream in different design workflows, using just a natural language prompt.

The model consists of 12 layers with a model dimension of 768, and a feedforward dimension of 2048. The model dimension is split into 16 heads, each with a dimension of 256. Rotary Position Embedding (RoPE) is applied to 64 dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as GPT-2/GPT-3.

Training data

GPT-J 162B was pre-trained on the Pile, a large-scale curated dataset created by EleutherAI. It was then finetuned on synthetically generated data that was procedurally generated using the Rhinocers/Grasshopper software suite. The model was finetuned for 1.25 billion tokens over 11,500 steps on TPU v3-8. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.

Intended Use and Limitations

Architext models learn an inner representation of the architectural design that can be used to generate a larger diversity of geometric designs and can be useful for many downstream design workflows and tasks. While it could be adapted to many different design outputs, the model is best at generating residential floor plans given a natural language prompt.

How to use

This model can be easily loaded using the AutoModelForCausalLM functionality:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("architext/gptj-162M")
model = AutoModelForCausalLM.from_pretrained("architext/gptj-162M")

Limitations and Biases

The core functionality of Architext is taking a string of text and generating a design output, by still continuously predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work especially in the design context. Architext will often generate a design that is not semantically correct, depending on the prompt description it was given, although it almost always generates designs that are valid (non intersecting spaces, no orphan rooms). It is also limited within a small diversity of natural language prompts, specifically prompts that describe:

typology: "a house with two bedrooms and three bathrooms" or "a house with six rooms"
adjacency: "the bedroom is adjacent to the living room" or "the kitchen is not adjacent to the bathroom"
location: "the bedroom is in the north side of the house" or "a bedroom is in the south east side of the house"

Of course, the designs that are generated are conceptual designs and one should never depend on Architext to directly generate accurate construction documentation.

Citation and Related Information

BibTeX entry

To cite this model:

@article{galanos2023architext,
  title={Architext: Language-Driven Generative Architecture Design},
  author={Galanos, Theodoros and Liapis, Antonios and Yannakakis, Georgios N},
  journal={arXiv preprint arXiv:2303.07519},
  year={2023}
}

To cite the codebase that trained this model:

@misc{mesh-transformer-jax,
  author = {Wang, Ben},
  title = {{Mesh-Transformer-JAX: Model-Parallel Implementation of Transformer Language Model with JAX}},
  howpublished = {\url{https://github.com/kingoflolz/mesh-transformer-jax}},
  year = 2021,
  month = May
}

Acknowledgements

This project would not have been possible without compute generously provided by Google through the TPU Research Cloud that generously provided access to Clout TPU VMs used to finetune this model.

architext
/

gptj-162M