sayakpaul
/

mgie

Model card Files Files and versions Community

mgie / README.md

sayakpaul's picture

sayakpaul HF staff

Update README.md

82cef1c verified 8 months ago

|

2.87 kB

	---
	library_name: diffusers
	---

	# MGIE

	This repository contains the UNet and LLaVA model checkpoints from [Guiding Instruction-based Image Editing via Multimodal Large Language Models](https://arxiv.org/abs/2309.17102).

	For a detailed example of usage, refer to [this notebook](https://github.com/apple/ml-mgie/blob/main/demo.ipynb) and the [official repository](https://github.com/apple/ml-mgie). Additionally, this notebook is a memory-optimized version of the original one. This decouples the MGIE inference pipeline into two broad stages:

	1. Calculate all the embeddings in a batched manner with the LLaVA model and the edit head.
	2. Pop it off the memory to gain VRAM.
	3. Loads the InstructPix2Pix pipeline and performs editing.

	💡 MGIE needs additional set up steps that are important to follow before running inference. Please refer to the
	repository for those instructions. Importantly, it needs you to merge the LLaVA weight deltas with
	the original LLaMA parameters. More details are in the repository.


	## Processing ultra high-resolution images

	Since the [InstructPi2xPi2x pipeline](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pix2pix) doesn't do any internal processing
	to resize the input images, you might get OOMs when processing ultra high-resolution images
	like [this one](https://i.imgur.com/CiAbKbS.jpg).

	So, it's recommended to resize them, preserving their aspect-ratio. Here's a utility function that can be leveraged here:

	```python
	from diffusers.utils import load_image

	def resize_image_aspect_ratio(img_url, base_width=None, base_height=None):
	# Load the image
	img = load_image(img_url).convert("RGB")

	# Get the current width and height of the image
	width, height = img.size

	# Calculate the new dimensions based on the aspect ratio
	if base_width is not None:
	# Calculate new height based on the base_width to maintain aspect ratio
	w_percent = (base_width / float(width))
	h_size = int((float(height) * float(w_percent)))
	new_size = (base_width, h_size)
	elif base_height is not None:
	# Calculate new width based on the base_height to maintain aspect ratio
	h_percent = (base_height / float(height))
	w_size = int((float(width) * float(h_percent)))
	new_size = (w_size, base_height)
	else:
	raise ValueError("Either base_width or base_height must be provided")

	# Resize the image
	resized_img = img.resize(new_size, Image.ANTIALIAS)
	return resized_img
	```

	## Citation

	```
	@inproceedings{fu2024mgie,
	author = {Tsu-Jui Fu and Wenze Hu and Xianzhi Du and William Yang Wang and Yinfei Yang, and Zhe Gan},
	title = {{Guiding Instruction-based Image Editing via Multimodal Large Language Models}},
	booktitle = {International Conference on Learning Representations (ICLR)},
	year = {2024}
	}
	```