EDGEwww25
/

EDGE-Model

Model card Files Files and versions Community

EDGE-Model / README.md

EDGEwww25's picture

update the model card

3f91224 29 days ago

|

history blame contribute delete

1.28 kB

	---
	license: mit
	datasets:
	- EDGEwww25/EDGE-Dataset
	- liuhaotian/LLaVA-Instruct-150K
	- echo840/Monkey_Data
	language:
	- en
	base_model:
	- echo840/Monkey-Chat
	---
	This is the model repository of paper EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data.


	The model is fine-tuned based on [Monkey](https://github.com/Yuliang-Liu/Monkey). In order to speed up the training, we also made some minor modifications:
	1. Instead of using the Lora Adapters in Monkey, the five patches of the raw image are stacked in an extra batch dimension and sent to the image encoder for processing at the same time.
	2. Inside the image encoder, we use [flash attention](https://github.com/Dao-AILab/flash-attention) instead of the manually implemented attention.
	3. Separate the step of reading the image from the forward propagation and make it a step of dataset preprocessing to speed up image reading using the `Dataloader` in pytorch.


	The training dataset (i.e. all training QAs in `.jsonl` format, excluding images) is published in repository [EDGE-Dataset](https://huggingface.co/datasets/EDGEwww25/EDGE-Dataset).

	The model training and inference scripts are published in anonymous repository [EDGE](https://anonymous.4open.science/r/EDGE-1CDB).