File size: 1,282 Bytes

3f91224

---
license: mit
datasets:
- EDGEwww25/EDGE-Dataset
- liuhaotian/LLaVA-Instruct-150K
- echo840/Monkey_Data
language:
- en
base_model:
- echo840/Monkey-Chat
---
This is the model repository of paper *EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data*.


The model is fine-tuned based on [*Monkey*](https://github.com/Yuliang-Liu/Monkey). In order to speed up the training, we also made some minor modifications:
1. Instead of using the Lora Adapters in *Monkey*, the five patches of the raw image are stacked in an extra batch dimension and sent to the image encoder for processing at the same time.
2. Inside the image encoder, we use [*flash attention*](https://github.com/Dao-AILab/flash-attention) instead of the manually implemented attention.
3. Separate the step of reading the image from the forward propagation and make it a step of dataset preprocessing to speed up image reading using the `Dataloader` in pytorch.


The training dataset (i.e. all training QAs in `.jsonl` format, excluding images) is published in repository [*EDGE-Dataset*](https://huggingface.co/datasets/EDGEwww25/EDGE-Dataset).

The model training and inference scripts are published in anonymous repository [*EDGE*](https://anonymous.4open.science/r/EDGE-1CDB).