WHB139426
/

Grounded-Video-LLM

Model card Files Files and versions Community

Grounded-Video-LLM / README.md

WHB139426's picture

Update README.md

e6b4c09 verified 6 days ago

|

history blame contribute delete

No virus

1.06 kB

	---
	license: mit
	language:
	- en
	---

	# Grounded-VideoLLM Model Card
	Grounded-VideoLLM is a Video-LLM adept at fine-grained temporal grounding, which not only excels in grounding tasks such as temporal sentence grounding, dense video captioning, and grounded VideoQA, but also shows great potential as a versatile video assistant for general video understanding.

	## Model details

	Model date:
	Grounded-VideoLLM-Phi3.5-Vision-Instruct was trained in Oct. 2024.

	Paper or resources for more information:
	[Paper](https://arxiv.org/abs/2410.03290), [Code](https://github.com/WHB139426/Grounded-Video-LLM)

	## Citation
	If you find our project useful, hope you can star our repo and cite our paper as follows:

	```
	@article{wang2024grounded,
	title={Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models},
	author={Wang, Haibo and Xu, Zhiyang and Cheng, Yu and Diao, Shizhe and Zhou, Yufan and Cao, Yixin and Wang, Qifan and Ge, Weifeng and Huang, Lifu},
	journal={arXiv preprint arXiv:2410.03290},
	year={2024}
	}
	```