--- license: apache-2.0 datasets: - Reself/AuroraCap-trainset base_model: - lmsys/vicuna-7b-v1.5-16k tags: - caption model-index: - name: AuroraCap-7B results: - task: type: video detailed caption dataset: type: VDC name: VDC metrics: - type: Acc value: 38.21 name: VDCScore - type: Acc value: 48.33 name: VDD - type: cider value: 9.51 - type: bleu value: 30.90 name: bleu@1 - type: bleu value: 4.06 name: bleu@4 - type: meteor value: 19.09 - type: rouge value: 21.58 name: rouge-l - task: type: video caption dataset: type: MSR-VTT name: NSR-VTT metrics: - type: cider value: 33.1 - type: bleu value: 58.6 name: bleu@1 - type: bleu value: 21.0 name: bleu@4 - type: meteor value: 23.9 - type: rouge value: 49.5 name: rouge-l - task: type: video caption dataset: type: VATEX name: VATEX metrics: - type: cider value: 33.8 - type: bleu value: 57.1 name: bleu@1 - type: bleu value: 18.4 name: bleu@4 - type: meteor value: 19.0 - type: rouge value: 40.8 name: rouge-l - task: type: video question anwering dataset: type: ActivityNet name: ActivityNet metrics: - type: Acc value: 61.8 - task: type: video question anwering dataset: type: MSVD name: MSVD metrics: - type: Acc value: 62.6 - task: type: video question anwering dataset: type: MSR-VTT name: MSR-VTT metrics: - type: Acc value: 43.5 - task: type: video question anwering dataset: type: iVQA name: iVQA metrics: - type: Acc value: 55.2 --- ## Resources - [Website](https://rese1f.github.io/aurora-web/) - [arXiv: Paper]() - [GitHub: Code](https://github.com/rese1f/aurora) - [Huggingface: AuroraCap Model](https://huggingface.co/collections/Reself/auroracap-66d117ffe13bedda96702013) - [Huggingface: VDC Benchmark](https://huggingface.co/datasets/Reself/Video-Detailed-Caption) - [Huggingface: Trainset](https://huggingface.co/datasets/Reself/AuroraCap-trainset) ## Features AuroraCap is a multimodal large language model for image and video captioning. ## Quick Start See [Docs](https://github.com/rese1f/aurora/blob/main/docs/auroracap/README.md). ## Citation