---
license: apache-2.0
datasets:
- Reself/AuroraCap-trainset
base_model:
- lmsys/vicuna-7b-v1.5-16k
tags:
- caption
model-index:
- name: AuroraCap-7B
  results:
  - task:
      type: video detailed caption
    dataset:
      type: VDC
      name: VDC
    metrics:
      - type: Acc
        value: 38.21
        name: VDCScore
      - type: Acc
        value: 48.33
        name: VDD
      - type: cider
        value: 9.51
      - type: bleu
        value: 30.90
        name: bleu@1
      - type: bleu
        value: 4.06
        name: bleu@4
      - type: meteor
        value: 19.09
      - type: rouge
        value: 21.58
        name: rouge-l
  - task:
      type: video caption
    dataset:
      type: MSR-VTT
      name: NSR-VTT
    metrics:
      - type: cider
        value: 33.1
      - type: bleu
        value: 58.6
        name: bleu@1
      - type: bleu
        value: 21.0
        name: bleu@4
      - type: meteor
        value: 23.9
      - type: rouge
        value: 49.5
        name: rouge-l
  - task:
      type: video caption
    dataset:
      type: VATEX
      name: VATEX
    metrics:
      - type: cider
        value: 33.8
      - type: bleu
        value: 57.1
        name: bleu@1
      - type: bleu
        value: 18.4
        name: bleu@4
      - type: meteor
        value: 19.0
      - type: rouge
        value: 40.8
        name: rouge-l
  - task:
      type: video question anwering
    dataset:
      type: ActivityNet
      name: ActivityNet
    metrics:
      - type: Acc
        value: 61.8
  - task:
      type: video question anwering
    dataset:
      type: MSVD
      name: MSVD
    metrics:
      - type: Acc
        value: 62.6
  - task:
      type: video question anwering
    dataset:
      type: MSR-VTT
      name: MSR-VTT
    metrics:
      - type: Acc
        value: 43.5
  - task:
      type: video question anwering
    dataset:
      type: iVQA
      name: iVQA
    metrics:
      - type: Acc
        value: 55.2
---

<img src="assets/teaser.png" align="center">

## Resources

- [Website](https://rese1f.github.io/aurora-web/)
- [arXiv: Paper]()
- [GitHub: Code](https://github.com/rese1f/aurora)
- [Huggingface: AuroraCap Model](https://huggingface.co/collections/Reself/auroracap-66d117ffe13bedda96702013)
- [Huggingface: VDC Benchmark](https://huggingface.co/datasets/Reself/Video-Detailed-Caption)
- [Huggingface: Trainset](https://huggingface.co/datasets/Reself/AuroraCap-trainset)
  
## Features

<img src="assets/vdc_baseline.png" align="center">

AuroraCap is a multimodal large language model for image and video captioning. 

## Quick Start
See [Docs](https://github.com/rese1f/aurora/blob/main/docs/auroracap/README.md).
## Citation