RhapsodyAI
/

minicpm-visual-embedding-v0

Feature Extraction

information retrieval

embedding model

visual information retrieval

Model card Files Files and versions Community

bokesyo commited on Jul 13

Commit

c26facf

•

1 Parent(s): d917af7

Update README.md

Files changed (1) hide show

README.md +9 -2

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ The model only takes images as document-side inputs and produce vectors represen
 # News
-- 2024-07-14: We released a Gradio demo of `miniCPM-visual-embedding-v0`, take a look at [pipeline_gradio.py](https://huggingface.co/RhapsodyAI/minicpm-visual-embedding-v0/blob/main/pipeline_gradio.py). We consider hosting a huggingface space to deploy this.
 - 2024-07-13: We released a command-line based demo of `miniCPM-visual-embedding-v0` for users to retireve most relavant pages from a given PDF file (could be very long), take a look at [pipeline.py](https://huggingface.co/RhapsodyAI/minicpm-visual-embedding-v0/blob/main/pipeline.py).
@@ -92,6 +92,14 @@ print(scores)
 # tensor([[-0.0112,  0.3316,  0.2376]], device='cuda:0')
 ```
 # Limitations
 - This checkpoint is an alpha version, and may not be strong in your tasks, for bad case, please create an issue to let us know, many thanks!
@@ -100,7 +108,6 @@ print(scores)
 - The inference speed is low, because vision encoder uses `timm`, which does not yet support `flash-attn`.
 # Citation
 If you find our work useful, please consider cite us:

 # News
+- 2024-07-14: We released a Gradio demo of `miniCPM-visual-embedding-v0`, take a look at [pipeline_gradio.py](https://huggingface.co/RhapsodyAI/minicpm-visual-embedding-v0/blob/main/pipeline_gradio.py). You can run `pipeline_gradio.py` to build a demo on your PC.
 - 2024-07-13: We released a command-line based demo of `miniCPM-visual-embedding-v0` for users to retireve most relavant pages from a given PDF file (could be very long), take a look at [pipeline.py](https://huggingface.co/RhapsodyAI/minicpm-visual-embedding-v0/blob/main/pipeline.py).
 # tensor([[-0.0112,  0.3316,  0.2376]], device='cuda:0')
 ```
+# Todos
+- Release huggingface space demo.
+- Release the evaluation results.
+- Release technical report.
 # Limitations
 - This checkpoint is an alpha version, and may not be strong in your tasks, for bad case, please create an issue to let us know, many thanks!
 - The inference speed is low, because vision encoder uses `timm`, which does not yet support `flash-attn`.
 # Citation
 If you find our work useful, please consider cite us: