Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM
Abstract
In this technical report, we propose ChemVLM, the first open-source multimodal large language model dedicated to the fields of chemistry, designed to address the incompatibility between chemical image understanding and text analysis. Built upon the VIT-MLP-LLM architecture, we leverage ChemLLM-20B as the foundational large model, endowing our model with robust capabilities in understanding and utilizing chemical text knowledge. Additionally, we employ InternVIT-6B as a powerful image encoder. We have curated high-quality data from the chemical domain, including molecules, reaction formulas, and chemistry examination data, and compiled these into a bilingual multimodal question-answering dataset. We test the performance of our model on multiple open-source benchmarks and three custom evaluation sets. Experimental results demonstrate that our model achieves excellent performance, securing state-of-the-art results in five out of six involved tasks. Our model can be found at https://huggingface.co/AI4Chem/ChemVLM-26B.
Community
🚀 Introducing ChemVLM, the first open-source multimodal large language model dedicated to chemistry!
🌟Comparable performances with commercial models or specific OCR model but with dialogue capabilities!
✨2B/26B Models Here! https://huggingface.co/AI4Chem/ChemVLM-26B
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models (2024)
- Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation (2024)
- PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes (2024)
- On Pre-training of Multimodal Language Models Customized for Chart Understanding (2024)
- MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper