metadata

library_name: transformers
tags:
  - multimodal
  - vision-language
license: apache-2.0
language:
  - en

Model Card for HPT

Hyper-Pretrained Transformers (HPT) is a novel multimodal LLM framework from HyperGAI, and has been trained for vision-language models that are capable of multimodal understanding for both textual and visual inputs. Here we release our best open-sourced Multimodal LLM HPT 1.5 Edge. Built with Microsoft Phi-3-mini, our hyper capable HPT 1.5 Edge packs a punch on real world understanding and complex reasoning. This repository contains the open-source weight to reproduce the evaluation results of HPT 1.5 Edge on different benchmarks.

For full details of this model please read our technical blog post

Run the model

Please use the scripts available in our Github repository to utilize the model.

Troubleshooting

Please report the issue at our Github repo

Pretrained models used

Pretrained LLM: Phi-3-mini-4k-instruct
Pretrained Visual Encoder: siglip-so400m-patch14-384

Disclaimer and Responsible Use

Note that the HPT Edge is a quick open release of our models to facilitate the open, responsible AI research and community development. It does not have any moderation mechanism and provides no guarantees on their results. We hope to engage with the community to make the model finely respect guardrails to allow adoptions in practical applications requiring moderated outputs.

Contact Us

Contact: [email protected]
Follow us on Twitter.
Follow us on Linkedin.
Visit our website to learn more about us.

License

This project is released under the Apache 2.0 license. Parts of this project contain code and models from other sources, which are subject to their respective licenses and you need to apply their respective license if you may want to use for commercial purposes.