cerebras
/

Cerebras-LLaVA-7B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Cerebras-LLaVA-7B / README.md

aarticerebras's picture

Update README.md

ce72365 verified 7 months ago

|

2.41 kB

	---
	license: apache-2.0
	---
	# Model Card for cerebras/Cerebras-LLaVA-7B

	The checkpoints consists of Language encoder and projector weights of multimodal LLaVA-7B model trained with our Cerebras implementation and training recipe.
	The vision encoder checkpoints for this model can be found at [cerebras/Cerebras-ViT-L-336-patch14-llava7b-ShareGPT4V](https://huggingface.co/cerebras/Cerebras-ViT-L-336-patch14-llava7b-ShareGPT4V)

	Note: _ShareGPT4V_ is added to the vision model name to ensure correct loading of checkpoints in [LLaVA source repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/model/multimodal_encoder/builder.py#L8)

	For full details of this model and training details, please read our upcoming blog post.

	## License


	## Model Architecture
	Cerebras-LLaVA-7B is a transformer model with the following architecture details
	* Vision encoder: [CLIP-VisionModel-Large](cerebras/Cerebras-ViT-L-336-patch14-llava7b-ShareGPT4V). It handles images of size 336 x 336 with patch size of 14
	* Large Language Model: Pretrained from Vicuna-7B checkpoints and instruction finetuned on various datasets.
	* Projector: the projector module that connects the LLM and Vision encoder part consists of two linear layers with gelu activation (mlp2x-gelu)

	## Loading the model

	This model can directly be loaded using the [LLaVa source code repository](https://github.com/haotian-liu/LLaVA). For installation, please refer to the [instructions in source code repository](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#install).
	We perform all our evaluations using the LLaVA source code repository scripts.

	```
	from llava.model.builder import load_pretrained_model
	from llava.mm_utils import get_model_name_from_path
	from llava.eval.run_llava import eval_model

	model_path = "cerebras/Cerebras-LLaVA-7B"

	tokenizer, model, image_processor, context_len = load_pretrained_model(
	model_path=model_path,
	model_base=None,
	model_name=get_model_name_from_path(model_path)
	)
	```

	## Intended Use
	Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.

	Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence


	## Acknowledgements
	We are thankful to all Cerebras engineers that made this work possible.