feynmodel / README.md

Update README.md

e7920db verified about 1 month ago

4.75 kB

	---
	license: apache-2.0
	---
	# FeynModel V 0.1



	![image/png](https://cdn-uploads.huggingface.co/production/uploads/645364cbf666f76551f93111/ZviQjj2NvCvl0R7IZiRai.png)

	#### Welcome to the FeynModel repository, a Vision Language model with the reasoning capabilities of an LLM (Large Language Model). It aims to explore the combined power of vision and language for scientific reasoning tasks. This model is fine-tuned using the LoRA (Low-Rank Adaptation) method, optimizing it for enhanced performance in a variety of vision and language tasks.

	#### Version 0.1 utilizes pretrained layers from the DaVit Vision Tower of Florence2-base (Microsoft) and Gemma2-2B (Google), and was fine-tuned on M3IT, COCO, and ScienceQA datasets. It employs an S6 block to integrate context memory for Q*TS (experimental).

	# how to use

	```python
	# make sur to have torch, transformers, pillow, einos ,einops and timm libraries
	from transformers import AutoProcessor, AutoModelForCausalLM
	model_id='Imagroune/feynmodel'
	processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(model_id,trust_remote_code=True)
	# if have a cuda device
	model.to('cuda')
	# else if you have cpu you can use
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	trust_remote_code=True,
	device_map='cpu' # Assure que le modèle est chargé sur le CPU
	,torch_dtype=torch.bfloat16 # Charger le modèle en demi-précision
	)
	```

	# LLM Inference

	```python

	input_text = "<start_of_turn>user\nCombien d'helicoptère un humain adulte peut manger en un seul repas?.<end_of_turn> <start_of_turn>model\n"
	input_ids = processor.tokenizer(input_text, return_tensors="pt").to("cuda")

	# Génération du texte en mode streaming
	max_length = input_ids.input_ids.shape[1] + 1024 # Longueur maximale totale
	stream_output = [] # Liste pour stocker le flux de sortie

	# Génération et affichage en mode streaming
	for output in model.generate(input_ids=input_ids.input_ids,max_length=max_length, do_sample=True, temperature=0.7):
	decoded_output = processor.tokenizer.decode(output, skip_special_tokens=True)
	stream_output.append(decoded_output)
	print(decoded_output, end="", flush=True)

	```

	#### it will output something like :

	```
	This is a trick question! Here's why:

	* Helicopters don't have food to eat. Helicopters are machines that fly. They don't have mouths or stomachs!
	* Humans don't fly through food. We eat food to give our bodies energy. But we don't eat food that we can fly through!

	Let me know if you'd like to learn about how people eat different foods.

	```

	# Vision Inference

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, StoppingCriteria, StoppingCriteriaList

	class PrintTokensStoppingCriteria(StoppingCriteria):
	def __init__(self, tokenizer):
	self.tokenizer = tokenizer

	def __call__(self, input_ids, scores, **kwargs):
	# Decode the last generated token and print it
	last_token_id = input_ids[0, -1].item()
	token = self.tokenizer.decode([last_token_id], skip_special_tokens=True)
	print(token, end='', flush=True)

	# Continue generating tokens until a stopping condition is met
	# Return True to stop, False to continue
	return False
	stopping_criteria = PrintTokensStoppingCriteria(processor.tokenizer)

	from PIL import Image
	import requests
	input_text = "<start_of_turn>user\n what is this ?<end_of_turn>\n<start_of_turn>model"


	url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
	image = Image.open(requests.get(url, stream=True).raw)


	input_text="""<start_of_turn>user
	Create a concise caption that accurately describes the main elements in the image provided
	<end_of_turn>
	<start_of_turn>model

	"""
	inputs = processor(text=input_text, images=image, return_tensors="pt")
	inputs = {key: value.cuda() for key, value in inputs.items()}
	# NB : if you are using bflot16 ==>
	inputs = {key: value.to(dtype=model.dtype) if value.dtype == torch.float32 else value for key, value in inputs.items()}

	image
	```


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/645364cbf666f76551f93111/XVxraj69m26HtkfaWQhve.png)


	```python

	max_length =inputs['input_ids'].shape[1] + 1024 # Longueur maximale totale
	stream_output = [] # Liste pour stocker le flux de sortie
	# Génération et affichage en mode streaming
	ret= model.generate(inputs['input_ids'], pixel_values=inputs['pixel_values'],stopping_criteria=StoppingCriteriaList([stopping_criteria]),max_length=2048, do_sample=True, temperature=0.7)

	# An older, green car sits parked on the curb in front of a building.

	```