|
--- |
|
license: apache-2.0 |
|
--- |
|
# FeynModel V 0.1 |
|
|
|
|
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/645364cbf666f76551f93111/ZviQjj2NvCvl0R7IZiRai.png) |
|
|
|
#### Welcome to the FeynModel repository, a Vision Language model with the reasoning capabilities of an LLM (Large Language Model). It aims to explore the combined power of vision and language for scientific reasoning tasks. This model is fine-tuned using the LoRA (Low-Rank Adaptation) method, optimizing it for enhanced performance in a variety of vision and language tasks. |
|
|
|
#### Version 0.1 utilizes pretrained layers from the DaVit Vision Tower of Florence2-base (Microsoft) and Gemma2-2B (Google), and was fine-tuned on M3IT, COCO, and ScienceQA datasets. It employs an S6 block to integrate context memory for Q*TS (experimental). |
|
|
|
# how to use |
|
|
|
```python |
|
# make sur to have torch, transformers, pillow, einos ,einops and timm libraries |
|
from transformers import AutoProcessor, AutoModelForCausalLM |
|
model_id='Imagroune/feynmodel' |
|
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) |
|
model = AutoModelForCausalLM.from_pretrained(model_id,trust_remote_code=True) |
|
# if have a cuda device |
|
model.to('cuda') |
|
# else if you have cpu you can use |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_id, |
|
trust_remote_code=True, |
|
device_map='cpu' # Assure que le modèle est chargé sur le CPU |
|
,torch_dtype=torch.bfloat16 # Charger le modèle en demi-précision |
|
) |
|
``` |
|
|
|
# LLM Inference |
|
|
|
```python |
|
|
|
input_text = "<start_of_turn>user\nCombien d'helicoptère un humain adulte peut manger en un seul repas?.<end_of_turn> <start_of_turn>model\n" |
|
input_ids = processor.tokenizer(input_text, return_tensors="pt").to("cuda") |
|
|
|
# Génération du texte en mode streaming |
|
max_length = input_ids.input_ids.shape[1] + 1024 # Longueur maximale totale |
|
stream_output = [] # Liste pour stocker le flux de sortie |
|
|
|
# Génération et affichage en mode streaming |
|
for output in model.generate(input_ids=input_ids.input_ids,max_length=max_length, do_sample=True, temperature=0.7): |
|
decoded_output = processor.tokenizer.decode(output, skip_special_tokens=True) |
|
stream_output.append(decoded_output) |
|
print(decoded_output, end="", flush=True) |
|
|
|
``` |
|
|
|
#### it will output something like : |
|
|
|
``` |
|
This is a trick question! Here's why: |
|
|
|
* **Helicopters don't have food to eat.** Helicopters are machines that fly. They don't have mouths or stomachs! |
|
* **Humans don't fly through food.** We eat food to give our bodies energy. But we don't eat food that we can fly through! |
|
|
|
Let me know if you'd like to learn about how people eat different foods. |
|
|
|
``` |
|
|
|
# Vision Inference |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, StoppingCriteria, StoppingCriteriaList |
|
|
|
class PrintTokensStoppingCriteria(StoppingCriteria): |
|
def __init__(self, tokenizer): |
|
self.tokenizer = tokenizer |
|
|
|
def __call__(self, input_ids, scores, **kwargs): |
|
# Decode the last generated token and print it |
|
last_token_id = input_ids[0, -1].item() |
|
token = self.tokenizer.decode([last_token_id], skip_special_tokens=True) |
|
print(token, end='', flush=True) |
|
|
|
# Continue generating tokens until a stopping condition is met |
|
# Return True to stop, False to continue |
|
return False |
|
stopping_criteria = PrintTokensStoppingCriteria(processor.tokenizer) |
|
|
|
from PIL import Image |
|
import requests |
|
input_text = "<start_of_turn>user\n what is this ?<end_of_turn>\n<start_of_turn>model" |
|
|
|
|
|
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true" |
|
image = Image.open(requests.get(url, stream=True).raw) |
|
|
|
|
|
input_text="""<start_of_turn>user |
|
Create a concise caption that accurately describes the main elements in the image provided |
|
<end_of_turn> |
|
<start_of_turn>model |
|
|
|
""" |
|
inputs = processor(text=input_text, images=image, return_tensors="pt") |
|
inputs = {key: value.cuda() for key, value in inputs.items()} |
|
# NB : if you are using bflot16 ==> |
|
inputs = {key: value.to(dtype=model.dtype) if value.dtype == torch.float32 else value for key, value in inputs.items()} |
|
|
|
image |
|
``` |
|
|
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/645364cbf666f76551f93111/XVxraj69m26HtkfaWQhve.png) |
|
|
|
|
|
```python |
|
|
|
max_length =inputs['input_ids'].shape[1] + 1024 # Longueur maximale totale |
|
stream_output = [] # Liste pour stocker le flux de sortie |
|
# Génération et affichage en mode streaming |
|
ret= model.generate(inputs['input_ids'], pixel_values=inputs['pixel_values'],stopping_criteria=StoppingCriteriaList([stopping_criteria]),max_length=2048, do_sample=True, temperature=0.7) |
|
|
|
# An older, green car sits parked on the curb in front of a building. |
|
|
|
``` |