--- license: apache-2.0 --- # FeynModel V 0.1 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645364cbf666f76551f93111/ZviQjj2NvCvl0R7IZiRai.png) #### Welcome to the FeynModel repository, a Vision Language model with the reasoning capabilities of an LLM (Large Language Model). It aims to explore the combined power of vision and language for scientific reasoning tasks. This model is fine-tuned using the LoRA (Low-Rank Adaptation) method, optimizing it for enhanced performance in a variety of vision and language tasks. #### Version 0.1 utilizes pretrained layers from the DaVit Vision Tower of Florence2-base (Microsoft) and Gemma2-2B (Google), and was fine-tuned on M3IT, COCO, and ScienceQA datasets. It employs an S6 block to integrate context memory for Q*TS (experimental). # how to use ```python # make sur to have torch, transformers, pillow, einos ,einops and timm libraries from transformers import AutoProcessor, AutoModelForCausalLM model_id='Imagroune/feynmodel' processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(model_id,trust_remote_code=True) # if have a cuda device model.to('cuda') # else if you have cpu you can use model = AutoModelForCausalLM.from_pretrained( model_id, trust_remote_code=True, device_map='cpu' # Assure que le modèle est chargé sur le CPU ,torch_dtype=torch.bfloat16 # Charger le modèle en demi-précision ) ``` # LLM Inference ```python input_text = "user\nCombien d'helicoptère un humain adulte peut manger en un seul repas?. model\n" input_ids = processor.tokenizer(input_text, return_tensors="pt").to("cuda") # Génération du texte en mode streaming max_length = input_ids.input_ids.shape[1] + 1024 # Longueur maximale totale stream_output = [] # Liste pour stocker le flux de sortie # Génération et affichage en mode streaming for output in model.generate(input_ids=input_ids.input_ids,max_length=max_length, do_sample=True, temperature=0.7): decoded_output = processor.tokenizer.decode(output, skip_special_tokens=True) stream_output.append(decoded_output) print(decoded_output, end="", flush=True) ``` #### it will output something like : ``` This is a trick question! Here's why: * **Helicopters don't have food to eat.** Helicopters are machines that fly. They don't have mouths or stomachs! * **Humans don't fly through food.** We eat food to give our bodies energy. But we don't eat food that we can fly through! Let me know if you'd like to learn about how people eat different foods. ``` # Vision Inference ```python from transformers import AutoTokenizer, AutoModelForCausalLM, StoppingCriteria, StoppingCriteriaList class PrintTokensStoppingCriteria(StoppingCriteria): def __init__(self, tokenizer): self.tokenizer = tokenizer def __call__(self, input_ids, scores, **kwargs): # Decode the last generated token and print it last_token_id = input_ids[0, -1].item() token = self.tokenizer.decode([last_token_id], skip_special_tokens=True) print(token, end='', flush=True) # Continue generating tokens until a stopping condition is met # Return True to stop, False to continue return False stopping_criteria = PrintTokensStoppingCriteria(processor.tokenizer) from PIL import Image import requests input_text = "user\n what is this ?\nmodel" url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true" image = Image.open(requests.get(url, stream=True).raw) input_text="""user Create a concise caption that accurately describes the main elements in the image provided model """ inputs = processor(text=input_text, images=image, return_tensors="pt") inputs = {key: value.cuda() for key, value in inputs.items()} # NB : if you are using bflot16 ==> inputs = {key: value.to(dtype=model.dtype) if value.dtype == torch.float32 else value for key, value in inputs.items()} image ``` ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645364cbf666f76551f93111/XVxraj69m26HtkfaWQhve.png) ```python max_length =inputs['input_ids'].shape[1] + 1024 # Longueur maximale totale stream_output = [] # Liste pour stocker le flux de sortie # Génération et affichage en mode streaming ret= model.generate(inputs['input_ids'], pixel_values=inputs['pixel_values'],stopping_criteria=StoppingCriteriaList([stopping_criteria]),max_length=2048, do_sample=True, temperature=0.7) # An older, green car sits parked on the curb in front of a building. ```