Model Card: Falconsai/florence-2-invoice
- Developed by: Michael Stattelman for Falcons.ai
- Funded by [optional]: Falcons.ai
Model Sources:
Model Overview
Falconsai/florence-2-invoice
is a fine-tuned version of the microsoft/Florence-2-base-ft
model. This model has been specifically trained to identify and extract key fields from invoice images. The fine-tuning process utilized a curated dataset of invoices annotated to recognize the following fields:
- Billing address, - Discount percentage, - Due date
- Email client, - Header, - Invoice date
- Invoice number, - Name client, - Products
- Remise, - Shipping address, - Subtotal
- Tax, - Tax percentage, - Tel client, - Total
Base Model
The base model used for fine-tuning is microsoft/Florence-2-base-ft
, a state-of-the-art vision model developed by Microsoft.
Fine-tuning Configuration
The fine-tuning process was carried out using a Low-Rank Adaptation (LoRa) configuration with the following parameters:
LoraConfig(
r=8,
lora_alpha=8,
target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "linear", "Conv2d", "lm_head", "fc2"],
task_type="CAUSAL_LM",
lora_dropout=0.05,
bias="none",
inference_mode=False,
use_rslora=True,
init_lora_weights="gaussian",
revision=REVISION
)
Hardware Used
The fine-tuning process was conducted on an Alienware system, ensuring robust performance and efficient training.
Dataset
The model was trained on a curated dataset of invoice images. Each invoice was annotated to identify the specific fields listed above. This dataset ensured that the model learned to accurately detect and extract key information from various invoice formats.
Usage
Inference
To use this model for inference, you can load it via the Hugging Face Transformers library:
import torch
from PIL import Image
from transformers import (
AdamW,
AutoModelForCausalLM,
AutoProcessor,
get_scheduler
)
def run_florence_invoice(img, task_prompt, text_input=None):
image = Image.open(img)
# Ensure the image is in RGB format
if image.mode != "RGB":
image = image.convert("RGB")
model_id2 = "Falconsai/florence-2-invoice"
model = AutoModelForCausalLM.from_pretrained(model_id2, trust_remote_code=True).eval().cuda()
processor = AutoProcessor.from_pretrained(model_id2, trust_remote_code=True)
with torch.no_grad():
if text_input is None:
prompt = task_prompt
else:
prompt = task_prompt + text_input
inputs = processor(text=prompt, images=image, return_tensors="pt")
generated_ids = model.generate(
input_ids=inputs["input_ids"].cuda(),
pixel_values=inputs["pixel_values"].cuda(),
max_new_tokens=1024,
num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
parsed_answer = processor.post_process_generation(generated_text, task=task_prompt, image_size=(image.width, image.height))
del model
del processor
return parsed_answer
## Call the function as follows:
### Return all fields identified:
img = './invoice.png'
run_florence_invoice(img, '<OD>')
### Return Specific field
img = './invoice.png'
results = run_florence_invoice(img, "<CAPTION_TO_PHRASE_GROUNDING>", text_input="invoice date")
Applications
This model is ideal for automating the extraction of key information from invoices in various business and financial applications. It can significantly reduce the manual effort required for data entry and validation in accounting and bookkeeping processes.
Evaluation
The model has been evaluated on a held-out set of annotated invoice images. The evaluation metrics used included precision, recall, and F1-score for each of the identified fields. Detailed evaluation results and visualizations are available in the results
directory of the repository.
Limitations
- The model's performance is dependent on the quality and variability of the training dataset. It may not perform as well on invoices that significantly differ from those seen during training.
- Fine-tuning was conducted with specific LoRa configurations, which may need to be adjusted for different use cases or datasets.
Contact
For more information or questions about this model, please contact the developers at [[email protected]].
License
This model is licensed under the MIT License. See the LICENSE
file for more details.
Acknowledgments
We would like to thank Microsoft for the development of the Florence2 vision model and the broader machine learning community for their contributions and support.
- Downloads last month
- 37