Fixed a minor bug with region OCR always returning the first label with an erronious </s>

The raw output of florence here is:
tensor([[ 2, 0, 8108, 500, 50528, 50486, 50736, 50479, 50739, 50592,
50532, 50600, 2]], device='cuda:0')

The 2 token is </s>. Without this change, florence does not remove this in the OCR with regions case which results in the first label always having an extra </s> E.G.:
'labels': ['</s>SSR']}}

Files changed (1) hide show

processing_florence2.py +1 -0

processing_florence2.py CHANGED Viewed

@@ -722,6 +722,7 @@ class Florence2PostProcesser(object):
         bboxes = []
         labels = []
         text = text.replace('<s>', '')
         # ocr with regions
         parsed = re.findall(pattern, text)
         instances = []

         bboxes = []
         labels = []
         text = text.replace('<s>', '')
+        text = text.replace('</s>', '')
         # ocr with regions
         parsed = re.findall(pattern, text)
         instances = []