Spaces:

tomofi
/

MaskTextSpotterV3-OCR

Runtime error

App Files Files Community

3v324v23 commited on Apr 5, 2022

Commit

c310e19

•

1 Parent(s): 5c20bda

add

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

LICENSE.md +159 -0
app.py +39 -0
configs/mixtrain/seg_rec_poly_fuse_feature.yaml +97 -0
configs/pretrain/seg_rec_poly_fuse_feature.yaml +94 -0
evaluation/icdar2015/e2e/prepare_results.py +263 -0
evaluation/icdar2015/e2e/rrc_evaluation_funcs.py +369 -0
evaluation/icdar2015/e2e/script.py +461 -0
evaluation/icdar2015/gt.zip +0 -0
evaluation/rotated_icdar2013/e2e/prepare_results.py +267 -0
evaluation/rotated_icdar2013/e2e/rrc_evaluation_funcs.py +369 -0
evaluation/rotated_icdar2013/e2e/script.py +460 -0
evaluation/rotated_icdar2013/gt/gt.zip +0 -0
evaluation/rotated_icdar2013/gt/gt_-15.zip +0 -0
evaluation/rotated_icdar2013/gt/gt_-30.zip +0 -0
evaluation/rotated_icdar2013/gt/gt_-45.zip +0 -0
evaluation/rotated_icdar2013/gt/gt_-60.zip +0 -0
evaluation/rotated_icdar2013/gt/gt_-75.zip +0 -0
evaluation/rotated_icdar2013/gt/gt_-90.zip +0 -0
evaluation/rotated_icdar2013/gt/gt_0.zip +0 -0
evaluation/rotated_icdar2013/gt/gt_15.zip +0 -0
evaluation/rotated_icdar2013/gt/gt_30.zip +0 -0
evaluation/rotated_icdar2013/gt/gt_45.zip +0 -0
evaluation/rotated_icdar2013/gt/gt_60.zip +0 -0
evaluation/rotated_icdar2013/gt/gt_75.zip +0 -0
evaluation/rotated_icdar2013/gt/gt_85.zip +0 -0
evaluation/rotated_icdar2013/gt/gt_90.zip +0 -0
evaluation/totaltext/e2e/prepare_results.py +234 -0
evaluation/totaltext/e2e/rrc_evaluation_funcs.py +369 -0
evaluation/totaltext/e2e/rrc_evaluation_funcs_total_text.py +363 -0
evaluation/totaltext/e2e/script.py +452 -0
evaluation/totaltext/gt.zip +0 -0
evaluation/weighted_editdistance.py +55 -0
example1.jpg +0 -0
example2.jpg +0 -0
example3.jpg +0 -0
maskrcnn_benchmark/config/__init__.py +2 -0
maskrcnn_benchmark/config/defaults.py +373 -0
maskrcnn_benchmark/config/paths_catalog.py +237 -0
maskrcnn_benchmark/csrc/ROIAlign.h +46 -0
maskrcnn_benchmark/csrc/ROIPool.h +48 -0
maskrcnn_benchmark/csrc/SigmoidFocalLoss.h +41 -0
maskrcnn_benchmark/csrc/cpu/ROIAlign_cpu.cpp +257 -0
maskrcnn_benchmark/csrc/cpu/nms_cpu.cpp +75 -0
maskrcnn_benchmark/csrc/cpu/vision.h +16 -0
maskrcnn_benchmark/csrc/cuda/ROIAlign_cuda.cu +346 -0
maskrcnn_benchmark/csrc/cuda/ROIPool_cuda.cu +202 -0
maskrcnn_benchmark/csrc/cuda/SigmoidFocalLoss_cuda.cu +189 -0
maskrcnn_benchmark/csrc/cuda/deform_conv_cuda.cu +691 -0
maskrcnn_benchmark/csrc/cuda/deform_conv_kernel_cuda.cu +874 -0
maskrcnn_benchmark/csrc/cuda/deform_pool_cuda.cu +87 -0

LICENSE.md ADDED Viewed

	@@ -0,0 +1,159 @@

+# Creative Commons Attribution-NonCommercial 4.0 International
+Creative Commons Corporation (“Creative Commons”) is not a law firm and does not provide legal services or legal advice. Distribution of Creative Commons public licenses does not create a lawyer-client or other relationship. Creative Commons makes its licenses and related information available on an “as-is” basis. Creative Commons gives no warranties regarding its licenses, any material licensed under their terms and conditions, or any related information. Creative Commons disclaims all liability for damages resulting from their use to the fullest extent possible.
+### Using Creative Commons Public Licenses
+Creative Commons public licenses provide a standard set of terms and conditions that creators and other rights holders may use to share original works of authorship and other material subject to copyright and certain other rights specified in the public license below. The following considerations are for informational purposes only, are not exhaustive, and do not form part of our licenses.
+* __Considerations for licensors:__ Our public licenses are intended for use by those authorized to give the public permission to use material in ways otherwise restricted by copyright and certain other rights. Our licenses are irrevocable. Licensors should read and understand the terms and conditions of the license they choose before applying it. Licensors should also secure all rights necessary before applying our licenses so that the public can reuse the material as expected. Licensors should clearly mark any material not subject to the license. This includes other CC-licensed material, or material used under an exception or limitation to copyright. [More considerations for licensors](http://wiki.creativecommons.org/Considerations_for_licensors_and_licensees#Considerations_for_licensors).
+* __Considerations for the public:__ By using one of our public licenses, a licensor grants the public permission to use the licensed material under specified terms and conditions. If the licensor’s permission is not necessary for any reason–for example, because of any applicable exception or limitation to copyright–then that use is not regulated by the license. Our licenses grant only permissions under copyright and certain other rights that a licensor has authority to grant. Use of the licensed material may still be restricted for other reasons, including because others have copyright or other rights in the material. A licensor may make special requests, such as asking that all changes be marked or described. Although not required by our licenses, you are encouraged to respect those requests where reasonable. [More considerations for the public](http://wiki.creativecommons.org/Considerations_for_licensors_and_licensees#Considerations_for_licensees).
+## Creative Commons Attribution-NonCommercial 4.0 International Public License
+By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-NonCommercial 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions.
+### Section 1 – Definitions.
+a. __Adapted Material__ means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image.
+b. __Adapter's License__ means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License.
+c. __Copyright and Similar Rights__ means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights.
+d. __Effective Technological Measures__ means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements.
+e. __Exceptions and Limitations__ means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material.
+f. __Licensed Material__ means the artistic or literary work, database, or other material to which the Licensor applied this Public License.
+g. __Licensed Rights__ means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license.
+h. __Licensor__ means the individual(s) or entity(ies) granting rights under this Public License.
+i. __NonCommercial__ means not primarily intended for or directed towards commercial advantage or monetary compensation. For purposes of this Public License, the exchange of the Licensed Material for other material subject to Copyright and Similar Rights by digital file-sharing or similar means is NonCommercial provided there is no payment of monetary compensation in connection with the exchange.
+j. __Share__ means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them.
+k. __Sui Generis Database Rights__ means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world.
+l. __You__ means the individual or entity exercising the Licensed Rights under this Public License. __Your__ has a corresponding meaning.
+### Section 2 – Scope.
+a. ___License grant.___
+   1. Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to:
+       A. reproduce and Share the Licensed Material, in whole or in part, for NonCommercial purposes only; and
+       B. produce, reproduce, and Share Adapted Material for NonCommercial purposes only.
+   2. __Exceptions and Limitations.__ For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions.
+   3. __Term.__ The term of this Public License is specified in Section 6(a).
+   4. __Media and formats; technical modifications allowed.__ The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a)(4) never produces Adapted Material.
+   5. __Downstream recipients.__
+        A. __Offer from the Licensor – Licensed Material.__ Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License.
+        B. __No downstream restrictions.__ You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material.
+   6. __No endorsement.__ Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i).
+b. ___Other rights.___
+   1. Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise.
+   2. Patent and trademark rights are not licensed under this Public License.
+   3. To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties, including when the Licensed Material is used other than for NonCommercial purposes.
+### Section 3 – License Conditions.
+Your exercise of the Licensed Rights is expressly made subject to the following conditions.
+a. ___Attribution.___
+   1. If You Share the Licensed Material (including in modified form), You must:
+       A. retain the following if it is supplied by the Licensor with the Licensed Material:
+         i. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated);
+         ii. a copyright notice;
+         iii. a notice that refers to this Public License;
+         iv. a notice that refers to the disclaimer of warranties;
+         v. a URI or hyperlink to the Licensed Material to the extent reasonably practicable;
+       B. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and
+       C. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License.
+   2. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information.
+   3. If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable.
+   4. If You Share Adapted Material You produce, the Adapter's License You apply must not prevent recipients of the Adapted Material from complying with this Public License.
+### Section 4 – Sui Generis Database Rights.
+Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material:
+a. for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database for NonCommercial purposes only;
+b. if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material; and
+c. You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database.
+For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights.
+### Section 5 – Disclaimer of Warranties and Limitation of Liability.
+a. __Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You.__
+b. __To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You.__
+c. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability.
+### Section 6 – Term and Termination.
+a. This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically.
+b. Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates:
+   1. automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or
+   2. upon express reinstatement by the Licensor.
+   For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License.
+c. For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License.
+d. Sections 1, 5, 6, 7, and 8 survive termination of this Public License.
+### Section 7 – Other Terms and Conditions.
+a. The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed.
+b. Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License.
+### Section 8 – Interpretation.
+a. For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License.
+b. To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions.
+c. No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor.
+d. Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority.
+> Creative Commons is not a party to its public licenses. Notwithstanding, Creative Commons may elect to apply one of its public licenses to material it publishes and in those instances will be considered the “Licensor.” Except for the limited purpose of indicating that material is shared under a Creative Commons public license or as otherwise permitted by the Creative Commons policies published at [creativecommons.org/policies](http://creativecommons.org/policies), Creative Commons does not authorize the use of the trademark “Creative Commons” or any other trademark or logo of Creative Commons without its prior written consent including, without limitation, in connection with any unauthorized modifications to any of its public licenses or any other arrangements, understandings, or agreements concerning use of licensed material. For the avoidance of doubt, this paragraph does not form part of the public licenses.
+>
+> Creative Commons may be contacted at creativecommons.org

app.py ADDED Viewed

	@@ -0,0 +1,39 @@

+import os
+os.system('python setup.py build develop')
+os.system('pip install --upgrade --no-cache-dir gdown')
+os.system('gdown -O output/mixtrain/ 1XQsikiNY7ILgZvmvOeUf9oPDG4fTp0zs')
+import cv2
+import pandas as pd
+import gradio as gr
+from tools.demo import TextDemo
+from maskrcnn_benchmark.config import cfg
+def infer(filepath):
+    cfg.merge_from_file('configs/mixtrain/seg_rec_poly_fuse_feature.yaml')
+    # manual override some options
+    cfg.merge_from_list(["MODEL.DEVICE", "cpu"])
+    text_demo = TextDemo(
+        cfg,
+        min_image_size=800,
+        confidence_threshold=0.7,
+        output_polygon=True
+    )
+    image = cv2.imread(filepath)
+    result_polygons, result_words = text_demo.run_on_opencv_image(image)
+    text_demo.visualization(image, result_polygons, result_words)
+    cv2.imwrite('result.jpg', image)
+    return 'result.jpg', pd.DataFrame(result_words)
+iface = gr.Interface(
+    fn=infer,
+    title="Mask TextSpotter v3",
+    description="Mask TextSpotter v3 is an end-to-end trainable scene text spotter that adopts a Segmentation Proposal Network (SPN) instead of an RPN. Mask TextSpotter v3 significantly improves robustness to rotations, aspect ratios, and shapes.",
+    inputs=[gr.inputs.Image(label="image", type="filepath")],
+    outputs=[gr.outputs.Image(), gr.outputs.Dataframe(headers=['word'])],
+    examples=['example1.jpg', 'example2.jpg', 'example3.jpg'],
+    article="<a href=\"https://github.com/MhLiao/MaskTextSpotterV3\">GitHub Repo</a>",
+).launch(enable_queue=True, cache_examples=True)

configs/mixtrain/seg_rec_poly_fuse_feature.yaml ADDED Viewed

	@@ -0,0 +1,97 @@

+MODEL:
+  META_ARCHITECTURE: "GeneralizedRCNN"
+  # WEIGHT: './output/path-to-pretrain-model' # for training
+  WEIGHT: './output/mixtrain/trained_model.pth' # for testing
+  BACKBONE:
+    CONV_BODY: "R-50-FPN"
+    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
+  RPN:
+    USE_FPN: True
+    ANCHOR_STRIDE: (4, 8, 16, 32, 64)
+    PRE_NMS_TOP_N_TRAIN: 2000
+    PRE_NMS_TOP_N_TEST: 1000
+    POST_NMS_TOP_N_TEST: 1000
+    FPN_POST_NMS_TOP_N_TEST: 1000
+  SEG:
+    USE_FPN: True
+    USE_FUSE_FEATURE: True
+    TOP_N_TRAIN: 1000
+    TOP_N_TEST: 1000
+    BINARY_THRESH: 0.1
+    BOX_THRESH: 0.1
+    MIN_SIZE: 5
+    SHRINK_RATIO: 0.4
+    EXPAND_RATIO: 3.0
+  ROI_HEADS:
+    USE_FPN: True
+    BATCH_SIZE_PER_IMAGE: 512
+  ROI_BOX_HEAD:
+    POOLER_RESOLUTION: 7
+    POOLER_SCALES: (0.25,)
+    POOLER_SAMPLING_RATIO: 2
+    FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
+    PREDICTOR: "FPNPredictor"
+    NUM_CLASSES: 2
+    USE_MASKED_FEATURE: True
+  ROI_MASK_HEAD:
+    POOLER_SCALES: (0.25,)
+    FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor"
+    PREDICTOR: "SeqCharMaskRCNNC4Predictor"
+    POOLER_RESOLUTION: 14
+    POOLER_RESOLUTION_H: 32
+    POOLER_RESOLUTION_W: 32
+    POOLER_SAMPLING_RATIO: 2
+    RESOLUTION: 28
+    RESOLUTION_H: 64
+    RESOLUTION_W: 64
+    SHARE_BOX_FEATURE_EXTRACTOR: False
+    CHAR_NUM_CLASSES: 37
+    USE_WEIGHTED_CHAR_MASK: True
+    MASK_BATCH_SIZE_PER_IM: 64
+    USE_MASKED_FEATURE: True
+  MASK_ON: True
+  CHAR_MASK_ON: True
+  SEG_ON: True
+  # TRAIN_DETECTION_ONLY: True
+SEQUENCE:
+  SEQ_ON: True
+  NUM_CHAR: 38
+  BOS_TOKEN: 0
+  MAX_LENGTH: 32
+  TEACHER_FORCE_RATIO: 1.0
+DATASETS:
+  # TRAIN: ("synthtext_train",)
+  TRAIN: ("synthtext_train","icdar_2013_train","icdar_2015_train","scut-eng-char_train","total_text_train")
+  RATIOS: [0.25,0.25,0.25,0.125,0.125]
+  # TEST: ("icdar_2015_test",)
+  TEST: ("total_text_test",)
+  # TEST: ("rotated_ic13_test_45",)
+  AUG: True
+  IGNORE_DIFFICULT: True
+  MAX_ROTATE_THETA: 90
+DATALOADER:
+  SIZE_DIVISIBILITY: 32
+  NUM_WORKERS: 4
+  ASPECT_RATIO_GROUPING: False
+SOLVER:
+  BASE_LR: 0.002 #0.02
+  WARMUP_FACTOR: 0.1
+  WEIGHT_DECAY: 0.0001
+  STEPS: (100000, 160000)
+  MAX_ITER: 300000
+  IMS_PER_BATCH: 8
+  RESUME: False
+  DISPLAY_FREQ: 20
+OUTPUT_DIR: "./output/mixtrain"
+TEST:
+  VIS: True
+  CHAR_THRESH: 192
+  IMS_PER_BATCH: 1
+INPUT:
+  MIN_SIZE_TRAIN: (800, 1000, 1200, 1400)
+  MAX_SIZE_TRAIN: 2333
+  MIN_SIZE_TEST: 1000
+  # MIN_SIZE_TEST: 1440
+  MAX_SIZE_TEST: 4000

configs/pretrain/seg_rec_poly_fuse_feature.yaml ADDED Viewed

	@@ -0,0 +1,94 @@

+MODEL:
+  META_ARCHITECTURE: "GeneralizedRCNN"
+  WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
+  BACKBONE:
+    CONV_BODY: "R-50-FPN"
+    OUT_CHANNELS: 256
+  RESNETS:
+    BACKBONE_OUT_CHANNELS: 256
+  RPN:
+    USE_FPN: True
+    ANCHOR_STRIDE: (4, 8, 16, 32, 64)
+    PRE_NMS_TOP_N_TRAIN: 2000
+    PRE_NMS_TOP_N_TEST: 1000
+    POST_NMS_TOP_N_TEST: 1000
+    FPN_POST_NMS_TOP_N_TEST: 1000
+  SEG:
+    USE_FPN: True
+    USE_FUSE_FEATURE: True
+    TOP_N_TRAIN: 1000
+    TOP_N_TEST: 1000
+    BINARY_THRESH: 0.1
+    BOX_THRESH: 0.1
+    MIN_SIZE: 5
+    SHRINK_RATIO: 0.4
+    EXPAND_RATIO: 3.0
+  ROI_HEADS:
+    USE_FPN: True
+    BATCH_SIZE_PER_IMAGE: 512
+  ROI_BOX_HEAD:
+    POOLER_RESOLUTION: 7
+    POOLER_SCALES: (0.25,)
+    POOLER_SAMPLING_RATIO: 2
+    FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
+    PREDICTOR: "FPNPredictor"
+    NUM_CLASSES: 2
+    USE_MASKED_FEATURE: True
+  ROI_MASK_HEAD:
+    POOLER_SCALES: (0.25,)
+    FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor"
+    PREDICTOR: "SeqCharMaskRCNNC4Predictor"
+    POOLER_RESOLUTION: 14
+    POOLER_RESOLUTION_H: 32
+    POOLER_RESOLUTION_W: 32
+    POOLER_SAMPLING_RATIO: 2
+    RESOLUTION: 28
+    RESOLUTION_H: 64
+    RESOLUTION_W: 64
+    SHARE_BOX_FEATURE_EXTRACTOR: False
+    CHAR_NUM_CLASSES: 37
+    USE_WEIGHTED_CHAR_MASK: True
+    MASK_BATCH_SIZE_PER_IM: 64
+    USE_MASKED_FEATURE: True
+  MASK_ON: True
+  CHAR_MASK_ON: True
+  SEG_ON: True
+SEQUENCE:
+  SEQ_ON: True
+  NUM_CHAR: 38
+  BOS_TOKEN: 0
+  MAX_LENGTH: 32
+  TEACHER_FORCE_RATIO: 1.0
+DATASETS:
+  TRAIN: ("synthtext_train",)
+  # TRAIN: ("synthtext_train","icdar_2013_train","icdar_2015_train","scut-eng-char_train","total_text_train")
+  # RATIOS: [0.25,0.25,0.25,0.125,0.125]
+  TEST: ("icdar_2015_test",)
+  # TEST: ("total_text_test",)
+  AUG: True
+  IGNORE_DIFFICULT: True
+  MAX_ROTATE_THETA: 90
+DATALOADER:
+  SIZE_DIVISIBILITY: 32
+  NUM_WORKERS: 4
+  ASPECT_RATIO_GROUPING: False
+SOLVER:
+  BASE_LR: 0.02 #0.02
+  WARMUP_FACTOR: 0.1
+  WEIGHT_DECAY: 0.0001
+  STEPS: (100000, 200000)
+  MAX_ITER: 300000
+  IMS_PER_BATCH: 8
+  RESUME: True
+  DISPLAY_FREQ: 20
+OUTPUT_DIR: "./output/pretrain"
+TEST:
+  VIS: False
+  CHAR_THRESH: 192
+  IMS_PER_BATCH: 1
+INPUT:
+  MIN_SIZE_TRAIN: (600, 800)
+  # MIN_SIZE_TRAIN: (800, 1000, 1200, 1400)
+  MAX_SIZE_TRAIN: 2333
+  MIN_SIZE_TEST: 1440
+  MAX_SIZE_TEST: 4000

evaluation/icdar2015/e2e/prepare_results.py ADDED Viewed

	@@ -0,0 +1,263 @@

+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+import sys
+import os
+sys.path.append('./')
+import shapely
+from shapely.geometry import Polygon,MultiPoint
+import numpy as np
+import editdistance
+sys.path.append('../../')
+from weighted_editdistance import weighted_edit_distance
+from tqdm import tqdm
+try:
+    import pickle
+except ImportError:
+    import cPickle as pickle
+def list_from_str(st):
+    line = st.split(',')
+    # box[0:4], polygon[4:12], word, seq_word, detection_score, rec_socre, seq_score, char_score_path
+    new_line = [float(a) for a in line[4:12]]+[float(line[-4])]+[line[-5]]+[line[-6]]+[float(line[-3])]+[float(line[-2])] + [line[-1]]
+    return new_line
+def polygon_from_list(line):
+    """
+    Create a shapely polygon object from gt or dt line.
+    """
+    polygon_points = np.array(line).reshape(4, 2)
+    polygon = Polygon(polygon_points).convex_hull
+    return polygon
+def polygon_iou(list1, list2):
+    """
+    Intersection over union between two shapely polygons.
+    """
+    polygon_points1 = np.array(list1).reshape(4, 2)
+    poly1 = Polygon(polygon_points1).convex_hull
+    polygon_points2 = np.array(list2).reshape(4, 2)
+    poly2 = Polygon(polygon_points2).convex_hull
+    union_poly = np.concatenate((polygon_points1,polygon_points2))
+    if not poly1.intersects(poly2): # this test is fast and can accelerate calculation
+        iou = 0
+    else:
+        try:
+            inter_area = poly1.intersection(poly2).area
+            #union_area = poly1.area + poly2.area - inter_area
+            union_area = MultiPoint(union_poly).convex_hull.area
+            iou = float(inter_area) / (union_area+1e-6)
+        except shapely.geos.TopologicalError:
+            print('shapely.geos.TopologicalError occured, iou set to 0')
+            iou = 0
+    return iou
+def nms(boxes,overlap):
+    rec_scores = [b[-2] for b in boxes]
+    indices = sorted(range(len(rec_scores)), key=lambda k: -rec_scores[k])
+    box_num = len(boxes)
+    nms_flag = [True]*box_num
+    for i in range(box_num):
+        ii = indices[i]
+        if not nms_flag[ii]:
+            continue
+        for j in range(box_num):
+            jj = indices[j]
+            if j == i:
+                continue
+            if not nms_flag[jj]:
+                continue
+            box1 = boxes[ii]
+            box2 = boxes[jj]
+            box1_score = rec_scores[ii]
+            box2_score = rec_scores[jj]
+            str1 = box1[9]
+            str2 = box2[9]
+            box_i = [box1[0],box1[1],box1[4],box1[5]]
+            box_j = [box2[0],box2[1],box2[4],box2[5]]
+            poly1 = polygon_from_list(box1[0:8])
+            poly2 = polygon_from_list(box2[0:8])
+            iou = polygon_iou(box1[0:8],box2[0:8])
+            thresh = overlap
+            if iou > thresh:
+                if box1_score > box2_score:
+                    nms_flag[jj] = False
+                if box1_score == box2_score and poly1.area > poly2.area:
+                    nms_flag[jj] = False
+                if box1_score == box2_score and poly1.area<=poly2.area:
+                    nms_flag[ii] = False
+                    break
+    return nms_flag
+def packing(save_dir, cache_dir, pack_name):
+    files = os.listdir(save_dir)
+    if not os.path.exists(cache_dir):
+        os.mkdir(cache_dir)
+    os.system('zip -r -q -j '+os.path.join(cache_dir, pack_name+'.zip')+' '+save_dir+'/*')
+def test_single(results_dir,lexicon_type=3,cache_dir='./cache_dir',score_det=0.5,score_rec=0.5,score_rec_seq=0.5,overlap=0.2, use_lexicon=True, weighted_ed=True, use_seq=False, use_char=False, mix=False):
+    '''
+    results_dir: result directory
+    score_det: score of detection bounding box
+    score_rec: score of the mask recognition branch
+    socre_rec_seq: score of the sequence recognition branch
+    overlap: overlap threshold used for nms
+    lexicon_type: 1 for generic; 2 for weak; 3 for strong
+    use_seq: use the recognition result of sequence branch
+    use_mix: use both the recognition result of the mask and sequence branches, selected by score
+    '''
+    print('score_det:', 'score_det:', score_det, 'score_rec:', score_rec, 'score_rec_seq:', score_rec_seq, 'lexicon_type:', lexicon_type, 'weighted_ed:', weighted_ed, 'use_seq:', use_seq, 'use_char:', use_char, 'mix:', mix)
+    if not os.path.exists(cache_dir):
+        os.mkdir(cache_dir)
+    nms_dir = os.path.join(cache_dir,str(score_det)+'_'+str(score_rec)+'_'+str(score_rec_seq))
+    if not os.path.exists(nms_dir):
+        os.mkdir(nms_dir)
+    if lexicon_type==1:
+        # generic lexicon
+        lexicon_path = '../../lexicons/ic15/GenericVocabulary_new.txt'
+        lexicon_fid=open(lexicon_path, 'r')
+        pair_list = open('../../lexicons/ic15/GenericVocabulary_pair_list.txt', 'r')
+        pairs = dict()
+        for line in pair_list.readlines():
+            line=line.strip()
+            word = line.split(' ')[0].upper()
+            word_gt = line[len(word)+1:]
+            pairs[word] = word_gt
+        lexicon_fid=open(lexicon_path, 'r')
+        lexicon=[]
+        for line in lexicon_fid.readlines():
+            line=line.strip()
+            lexicon.append(line)
+    if lexicon_type==2:
+        # weak lexicon
+        lexicon_path = '../../lexicons/ic15/ch4_test_vocabulary_new.txt'
+        lexicon_fid=open(lexicon_path, 'r')
+        pair_list = open('../../lexicons/ic15/ch4_test_vocabulary_pair_list.txt', 'r')
+        pairs = dict()
+        for line in pair_list.readlines():
+            line=line.strip()
+            word = line.split(' ')[0].upper()
+            word_gt = line[len(word)+1:]
+            pairs[word] = word_gt
+        lexicon_fid=open(lexicon_path, 'r')
+        lexicon=[]
+        for line in lexicon_fid.readlines():
+            line=line.strip()
+            lexicon.append(line)
+    for i in tqdm(range(1,501)):
+        img = 'img_'+str(i)+'.jpg'
+        gt_img = 'gt_img_'+str(i)+'.txt'
+        if lexicon_type==3:
+            # weak
+            lexicon_path = '../../lexicons/ic15/new_strong_lexicon/new_voc_img_' + str(i) + '.txt'
+            lexicon_fid=open(lexicon_path, 'r')
+            pair_list = open('../../lexicons/ic15/new_strong_lexicon/pair_voc_img_' + str(i) + '.txt', 'r')
+            pairs = dict()
+            for line in pair_list.readlines():
+                line=line.strip()
+                word = line.split(' ')[0].upper()
+                word_gt = line[len(word)+1:]
+                pairs[word] = word_gt
+            lexicon_fid=open(lexicon_path, 'r')
+            lexicon=[]
+            for line in lexicon_fid.readlines():
+                line=line.strip()
+                lexicon.append(line)
+        result_path = os.path.join(results_dir,'res_img_'+str(i)+'.txt')
+        if os.path.isfile(result_path):
+            with open(result_path,'r') as f:
+                dt_lines = [a.strip() for a in f.readlines()]
+            dt_lines = [list_from_str(dt) for dt in dt_lines]
+        else:
+            dt_lines = []
+        dt_lines = [dt for dt in dt_lines if dt[-2]>score_rec_seq and dt[-3]>score_rec and dt[-6]>score_det]
+        nms_flag = nms(dt_lines,overlap)
+        boxes = []
+        for k in range(len(dt_lines)):
+            dt = dt_lines[k]
+            if nms_flag[k]:
+                if dt not in boxes:
+                    boxes.append(dt)
+        with open(os.path.join(nms_dir,'res_img_'+str(i)+'.txt'),'w') as f:
+            for g in boxes:
+                gt_coors = [int(b) for b in g[0:8]]
+                with open('../../../' + g[-1], "rb") as input_file:
+                # with open(g[-1], "rb") as input_file:
+                    dict_scores = pickle.load(input_file)
+                if use_char and use_seq:
+                    if g[-2]>g[-3]:
+                        word = g[-5]
+                        scores = dict_scores['seq_char_scores'][:,1:-1].swapaxes(0,1)
+                    else:
+                        word = g[-4]
+                        scores = dict_scores['seg_char_scores']
+                elif use_seq:
+                    word = g[-5]
+                    scores = dict_scores['seq_char_scores'][:,1:-1].swapaxes(0,1)
+                else:
+                    word = g[-4]
+                    scores = dict_scores['seg_char_scores']
+                match_word, match_dist = find_match_word(word, lexicon, pairs, scores, use_lexicon, weighted_ed)
+                if match_dist<1.5 or lexicon_type==1:
+                    gt_coor_strs = [str(a) for a in gt_coors]+ [match_word]
+                    f.write(','.join(gt_coor_strs)+'\r\n')
+    pack_name = str(score_det)+'_'+str(score_rec)+'_over'+str(overlap)
+    packing(nms_dir,cache_dir,pack_name)
+    submit_file_path = os.path.join(cache_dir, pack_name+'.zip')
+    return submit_file_path
+def find_match_word(rec_str, lexicon, pairs, scores_numpy, use_ed = True, weighted_ed = False):
+    if not use_ed:
+        return rec_str
+    rec_str = rec_str.upper()
+    dist_min = 100
+    dist_min_pre = 100
+    match_word = ''
+    match_dist = 100
+    if not weighted_ed:
+        for word in lexicon:
+            word = word.upper()
+            ed = editdistance.eval(rec_str, word)
+            length_dist = abs(len(word) - len(rec_str))
+            # dist = ed + length_dist
+            dist = ed
+            if dist<dist_min:
+                dist_min = dist
+                match_word = pairs[word]
+                match_dist = dist
+        return match_word, match_dist
+    else:
+        small_lexicon_dict = dict()
+        for word in lexicon:
+            word = word.upper()
+            ed = editdistance.eval(rec_str, word)
+            small_lexicon_dict[word] = ed
+            dist = ed
+            if dist<dist_min_pre:
+                dist_min_pre = dist
+        small_lexicon = []
+        for word in small_lexicon_dict:
+            if small_lexicon_dict[word]<=dist_min_pre+2:
+                small_lexicon.append(word)
+        for word in small_lexicon:
+            word = word.upper()
+            ed = weighted_edit_distance(rec_str, word, scores_numpy)
+            dist = ed
+            if dist<dist_min:
+                dist_min = dist
+                match_word = pairs[word]
+                match_dist = dist
+        return match_word, match_dist
+def prepare_results_for_evaluation(results_dir, lexicon_type, cache_dir, score_det, score_rec, score_rec_seq):
+    if not os.path.isdir(cache_dir):
+        os.mkdir(cache_dir)
+    result_path = test_single(results_dir,score_det=score_det,score_rec=score_rec,score_rec_seq=score_rec_seq,overlap=0.2,cache_dir=cache_dir,lexicon_type=lexicon_type, use_lexicon=True, weighted_ed=True, use_seq=True, use_char=True, mix=True)
+    return result_path

evaluation/icdar2015/e2e/rrc_evaluation_funcs.py ADDED Viewed

	@@ -0,0 +1,369 @@

+#!/usr/bin/env python2
+#encoding: UTF-8
+import json
+import sys;sys.path.append('./')
+import zipfile
+import re
+import sys
+import os
+import codecs
+import importlib
+try:
+    from StringIO import StringIO
+except ImportError:
+    from io import StringIO
+def print_help():
+    sys.stdout.write('Usage: python %s.py -g=<gtFile> -s=<submFile> [-o=<outputFolder> -p=<jsonParams>]' %sys.argv[0])
+    sys.exit(2)
+def load_zip_file_keys(file,fileNameRegExp=''):
+    """
+    Returns an array with the entries of the ZIP file that match with the regular expression.
+    The key's are the names or the file or the capturing group definied in the fileNameRegExp
+    """
+    try:
+        archive=zipfile.ZipFile(file, mode='r', allowZip64=True)
+    except :
+        raise Exception('Error loading the ZIP archive.')
+    pairs = []
+    for name in archive.namelist():
+        addFile = True
+        keyName = name
+        if fileNameRegExp!="":
+            m = re.match(fileNameRegExp,name)
+            if m == None:
+                addFile = False
+            else:
+                if len(m.groups())>0:
+                    keyName = m.group(1)
+        if addFile:
+            pairs.append( keyName )
+    return pairs
+def load_zip_file(file,fileNameRegExp='',allEntries=False):
+    """
+    Returns an array with the contents (filtered by fileNameRegExp) of a ZIP file.
+    The key's are the names or the file or the capturing group definied in the fileNameRegExp
+    allEntries validates that all entries in the ZIP file pass the fileNameRegExp
+    """
+    try:
+        archive=zipfile.ZipFile(file, mode='r', allowZip64=True)
+    except :
+        raise Exception('Error loading the ZIP archive')
+    pairs = []
+    for name in archive.namelist():
+        addFile = True
+        keyName = name
+        if fileNameRegExp!="":
+            m = re.match(fileNameRegExp,name)
+            if m == None:
+                addFile = False
+            else:
+                if len(m.groups())>0:
+                    keyName = m.group(1)
+        if addFile:
+            pairs.append( [ keyName , archive.read(name)] )
+        else:
+            if allEntries:
+                raise Exception('ZIP entry not valid: %s' %name)
+    return dict(pairs)
+def decode_utf8(raw):
+    """
+    Returns a Unicode object on success, or None on failure
+    """
+    try:
+        raw = codecs.decode(raw,'utf-8', 'replace')
+        #extracts BOM if exists
+        raw = raw.encode('utf8')
+        if raw.startswith(codecs.BOM_UTF8):
+            raw = raw.replace(codecs.BOM_UTF8, '', 1)
+        return raw.decode('utf-8')
+    except:
+       return None
+def validate_lines_in_file(fileName,file_contents,CRLF=True,LTRB=True,withTranscription=False,withConfidence=False,imWidth=0,imHeight=0):
+    """
+    This function validates that all lines of the file calling the Line validation function for each line
+    """
+    utf8File = decode_utf8(file_contents)
+    if (utf8File is None) :
+        raise Exception("The file %s is not UTF-8" %fileName)
+    lines = utf8File.split( "\r\n" if CRLF else "\n" )
+    for line in lines:
+        line = line.replace("\r","").replace("\n","")
+        if(line != ""):
+            try:
+                validate_tl_line(line,LTRB,withTranscription,withConfidence,imWidth,imHeight)
+            except Exception as e:
+                raise Exception(("Line in sample not valid. Sample: %s Line: %s Error: %s" %(fileName,line,str(e))).encode('utf-8', 'replace'))
+def validate_tl_line(line,LTRB=True,withTranscription=True,withConfidence=True,imWidth=0,imHeight=0):
+    """
+    Validate the format of the line. If the line is not valid an exception will be raised.
+    If maxWidth and maxHeight are specified, all points must be inside the imgage bounds.
+    Posible values are:
+    LTRB=True: xmin,ymin,xmax,ymax[,confidence][,transcription]
+    LTRB=False: x1,y1,x2,y2,x3,y3,x4,y4[,confidence][,transcription]
+    """
+    get_tl_line_values(line,LTRB,withTranscription,withConfidence,imWidth,imHeight)
+def get_tl_line_values(line,LTRB=True,withTranscription=False,withConfidence=False,imWidth=0,imHeight=0):
+    """
+    Validate the format of the line. If the line is not valid an exception will be raised.
+    If maxWidth and maxHeight are specified, all points must be inside the imgage bounds.
+    Posible values are:
+    LTRB=True: xmin,ymin,xmax,ymax[,confidence][,transcription]
+    LTRB=False: x1,y1,x2,y2,x3,y3,x4,y4[,confidence][,transcription]
+    Returns values from a textline. Points , [Confidences], [Transcriptions]
+    """
+    confidence = 0.0
+    transcription = "";
+    points = []
+    numPoints = 4;
+    if LTRB:
+        numPoints = 4;
+        if withTranscription and withConfidence:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-1].?[0-9]*)\s*,(.*)$',line)
+            if m == None :
+                m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-1].?[0-9]*)\s*,(.*)$',line)
+                raise Exception("Format incorrect. Should be: xmin,ymin,xmax,ymax,confidence,transcription")
+        elif withConfidence:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-1].?[0-9]*)\s*$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: xmin,ymin,xmax,ymax,confidence")
+        elif withTranscription:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,(.*)$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: xmin,ymin,xmax,ymax,transcription")
+        else:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,?\s*$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: xmin,ymin,xmax,ymax")
+        xmin = int(m.group(1))
+        ymin = int(m.group(2))
+        xmax = int(m.group(3))
+        ymax = int(m.group(4))
+        if(xmax<xmin):
+                raise Exception("Xmax value (%s) not valid (Xmax < Xmin)." %(xmax))
+        if(ymax<ymin):
+                raise Exception("Ymax value (%s)  not valid (Ymax < Ymin)." %(ymax))
+        points = [ float(m.group(i)) for i in range(1, (numPoints+1) ) ]
+        if (imWidth>0 and imHeight>0):
+            validate_point_inside_bounds(xmin,ymin,imWidth,imHeight);
+            validate_point_inside_bounds(xmax,ymax,imWidth,imHeight);
+    else:
+        numPoints = 8;
+        if withTranscription and withConfidence:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-1].?[0-9]*)\s*,(.*)$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4,confidence,transcription")
+        elif withConfidence:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-1].?[0-9]*)\s*$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4,confidence")
+        elif withTranscription:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,(.*)$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4,transcription")
+        else:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4")
+        points = [ float(m.group(i)) for i in range(1, (numPoints+1) ) ]
+        validate_clockwise_points(points)
+        if (imWidth>0 and imHeight>0):
+            validate_point_inside_bounds(points[0],points[1],imWidth,imHeight);
+            validate_point_inside_bounds(points[2],points[3],imWidth,imHeight);
+            validate_point_inside_bounds(points[4],points[5],imWidth,imHeight);
+            validate_point_inside_bounds(points[6],points[7],imWidth,imHeight);
+    if withConfidence:
+        try:
+            confidence = float(m.group(numPoints+1))
+        except ValueError:
+            raise Exception("Confidence value must be a float")
+    if withTranscription:
+        posTranscription = numPoints + (2 if withConfidence else 1)
+        transcription = m.group(posTranscription)
+        m2 = re.match(r'^\s*\"(.*)\"\s*$',transcription)
+        if m2 != None : #Transcription with double quotes, we extract the value and replace escaped characters
+            transcription = m2.group(1).replace("\\\\", "\\").replace("\\\"", "\"")
+    return points,confidence,transcription
+def validate_point_inside_bounds(x,y,imWidth,imHeight):
+    if(x<0 or x>imWidth):
+            raise Exception("X value (%s) not valid. Image dimensions: (%s,%s)" %(xmin,imWidth,imHeight))
+    if(y<0 or y>imHeight):
+            raise Exception("Y value (%s)  not valid. Image dimensions: (%s,%s) Sample: %s Line:%s" %(ymin,imWidth,imHeight))
+def validate_clockwise_points(points):
+    """
+    Validates that the points that the 4 points that dlimite a polygon are in clockwise order.
+    """
+    if len(points) != 8:
+        raise Exception("Points list not valid." + str(len(points)))
+    point = [
+                [int(points[0]) , int(points[1])],
+                [int(points[2]) , int(points[3])],
+                [int(points[4]) , int(points[5])],
+                [int(points[6]) , int(points[7])]
+            ]
+    edge = [
+                ( point[1][0] - point[0][0])*( point[1][1] + point[0][1]),
+                ( point[2][0] - point[1][0])*( point[2][1] + point[1][1]),
+                ( point[3][0] - point[2][0])*( point[3][1] + point[2][1]),
+                ( point[0][0] - point[3][0])*( point[0][1] + point[3][1])
+    ]
+    summatory = edge[0] + edge[1] + edge[2] + edge[3];
+    if summatory>0:
+        raise Exception("Points are not clockwise. The coordinates of bounding quadrilaterals have to be given in clockwise order. Regarding the correct interpretation of 'clockwise' remember that the image coordinate system used is the standard one, with the image origin at the upper left, the X axis extending to the right and Y axis extending downwards.")
+def get_tl_line_values_from_file_contents(content,CRLF=True,LTRB=True,withTranscription=False,withConfidence=False,imWidth=0,imHeight=0,sort_by_confidences=True):
+    """
+    Returns all points, confindences and transcriptions of a file in lists. Valid line formats:
+    xmin,ymin,xmax,ymax,[confidence],[transcription]
+    x1,y1,x2,y2,x3,y3,x4,y4,[confidence],[transcription]
+    """
+    pointsList = []
+    transcriptionsList = []
+    confidencesList = []
+    lines = content.split( "\r\n" if CRLF else "\n" )
+    for line in lines:
+        line = line.replace("\r","").replace("\n","")
+        if(line != "") :
+            points, confidence, transcription = get_tl_line_values(line,LTRB,withTranscription,withConfidence,imWidth,imHeight);
+            pointsList.append(points)
+            transcriptionsList.append(transcription)
+            confidencesList.append(confidence)
+    if withConfidence and len(confidencesList)>0 and sort_by_confidences:
+        import numpy as np
+        sorted_ind = np.argsort(-np.array(confidencesList))
+        confidencesList = [confidencesList[i] for i in sorted_ind]
+        pointsList = [pointsList[i] for i in sorted_ind]
+        transcriptionsList = [transcriptionsList[i] for i in sorted_ind]
+    return pointsList,confidencesList,transcriptionsList
+def main_evaluation(p,default_evaluation_params_fn,validate_data_fn,evaluate_method_fn,show_result=True,per_sample=True):
+    """
+    This process validates a method, evaluates it and if it succed generates a ZIP file with a JSON entry for each sample.
+    Params:
+    p: Dictionary of parmeters with the GT/submission locations. If None is passed, the parameters send by the system are used.
+    default_evaluation_params_fn: points to a function that returns a dictionary with the default parameters used for the evaluation
+    validate_data_fn: points to a method that validates the corrct format of the submission
+    evaluate_method_fn: points to a function that evaluated the submission and return a Dictionary with the results
+    """
+    if (p == None):
+        p = dict([s[1:].split('=') for s in sys.argv[1:]])
+        if(len(sys.argv)<3):
+            print_help()
+    evalParams = default_evaluation_params_fn()
+    if 'p' in p.keys():
+        evalParams.update( p['p'] if isinstance(p['p'], dict) else json.loads(p['p'][1:-1]) )
+    resDict={'calculated':True,'Message':'','method':'{}','per_sample':'{}'}
+    try:
+        validate_data_fn(p['g'], p['s'], evalParams)
+        evalData = evaluate_method_fn(p['g'], p['s'], evalParams)
+        resDict.update(evalData)
+    except Exception as e:
+        resDict['Message']= str(e)
+        resDict['calculated']=False
+    if 'o' in p:
+        if not os.path.exists(p['o']):
+            os.makedirs(p['o'])
+        resultsOutputname = p['o'] + '/results.zip'
+        outZip = zipfile.ZipFile(resultsOutputname, mode='w', allowZip64=True)
+        del resDict['per_sample']
+        if 'output_items' in resDict.keys():
+            del resDict['output_items']
+        outZip.writestr('method.json',json.dumps(resDict))
+    if not resDict['calculated']:
+        if show_result:
+            sys.stderr.write('Error!\n'+ resDict['Message']+'\n\n')
+        if 'o' in p:
+            outZip.close()
+        return resDict
+    if 'o' in p:
+        if per_sample == True:
+            for k,v in evalData['per_sample'].items():
+                outZip.writestr( k + '.json',json.dumps(v))
+            if 'output_items' in evalData.keys():
+                for k, v in evalData['output_items'].items():
+                    outZip.writestr( k,v)
+        outZip.close()
+    if show_result:
+        sys.stdout.write("Calculated!")
+        sys.stdout.write(json.dumps(resDict['method']))
+    return resDict
+def main_validation(default_evaluation_params_fn,validate_data_fn):
+    """
+    This process validates a method
+    Params:
+    default_evaluation_params_fn: points to a function that returns a dictionary with the default parameters used for the evaluation
+    validate_data_fn: points to a method that validates the corrct format of the submission
+    """
+    try:
+        p = dict([s[1:].split('=') for s in sys.argv[1:]])
+        evalParams = default_evaluation_params_fn()
+        if 'p' in p.keys():
+            evalParams.update( p['p'] if isinstance(p['p'], dict) else json.loads(p['p'][1:-1]) )
+        validate_data_fn(p['g'], p['s'], evalParams)
+        print('SUCCESS')
+        sys.exit(0)
+    except Exception as e:
+        print(str(e))
+        sys.exit(101)

evaluation/icdar2015/e2e/script.py ADDED Viewed

	@@ -0,0 +1,461 @@

+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# encoding=utf8
+from collections import namedtuple
+import rrc_evaluation_funcs
+import importlib
+from prepare_results import prepare_results_for_evaluation
+def evaluation_imports():
+    """
+    evaluation_imports: Dictionary ( key = module name , value = alias  )  with python modules used in the evaluation.
+    """
+    return {
+            'Polygon':'plg',
+            'numpy':'np'
+            }
+def default_evaluation_params():
+    """
+    default_evaluation_params: Default parameters to use for the validation and evaluation.
+    """
+    return {
+            'IOU_CONSTRAINT' :0.5,
+            'AREA_PRECISION_CONSTRAINT' :0.5,
+            'WORD_SPOTTING' :False,
+            'MIN_LENGTH_CARE_WORD' :3,
+            'GT_SAMPLE_NAME_2_ID':'gt_img_([0-9]+).txt',
+            'DET_SAMPLE_NAME_2_ID':'res_img_([0-9]+).txt',
+            'LTRB':False, #LTRB:2points(left,top,right,bottom) or 4 points(x1,y1,x2,y2,x3,y3,x4,y4)
+            'CRLF':False, # Lines are delimited by Windows CRLF format
+            'CONFIDENCES':False, #Detections must include confidence value. MAP and MAR will be calculated,
+            'SPECIAL_CHARACTERS':'!?.:,*"()·[]/\'',
+            'ONLY_REMOVE_FIRST_LAST_CHARACTER' : True
+        }
+def validate_data(gtFilePath, submFilePath, evaluationParams):
+    """
+    Method validate_data: validates that all files in the results folder are correct (have the correct name contents).
+                            Validates also that there are no missing files in the folder.
+                            If some error detected, the method raises the error
+    """
+    gt = rrc_evaluation_funcs.load_zip_file(gtFilePath, evaluationParams['GT_SAMPLE_NAME_2_ID'])
+    subm = rrc_evaluation_funcs.load_zip_file(submFilePath, evaluationParams['DET_SAMPLE_NAME_2_ID'], True)
+    #Validate format of GroundTruth
+    for k in gt:
+        rrc_evaluation_funcs.validate_lines_in_file(k,gt[k],evaluationParams['CRLF'],evaluationParams['LTRB'],True)
+    #Validate format of results
+    for k in subm:
+        if (k in gt) == False :
+            raise Exception("The sample %s not present in GT" %k)
+        rrc_evaluation_funcs.validate_lines_in_file(k,subm[k],evaluationParams['CRLF'],evaluationParams['LTRB'],True,evaluationParams['CONFIDENCES'])
+def evaluate_method(gtFilePath, submFilePath, evaluationParams):
+    """
+    Method evaluate_method: evaluate method and returns the results
+        Results. Dictionary with the following values:
+        - method (required)  Global method metrics. Ex: { 'Precision':0.8,'Recall':0.9 }
+        - samples (optional) Per sample metrics. Ex: {'sample1' : { 'Precision':0.8,'Recall':0.9 } , 'sample2' : { 'Precision':0.8,'Recall':0.9 }
+    """
+    for module,alias in evaluation_imports().items():
+        globals()[alias] = importlib.import_module(module)
+    def polygon_from_points(points,correctOffset=False):
+        """
+        Returns a Polygon object to use with the Polygon2 class from a list of 8 points: x1,y1,x2,y2,x3,y3,x4,y4
+        """
+        if correctOffset: #this will substract 1 from the coordinates that correspond to the xmax and ymax
+            points[2] -= 1
+            points[4] -= 1
+            points[5] -= 1
+            points[7] -= 1
+        resBoxes=np.empty([1,8],dtype='int32')
+        resBoxes[0,0]=int(points[0])
+        resBoxes[0,4]=int(points[1])
+        resBoxes[0,1]=int(points[2])
+        resBoxes[0,5]=int(points[3])
+        resBoxes[0,2]=int(points[4])
+        resBoxes[0,6]=int(points[5])
+        resBoxes[0,3]=int(points[6])
+        resBoxes[0,7]=int(points[7])
+        pointMat = resBoxes[0].reshape([2,4]).T
+        return plg.Polygon( pointMat)
+    def rectangle_to_polygon(rect):
+        resBoxes=np.empty([1,8],dtype='int32')
+        resBoxes[0,0]=int(rect.xmin)
+        resBoxes[0,4]=int(rect.ymax)
+        resBoxes[0,1]=int(rect.xmin)
+        resBoxes[0,5]=int(rect.ymin)
+        resBoxes[0,2]=int(rect.xmax)
+        resBoxes[0,6]=int(rect.ymin)
+        resBoxes[0,3]=int(rect.xmax)
+        resBoxes[0,7]=int(rect.ymax)
+        pointMat = resBoxes[0].reshape([2,4]).T
+        return plg.Polygon( pointMat)
+    def rectangle_to_points(rect):
+        points = [int(rect.xmin), int(rect.ymax), int(rect.xmax), int(rect.ymax), int(rect.xmax), int(rect.ymin), int(rect.xmin), int(rect.ymin)]
+        return points
+    def get_union(pD,pG):
+        areaA = pD.area();
+        areaB = pG.area();
+        return areaA + areaB - get_intersection(pD, pG);
+    def get_intersection_over_union(pD,pG):
+        try:
+            return get_intersection(pD, pG) / get_union(pD, pG);
+        except:
+            return 0
+    def get_intersection(pD,pG):
+        pInt = pD & pG
+        if len(pInt) == 0:
+            return 0
+        return pInt.area()
+    def compute_ap(confList, matchList,numGtCare):
+        correct = 0
+        AP = 0
+        if len(confList)>0:
+            confList = np.array(confList)
+            matchList = np.array(matchList)
+            sorted_ind = np.argsort(-confList)
+            confList = confList[sorted_ind]
+            matchList = matchList[sorted_ind]
+            for n in range(len(confList)):
+                match = matchList[n]
+                if match:
+                    correct += 1
+                    AP += float(correct)/(n + 1)
+            if numGtCare>0:
+                AP /= numGtCare
+        return AP
+    def transcription_match(transGt,transDet,specialCharacters='!?.:,*"()·[]/\'',onlyRemoveFirstLastCharacterGT=True):
+        if onlyRemoveFirstLastCharacterGT:
+            #special characters in GT are allowed only at initial or final position
+            if (transGt==transDet):
+                return True
+            if specialCharacters.find(transGt[0])>-1:
+                if transGt[1:]==transDet:
+                    return True
+            if specialCharacters.find(transGt[-1])>-1:
+                if transGt[0:len(transGt)-1]==transDet:
+                    return True
+            if specialCharacters.find(transGt[0])>-1 and specialCharacters.find(transGt[-1])>-1:
+                if transGt[1:len(transGt)-1]==transDet:
+                    return True
+            return False
+        else:
+            #Special characters are removed from the begining and the end of both Detection and GroundTruth
+            while len(transGt)>0 and specialCharacters.find(transGt[0])>-1:
+                transGt = transGt[1:]
+            while len(transDet)>0 and specialCharacters.find(transDet[0])>-1:
+                transDet = transDet[1:]
+            while len(transGt)>0 and specialCharacters.find(transGt[-1])>-1 :
+                transGt = transGt[0:len(transGt)-1]
+            while len(transDet)>0 and specialCharacters.find(transDet[-1])>-1:
+                transDet = transDet[0:len(transDet)-1]
+            return transGt == transDet
+    def include_in_dictionary(transcription):
+        """
+        Function used in Word Spotting that finds if the Ground Truth transcription meets the rules to enter into the dictionary. If not, the transcription will be cared as don't care
+        """
+        #special case 's at final
+        if transcription[len(transcription)-2:]=="'s" or transcription[len(transcription)-2:]=="'S":
+            transcription = transcription[0:len(transcription)-2]
+        #hypens at init or final of the word
+        transcription = transcription.strip('-');
+        specialCharacters = "'!?.:,*\"()·[]/";
+        for character in specialCharacters:
+            transcription = transcription.replace(character,' ')
+        transcription = transcription.strip()
+        if len(transcription) != len(transcription.replace(" ","")) :
+            return False;
+        if len(transcription) < evaluationParams['MIN_LENGTH_CARE_WORD']:
+            return False;
+        notAllowed = "×÷·";
+        range1 = [ ord(u'a'), ord(u'z') ]
+        range2 = [ ord(u'A'), ord(u'Z') ]
+        range3 = [ ord(u'À'), ord(u'ƿ') ]
+        range4 = [ ord(u'Ǆ'), ord(u'ɿ') ]
+        range5 = [ ord(u'Ά'), ord(u'Ͽ') ]
+        range6 = [ ord(u'-'), ord(u'-') ]
+        for char in transcription :
+            charCode = ord(char)
+            if(notAllowed.find(char) != -1):
+                return False
+            valid = ( charCode>=range1[0] and charCode<=range1[1] ) or ( charCode>=range2[0] and charCode<=range2[1] ) or ( charCode>=range3[0] and charCode<=range3[1] ) or ( charCode>=range4[0] and charCode<=range4[1] ) or ( charCode>=range5[0] and charCode<=range5[1] ) or ( charCode>=range6[0] and charCode<=range6[1] )
+            if valid == False:
+                return False
+        return True
+    def include_in_dictionary_transcription(transcription):
+        """
+        Function applied to the Ground Truth transcriptions used in Word Spotting. It removes special characters or terminations
+        """
+        #special case 's at final
+        if transcription[len(transcription)-2:]=="'s" or transcription[len(transcription)-2:]=="'S":
+            transcription = transcription[0:len(transcription)-2]
+        #hypens at init or final of the word
+        transcription = transcription.strip('-');
+        specialCharacters = "'!?.:,*\"()·[]/";
+        for character in specialCharacters:
+            transcription = transcription.replace(character,' ')
+        transcription = transcription.strip()
+        return transcription
+    perSampleMetrics = {}
+    matchedSum = 0
+    Rectangle = namedtuple('Rectangle', 'xmin ymin xmax ymax')
+    gt = rrc_evaluation_funcs.load_zip_file(gtFilePath,evaluationParams['GT_SAMPLE_NAME_2_ID'])
+    subm = rrc_evaluation_funcs.load_zip_file(submFilePath,evaluationParams['DET_SAMPLE_NAME_2_ID'],True)
+    numGlobalCareGt = 0;
+    numGlobalCareDet = 0;
+    arrGlobalConfidences = [];
+    arrGlobalMatches = [];
+    for resFile in gt:
+        gtFile = rrc_evaluation_funcs.decode_utf8(gt[resFile])
+        if (gtFile is None) :
+            raise Exception("The file %s is not UTF-8" %resFile)
+        recall = 0
+        precision = 0
+        hmean = 0
+        detCorrect = 0
+        iouMat = np.empty([1,1])
+        gtPols = []
+        detPols = []
+        gtTrans = []
+        detTrans = []
+        gtPolPoints = []
+        detPolPoints = []
+        gtDontCarePolsNum = [] #Array of Ground Truth Polygons' keys marked as don't Care
+        detDontCarePolsNum = [] #Array of Detected Polygons' matched with a don't Care GT
+        detMatchedNums = []
+        pairs = []
+        arrSampleConfidences = [];
+        arrSampleMatch = [];
+        sampleAP = 0;
+        evaluationLog = ""
+        pointsList,_,transcriptionsList = rrc_evaluation_funcs.get_tl_line_values_from_file_contents(gtFile,evaluationParams['CRLF'],evaluationParams['LTRB'],True,False)
+        for n in range(len(pointsList)):
+            points = pointsList[n]
+            transcription = transcriptionsList[n]
+            dontCare = transcription == "###"
+            if evaluationParams['LTRB']:
+                gtRect = Rectangle(*points)
+                gtPol = rectangle_to_polygon(gtRect)
+            else:
+                gtPol = polygon_from_points(points)
+            gtPols.append(gtPol)
+            gtPolPoints.append(points)
+            #On word spotting we will filter some transcriptions with special characters
+            if evaluationParams['WORD_SPOTTING'] :
+                if dontCare == False :
+                    if include_in_dictionary(transcription) == False :
+                        dontCare = True
+                    else:
+                        transcription = include_in_dictionary_transcription(transcription)
+            gtTrans.append(transcription)
+            if dontCare:
+                gtDontCarePolsNum.append( len(gtPols)-1 )
+        evaluationLog += "GT polygons: " + str(len(gtPols)) + (" (" + str(len(gtDontCarePolsNum)) + " don't care)\n" if len(gtDontCarePolsNum)>0 else "\n")
+        if resFile in subm:
+            detFile = rrc_evaluation_funcs.decode_utf8(subm[resFile])
+            pointsList,confidencesList,transcriptionsList = rrc_evaluation_funcs.get_tl_line_values_from_file_contents(detFile,evaluationParams['CRLF'],evaluationParams['LTRB'],True,evaluationParams['CONFIDENCES'])
+            for n in range(len(pointsList)):
+                points = pointsList[n]
+                transcription = transcriptionsList[n]
+                if evaluationParams['LTRB']:
+                    detRect = Rectangle(*points)
+                    detPol = rectangle_to_polygon(detRect)
+                else:
+                    detPol = polygon_from_points(points)
+                detPols.append(detPol)
+                detPolPoints.append(points)
+                detTrans.append(transcription)
+                if len(gtDontCarePolsNum)>0 :
+                    for dontCarePol in gtDontCarePolsNum:
+                        dontCarePol = gtPols[dontCarePol]
+                        intersected_area = get_intersection(dontCarePol,detPol)
+                        pdDimensions = detPol.area()
+                        precision = 0 if pdDimensions == 0 else intersected_area / pdDimensions
+                        if (precision > evaluationParams['AREA_PRECISION_CONSTRAINT'] ):
+                            detDontCarePolsNum.append( len(detPols)-1 )
+                            break
+            evaluationLog += "DET polygons: " + str(len(detPols)) + (" (" + str(len(detDontCarePolsNum)) + " don't care)\n" if len(detDontCarePolsNum)>0 else "\n")
+            if len(gtPols)>0 and len(detPols)>0:
+                #Calculate IoU and precision matrixs
+                outputShape=[len(gtPols),len(detPols)]
+                iouMat = np.empty(outputShape)
+                gtRectMat = np.zeros(len(gtPols),np.int8)
+                detRectMat = np.zeros(len(detPols),np.int8)
+                for gtNum in range(len(gtPols)):
+                    for detNum in range(len(detPols)):
+                        pG = gtPols[gtNum]
+                        pD = detPols[detNum]
+                        iouMat[gtNum,detNum] = get_intersection_over_union(pD,pG)
+                for gtNum in range(len(gtPols)):
+                    for detNum in range(len(detPols)):
+                        if gtRectMat[gtNum] == 0 and detRectMat[detNum] == 0 and gtNum not in gtDontCarePolsNum and detNum not in detDontCarePolsNum :
+                            if iouMat[gtNum,detNum]>evaluationParams['IOU_CONSTRAINT']:
+                                gtRectMat[gtNum] = 1
+                                detRectMat[detNum] = 1
+                                #detection matched only if transcription is equal
+                                if evaluationParams['WORD_SPOTTING']:
+                                    correct = gtTrans[gtNum].upper() == detTrans[detNum].upper()
+                                else:
+                                    correct = transcription_match(gtTrans[gtNum].upper(),detTrans[detNum].upper(),evaluationParams['SPECIAL_CHARACTERS'],evaluationParams['ONLY_REMOVE_FIRST_LAST_CHARACTER'])==True
+                                detCorrect += (1 if correct else 0)
+                                if correct:
+                                    detMatchedNums.append(detNum)
+                                pairs.append({'gt':gtNum,'det':detNum,'correct':correct})
+                                evaluationLog += "Match GT #" + str(gtNum) + " with Det #" + str(detNum) + " trans. correct: " + str(correct) + "\n"
+            if evaluationParams['CONFIDENCES']:
+                for detNum in range(len(detPols)):
+                    if detNum not in detDontCarePolsNum :
+                        #we exclude the don't care detections
+                        match = detNum in detMatchedNums
+                        arrSampleConfidences.append(confidencesList[detNum])
+                        arrSampleMatch.append(match)
+                        arrGlobalConfidences.append(confidencesList[detNum]);
+                        arrGlobalMatches.append(match);
+        numGtCare = (len(gtPols) - len(gtDontCarePolsNum))
+        numDetCare = (len(detPols) - len(detDontCarePolsNum))
+        if numGtCare == 0:
+            recall = float(1)
+            precision = float(0) if numDetCare >0 else float(1)
+            sampleAP = precision
+        else:
+            recall = float(detCorrect) / numGtCare
+            precision = 0 if numDetCare==0 else float(detCorrect) / numDetCare
+            if evaluationParams['CONFIDENCES']:
+                sampleAP = compute_ap(arrSampleConfidences, arrSampleMatch, numGtCare )
+        hmean = 0 if (precision + recall)==0 else 2.0 * precision * recall / (precision + recall)
+        matchedSum += detCorrect
+        numGlobalCareGt += numGtCare
+        numGlobalCareDet += numDetCare
+        perSampleMetrics[resFile] = {
+                                        'precision':precision,
+                                        'recall':recall,
+                                        'hmean':hmean,
+                                        'pairs':pairs,
+                                        'AP':sampleAP,
+                                        'iouMat':[] if len(detPols)>100 else iouMat.tolist(),
+                                        'gtPolPoints':gtPolPoints,
+                                        'detPolPoints':detPolPoints,
+                                        'gtTrans':gtTrans,
+                                        'detTrans':detTrans,
+                                        'gtDontCare':gtDontCarePolsNum,
+                                        'detDontCare':detDontCarePolsNum,
+                                        'evaluationParams': evaluationParams,
+                                        'evaluationLog': evaluationLog
+                                    }
+    # Compute AP
+    AP = 0
+    if evaluationParams['CONFIDENCES']:
+        AP = compute_ap(arrGlobalConfidences, arrGlobalMatches, numGlobalCareGt)
+    methodRecall = 0 if numGlobalCareGt == 0 else float(matchedSum)/numGlobalCareGt
+    methodPrecision = 0 if numGlobalCareDet == 0 else float(matchedSum)/numGlobalCareDet
+    methodHmean = 0 if methodRecall + methodPrecision==0 else 2* methodRecall * methodPrecision / (methodRecall + methodPrecision)
+    methodMetrics = {'precision':methodPrecision, 'recall':methodRecall,'hmean': methodHmean, 'AP': AP  }
+    resDict = {'calculated':True,'Message':'','method': methodMetrics,'per_sample': perSampleMetrics}
+    return resDict;
+if __name__=='__main__':
+    '''
+    results_dir: result directory
+    score_det: score of detection bounding box
+    score_rec: score of the mask recognition branch
+    score_rec_seq: score of the sequence recognition branch
+    lexicon_type: 1 for generic; 2 for weak; 3 for strong
+    '''
+    results_dir = '../../../output/mixtrain/inference/icdar_2015_test/model_0250000_1440_results/'
+    lexicon_type = 3
+    score_det = 0.01
+    score_rec = 0.4
+    # score_rec_seq set to 0.7 for lexicon_type 3 or 2; 0.8 for lexicon_type 1
+    score_rec_seq = 0.7
+    evaluate_result_path = prepare_results_for_evaluation(results_dir,
+        lexicon_type=lexicon_type, cache_dir='./cache_files',
+        score_det=score_det, score_rec=score_rec, score_rec_seq=score_rec_seq)
+    p = {
+        'g': "../gt.zip",
+        's': evaluate_result_path
+    }
+    rrc_evaluation_funcs.main_evaluation(p,default_evaluation_params,validate_data,evaluate_method)

evaluation/icdar2015/gt.zip ADDED Viewed

Binary file (250 kB). View file

evaluation/rotated_icdar2013/e2e/prepare_results.py ADDED Viewed

	@@ -0,0 +1,267 @@

+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+import sys
+import os
+sys.path.append('./')
+import shapely
+from shapely.geometry import Polygon,MultiPoint
+import numpy as np
+import editdistance
+sys.path.append('../../')
+from weighted_editdistance import weighted_edit_distance
+from tqdm import tqdm
+try:
+    import pickle
+except ImportError:
+    import cPickle as pickle
+def list_from_str(st):
+    line = st.split(',')
+    # box[0:4], polygon[4:12], word, seq_word, detection_score, rec_socre, seq_score, char_score_path
+    new_line = [float(a) for a in line[4:12]]+[float(line[-4])]+[line[-5]]+[line[-6]]+[float(line[-3])]+[float(line[-2])] + [line[-1]]
+    return new_line
+def polygon_from_list(line):
+    """
+    Create a shapely polygon object from gt or dt line.
+    """
+    polygon_points = np.array(line).reshape(4, 2)
+    polygon = Polygon(polygon_points).convex_hull
+    return polygon
+def polygon_iou(list1, list2):
+    """
+    Intersection over union between two shapely polygons.
+    """
+    polygon_points1 = np.array(list1).reshape(4, 2)
+    poly1 = Polygon(polygon_points1).convex_hull
+    polygon_points2 = np.array(list2).reshape(4, 2)
+    poly2 = Polygon(polygon_points2).convex_hull
+    union_poly = np.concatenate((polygon_points1,polygon_points2))
+    if not poly1.intersects(poly2): # this test is fast and can accelerate calculation
+        iou = 0
+    else:
+        try:
+            inter_area = poly1.intersection(poly2).area
+            #union_area = poly1.area + poly2.area - inter_area
+            union_area = MultiPoint(union_poly).convex_hull.area
+            iou = float(inter_area) / (union_area+1e-6)
+        except shapely.geos.TopologicalError:
+            print('shapely.geos.TopologicalError occured, iou set to 0')
+            iou = 0
+    return iou
+def nms(boxes,overlap):
+    rec_scores = [b[-2] for b in boxes]
+    indices = sorted(range(len(rec_scores)), key=lambda k: -rec_scores[k])
+    box_num = len(boxes)
+    nms_flag = [True]*box_num
+    for i in range(box_num):
+        ii = indices[i]
+        if not nms_flag[ii]:
+            continue
+        for j in range(box_num):
+            jj = indices[j]
+            if j == i:
+                continue
+            if not nms_flag[jj]:
+                continue
+            box1 = boxes[ii]
+            box2 = boxes[jj]
+            box1_score = rec_scores[ii]
+            box2_score = rec_scores[jj]
+            str1 = box1[9]
+            str2 = box2[9]
+            box_i = [box1[0],box1[1],box1[4],box1[5]]
+            box_j = [box2[0],box2[1],box2[4],box2[5]]
+            poly1 = polygon_from_list(box1[0:8])
+            poly2 = polygon_from_list(box2[0:8])
+            iou = polygon_iou(box1[0:8],box2[0:8])
+            thresh = overlap
+            if iou > thresh:
+                if box1_score > box2_score:
+                    nms_flag[jj] = False
+                if box1_score == box2_score and poly1.area > poly2.area:
+                    nms_flag[jj] = False
+                if box1_score == box2_score and poly1.area<=poly2.area:
+                    nms_flag[ii] = False
+                    break
+    return nms_flag
+def packing(save_dir, cache_dir, pack_name):
+    files = os.listdir(save_dir)
+    if not os.path.exists(cache_dir):
+        os.mkdir(cache_dir)
+    os.system('zip -r -q -j '+os.path.join(cache_dir, pack_name+'.zip')+' '+save_dir+'/*')
+def test_single(results_dir,lexicon_type=3,cache_dir='./cache_dir',score_det=0.5,score_rec=0.5,score_rec_seq=0.5,overlap=0.2, use_lexicon=True, weighted_ed=True, use_seq=False, use_char=False, mix=False):
+    '''
+    results_dir: result directory
+    score_det: score of detection bounding box
+    score_rec: score of the mask recognition branch
+    socre_rec_seq: score of the sequence recognition branch
+    overlap: overlap threshold used for nms
+    lexicon_type: 1 for generic; 2 for weak; 3 for strong
+    use_seq: use the recognition result of sequence branch
+    use_mix: use both the recognition result of the mask and sequence branches, selected by score
+    '''
+    print('score_det:', 'score_det:', score_det, 'score_rec:', score_rec, 'score_rec_seq:', score_rec_seq, 'lexicon_type:', lexicon_type, 'weighted_ed:', weighted_ed, 'use_seq:', use_seq, 'use_char:', use_char, 'mix:', mix)
+    if not os.path.exists(cache_dir):
+        os.mkdir(cache_dir)
+    nms_dir = os.path.join(cache_dir,str(score_det)+'_'+str(score_rec)+'_'+str(score_rec_seq))
+    if not os.path.exists(nms_dir):
+        os.mkdir(nms_dir)
+    if lexicon_type==1:
+        # generic lexicon
+        lexicon_path = '../../lexicons/ic13/GenericVocabulary_new.txt'
+        lexicon_fid=open(lexicon_path, 'r')
+        pair_list = open('../../lexicons/ic13/GenericVocabulary_pair_list.txt', 'r')
+        pairs = dict()
+        for line in pair_list.readlines():
+            line=line.strip()
+            word = line.split(' ')[0].upper()
+            word_gt = line[len(word)+1:]
+            pairs[word] = word_gt
+        lexicon_fid=open(lexicon_path, 'r')
+        lexicon=[]
+        for line in lexicon_fid.readlines():
+            line=line.strip()
+            lexicon.append(line)
+    if lexicon_type==2:
+        # weak lexicon
+        lexicon_path = '../../lexicons/ic13/ch4_test_vocabulary_new.txt'
+        lexicon_fid=open(lexicon_path, 'r')
+        pair_list = open('../../lexicons/ic13/ch4_test_vocabulary_pair_list.txt', 'r')
+        pairs = dict()
+        for line in pair_list.readlines():
+            line=line.strip()
+            word = line.split(' ')[0].upper()
+            word_gt = line[len(word)+1:]
+            pairs[word] = word_gt
+        lexicon_fid=open(lexicon_path, 'r')
+        lexicon=[]
+        for line in lexicon_fid.readlines():
+            line=line.strip()
+            lexicon.append(line)
+    for i in tqdm(range(1,234)):
+        img = 'img_'+str(i)+'.jpg'
+        gt_img = 'gt_img_'+str(i)+'.txt'
+        if lexicon_type==3:
+            # weak
+            lexicon_path = '../../lexicons/ic13/new_strong_lexicon/new_voc_img_' + str(i) + '.txt'
+            lexicon_fid=open(lexicon_path, 'r')
+            pair_list = open('../../lexicons/ic13/new_strong_lexicon/pair_voc_img_' + str(i) + '.txt', 'r')
+            pairs = dict()
+            for line in pair_list.readlines():
+                line=line.strip()
+                word = line.split(' ')[0].upper()
+                word_gt = line[len(word)+1:]
+                pairs[word] = word_gt
+            lexicon_fid=open(lexicon_path, 'r')
+            lexicon=[]
+            for line in lexicon_fid.readlines():
+                line=line.strip()
+                lexicon.append(line)
+        result_path = os.path.join(results_dir,'res_img_'+str(i)+'.txt')
+        if os.path.isfile(result_path):
+            with open(result_path,'r') as f:
+                dt_lines = [a.strip() for a in f.readlines()]
+            dt_lines = [list_from_str(dt) for dt in dt_lines]
+        else:
+            dt_lines = []
+        dt_lines = [dt for dt in dt_lines if dt[-2]>score_rec_seq and dt[-3]>score_rec and dt[-6]>score_det]
+        nms_flag = nms(dt_lines,overlap)
+        boxes = []
+        for k in range(len(dt_lines)):
+            dt = dt_lines[k]
+            if nms_flag[k]:
+                if dt not in boxes:
+                    boxes.append(dt)
+        with open(os.path.join(nms_dir,'res_img_'+str(i)+'.txt'),'w') as f:
+            for g in boxes:
+                gt_coors = [int(b) for b in g[0:8]]
+                with open('../../../' + g[-1], "rb") as input_file:
+                # with open(g[-1], "rb") as input_file:
+                    dict_scores = pickle.load(input_file)
+                if use_char and use_seq:
+                    if g[-2]>g[-3]:
+                        word = g[-5]
+                        scores = dict_scores['seq_char_scores'][:,1:-1].swapaxes(0,1)
+                    else:
+                        word = g[-4]
+                        scores = dict_scores['seg_char_scores']
+                elif use_seq:
+                    word = g[-5]
+                    scores = dict_scores['seq_char_scores'][:,1:-1].swapaxes(0,1)
+                else:
+                    word = g[-4]
+                    scores = dict_scores['seg_char_scores']
+                if not use_lexicon:
+                    match_word = word
+                    match_dist = 0.
+                else:
+                    match_word, match_dist = find_match_word(word, lexicon, pairs, scores, use_lexicon, weighted_ed)
+                if match_dist<1.5 or lexicon_type==1:
+                    gt_coor_strs = [str(a) for a in gt_coors]+ [match_word]
+                    f.write(','.join(gt_coor_strs)+'\r\n')
+    pack_name = str(score_det)+'_'+str(score_rec)+'_over'+str(overlap)
+    packing(nms_dir,cache_dir,pack_name)
+    submit_file_path = os.path.join(cache_dir, pack_name+'.zip')
+    return submit_file_path
+def find_match_word(rec_str, lexicon, pairs, scores_numpy, use_ed = True, weighted_ed = False):
+    if not use_ed:
+        return rec_str
+    rec_str = rec_str.upper()
+    dist_min = 100
+    dist_min_pre = 100
+    match_word = ''
+    match_dist = 100
+    if not weighted_ed:
+        for word in lexicon:
+            word = word.upper()
+            ed = editdistance.eval(rec_str, word)
+            length_dist = abs(len(word) - len(rec_str))
+            # dist = ed + length_dist
+            dist = ed
+            if dist<dist_min:
+                dist_min = dist
+                match_word = pairs[word]
+                match_dist = dist
+        return match_word, match_dist
+    else:
+        small_lexicon_dict = dict()
+        for word in lexicon:
+            word = word.upper()
+            ed = editdistance.eval(rec_str, word)
+            small_lexicon_dict[word] = ed
+            dist = ed
+            if dist<dist_min_pre:
+                dist_min_pre = dist
+        small_lexicon = []
+        for word in small_lexicon_dict:
+            if small_lexicon_dict[word]<=dist_min_pre+2:
+                small_lexicon.append(word)
+        for word in small_lexicon:
+            word = word.upper()
+            ed = weighted_edit_distance(rec_str, word, scores_numpy)
+            dist = ed
+            if dist<dist_min:
+                dist_min = dist
+                match_word = pairs[word]
+                match_dist = dist
+        return match_word, match_dist
+def prepare_results_for_evaluation(results_dir, use_lexicon, cache_dir, score_det, score_rec, score_rec_seq):
+    if not os.path.isdir(cache_dir):
+        os.mkdir(cache_dir)
+    result_path = test_single(results_dir,score_det=score_det,score_rec=score_rec,score_rec_seq=score_rec_seq,overlap=0.2,cache_dir=cache_dir,lexicon_type=3, use_lexicon=use_lexicon, weighted_ed=True, use_seq=True, use_char=True, mix=True)
+    return result_path

evaluation/rotated_icdar2013/e2e/rrc_evaluation_funcs.py ADDED Viewed

	@@ -0,0 +1,369 @@

+#!/usr/bin/env python2
+#encoding: UTF-8
+import json
+import sys;sys.path.append('./')
+import zipfile
+import re
+import sys
+import os
+import codecs
+import importlib
+try:
+    from StringIO import StringIO
+except ImportError:
+    from io import StringIO
+def print_help():
+    sys.stdout.write('Usage: python %s.py -g=<gtFile> -s=<submFile> [-o=<outputFolder> -p=<jsonParams>]' %sys.argv[0])
+    sys.exit(2)
+def load_zip_file_keys(file,fileNameRegExp=''):
+    """
+    Returns an array with the entries of the ZIP file that match with the regular expression.
+    The key's are the names or the file or the capturing group definied in the fileNameRegExp
+    """
+    try:
+        archive=zipfile.ZipFile(file, mode='r', allowZip64=True)
+    except :
+        raise Exception('Error loading the ZIP archive.')
+    pairs = []
+    for name in archive.namelist():
+        addFile = True
+        keyName = name
+        if fileNameRegExp!="":
+            m = re.match(fileNameRegExp,name)
+            if m == None:
+                addFile = False
+            else:
+                if len(m.groups())>0:
+                    keyName = m.group(1)
+        if addFile:
+            pairs.append( keyName )
+    return pairs
+def load_zip_file(file,fileNameRegExp='',allEntries=False):
+    """
+    Returns an array with the contents (filtered by fileNameRegExp) of a ZIP file.
+    The key's are the names or the file or the capturing group definied in the fileNameRegExp
+    allEntries validates that all entries in the ZIP file pass the fileNameRegExp
+    """
+    try:
+        archive=zipfile.ZipFile(file, mode='r', allowZip64=True)
+    except :
+        raise Exception('Error loading the ZIP archive')
+    pairs = []
+    for name in archive.namelist():
+        addFile = True
+        keyName = name
+        if fileNameRegExp!="":
+            m = re.match(fileNameRegExp,name)
+            if m == None:
+                addFile = False
+            else:
+                if len(m.groups())>0:
+                    keyName = m.group(1)
+        if addFile:
+            pairs.append( [ keyName , archive.read(name)] )
+        else:
+            if allEntries:
+                raise Exception('ZIP entry not valid: %s' %name)
+    return dict(pairs)
+def decode_utf8(raw):
+    """
+    Returns a Unicode object on success, or None on failure
+    """
+    try:
+        raw = codecs.decode(raw,'utf-8', 'replace')
+        #extracts BOM if exists
+        raw = raw.encode('utf8')
+        if raw.startswith(codecs.BOM_UTF8):
+            raw = raw.replace(codecs.BOM_UTF8, '', 1)
+        return raw.decode('utf-8')
+    except:
+       return None
+def validate_lines_in_file(fileName,file_contents,CRLF=True,LTRB=True,withTranscription=False,withConfidence=False,imWidth=0,imHeight=0):
+    """
+    This function validates that all lines of the file calling the Line validation function for each line
+    """
+    utf8File = decode_utf8(file_contents)
+    if (utf8File is None) :
+        raise Exception("The file %s is not UTF-8" %fileName)
+    lines = utf8File.split( "\r\n" if CRLF else "\n" )
+    for line in lines:
+        line = line.replace("\r","").replace("\n","")
+        if(line != ""):
+            try:
+                validate_tl_line(line,LTRB,withTranscription,withConfidence,imWidth,imHeight)
+            except Exception as e:
+                raise Exception(("Line in sample not valid. Sample: %s Line: %s Error: %s" %(fileName,line,str(e))).encode('utf-8', 'replace'))
+def validate_tl_line(line,LTRB=True,withTranscription=True,withConfidence=True,imWidth=0,imHeight=0):
+    """
+    Validate the format of the line. If the line is not valid an exception will be raised.
+    If maxWidth and maxHeight are specified, all points must be inside the imgage bounds.
+    Posible values are:
+    LTRB=True: xmin,ymin,xmax,ymax[,confidence][,transcription]
+    LTRB=False: x1,y1,x2,y2,x3,y3,x4,y4[,confidence][,transcription]
+    """
+    get_tl_line_values(line,LTRB,withTranscription,withConfidence,imWidth,imHeight)
+def get_tl_line_values(line,LTRB=True,withTranscription=False,withConfidence=False,imWidth=0,imHeight=0):
+    """
+    Validate the format of the line. If the line is not valid an exception will be raised.
+    If maxWidth and maxHeight are specified, all points must be inside the imgage bounds.
+    Posible values are:
+    LTRB=True: xmin,ymin,xmax,ymax[,confidence][,transcription]
+    LTRB=False: x1,y1,x2,y2,x3,y3,x4,y4[,confidence][,transcription]
+    Returns values from a textline. Points , [Confidences], [Transcriptions]
+    """
+    confidence = 0.0
+    transcription = "";
+    points = []
+    numPoints = 4;
+    if LTRB:
+        numPoints = 4;
+        if withTranscription and withConfidence:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-1].?[0-9]*)\s*,(.*)$',line)
+            if m == None :
+                m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-1].?[0-9]*)\s*,(.*)$',line)
+                raise Exception("Format incorrect. Should be: xmin,ymin,xmax,ymax,confidence,transcription")
+        elif withConfidence:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-1].?[0-9]*)\s*$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: xmin,ymin,xmax,ymax,confidence")
+        elif withTranscription:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,(.*)$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: xmin,ymin,xmax,ymax,transcription")
+        else:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,?\s*$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: xmin,ymin,xmax,ymax")
+        xmin = int(m.group(1))
+        ymin = int(m.group(2))
+        xmax = int(m.group(3))
+        ymax = int(m.group(4))
+        if(xmax<xmin):
+                raise Exception("Xmax value (%s) not valid (Xmax < Xmin)." %(xmax))
+        if(ymax<ymin):
+                raise Exception("Ymax value (%s)  not valid (Ymax < Ymin)." %(ymax))
+        points = [ float(m.group(i)) for i in range(1, (numPoints+1) ) ]
+        if (imWidth>0 and imHeight>0):
+            validate_point_inside_bounds(xmin,ymin,imWidth,imHeight);
+            validate_point_inside_bounds(xmax,ymax,imWidth,imHeight);
+    else:
+        numPoints = 8;
+        if withTranscription and withConfidence:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-1].?[0-9]*)\s*,(.*)$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4,confidence,transcription")
+        elif withConfidence:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-1].?[0-9]*)\s*$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4,confidence")
+        elif withTranscription:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,(.*)$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4,transcription")
+        else:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4")
+        points = [ float(m.group(i)) for i in range(1, (numPoints+1) ) ]
+        validate_clockwise_points(points)
+        if (imWidth>0 and imHeight>0):
+            validate_point_inside_bounds(points[0],points[1],imWidth,imHeight);
+            validate_point_inside_bounds(points[2],points[3],imWidth,imHeight);
+            validate_point_inside_bounds(points[4],points[5],imWidth,imHeight);
+            validate_point_inside_bounds(points[6],points[7],imWidth,imHeight);
+    if withConfidence:
+        try:
+            confidence = float(m.group(numPoints+1))
+        except ValueError:
+            raise Exception("Confidence value must be a float")
+    if withTranscription:
+        posTranscription = numPoints + (2 if withConfidence else 1)
+        transcription = m.group(posTranscription)
+        m2 = re.match(r'^\s*\"(.*)\"\s*$',transcription)
+        if m2 != None : #Transcription with double quotes, we extract the value and replace escaped characters
+            transcription = m2.group(1).replace("\\\\", "\\").replace("\\\"", "\"")
+    return points,confidence,transcription
+def validate_point_inside_bounds(x,y,imWidth,imHeight):
+    if(x<0 or x>imWidth):
+            raise Exception("X value (%s) not valid. Image dimensions: (%s,%s)" %(xmin,imWidth,imHeight))
+    if(y<0 or y>imHeight):
+            raise Exception("Y value (%s)  not valid. Image dimensions: (%s,%s) Sample: %s Line:%s" %(ymin,imWidth,imHeight))
+def validate_clockwise_points(points):
+    """
+    Validates that the points that the 4 points that dlimite a polygon are in clockwise order.
+    """
+    if len(points) != 8:
+        raise Exception("Points list not valid." + str(len(points)))
+    point = [
+                [int(points[0]) , int(points[1])],
+                [int(points[2]) , int(points[3])],
+                [int(points[4]) , int(points[5])],
+                [int(points[6]) , int(points[7])]
+            ]
+    edge = [
+                ( point[1][0] - point[0][0])*( point[1][1] + point[0][1]),
+                ( point[2][0] - point[1][0])*( point[2][1] + point[1][1]),
+                ( point[3][0] - point[2][0])*( point[3][1] + point[2][1]),
+                ( point[0][0] - point[3][0])*( point[0][1] + point[3][1])
+    ]
+    summatory = edge[0] + edge[1] + edge[2] + edge[3];
+    if summatory>0:
+        raise Exception("Points are not clockwise. The coordinates of bounding quadrilaterals have to be given in clockwise order. Regarding the correct interpretation of 'clockwise' remember that the image coordinate system used is the standard one, with the image origin at the upper left, the X axis extending to the right and Y axis extending downwards.")
+def get_tl_line_values_from_file_contents(content,CRLF=True,LTRB=True,withTranscription=False,withConfidence=False,imWidth=0,imHeight=0,sort_by_confidences=True):
+    """
+    Returns all points, confindences and transcriptions of a file in lists. Valid line formats:
+    xmin,ymin,xmax,ymax,[confidence],[transcription]
+    x1,y1,x2,y2,x3,y3,x4,y4,[confidence],[transcription]
+    """
+    pointsList = []
+    transcriptionsList = []
+    confidencesList = []
+    lines = content.split( "\r\n" if CRLF else "\n" )
+    for line in lines:
+        line = line.replace("\r","").replace("\n","")
+        if(line != "") :
+            points, confidence, transcription = get_tl_line_values(line,LTRB,withTranscription,withConfidence,imWidth,imHeight);
+            pointsList.append(points)
+            transcriptionsList.append(transcription)
+            confidencesList.append(confidence)
+    if withConfidence and len(confidencesList)>0 and sort_by_confidences:
+        import numpy as np
+        sorted_ind = np.argsort(-np.array(confidencesList))
+        confidencesList = [confidencesList[i] for i in sorted_ind]
+        pointsList = [pointsList[i] for i in sorted_ind]
+        transcriptionsList = [transcriptionsList[i] for i in sorted_ind]
+    return pointsList,confidencesList,transcriptionsList
+def main_evaluation(p,default_evaluation_params_fn,validate_data_fn,evaluate_method_fn,show_result=True,per_sample=True):
+    """
+    This process validates a method, evaluates it and if it succed generates a ZIP file with a JSON entry for each sample.
+    Params:
+    p: Dictionary of parmeters with the GT/submission locations. If None is passed, the parameters send by the system are used.
+    default_evaluation_params_fn: points to a function that returns a dictionary with the default parameters used for the evaluation
+    validate_data_fn: points to a method that validates the corrct format of the submission
+    evaluate_method_fn: points to a function that evaluated the submission and return a Dictionary with the results
+    """
+    if (p == None):
+        p = dict([s[1:].split('=') for s in sys.argv[1:]])
+        if(len(sys.argv)<3):
+            print_help()
+    evalParams = default_evaluation_params_fn()
+    if 'p' in p.keys():
+        evalParams.update( p['p'] if isinstance(p['p'], dict) else json.loads(p['p'][1:-1]) )
+    resDict={'calculated':True,'Message':'','method':'{}','per_sample':'{}'}
+    try:
+        validate_data_fn(p['g'], p['s'], evalParams)
+        evalData = evaluate_method_fn(p['g'], p['s'], evalParams)
+        resDict.update(evalData)
+    except Exception as e:
+        resDict['Message']= str(e)
+        resDict['calculated']=False
+    if 'o' in p:
+        if not os.path.exists(p['o']):
+            os.makedirs(p['o'])
+        resultsOutputname = p['o'] + '/results.zip'
+        outZip = zipfile.ZipFile(resultsOutputname, mode='w', allowZip64=True)
+        del resDict['per_sample']
+        if 'output_items' in resDict.keys():
+            del resDict['output_items']
+        outZip.writestr('method.json',json.dumps(resDict))
+    if not resDict['calculated']:
+        if show_result:
+            sys.stderr.write('Error!\n'+ resDict['Message']+'\n\n')
+        if 'o' in p:
+            outZip.close()
+        return resDict
+    if 'o' in p:
+        if per_sample == True:
+            for k,v in evalData['per_sample'].items():
+                outZip.writestr( k + '.json',json.dumps(v))
+            if 'output_items' in evalData.keys():
+                for k, v in evalData['output_items'].items():
+                    outZip.writestr( k,v)
+        outZip.close()
+    if show_result:
+        sys.stdout.write("Calculated!")
+        sys.stdout.write(json.dumps(resDict['method']))
+    return resDict
+def main_validation(default_evaluation_params_fn,validate_data_fn):
+    """
+    This process validates a method
+    Params:
+    default_evaluation_params_fn: points to a function that returns a dictionary with the default parameters used for the evaluation
+    validate_data_fn: points to a method that validates the corrct format of the submission
+    """
+    try:
+        p = dict([s[1:].split('=') for s in sys.argv[1:]])
+        evalParams = default_evaluation_params_fn()
+        if 'p' in p.keys():
+            evalParams.update( p['p'] if isinstance(p['p'], dict) else json.loads(p['p'][1:-1]) )
+        validate_data_fn(p['g'], p['s'], evalParams)
+        print('SUCCESS')
+        sys.exit(0)
+    except Exception as e:
+        print(str(e))
+        sys.exit(101)

evaluation/rotated_icdar2013/e2e/script.py ADDED Viewed

	@@ -0,0 +1,460 @@

+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# encoding=utf8
+from collections import namedtuple
+import rrc_evaluation_funcs
+import importlib
+from prepare_results import prepare_results_for_evaluation
+def evaluation_imports():
+    """
+    evaluation_imports: Dictionary ( key = module name , value = alias  )  with python modules used in the evaluation.
+    """
+    return {
+            'Polygon':'plg',
+            'numpy':'np'
+            }
+def default_evaluation_params():
+    """
+    default_evaluation_params: Default parameters to use for the validation and evaluation.
+    """
+    return {
+            'IOU_CONSTRAINT' :0.5,
+            'AREA_PRECISION_CONSTRAINT' :0.5,
+            'WORD_SPOTTING' :False,
+            'MIN_LENGTH_CARE_WORD' :3,
+            'GT_SAMPLE_NAME_2_ID':'gt_img_([0-9]+).txt',
+            'DET_SAMPLE_NAME_2_ID':'res_img_([0-9]+).txt',
+            'LTRB':False, #LTRB:2points(left,top,right,bottom) or 4 points(x1,y1,x2,y2,x3,y3,x4,y4)
+            'CRLF':False, # Lines are delimited by Windows CRLF format
+            'CONFIDENCES':False, #Detections must include confidence value. MAP and MAR will be calculated,
+            'SPECIAL_CHARACTERS':'!?.:,*"()·[]/\'',
+            'ONLY_REMOVE_FIRST_LAST_CHARACTER' : True
+        }
+def validate_data(gtFilePath, submFilePath, evaluationParams):
+    """
+    Method validate_data: validates that all files in the results folder are correct (have the correct name contents).
+                            Validates also that there are no missing files in the folder.
+                            If some error detected, the method raises the error
+    """
+    gt = rrc_evaluation_funcs.load_zip_file(gtFilePath, evaluationParams['GT_SAMPLE_NAME_2_ID'])
+    subm = rrc_evaluation_funcs.load_zip_file(submFilePath, evaluationParams['DET_SAMPLE_NAME_2_ID'], True)
+    #Validate format of GroundTruth
+    for k in gt:
+        rrc_evaluation_funcs.validate_lines_in_file(k,gt[k],evaluationParams['CRLF'],evaluationParams['LTRB'],True)
+    #Validate format of results
+    for k in subm:
+        if (k in gt) == False :
+            raise Exception("The sample %s not present in GT" %k)
+        rrc_evaluation_funcs.validate_lines_in_file(k,subm[k],evaluationParams['CRLF'],evaluationParams['LTRB'],True,evaluationParams['CONFIDENCES'])
+def evaluate_method(gtFilePath, submFilePath, evaluationParams):
+    """
+    Method evaluate_method: evaluate method and returns the results
+        Results. Dictionary with the following values:
+        - method (required)  Global method metrics. Ex: { 'Precision':0.8,'Recall':0.9 }
+        - samples (optional) Per sample metrics. Ex: {'sample1' : { 'Precision':0.8,'Recall':0.9 } , 'sample2' : { 'Precision':0.8,'Recall':0.9 }
+    """
+    for module,alias in evaluation_imports().items():
+        globals()[alias] = importlib.import_module(module)
+    def polygon_from_points(points,correctOffset=False):
+        """
+        Returns a Polygon object to use with the Polygon2 class from a list of 8 points: x1,y1,x2,y2,x3,y3,x4,y4
+        """
+        if correctOffset: #this will substract 1 from the coordinates that correspond to the xmax and ymax
+            points[2] -= 1
+            points[4] -= 1
+            points[5] -= 1
+            points[7] -= 1
+        resBoxes=np.empty([1,8],dtype='int32')
+        resBoxes[0,0]=int(points[0])
+        resBoxes[0,4]=int(points[1])
+        resBoxes[0,1]=int(points[2])
+        resBoxes[0,5]=int(points[3])
+        resBoxes[0,2]=int(points[4])
+        resBoxes[0,6]=int(points[5])
+        resBoxes[0,3]=int(points[6])
+        resBoxes[0,7]=int(points[7])
+        pointMat = resBoxes[0].reshape([2,4]).T
+        return plg.Polygon( pointMat)
+    def rectangle_to_polygon(rect):
+        resBoxes=np.empty([1,8],dtype='int32')
+        resBoxes[0,0]=int(rect.xmin)
+        resBoxes[0,4]=int(rect.ymax)
+        resBoxes[0,1]=int(rect.xmin)
+        resBoxes[0,5]=int(rect.ymin)
+        resBoxes[0,2]=int(rect.xmax)
+        resBoxes[0,6]=int(rect.ymin)
+        resBoxes[0,3]=int(rect.xmax)
+        resBoxes[0,7]=int(rect.ymax)
+        pointMat = resBoxes[0].reshape([2,4]).T
+        return plg.Polygon( pointMat)
+    def rectangle_to_points(rect):
+        points = [int(rect.xmin), int(rect.ymax), int(rect.xmax), int(rect.ymax), int(rect.xmax), int(rect.ymin), int(rect.xmin), int(rect.ymin)]
+        return points
+    def get_union(pD,pG):
+        areaA = pD.area();
+        areaB = pG.area();
+        return areaA + areaB - get_intersection(pD, pG);
+    def get_intersection_over_union(pD,pG):
+        try:
+            return get_intersection(pD, pG) / get_union(pD, pG);
+        except:
+            return 0
+    def get_intersection(pD,pG):
+        pInt = pD & pG
+        if len(pInt) == 0:
+            return 0
+        return pInt.area()
+    def compute_ap(confList, matchList,numGtCare):
+        correct = 0
+        AP = 0
+        if len(confList)>0:
+            confList = np.array(confList)
+            matchList = np.array(matchList)
+            sorted_ind = np.argsort(-confList)
+            confList = confList[sorted_ind]
+            matchList = matchList[sorted_ind]
+            for n in range(len(confList)):
+                match = matchList[n]
+                if match:
+                    correct += 1
+                    AP += float(correct)/(n + 1)
+            if numGtCare>0:
+                AP /= numGtCare
+        return AP
+    def transcription_match(transGt,transDet,specialCharacters='!?.:,*"()·[]/\'',onlyRemoveFirstLastCharacterGT=True):
+        if onlyRemoveFirstLastCharacterGT:
+            #special characters in GT are allowed only at initial or final position
+            if (transGt==transDet):
+                return True
+            if specialCharacters.find(transGt[0])>-1:
+                if transGt[1:]==transDet:
+                    return True
+            if specialCharacters.find(transGt[-1])>-1:
+                if transGt[0:len(transGt)-1]==transDet:
+                    return True
+            if specialCharacters.find(transGt[0])>-1 and specialCharacters.find(transGt[-1])>-1:
+                if transGt[1:len(transGt)-1]==transDet:
+                    return True
+            return False
+        else:
+            #Special characters are removed from the begining and the end of both Detection and GroundTruth
+            while len(transGt)>0 and specialCharacters.find(transGt[0])>-1:
+                transGt = transGt[1:]
+            while len(transDet)>0 and specialCharacters.find(transDet[0])>-1:
+                transDet = transDet[1:]
+            while len(transGt)>0 and specialCharacters.find(transGt[-1])>-1 :
+                transGt = transGt[0:len(transGt)-1]
+            while len(transDet)>0 and specialCharacters.find(transDet[-1])>-1:
+                transDet = transDet[0:len(transDet)-1]
+            return transGt == transDet
+    def include_in_dictionary(transcription):
+        """
+        Function used in Word Spotting that finds if the Ground Truth transcription meets the rules to enter into the dictionary. If not, the transcription will be cared as don't care
+        """
+        #special case 's at final
+        if transcription[len(transcription)-2:]=="'s" or transcription[len(transcription)-2:]=="'S":
+            transcription = transcription[0:len(transcription)-2]
+        #hypens at init or final of the word
+        transcription = transcription.strip('-');
+        specialCharacters = "'!?.:,*\"()·[]/";
+        for character in specialCharacters:
+            transcription = transcription.replace(character,' ')
+        transcription = transcription.strip()
+        if len(transcription) != len(transcription.replace(" ","")) :
+            return False;
+        if len(transcription) < evaluationParams['MIN_LENGTH_CARE_WORD']:
+            return False;
+        notAllowed = "×÷·";
+        range1 = [ ord(u'a'), ord(u'z') ]
+        range2 = [ ord(u'A'), ord(u'Z') ]
+        range3 = [ ord(u'À'), ord(u'ƿ') ]
+        range4 = [ ord(u'Ǆ'), ord(u'ɿ') ]
+        range5 = [ ord(u'Ά'), ord(u'Ͽ') ]
+        range6 = [ ord(u'-'), ord(u'-') ]
+        for char in transcription :
+            charCode = ord(char)
+            if(notAllowed.find(char) != -1):
+                return False
+            valid = ( charCode>=range1[0] and charCode<=range1[1] ) or ( charCode>=range2[0] and charCode<=range2[1] ) or ( charCode>=range3[0] and charCode<=range3[1] ) or ( charCode>=range4[0] and charCode<=range4[1] ) or ( charCode>=range5[0] and charCode<=range5[1] ) or ( charCode>=range6[0] and charCode<=range6[1] )
+            if valid == False:
+                return False
+        return True
+    def include_in_dictionary_transcription(transcription):
+        """
+        Function applied to the Ground Truth transcriptions used in Word Spotting. It removes special characters or terminations
+        """
+        #special case 's at final
+        if transcription[len(transcription)-2:]=="'s" or transcription[len(transcription)-2:]=="'S":
+            transcription = transcription[0:len(transcription)-2]
+        #hypens at init or final of the word
+        transcription = transcription.strip('-');
+        specialCharacters = "'!?.:,*\"()·[]/";
+        for character in specialCharacters:
+            transcription = transcription.replace(character,' ')
+        transcription = transcription.strip()
+        return transcription
+    perSampleMetrics = {}
+    matchedSum = 0
+    Rectangle = namedtuple('Rectangle', 'xmin ymin xmax ymax')
+    gt = rrc_evaluation_funcs.load_zip_file(gtFilePath,evaluationParams['GT_SAMPLE_NAME_2_ID'])
+    subm = rrc_evaluation_funcs.load_zip_file(submFilePath,evaluationParams['DET_SAMPLE_NAME_2_ID'],True)
+    numGlobalCareGt = 0;
+    numGlobalCareDet = 0;
+    arrGlobalConfidences = [];
+    arrGlobalMatches = [];
+    for resFile in gt:
+        gtFile = rrc_evaluation_funcs.decode_utf8(gt[resFile])
+        if (gtFile is None) :
+            raise Exception("The file %s is not UTF-8" %resFile)
+        recall = 0
+        precision = 0
+        hmean = 0
+        detCorrect = 0
+        iouMat = np.empty([1,1])
+        gtPols = []
+        detPols = []
+        gtTrans = []
+        detTrans = []
+        gtPolPoints = []
+        detPolPoints = []
+        gtDontCarePolsNum = [] #Array of Ground Truth Polygons' keys marked as don't Care
+        detDontCarePolsNum = [] #Array of Detected Polygons' matched with a don't Care GT
+        detMatchedNums = []
+        pairs = []
+        arrSampleConfidences = [];
+        arrSampleMatch = [];
+        sampleAP = 0;
+        evaluationLog = ""
+        pointsList,_,transcriptionsList = rrc_evaluation_funcs.get_tl_line_values_from_file_contents(gtFile,evaluationParams['CRLF'],evaluationParams['LTRB'],True,False)
+        for n in range(len(pointsList)):
+            points = pointsList[n]
+            transcription = transcriptionsList[n]
+            dontCare = transcription == "###"
+            if evaluationParams['LTRB']:
+                gtRect = Rectangle(*points)
+                gtPol = rectangle_to_polygon(gtRect)
+            else:
+                gtPol = polygon_from_points(points)
+            gtPols.append(gtPol)
+            gtPolPoints.append(points)
+            #On word spotting we will filter some transcriptions with special characters
+            if evaluationParams['WORD_SPOTTING'] :
+                if dontCare == False :
+                    if include_in_dictionary(transcription) == False :
+                        dontCare = True
+                    else:
+                        transcription = include_in_dictionary_transcription(transcription)
+            gtTrans.append(transcription)
+            if dontCare:
+                gtDontCarePolsNum.append( len(gtPols)-1 )
+        evaluationLog += "GT polygons: " + str(len(gtPols)) + (" (" + str(len(gtDontCarePolsNum)) + " don't care)\n" if len(gtDontCarePolsNum)>0 else "\n")
+        if resFile in subm:
+            detFile = rrc_evaluation_funcs.decode_utf8(subm[resFile])
+            pointsList,confidencesList,transcriptionsList = rrc_evaluation_funcs.get_tl_line_values_from_file_contents(detFile,evaluationParams['CRLF'],evaluationParams['LTRB'],True,evaluationParams['CONFIDENCES'])
+            for n in range(len(pointsList)):
+                points = pointsList[n]
+                transcription = transcriptionsList[n]
+                if evaluationParams['LTRB']:
+                    detRect = Rectangle(*points)
+                    detPol = rectangle_to_polygon(detRect)
+                else:
+                    detPol = polygon_from_points(points)
+                detPols.append(detPol)
+                detPolPoints.append(points)
+                detTrans.append(transcription)
+                if len(gtDontCarePolsNum)>0 :
+                    for dontCarePol in gtDontCarePolsNum:
+                        dontCarePol = gtPols[dontCarePol]
+                        intersected_area = get_intersection(dontCarePol,detPol)
+                        pdDimensions = detPol.area()
+                        precision = 0 if pdDimensions == 0 else intersected_area / pdDimensions
+                        if (precision > evaluationParams['AREA_PRECISION_CONSTRAINT'] ):
+                            detDontCarePolsNum.append( len(detPols)-1 )
+                            break
+            evaluationLog += "DET polygons: " + str(len(detPols)) + (" (" + str(len(detDontCarePolsNum)) + " don't care)\n" if len(detDontCarePolsNum)>0 else "\n")
+            if len(gtPols)>0 and len(detPols)>0:
+                #Calculate IoU and precision matrixs
+                outputShape=[len(gtPols),len(detPols)]
+                iouMat = np.empty(outputShape)
+                gtRectMat = np.zeros(len(gtPols),np.int8)
+                detRectMat = np.zeros(len(detPols),np.int8)
+                for gtNum in range(len(gtPols)):
+                    for detNum in range(len(detPols)):
+                        pG = gtPols[gtNum]
+                        pD = detPols[detNum]
+                        iouMat[gtNum,detNum] = get_intersection_over_union(pD,pG)
+                for gtNum in range(len(gtPols)):
+                    for detNum in range(len(detPols)):
+                        if gtRectMat[gtNum] == 0 and detRectMat[detNum] == 0 and gtNum not in gtDontCarePolsNum and detNum not in detDontCarePolsNum :
+                            if iouMat[gtNum,detNum]>evaluationParams['IOU_CONSTRAINT']:
+                                gtRectMat[gtNum] = 1
+                                detRectMat[detNum] = 1
+                                #detection matched only if transcription is equal
+                                if evaluationParams['WORD_SPOTTING']:
+                                    correct = gtTrans[gtNum].upper() == detTrans[detNum].upper()
+                                else:
+                                    correct = transcription_match(gtTrans[gtNum].upper(),detTrans[detNum].upper(),evaluationParams['SPECIAL_CHARACTERS'],evaluationParams['ONLY_REMOVE_FIRST_LAST_CHARACTER'])==True
+                                detCorrect += (1 if correct else 0)
+                                if correct:
+                                    detMatchedNums.append(detNum)
+                                pairs.append({'gt':gtNum,'det':detNum,'correct':correct})
+                                evaluationLog += "Match GT #" + str(gtNum) + " with Det #" + str(detNum) + " trans. correct: " + str(correct) + "\n"
+            if evaluationParams['CONFIDENCES']:
+                for detNum in range(len(detPols)):
+                    if detNum not in detDontCarePolsNum :
+                        #we exclude the don't care detections
+                        match = detNum in detMatchedNums
+                        arrSampleConfidences.append(confidencesList[detNum])
+                        arrSampleMatch.append(match)
+                        arrGlobalConfidences.append(confidencesList[detNum]);
+                        arrGlobalMatches.append(match);
+        numGtCare = (len(gtPols) - len(gtDontCarePolsNum))
+        numDetCare = (len(detPols) - len(detDontCarePolsNum))
+        if numGtCare == 0:
+            recall = float(1)
+            precision = float(0) if numDetCare >0 else float(1)
+            sampleAP = precision
+        else:
+            recall = float(detCorrect) / numGtCare
+            precision = 0 if numDetCare==0 else float(detCorrect) / numDetCare
+            if evaluationParams['CONFIDENCES']:
+                sampleAP = compute_ap(arrSampleConfidences, arrSampleMatch, numGtCare )
+        hmean = 0 if (precision + recall)==0 else 2.0 * precision * recall / (precision + recall)
+        matchedSum += detCorrect
+        numGlobalCareGt += numGtCare
+        numGlobalCareDet += numDetCare
+        perSampleMetrics[resFile] = {
+                                        'precision':precision,
+                                        'recall':recall,
+                                        'hmean':hmean,
+                                        'pairs':pairs,
+                                        'AP':sampleAP,
+                                        'iouMat':[] if len(detPols)>100 else iouMat.tolist(),
+                                        'gtPolPoints':gtPolPoints,
+                                        'detPolPoints':detPolPoints,
+                                        'gtTrans':gtTrans,
+                                        'detTrans':detTrans,
+                                        'gtDontCare':gtDontCarePolsNum,
+                                        'detDontCare':detDontCarePolsNum,
+                                        'evaluationParams': evaluationParams,
+                                        'evaluationLog': evaluationLog
+                                    }
+    # Compute AP
+    AP = 0
+    if evaluationParams['CONFIDENCES']:
+        AP = compute_ap(arrGlobalConfidences, arrGlobalMatches, numGlobalCareGt)
+    methodRecall = 0 if numGlobalCareGt == 0 else float(matchedSum)/numGlobalCareGt
+    methodPrecision = 0 if numGlobalCareDet == 0 else float(matchedSum)/numGlobalCareDet
+    methodHmean = 0 if methodRecall + methodPrecision==0 else 2* methodRecall * methodPrecision / (methodRecall + methodPrecision)
+    methodMetrics = {'precision':methodPrecision, 'recall':methodRecall,'hmean': methodHmean, 'AP': AP  }
+    resDict = {'calculated':True,'Message':'','method': methodMetrics,'per_sample': perSampleMetrics}
+    return resDict;
+if __name__=='__main__':
+    '''
+    results_dir: result directory
+    score_det: score of detection bounding box
+    score_rec: score of the mask recognition branch
+    score_rec_seq: score of the sequence recognition branch
+    lexicon_type: 1 for generic; 2 for weak; 3 for strong
+    '''
+    angle = 45
+    results_dir = '../../../output/mixtrain/inference/rotated_ic13_test_' + str(angle) + '/model_0250000_1000_results/'
+    score_rec_seq = 0.9
+    score_rec = 0.4
+    score_det = 0.1
+    evaluate_result_path = prepare_results_for_evaluation(results_dir,
+        use_lexicon=False, cache_dir='./cache_files',
+        score_det=score_det, score_rec=score_rec, score_rec_seq=score_rec_seq)
+    p = {
+        'g': '../gt/gt_'+str(angle)+'.zip',
+        's': evaluate_result_path
+    }
+    rrc_evaluation_funcs.main_evaluation(p,default_evaluation_params,validate_data,evaluate_method)

evaluation/rotated_icdar2013/gt/gt.zip ADDED Viewed

Binary file (65.2 kB). View file

evaluation/rotated_icdar2013/gt/gt_-15.zip ADDED Viewed

Binary file (64.9 kB). View file

evaluation/rotated_icdar2013/gt/gt_-30.zip ADDED Viewed

Binary file (65.2 kB). View file

evaluation/rotated_icdar2013/gt/gt_-45.zip ADDED Viewed

Binary file (65.2 kB). View file

evaluation/rotated_icdar2013/gt/gt_-60.zip ADDED Viewed

Binary file (65.2 kB). View file

evaluation/rotated_icdar2013/gt/gt_-75.zip ADDED Viewed

Binary file (64.9 kB). View file

evaluation/rotated_icdar2013/gt/gt_-90.zip ADDED Viewed

Binary file (59.9 kB). View file

evaluation/rotated_icdar2013/gt/gt_0.zip ADDED Viewed

Binary file (59.6 kB). View file

evaluation/rotated_icdar2013/gt/gt_15.zip ADDED Viewed

Binary file (65 kB). View file

evaluation/rotated_icdar2013/gt/gt_30.zip ADDED Viewed

Binary file (65.2 kB). View file

evaluation/rotated_icdar2013/gt/gt_45.zip ADDED Viewed

Binary file (65.2 kB). View file

evaluation/rotated_icdar2013/gt/gt_60.zip ADDED Viewed

Binary file (65.2 kB). View file

evaluation/rotated_icdar2013/gt/gt_75.zip ADDED Viewed

Binary file (64.9 kB). View file

evaluation/rotated_icdar2013/gt/gt_85.zip ADDED Viewed

Binary file (64.4 kB). View file

evaluation/rotated_icdar2013/gt/gt_90.zip ADDED Viewed

Binary file (59.7 kB). View file

evaluation/totaltext/e2e/prepare_results.py ADDED Viewed

	@@ -0,0 +1,234 @@

+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+import sys
+import os
+import glob
+sys.path.append('./')
+import shapely
+from shapely.geometry import Polygon,MultiPoint
+import numpy as np
+import editdistance
+sys.path.append('../../')
+from weighted_editdistance import weighted_edit_distance
+from tqdm import tqdm
+try:
+    import pickle
+except ImportError:
+    import cPickle as pickle
+def list_from_str(st):
+    line = st.split(';')
+    segms = line[1].split(',')
+    scores = line[2].split(',')
+    new_line = [float(a) for a in segms]+[float(scores[-4])]+[scores[-5]]+[scores[-6]]+[float(scores[-3])]+[float(scores[-2])] + [scores[-1]]
+    return new_line
+def polygon_from_list(line):
+    """
+    Create a shapely polygon object from gt or dt line.
+    """
+    polygon_points = np.array(line).reshape(-1, 2)
+    polygon = Polygon(polygon_points).convex_hull
+    return polygon
+def polygon_iou(list1, list2):
+    """
+    Intersection over union between two shapely polygons.
+    """
+    polygon_points1 = np.array(list1).reshape(-1, 2)
+    poly1 = Polygon(polygon_points1).convex_hull
+    polygon_points2 = np.array(list2).reshape(-1, 2)
+    poly2 = Polygon(polygon_points2).convex_hull
+    union_poly = np.concatenate((polygon_points1,polygon_points2))
+    if not poly1.intersects(poly2): # this test is fast and can accelerate calculation
+        iou = 0
+    else:
+        try:
+            inter_area = poly1.intersection(poly2).area
+            #union_area = poly1.area + poly2.area - inter_area
+            union_area = MultiPoint(union_poly).convex_hull.area
+            iou = float(inter_area) / (union_area+1e-6)
+        except shapely.geos.TopologicalError:
+            print('shapely.geos.TopologicalError occured, iou set to 0')
+            iou = 0
+    return iou
+def nms(boxes,overlap):
+    rec_scores = [b[-6] for b in boxes]
+    indices = sorted(range(len(rec_scores)), key=lambda k: -rec_scores[k])
+    box_num = len(boxes)
+    nms_flag = [True]*box_num
+    for i in range(box_num):
+        ii = indices[i]
+        if not nms_flag[ii]:
+            continue
+        for j in range(box_num):
+            jj = indices[j]
+            if j == i:
+                continue
+            if not nms_flag[jj]:
+                continue
+            box1 = boxes[ii]
+            box2 = boxes[jj]
+            box1_score = rec_scores[ii]
+            box2_score = rec_scores[jj]
+            str1 = box1[9]
+            str2 = box2[9]
+            box_i = [box1[0],box1[1],box1[4],box1[5]]
+            box_j = [box2[0],box2[1],box2[4],box2[5]]
+            poly1 = polygon_from_list(box1[0:-6])
+            poly2 = polygon_from_list(box2[0:-6])
+            iou = polygon_iou(box1[0:-6],box2[0:-6])
+            thresh = overlap
+            if iou > thresh:
+                if box1_score > box2_score:
+                    nms_flag[jj] = False
+                if box1_score == box2_score and poly1.area > poly2.area:
+                    nms_flag[jj] = False
+                if box1_score == box2_score and poly1.area<=poly2.area:
+                    nms_flag[ii] = False
+                    break
+    return nms_flag
+def packing(save_dir, cache_dir, pack_name):
+    files = os.listdir(save_dir)
+    if not os.path.exists(cache_dir):
+        os.mkdir(cache_dir)
+    os.system('zip -r -q -j '+os.path.join(cache_dir, pack_name+'.zip')+' '+save_dir+'/*')
+def test_single(results_dir,lexicon_type=3,cache_dir='./cache_dir',score_det=0.5,score_rec=0.5,score_rec_seq=0.5,overlap=0.2, use_lexicon=True, weighted_ed=True, use_seq=False, use_char=False, mix=False):
+    '''
+    results_dir: result directory
+    score_det: score of detection bounding box
+    score_rec: score of the mask recognition branch
+    socre_rec_seq: score of the sequence recognition branch
+    overlap: overlap threshold used for nms
+    lexicon_type: 1 for generic; 2 for weak; 3 for strong
+    use_seq: use the recognition result of sequence branch
+    use_mix: use both the recognition result of the mask and sequence branches, selected by score
+    '''
+    print('score_det:', 'score_det:', score_det, 'score_rec:', score_rec, 'score_rec_seq:', score_rec_seq, 'overlap:', overlap,'lexicon_type:', lexicon_type, 'weighted_ed:', weighted_ed, 'use_seq:', use_seq, 'use_char:', use_char, 'mix:', mix)
+    if not os.path.exists(cache_dir):
+        os.mkdir(cache_dir)
+    nms_dir = os.path.join(cache_dir,str(score_det)+'_'+str(score_rec)+'_'+str(score_rec_seq))
+    if not os.path.exists(nms_dir):
+        os.mkdir(nms_dir)
+    if use_lexicon and lexicon_type==2:
+        # weak lexicon
+        lexicon_path = '../../lexicons/totaltext/weak_voc_new.txt'
+        lexicon_fid=open(lexicon_path, 'r')
+        pair_list = open('../../lexicons/totaltext/weak_voc_pair_list.txt', 'r')
+        pairs = dict()
+        for line in pair_list.readlines():
+            line=line.strip()
+            word = line.split(' ')[0].upper()
+            word_gt = line[len(word)+1:]
+            pairs[word] = word_gt
+        lexicon_fid=open(lexicon_path, 'r')
+        lexicon=[]
+        for line in lexicon_fid.readlines():
+            line=line.strip()
+            lexicon.append(line)
+    for res_file in glob.glob("*.txt"):
+        result_path = os.path.join(results_dir,res_file)
+        if os.path.isfile(result_path):
+            with open(result_path,'r') as f:
+                dt_lines = [a.strip() for a in f.readlines()]
+            dt_lines = [list_from_str(dt) for dt in dt_lines]
+        else:
+            dt_lines = []
+        dt_lines = [dt for dt in dt_lines if dt[-2]>score_rec_seq and dt[-3]>score_rec and dt[-6]>score_det]
+        nms_flag = nms(dt_lines,overlap)
+        boxes = []
+        for k in range(len(dt_lines)):
+            dt = dt_lines[k]
+            if nms_flag[k]:
+                if dt not in boxes:
+                    boxes.append(dt)
+        with open(os.path.join(nms_dir,'gt_'+res_file.split('.')[0].split('_')[1]+'.txt'),'w') as f:
+            for g in boxes:
+                gt_coors = [int(b) for b in g[0:-6]]
+                with open('../../../' + g[-1], "rb") as input_file:
+                    dict_scores = pickle.load(input_file)
+                if use_char and use_seq:
+                    if g[-2]>g[-3]:
+                        word = g[-5]
+                        scores = dict_scores['seq_char_scores'][:,1:-1].swapaxes(0,1)
+                    else:
+                        word = g[-4]
+                        scores = dict_scores['seg_char_scores']
+                elif use_seq:
+                    word = g[-5]
+                    scores = dict_scores['seq_char_scores'][:,1:-1].swapaxes(0,1)
+                else:
+                    word = g[-4]
+                    scores = dict_scores['seg_char_scores']
+                if not use_lexicon:
+                    match_word = word
+                    match_dist = 0.
+                else:
+                    match_word, match_dist = find_match_word(word, pairs, scores, use_lexicon, weighted_ed, lexicon)
+                if match_dist<1.5 or lexicon_type==1:
+                    gt_coor_strs = [str(a) for a in gt_coors]+ [match_word]
+                    f.write(','.join(gt_coor_strs)+'\r\n')
+    pack_name = str(score_det)+'_'+str(score_rec)+'_over'+str(overlap)
+    packing(nms_dir,cache_dir,pack_name)
+    submit_file_path = os.path.join(cache_dir, pack_name+'.zip')
+    return submit_file_path
+def find_match_word(rec_str, pairs, scores_numpy, use_ed=True, weighted_ed=False, lexicon=None):
+    if not use_ed:
+        return rec_str
+    rec_str = rec_str.upper()
+    dist_min = 100
+    dist_min_pre = 100
+    match_word = ''
+    match_dist = 100
+    if not weighted_ed:
+        for word in lexicon:
+            word = word.upper()
+            ed = editdistance.eval(rec_str, word)
+            length_dist = abs(len(word) - len(rec_str))
+            # dist = ed + length_dist
+            dist = ed
+            if dist<dist_min:
+                dist_min = dist
+                match_word = pairs[word]
+                match_dist = dist
+        return match_word, match_dist
+    else:
+        small_lexicon_dict = dict()
+        for word in lexicon:
+            word = word.upper()
+            ed = editdistance.eval(rec_str, word)
+            small_lexicon_dict[word] = ed
+            dist = ed
+            if dist<dist_min_pre:
+                dist_min_pre = dist
+        small_lexicon = []
+        for word in small_lexicon_dict:
+            if small_lexicon_dict[word]<=dist_min_pre+2:
+                small_lexicon.append(word)
+        for word in small_lexicon:
+            word = word.upper()
+            ed = weighted_edit_distance(rec_str, word, scores_numpy)
+            dist = ed
+            if dist<dist_min:
+                dist_min = dist
+                match_word = pairs[word]
+                match_dist = dist
+        return match_word, match_dist
+def prepare_results_for_evaluation(results_dir, use_lexicon, cache_dir, score_det, score_rec, score_rec_seq):
+    if not os.path.isdir(cache_dir):
+        os.mkdir(cache_dir)
+    result_path = test_single(results_dir,score_det=score_det,score_rec=score_rec,score_rec_seq=score_rec_seq,overlap=0.2,cache_dir=cache_dir,lexicon_type=2, use_lexicon=use_lexicon, weighted_ed=True, use_seq=True, use_char=True, mix=True)
+    return result_path

evaluation/totaltext/e2e/rrc_evaluation_funcs.py ADDED Viewed

	@@ -0,0 +1,369 @@

+#!/usr/bin/env python2
+#encoding: UTF-8
+import json
+import sys;sys.path.append('./')
+import zipfile
+import re
+import sys
+import os
+import codecs
+import importlib
+try:
+    from StringIO import StringIO
+except ImportError:
+    from io import StringIO
+def print_help():
+    sys.stdout.write('Usage: python %s.py -g=<gtFile> -s=<submFile> [-o=<outputFolder> -p=<jsonParams>]' %sys.argv[0])
+    sys.exit(2)
+def load_zip_file_keys(file,fileNameRegExp=''):
+    """
+    Returns an array with the entries of the ZIP file that match with the regular expression.
+    The key's are the names or the file or the capturing group definied in the fileNameRegExp
+    """
+    try:
+        archive=zipfile.ZipFile(file, mode='r', allowZip64=True)
+    except :
+        raise Exception('Error loading the ZIP archive.')
+    pairs = []
+    for name in archive.namelist():
+        addFile = True
+        keyName = name
+        if fileNameRegExp!="":
+            m = re.match(fileNameRegExp,name)
+            if m == None:
+                addFile = False
+            else:
+                if len(m.groups())>0:
+                    keyName = m.group(1)
+        if addFile:
+            pairs.append( keyName )
+    return pairs
+def load_zip_file(file,fileNameRegExp='',allEntries=False):
+    """
+    Returns an array with the contents (filtered by fileNameRegExp) of a ZIP file.
+    The key's are the names or the file or the capturing group definied in the fileNameRegExp
+    allEntries validates that all entries in the ZIP file pass the fileNameRegExp
+    """
+    try:
+        archive=zipfile.ZipFile(file, mode='r', allowZip64=True)
+    except :
+        raise Exception('Error loading the ZIP archive')
+    pairs = []
+    for name in archive.namelist():
+        addFile = True
+        keyName = name
+        if fileNameRegExp!="":
+            m = re.match(fileNameRegExp,name)
+            if m == None:
+                addFile = False
+            else:
+                if len(m.groups())>0:
+                    keyName = m.group(1)
+        if addFile:
+            pairs.append( [ keyName , archive.read(name)] )
+        else:
+            if allEntries:
+                raise Exception('ZIP entry not valid: %s' %name)
+    return dict(pairs)
+def decode_utf8(raw):
+    """
+    Returns a Unicode object on success, or None on failure
+    """
+    try:
+        raw = codecs.decode(raw,'utf-8', 'replace')
+        #extracts BOM if exists
+        raw = raw.encode('utf8')
+        if raw.startswith(codecs.BOM_UTF8):
+            raw = raw.replace(codecs.BOM_UTF8, '', 1)
+        return raw.decode('utf-8')
+    except:
+       return None
+def validate_lines_in_file(fileName,file_contents,CRLF=True,LTRB=True,withTranscription=False,withConfidence=False,imWidth=0,imHeight=0):
+    """
+    This function validates that all lines of the file calling the Line validation function for each line
+    """
+    utf8File = decode_utf8(file_contents)
+    if (utf8File is None) :
+        raise Exception("The file %s is not UTF-8" %fileName)
+    lines = utf8File.split( "\r\n" if CRLF else "\n" )
+    for line in lines:
+        line = line.replace("\r","").replace("\n","")
+        if(line != ""):
+            try:
+                validate_tl_line(line,LTRB,withTranscription,withConfidence,imWidth,imHeight)
+            except Exception as e:
+                raise Exception(("Line in sample not valid. Sample: %s Line: %s Error: %s" %(fileName,line,str(e))).encode('utf-8', 'replace'))
+def validate_tl_line(line,LTRB=True,withTranscription=True,withConfidence=True,imWidth=0,imHeight=0):
+    """
+    Validate the format of the line. If the line is not valid an exception will be raised.
+    If maxWidth and maxHeight are specified, all points must be inside the imgage bounds.
+    Posible values are:
+    LTRB=True: xmin,ymin,xmax,ymax[,confidence][,transcription]
+    LTRB=False: x1,y1,x2,y2,x3,y3,x4,y4[,confidence][,transcription]
+    """
+    get_tl_line_values(line,LTRB,withTranscription,withConfidence,imWidth,imHeight)
+def get_tl_line_values(line,LTRB=True,withTranscription=False,withConfidence=False,imWidth=0,imHeight=0):
+    """
+    Validate the format of the line. If the line is not valid an exception will be raised.
+    If maxWidth and maxHeight are specified, all points must be inside the imgage bounds.
+    Posible values are:
+    LTRB=True: xmin,ymin,xmax,ymax[,confidence][,transcription]
+    LTRB=False: x1,y1,x2,y2,x3,y3,x4,y4[,confidence][,transcription]
+    Returns values from a textline. Points , [Confidences], [Transcriptions]
+    """
+    confidence = 0.0
+    transcription = "";
+    points = []
+    numPoints = 4;
+    if LTRB:
+        numPoints = 4;
+        if withTranscription and withConfidence:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-1].?[0-9]*)\s*,(.*)$',line)
+            if m == None :
+                m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-1].?[0-9]*)\s*,(.*)$',line)
+                raise Exception("Format incorrect. Should be: xmin,ymin,xmax,ymax,confidence,transcription")
+        elif withConfidence:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-1].?[0-9]*)\s*$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: xmin,ymin,xmax,ymax,confidence")
+        elif withTranscription:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,(.*)$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: xmin,ymin,xmax,ymax,transcription")
+        else:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,?\s*$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: xmin,ymin,xmax,ymax")
+        xmin = int(m.group(1))
+        ymin = int(m.group(2))
+        xmax = int(m.group(3))
+        ymax = int(m.group(4))
+        if(xmax<xmin):
+                raise Exception("Xmax value (%s) not valid (Xmax < Xmin)." %(xmax))
+        if(ymax<ymin):
+                raise Exception("Ymax value (%s)  not valid (Ymax < Ymin)." %(ymax))
+        points = [ float(m.group(i)) for i in range(1, (numPoints+1) ) ]
+        if (imWidth>0 and imHeight>0):
+            validate_point_inside_bounds(xmin,ymin,imWidth,imHeight);
+            validate_point_inside_bounds(xmax,ymax,imWidth,imHeight);
+    else:
+        numPoints = 8;
+        if withTranscription and withConfidence:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-1].?[0-9]*)\s*,(.*)$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4,confidence,transcription")
+        elif withConfidence:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-1].?[0-9]*)\s*$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4,confidence")
+        elif withTranscription:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,(.*)$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4,transcription")
+        else:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4")
+        points = [ float(m.group(i)) for i in range(1, (numPoints+1) ) ]
+        validate_clockwise_points(points)
+        if (imWidth>0 and imHeight>0):
+            validate_point_inside_bounds(points[0],points[1],imWidth,imHeight);
+            validate_point_inside_bounds(points[2],points[3],imWidth,imHeight);
+            validate_point_inside_bounds(points[4],points[5],imWidth,imHeight);
+            validate_point_inside_bounds(points[6],points[7],imWidth,imHeight);
+    if withConfidence:
+        try:
+            confidence = float(m.group(numPoints+1))
+        except ValueError:
+            raise Exception("Confidence value must be a float")
+    if withTranscription:
+        posTranscription = numPoints + (2 if withConfidence else 1)
+        transcription = m.group(posTranscription)
+        m2 = re.match(r'^\s*\"(.*)\"\s*$',transcription)
+        if m2 != None : #Transcription with double quotes, we extract the value and replace escaped characters
+            transcription = m2.group(1).replace("\\\\", "\\").replace("\\\"", "\"")
+    return points,confidence,transcription
+def validate_point_inside_bounds(x,y,imWidth,imHeight):
+    if(x<0 or x>imWidth):
+            raise Exception("X value (%s) not valid. Image dimensions: (%s,%s)" %(xmin,imWidth,imHeight))
+    if(y<0 or y>imHeight):
+            raise Exception("Y value (%s)  not valid. Image dimensions: (%s,%s) Sample: %s Line:%s" %(ymin,imWidth,imHeight))
+def validate_clockwise_points(points):
+    """
+    Validates that the points that the 4 points that dlimite a polygon are in clockwise order.
+    """
+    if len(points) != 8:
+        raise Exception("Points list not valid." + str(len(points)))
+    point = [
+                [int(points[0]) , int(points[1])],
+                [int(points[2]) , int(points[3])],
+                [int(points[4]) , int(points[5])],
+                [int(points[6]) , int(points[7])]
+            ]
+    edge = [
+                ( point[1][0] - point[0][0])*( point[1][1] + point[0][1]),
+                ( point[2][0] - point[1][0])*( point[2][1] + point[1][1]),
+                ( point[3][0] - point[2][0])*( point[3][1] + point[2][1]),
+                ( point[0][0] - point[3][0])*( point[0][1] + point[3][1])
+    ]
+    summatory = edge[0] + edge[1] + edge[2] + edge[3];
+    if summatory>0:
+        raise Exception("Points are not clockwise. The coordinates of bounding quadrilaterals have to be given in clockwise order. Regarding the correct interpretation of 'clockwise' remember that the image coordinate system used is the standard one, with the image origin at the upper left, the X axis extending to the right and Y axis extending downwards.")
+def get_tl_line_values_from_file_contents(content,CRLF=True,LTRB=True,withTranscription=False,withConfidence=False,imWidth=0,imHeight=0,sort_by_confidences=True):
+    """
+    Returns all points, confindences and transcriptions of a file in lists. Valid line formats:
+    xmin,ymin,xmax,ymax,[confidence],[transcription]
+    x1,y1,x2,y2,x3,y3,x4,y4,[confidence],[transcription]
+    """
+    pointsList = []
+    transcriptionsList = []
+    confidencesList = []
+    lines = content.split( "\r\n" if CRLF else "\n" )
+    for line in lines:
+        line = line.replace("\r","").replace("\n","")
+        if(line != "") :
+            points, confidence, transcription = get_tl_line_values(line,LTRB,withTranscription,withConfidence,imWidth,imHeight);
+            pointsList.append(points)
+            transcriptionsList.append(transcription)
+            confidencesList.append(confidence)
+    if withConfidence and len(confidencesList)>0 and sort_by_confidences:
+        import numpy as np
+        sorted_ind = np.argsort(-np.array(confidencesList))
+        confidencesList = [confidencesList[i] for i in sorted_ind]
+        pointsList = [pointsList[i] for i in sorted_ind]
+        transcriptionsList = [transcriptionsList[i] for i in sorted_ind]
+    return pointsList,confidencesList,transcriptionsList
+def main_evaluation(p,default_evaluation_params_fn,validate_data_fn,evaluate_method_fn,show_result=True,per_sample=True):
+    """
+    This process validates a method, evaluates it and if it succed generates a ZIP file with a JSON entry for each sample.
+    Params:
+    p: Dictionary of parmeters with the GT/submission locations. If None is passed, the parameters send by the system are used.
+    default_evaluation_params_fn: points to a function that returns a dictionary with the default parameters used for the evaluation
+    validate_data_fn: points to a method that validates the corrct format of the submission
+    evaluate_method_fn: points to a function that evaluated the submission and return a Dictionary with the results
+    """
+    if (p == None):
+        p = dict([s[1:].split('=') for s in sys.argv[1:]])
+        if(len(sys.argv)<3):
+            print_help()
+    evalParams = default_evaluation_params_fn()
+    if 'p' in p.keys():
+        evalParams.update( p['p'] if isinstance(p['p'], dict) else json.loads(p['p'][1:-1]) )
+    resDict={'calculated':True,'Message':'','method':'{}','per_sample':'{}'}
+    try:
+        validate_data_fn(p['g'], p['s'], evalParams)
+        evalData = evaluate_method_fn(p['g'], p['s'], evalParams)
+        resDict.update(evalData)
+    except Exception as e:
+        resDict['Message']= str(e)
+        resDict['calculated']=False
+    if 'o' in p:
+        if not os.path.exists(p['o']):
+            os.makedirs(p['o'])
+        resultsOutputname = p['o'] + '/results.zip'
+        outZip = zipfile.ZipFile(resultsOutputname, mode='w', allowZip64=True)
+        del resDict['per_sample']
+        if 'output_items' in resDict.keys():
+            del resDict['output_items']
+        outZip.writestr('method.json',json.dumps(resDict))
+    if not resDict['calculated']:
+        if show_result:
+            sys.stderr.write('Error!\n'+ resDict['Message']+'\n\n')
+        if 'o' in p:
+            outZip.close()
+        return resDict
+    if 'o' in p:
+        if per_sample == True:
+            for k,v in evalData['per_sample'].items():
+                outZip.writestr( k + '.json',json.dumps(v))
+            if 'output_items' in evalData.keys():
+                for k, v in evalData['output_items'].items():
+                    outZip.writestr( k,v)
+        outZip.close()
+    if show_result:
+        sys.stdout.write("Calculated!")
+        sys.stdout.write(json.dumps(resDict['method']))
+    return resDict
+def main_validation(default_evaluation_params_fn,validate_data_fn):
+    """
+    This process validates a method
+    Params:
+    default_evaluation_params_fn: points to a function that returns a dictionary with the default parameters used for the evaluation
+    validate_data_fn: points to a method that validates the corrct format of the submission
+    """
+    try:
+        p = dict([s[1:].split('=') for s in sys.argv[1:]])
+        evalParams = default_evaluation_params_fn()
+        if 'p' in p.keys():
+            evalParams.update( p['p'] if isinstance(p['p'], dict) else json.loads(p['p'][1:-1]) )
+        validate_data_fn(p['g'], p['s'], evalParams)
+        print('SUCCESS')
+        sys.exit(0)
+    except Exception as e:
+        print(str(e))
+        sys.exit(101)

evaluation/totaltext/e2e/rrc_evaluation_funcs_total_text.py ADDED Viewed

	@@ -0,0 +1,363 @@

+#!/usr/bin/env python2
+#encoding: UTF-8
+import json
+import sys;sys.path.append('./')
+import zipfile
+import re
+import sys
+import os
+import codecs
+import importlib
+from io import StringIO
+def print_help():
+    sys.stdout.write('Usage: python %s.py -g=<gtFile> -s=<submFile> -o=<outputFolder> [-i=<gtImagesFile> -p=<jsonParams>]' %sys.argv[0])
+    sys.exit(2)
+def load_zip_file_keys(file,fileNameRegExp=''):
+    """
+    Returns an array with the entries of the ZIP file that match with the regular expression.
+    The key's are the names or the file or the capturing group definied in the fileNameRegExp
+    """
+    try:
+        archive=zipfile.ZipFile(file, mode='r', allowZip64=True)
+    except :
+        raise Exception('Error loading the ZIP archive.')
+    pairs = []
+    for name in archive.namelist():
+        addFile = True
+        keyName = name
+        # if fileNameRegExp!="":
+        #     m = re.match(fileNameRegExp,name)
+        #     if m == None:
+        #         addFile = False
+        #     else:
+        #         if len(m.groups())>0:
+        #             keyName = m.group(1)
+        if addFile:
+            pairs.append( keyName )
+    return pairs
+def load_zip_file(file,fileNameRegExp='',allEntries=False):
+    """
+    Returns an array with the contents (filtered by fileNameRegExp) of a ZIP file.
+    The key's are the names or the file or the capturing group definied in the fileNameRegExp
+    allEntries validates that all entries in the ZIP file pass the fileNameRegExp
+    """
+    try:
+        archive=zipfile.ZipFile(file, mode='r', allowZip64=True)
+    except :
+        raise Exception('Error loading the ZIP archive')
+    pairs = []
+    for name in archive.namelist():
+        addFile = True
+        keyName = name
+        # if fileNameRegExp!="":
+        #     m = re.match(fileNameRegExp,name)
+        #     if m == None:
+        #         addFile = False
+        #     else:
+        #         if len(m.groups())>0:
+        #             keyName = m.group(1)
+        if addFile:
+            pairs.append( [ keyName , archive.read(name)] )
+        else:
+            if allEntries:
+                raise Exception('ZIP entry not valid: %s' %name)
+    return dict(pairs)
+def decode_utf8(raw):
+    """
+    Returns a Unicode object on success, or None on failure
+    """
+    try:
+        raw = codecs.decode(raw,'utf-8', 'replace')
+        #extracts BOM if exists
+        raw = raw.encode('utf8')
+        if raw.startswith(codecs.BOM_UTF8):
+            raw = raw.replace(codecs.BOM_UTF8, '', 1)
+        return raw.decode('utf-8')
+    except:
+       return None
+def validate_lines_in_file(fileName,file_contents,CRLF=True,LTRB=True,withTranscription=False,withConfidence=False,imWidth=0,imHeight=0):
+    """
+    This function validates that all lines of the file calling the Line validation function for each line
+    """
+    utf8File = decode_utf8(file_contents)
+    if (utf8File is None) :
+        raise Exception("The file %s is not UTF-8" %fileName)
+    lines = utf8File.split( "\r\n" if CRLF else "\n" )
+    for line in lines:
+        line = line.replace("\r","").replace("\n","")
+        if(line != ""):
+            try:
+                validate_tl_line(line,LTRB,withTranscription,withConfidence,imWidth,imHeight)
+            except Exception as e:
+                raise Exception(("Line in sample not valid. Sample: %s Line: %s Error: %s" %(fileName,line,str(e))).encode('utf-8', 'replace'))
+def validate_tl_line(line,LTRB=True,withTranscription=True,withConfidence=True,imWidth=0,imHeight=0):
+    """
+    Validate the format of the line. If the line is not valid an exception will be raised.
+    If maxWidth and maxHeight are specified, all points must be inside the imgage bounds.
+    Posible values are:
+    LTRB=True: xmin,ymin,xmax,ymax[,confidence][,transcription]
+    LTRB=False: x1,y1,x2,y2,x3,y3,x4,y4[,confidence][,transcription]
+    """
+    get_tl_line_values(line,LTRB,withTranscription,withConfidence,imWidth,imHeight)
+def get_tl_line_values(line,LTRB=True,withTranscription=False,withConfidence=False,imWidth=0,imHeight=0):
+    """
+    Validate the format of the line. If the line is not valid an exception will be raised.
+    If maxWidth and maxHeight are specified, all points must be inside the imgage bounds.
+    Posible values are:
+    LTRB=True: xmin,ymin,xmax,ymax[,confidence][,transcription]
+    LTRB=False: x1,y1,x2,y2,x3,y3,x4,y4[,confidence][,transcription]
+    Returns values from a textline. Points , [Confidences], [Transcriptions]
+    """
+    confidence = 0.0
+    transcription = "";
+    points = []
+    numPoints = 4;
+    if LTRB:
+        numPoints = 4;
+        if withTranscription and withConfidence:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-1].?[0-9]*)\s*,(.*)$',line)
+            if m == None :
+                m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-1].?[0-9]*)\s*,(.*)$',line)
+                raise Exception("Format incorrect. Should be: xmin,ymin,xmax,ymax,confidence,transcription")
+        elif withConfidence:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-1].?[0-9]*)\s*$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: xmin,ymin,xmax,ymax,confidence")
+        elif withTranscription:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,(.*)$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: xmin,ymin,xmax,ymax,transcription")
+        else:
+            m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*,?\s*$',line)
+            if m == None :
+                raise Exception("Format incorrect. Should be: xmin,ymin,xmax,ymax")
+        xmin = int(m.group(1))
+        ymin = int(m.group(2))
+        xmax = int(m.group(3))
+        ymax = int(m.group(4))
+        if(xmax<xmin):
+                raise Exception("Xmax value (%s) not valid (Xmax < Xmin)." %(xmax))
+        if(ymax<ymin):
+                raise Exception("Ymax value (%s)  not valid (Ymax < Ymin)." %(ymax))
+        points = [ float(m.group(i)) for i in range(1, (numPoints+1) ) ]
+        if (imWidth>0 and imHeight>0):
+            validate_point_inside_bounds(xmin,ymin,imWidth,imHeight);
+            validate_point_inside_bounds(xmax,ymax,imWidth,imHeight);
+    else:
+        line_split = line.split(',')
+        # print(line_split)
+        numPoints = int((len(line_split) - 1) / 2)
+        points = [ float(line_split[i]) for i in range(2 * numPoints) ]
+        # print(points)
+        transcription = line_split[-1]
+        # numPoints = 8;
+        # if withTranscription and withConfidence:
+        #     m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-1].?[0-9]*)\s*,(.*)$',line)
+        #     if m == None :
+        #         raise Exception("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4,confidence,transcription")
+        # elif withConfidence:
+        #     m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*([0-1].?[0-9]*)\s*$',line)
+        #     if m == None :
+        #         raise Exception("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4,confidence")
+        # elif withTranscription:
+        #     m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,(.*)$',line)
+        #     if m == None :
+        #         raise Exception("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4,transcription")
+        # else:
+        #     m = re.match(r'^\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*,\s*(-?[0-9]+)\s*$',line)
+        #     if m == None :
+        #         raise Exception("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4")
+        # points = [ float(m.group(i)) for i in range(1, (numPoints+1) ) ]
+        # validate_clockwise_points(points)
+        # if (imWidth>0 and imHeight>0):
+        #     validate_point_inside_bounds(points[0],points[1],imWidth,imHeight);
+        #     validate_point_inside_bounds(points[2],points[3],imWidth,imHeight);
+        #     validate_point_inside_bounds(points[4],points[5],imWidth,imHeight);
+        #     validate_point_inside_bounds(points[6],points[7],imWidth,imHeight);
+    # if withConfidence:
+    #     try:
+    #         confidence = float(m.group(numPoints+1))
+    #     except ValueError:
+    #         raise Exception("Confidence value must be a float")
+    # if withTranscription:
+    #     posTranscription = numPoints + (2 if withConfidence else 1)
+    #     transcription = m.group(posTranscription)
+    #     m2 = re.match(r'^\s*\"(.*)\"\s*$',transcription)
+    #     if m2 != None : #Transcription with double quotes, we extract the value and replace escaped characters
+    #         transcription = m2.group(1).replace("\\\\", "\\").replace("\\\"", "\"")
+    return points,confidence,transcription
+def validate_point_inside_bounds(x,y,imWidth,imHeight):
+    if(x<0 or x>imWidth):
+            raise Exception("X value (%s) not valid. Image dimensions: (%s,%s)" %(xmin,imWidth,imHeight))
+    if(y<0 or y>imHeight):
+            raise Exception("Y value (%s)  not valid. Image dimensions: (%s,%s) Sample: %s Line:%s" %(ymin,imWidth,imHeight))
+def validate_clockwise_points(points):
+    """
+    Validates that the points that the 4 points that dlimite a polygon are in clockwise order.
+    """
+    if len(points) != 8:
+        raise Exception("Points list not valid." + str(len(points)))
+    point = [
+                [int(points[0]) , int(points[1])],
+                [int(points[2]) , int(points[3])],
+                [int(points[4]) , int(points[5])],
+                [int(points[6]) , int(points[7])]
+            ]
+    edge = [
+                ( point[1][0] - point[0][0])*( point[1][1] + point[0][1]),
+                ( point[2][0] - point[1][0])*( point[2][1] + point[1][1]),
+                ( point[3][0] - point[2][0])*( point[3][1] + point[2][1]),
+                ( point[0][0] - point[3][0])*( point[0][1] + point[3][1])
+    ]
+    summatory = edge[0] + edge[1] + edge[2] + edge[3];
+    if summatory>0:
+        raise Exception("Points are not clockwise. The coordinates of bounding quadrilaterals have to be given in clockwise order. Regarding the correct interpretation of 'clockwise' remember that the image coordinate system used is the standard one, with the image origin at the upper left, the X axis extending to the right and Y axis extending downwards.")
+def get_tl_line_values_from_file_contents(content,CRLF=True,LTRB=True,withTranscription=False,withConfidence=False,imWidth=0,imHeight=0,sort_by_confidences=True):
+    """
+    Returns all points, confindences and transcriptions of a file in lists. Valid line formats:
+    xmin,ymin,xmax,ymax,[confidence],[transcription]
+    x1,y1,x2,y2,x3,y3,x4,y4,[confidence],[transcription]
+    """
+    pointsList = []
+    transcriptionsList = []
+    confidencesList = []
+    lines = content.split( "\r\n" if CRLF else "\n" )
+    for line in lines:
+        line = line.replace("\r","").replace("\n","")
+        if(line != "") :
+            points, confidence, transcription = get_tl_line_values(line,LTRB,withTranscription,withConfidence,imWidth,imHeight);
+            pointsList.append(points)
+            transcriptionsList.append(transcription)
+            confidencesList.append(confidence)
+    if withConfidence and len(confidencesList)>0 and sort_by_confidences:
+        confidencesList, pointsList,transcriptionsList = (list(t) for t in zip(*sorted(zip(confidencesList, pointsList, transcriptionsList), reverse=True)))
+    return pointsList,confidencesList,transcriptionsList
+def main_evaluation(p,default_evaluation_params_fn,validate_data_fn,evaluate_method_fn,show_result=True,per_sample=True):
+    """
+    This process validates a method, evaluates it and if it succed generates a ZIP file with a JSON entry for each sample.
+    Params:
+    p: Dictionary of parmeters with the GT/submission locations. If None is passed, the parameters send by the system are used.
+    default_evaluation_params_fn: points to a function that returns a dictionary with the default parameters used for the evaluation
+    validate_data_fn: points to a method that validates the corrct format of the submission
+    evaluate_method_fn: points to a function that evaluated the submission and return a Dictionary with the results
+    """
+    if (p == None):
+        p = dict([s[1:].split('=') for s in sys.argv[1:]])
+        if(len(sys.argv)<2):
+            print_help()
+    evalParams = default_evaluation_params_fn()
+    if 'p' in list(p.keys()):
+        evalParams.update( p['p'] if isinstance(p['p'], dict) else json.loads(p['p'][1:-1]) )
+    resDict={'calculated':True,'Message':'','method':'{}','per_sample':'{}'}
+    try:
+        validate_data_fn(p['g'], p['s'], evalParams)
+        evalData = evaluate_method_fn(p['g'], p['s'], evalParams)
+        resDict.update(evalData)
+    except Exception as e:
+        resDict['Message']= str(e)
+        resDict['calculated']=False
+    if not os.path.exists(p['o']):
+        os.makedirs(p['o'])
+    resultsOutputname = p['o'] + '/results.zip'
+    outZip = zipfile.ZipFile(resultsOutputname, mode='w', allowZip64=True)
+    del resDict['per_sample']
+    if 'output_items' in list(resDict.keys()):
+        del resDict['output_items']
+    outZip.writestr('method.json',json.dumps(resDict))
+    if not resDict['calculated']:
+        if show_result:
+            sys.stderr.write('Error!\n'+ resDict['Message']+'\n\n')
+        outZip.close()
+        return resDict
+    if per_sample == True:
+        for k,v in evalData['per_sample'].items():
+            outZip.writestr( k + '.json',json.dumps(v))
+        if 'output_items' in list(evalData.keys()):
+            for k, v in evalData['output_items'].items():
+                outZip.writestr( k,v)
+    outZip.close()
+    if show_result:
+        sys.stdout.write("Calculated!")
+        sys.stdout.write(json.dumps(resDict['method']))
+    return resDict
+def main_validation(default_evaluation_params_fn,validate_data_fn):
+    """
+    This process validates a method
+    Params:
+    default_evaluation_params_fn: points to a function that returns a dictionary with the default parameters used for the evaluation
+    validate_data_fn: points to a method that validates the corrct format of the submission
+    """
+    try:
+        p = dict([s[1:].split('=') for s in sys.argv[1:]])
+        evalParams = default_evaluation_params_fn()
+        if 'p' in list(p.keys()):
+            evalParams.update( p['p'] if isinstance(p['p'], dict) else json.loads(p['p'][1:-1]) )
+        validate_data_fn(p['g'], p['s'], evalParams)
+        print('SUCCESS')
+        sys.exit(0)
+    except Exception as e:
+        print(str(e))
+        sys.exit(101)

evaluation/totaltext/e2e/script.py ADDED Viewed

	@@ -0,0 +1,452 @@

+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# encoding=utf8
+from collections import namedtuple
+import rrc_evaluation_funcs_total_text as rrc_evaluation_funcs
+import importlib
+from prepare_results import prepare_results_for_evaluation
+def evaluation_imports():
+    """
+    evaluation_imports: Dictionary ( key = module name , value = alias  )  with python modules used in the evaluation.
+    """
+    return {
+            'Polygon':'plg',
+            'numpy':'np'
+            }
+def default_evaluation_params():
+    """
+    default_evaluation_params: Default parameters to use for the validation and evaluation.
+    """
+    return {
+            'IOU_CONSTRAINT' :0.5,
+            'AREA_PRECISION_CONSTRAINT' :0.5,
+            'WORD_SPOTTING' :False,
+            'MIN_LENGTH_CARE_WORD' :3,
+            'GT_SAMPLE_NAME_2_ID':'gt_img_([0-9]+).txt',
+            'DET_SAMPLE_NAME_2_ID':'res_img_([0-9]+).txt',
+            'LTRB':False, #LTRB:2points(left,top,right,bottom) or 4 points(x1,y1,x2,y2,x3,y3,x4,y4)
+            'CRLF':False, # Lines are delimited by Windows CRLF format
+            'CONFIDENCES':False, #Detections must include confidence value. MAP and MAR will be calculated,
+            'SPECIAL_CHARACTERS':'!?.:,*"()·[]/\'',
+            'ONLY_REMOVE_FIRST_LAST_CHARACTER' : True
+        }
+def validate_data(gtFilePath, submFilePath, evaluationParams):
+    """
+    Method validate_data: validates that all files in the results folder are correct (have the correct name contents).
+                            Validates also that there are no missing files in the folder.
+                            If some error detected, the method raises the error
+    """
+    gt = rrc_evaluation_funcs.load_zip_file(gtFilePath, evaluationParams['GT_SAMPLE_NAME_2_ID'])
+    subm = rrc_evaluation_funcs.load_zip_file(submFilePath, evaluationParams['DET_SAMPLE_NAME_2_ID'], True)
+    #Validate format of GroundTruth
+    for k in gt:
+        rrc_evaluation_funcs.validate_lines_in_file(k,gt[k],evaluationParams['CRLF'],evaluationParams['LTRB'],True)
+    #Validate format of results
+    for k in subm:
+        if (k in gt) == False :
+            raise Exception("The sample %s not present in GT" %k)
+        rrc_evaluation_funcs.validate_lines_in_file(k,subm[k],evaluationParams['CRLF'],evaluationParams['LTRB'],True,evaluationParams['CONFIDENCES'])
+def evaluate_method(gtFilePath, submFilePath, evaluationParams):
+    """
+    Method evaluate_method: evaluate method and returns the results
+        Results. Dictionary with the following values:
+        - method (required)  Global method metrics. Ex: { 'Precision':0.8,'Recall':0.9 }
+        - samples (optional) Per sample metrics. Ex: {'sample1' : { 'Precision':0.8,'Recall':0.9 } , 'sample2' : { 'Precision':0.8,'Recall':0.9 }
+    """
+    for module,alias in evaluation_imports().items():
+        globals()[alias] = importlib.import_module(module)
+    def polygon_from_points(points,correctOffset=False):
+        """
+        Returns a Polygon object to use with the Polygon2 class from a list of 8 points: x1,y1,x2,y2,x3,y3,x4,y4
+        """
+        resBoxes=np.empty([1,len(points)],dtype='int32')
+        for i in range(int(len(points) / 2)):
+            resBoxes[0, i] = int(points[2*i])
+            resBoxes[0, int(len(points) / 2) + i] = int(points[2*i+1])
+        pointMat = resBoxes[0].reshape([2,-1]).T
+        return plg.Polygon( pointMat)
+    def rectangle_to_polygon(rect):
+        resBoxes=np.empty([1,8],dtype='int32')
+        resBoxes[0,0]=int(rect.xmin)
+        resBoxes[0,4]=int(rect.ymax)
+        resBoxes[0,1]=int(rect.xmin)
+        resBoxes[0,5]=int(rect.ymin)
+        resBoxes[0,2]=int(rect.xmax)
+        resBoxes[0,6]=int(rect.ymin)
+        resBoxes[0,3]=int(rect.xmax)
+        resBoxes[0,7]=int(rect.ymax)
+        pointMat = resBoxes[0].reshape([2,4]).T
+        return plg.Polygon( pointMat)
+    def rectangle_to_points(rect):
+        points = [int(rect.xmin), int(rect.ymax), int(rect.xmax), int(rect.ymax), int(rect.xmax), int(rect.ymin), int(rect.xmin), int(rect.ymin)]
+        return points
+    def get_union(pD,pG):
+        areaA = pD.area();
+        areaB = pG.area();
+        return areaA + areaB - get_intersection(pD, pG);
+    def get_intersection_over_union(pD,pG):
+        try:
+            return get_intersection(pD, pG) / get_union(pD, pG);
+        except:
+            return 0
+    def get_intersection(pD,pG):
+        pInt = pD & pG
+        if len(pInt) == 0:
+            return 0
+        return pInt.area()
+    def compute_ap(confList, matchList,numGtCare):
+        correct = 0
+        AP = 0
+        if len(confList)>0:
+            confList = np.array(confList)
+            matchList = np.array(matchList)
+            sorted_ind = np.argsort(-confList)
+            confList = confList[sorted_ind]
+            matchList = matchList[sorted_ind]
+            for n in range(len(confList)):
+                match = matchList[n]
+                if match:
+                    correct += 1
+                    AP += float(correct)/(n + 1)
+            if numGtCare>0:
+                AP /= numGtCare
+        return AP
+    def transcription_match(transGt,transDet,specialCharacters='!?.:,*"()·[]/\'',onlyRemoveFirstLastCharacterGT=True):
+        if onlyRemoveFirstLastCharacterGT:
+            #special characters in GT are allowed only at initial or final position
+            if (transGt==transDet):
+                return True
+            if specialCharacters.find(transGt[0])>-1:
+                if transGt[1:]==transDet:
+                    return True
+            if specialCharacters.find(transGt[-1])>-1:
+                if transGt[0:len(transGt)-1]==transDet:
+                    return True
+            if specialCharacters.find(transGt[0])>-1 and specialCharacters.find(transGt[-1])>-1:
+                if transGt[1:len(transGt)-1]==transDet:
+                    return True
+            return False
+        else:
+            #Special characters are removed from the begining and the end of both Detection and GroundTruth
+            while len(transGt)>0 and specialCharacters.find(transGt[0])>-1:
+                transGt = transGt[1:]
+            while len(transDet)>0 and specialCharacters.find(transDet[0])>-1:
+                transDet = transDet[1:]
+            while len(transGt)>0 and specialCharacters.find(transGt[-1])>-1 :
+                transGt = transGt[0:len(transGt)-1]
+            while len(transDet)>0 and specialCharacters.find(transDet[-1])>-1:
+                transDet = transDet[0:len(transDet)-1]
+            return transGt == transDet
+    def include_in_dictionary(transcription):
+        """
+        Function used in Word Spotting that finds if the Ground Truth transcription meets the rules to enter into the dictionary. If not, the transcription will be cared as don't care
+        """
+        #special case 's at final
+        if transcription[len(transcription)-2:]=="'s" or transcription[len(transcription)-2:]=="'S":
+            transcription = transcription[0:len(transcription)-2]
+        #hypens at init or final of the word
+        transcription = transcription.strip('-');
+        specialCharacters = "'!?.:,*\"()·[]/";
+        for character in specialCharacters:
+            transcription = transcription.replace(character,' ')
+        transcription = transcription.strip()
+        if len(transcription) != len(transcription.replace(" ","")) :
+            return False;
+        if len(transcription) < evaluationParams['MIN_LENGTH_CARE_WORD']:
+            return False;
+        notAllowed = "×÷·";
+        range1 = [ ord(u'a'), ord(u'z') ]
+        range2 = [ ord(u'A'), ord(u'Z') ]
+        range3 = [ ord(u'À'), ord(u'ƿ') ]
+        range4 = [ ord(u'Ǆ'), ord(u'ɿ') ]
+        range5 = [ ord(u'Ά'), ord(u'Ͽ') ]
+        range6 = [ ord(u'-'), ord(u'-') ]
+        for char in transcription :
+            charCode = ord(char)
+            if(notAllowed.find(char) != -1):
+                return False
+            valid = ( charCode>=range1[0] and charCode<=range1[1] ) or ( charCode>=range2[0] and charCode<=range2[1] ) or ( charCode>=range3[0] and charCode<=range3[1] ) or ( charCode>=range4[0] and charCode<=range4[1] ) or ( charCode>=range5[0] and charCode<=range5[1] ) or ( charCode>=range6[0] and charCode<=range6[1] )
+            if valid == False:
+                return False
+        return True
+    def include_in_dictionary_transcription(transcription):
+        """
+        Function applied to the Ground Truth transcriptions used in Word Spotting. It removes special characters or terminations
+        """
+        #special case 's at final
+        if transcription[len(transcription)-2:]=="'s" or transcription[len(transcription)-2:]=="'S":
+            transcription = transcription[0:len(transcription)-2]
+        #hypens at init or final of the word
+        transcription = transcription.strip('-');
+        specialCharacters = "'!?.:,*\"()·[]/";
+        for character in specialCharacters:
+            transcription = transcription.replace(character,' ')
+        transcription = transcription.strip()
+        return transcription
+    perSampleMetrics = {}
+    matchedSum = 0
+    Rectangle = namedtuple('Rectangle', 'xmin ymin xmax ymax')
+    gt = rrc_evaluation_funcs.load_zip_file(gtFilePath,evaluationParams['GT_SAMPLE_NAME_2_ID'])
+    subm = rrc_evaluation_funcs.load_zip_file(submFilePath,evaluationParams['DET_SAMPLE_NAME_2_ID'],True)
+    numGlobalCareGt = 0;
+    numGlobalCareDet = 0;
+    arrGlobalConfidences = [];
+    arrGlobalMatches = [];
+    for resFile in gt:
+        gtFile = rrc_evaluation_funcs.decode_utf8(gt[resFile])
+        if (gtFile is None) :
+            raise Exception("The file %s is not UTF-8" %resFile)
+        recall = 0
+        precision = 0
+        hmean = 0
+        detCorrect = 0
+        iouMat = np.empty([1,1])
+        gtPols = []
+        detPols = []
+        gtTrans = []
+        detTrans = []
+        gtPolPoints = []
+        detPolPoints = []
+        gtDontCarePolsNum = [] #Array of Ground Truth Polygons' keys marked as don't Care
+        detDontCarePolsNum = [] #Array of Detected Polygons' matched with a don't Care GT
+        detMatchedNums = []
+        pairs = []
+        arrSampleConfidences = [];
+        arrSampleMatch = [];
+        sampleAP = 0;
+        evaluationLog = ""
+        pointsList,_,transcriptionsList = rrc_evaluation_funcs.get_tl_line_values_from_file_contents(gtFile,evaluationParams['CRLF'],evaluationParams['LTRB'],True,False)
+        for n in range(len(pointsList)):
+            points = pointsList[n]
+            transcription = transcriptionsList[n]
+            dontCare = transcription == "###"
+            if evaluationParams['LTRB']:
+                gtRect = Rectangle(*points)
+                gtPol = rectangle_to_polygon(gtRect)
+            else:
+                gtPol = polygon_from_points(points)
+            gtPols.append(gtPol)
+            gtPolPoints.append(points)
+            #On word spotting we will filter some transcriptions with special characters
+            if evaluationParams['WORD_SPOTTING'] :
+                if dontCare == False :
+                    if include_in_dictionary(transcription) == False :
+                        dontCare = True
+                    else:
+                        transcription = include_in_dictionary_transcription(transcription)
+            gtTrans.append(transcription)
+            if dontCare:
+                gtDontCarePolsNum.append( len(gtPols)-1 )
+        evaluationLog += "GT polygons: " + str(len(gtPols)) + (" (" + str(len(gtDontCarePolsNum)) + " don't care)\n" if len(gtDontCarePolsNum)>0 else "\n")
+        if resFile in subm:
+            detFile = rrc_evaluation_funcs.decode_utf8(subm[resFile])
+            pointsList,confidencesList,transcriptionsList = rrc_evaluation_funcs.get_tl_line_values_from_file_contents(detFile,evaluationParams['CRLF'],evaluationParams['LTRB'],True,evaluationParams['CONFIDENCES'])
+            for n in range(len(pointsList)):
+                points = pointsList[n]
+                transcription = transcriptionsList[n]
+                if evaluationParams['LTRB']:
+                    detRect = Rectangle(*points)
+                    detPol = rectangle_to_polygon(detRect)
+                else:
+                    detPol = polygon_from_points(points)
+                detPols.append(detPol)
+                detPolPoints.append(points)
+                detTrans.append(transcription)
+                if len(gtDontCarePolsNum)>0 :
+                    for dontCarePol in gtDontCarePolsNum:
+                        dontCarePol = gtPols[dontCarePol]
+                        intersected_area = get_intersection(dontCarePol,detPol)
+                        pdDimensions = detPol.area()
+                        precision = 0 if pdDimensions == 0 else intersected_area / pdDimensions
+                        if (precision > evaluationParams['AREA_PRECISION_CONSTRAINT'] ):
+                            detDontCarePolsNum.append( len(detPols)-1 )
+                            break
+            evaluationLog += "DET polygons: " + str(len(detPols)) + (" (" + str(len(detDontCarePolsNum)) + " don't care)\n" if len(detDontCarePolsNum)>0 else "\n")
+            if len(gtPols)>0 and len(detPols)>0:
+                #Calculate IoU and precision matrixs
+                outputShape=[len(gtPols),len(detPols)]
+                iouMat = np.empty(outputShape)
+                gtRectMat = np.zeros(len(gtPols),np.int8)
+                detRectMat = np.zeros(len(detPols),np.int8)
+                for gtNum in range(len(gtPols)):
+                    for detNum in range(len(detPols)):
+                        pG = gtPols[gtNum]
+                        pD = detPols[detNum]
+                        iouMat[gtNum,detNum] = get_intersection_over_union(pD,pG)
+                for gtNum in range(len(gtPols)):
+                    for detNum in range(len(detPols)):
+                        if gtRectMat[gtNum] == 0 and detRectMat[detNum] == 0 and gtNum not in gtDontCarePolsNum and detNum not in detDontCarePolsNum :
+                            if iouMat[gtNum,detNum]>evaluationParams['IOU_CONSTRAINT']:
+                                gtRectMat[gtNum] = 1
+                                detRectMat[detNum] = 1
+                                #detection matched only if transcription is equal
+                                if evaluationParams['WORD_SPOTTING']:
+                                    correct = gtTrans[gtNum].upper() == detTrans[detNum].upper()
+                                else:
+                                    correct = transcription_match(gtTrans[gtNum].upper(),detTrans[detNum].upper(),evaluationParams['SPECIAL_CHARACTERS'],evaluationParams['ONLY_REMOVE_FIRST_LAST_CHARACTER'])==True
+                                detCorrect += (1 if correct else 0)
+                                if correct:
+                                    detMatchedNums.append(detNum)
+                                pairs.append({'gt':gtNum,'det':detNum,'correct':correct})
+                                evaluationLog += "Match GT #" + str(gtNum) + " with Det #" + str(detNum) + " trans. correct: " + str(correct) + "\n"
+            if evaluationParams['CONFIDENCES']:
+                for detNum in range(len(detPols)):
+                    if detNum not in detDontCarePolsNum :
+                        #we exclude the don't care detections
+                        match = detNum in detMatchedNums
+                        arrSampleConfidences.append(confidencesList[detNum])
+                        arrSampleMatch.append(match)
+                        arrGlobalConfidences.append(confidencesList[detNum]);
+                        arrGlobalMatches.append(match);
+        numGtCare = (len(gtPols) - len(gtDontCarePolsNum))
+        numDetCare = (len(detPols) - len(detDontCarePolsNum))
+        if numGtCare == 0:
+            recall = float(1)
+            precision = float(0) if numDetCare >0 else float(1)
+            sampleAP = precision
+        else:
+            recall = float(detCorrect) / numGtCare
+            precision = 0 if numDetCare==0 else float(detCorrect) / numDetCare
+            if evaluationParams['CONFIDENCES']:
+                sampleAP = compute_ap(arrSampleConfidences, arrSampleMatch, numGtCare )
+        hmean = 0 if (precision + recall)==0 else 2.0 * precision * recall / (precision + recall)
+        matchedSum += detCorrect
+        numGlobalCareGt += numGtCare
+        numGlobalCareDet += numDetCare
+        perSampleMetrics[resFile] = {
+                                        'precision':precision,
+                                        'recall':recall,
+                                        'hmean':hmean,
+                                        'pairs':pairs,
+                                        'AP':sampleAP,
+                                        'iouMat':[] if len(detPols)>100 else iouMat.tolist(),
+                                        'gtPolPoints':gtPolPoints,
+                                        'detPolPoints':detPolPoints,
+                                        'gtTrans':gtTrans,
+                                        'detTrans':detTrans,
+                                        'gtDontCare':gtDontCarePolsNum,
+                                        'detDontCare':detDontCarePolsNum,
+                                        'evaluationParams': evaluationParams,
+                                        'evaluationLog': evaluationLog
+                                    }
+    # Compute AP
+    AP = 0
+    if evaluationParams['CONFIDENCES']:
+        AP = compute_ap(arrGlobalConfidences, arrGlobalMatches, numGlobalCareGt)
+    methodRecall = 0 if numGlobalCareGt == 0 else float(matchedSum)/numGlobalCareGt
+    methodPrecision = 0 if numGlobalCareDet == 0 else float(matchedSum)/numGlobalCareDet
+    methodHmean = 0 if methodRecall + methodPrecision==0 else 2* methodRecall * methodPrecision / (methodRecall + methodPrecision)
+    methodMetrics = {'precision':methodPrecision, 'recall':methodRecall,'hmean': methodHmean, 'AP': AP  }
+    resDict = {'calculated':True,'Message':'','method': methodMetrics,'per_sample': perSampleMetrics}
+    return resDict;
+if __name__=='__main__':
+    '''
+    results_dir: result directory
+    score_det: score of detection bounding box
+    score_rec: score of the mask recognition branch
+    score_rec_seq: score of the sequence recognition branch
+    lexicon_type: 1 for generic; 2 for weak; 3 for strong
+    '''
+    results_dir = '../../../output/mixtrain/inference/total_text_test/model_0250000_1000_results/'
+    score_det = 0.05
+    score_rec = 0.5
+    use_lexicon = False
+    score_rec_seq = 0.9
+    # use_lexicon = True
+    # score_rec_seq = 0.8
+    evaluate_result_path = prepare_results_for_evaluation(results_dir,
+        use_lexicon=use_lexicon, cache_dir='./cache_files',
+        score_det=score_det, score_rec=score_rec, score_rec_seq=score_rec_seq)
+    p = {
+        'g': "../gt.zip",
+        'o': "./cache_files",
+        's': evaluate_result_path
+    }
+    rrc_evaluation_funcs.main_evaluation(p,default_evaluation_params,validate_data,evaluate_method)

evaluation/totaltext/gt.zip ADDED Viewed

Binary file (106 kB). View file

evaluation/weighted_editdistance.py ADDED Viewed

	@@ -0,0 +1,55 @@

+def weighted_edit_distance(word1, word2, scores):
+    m = len(word1)
+    n = len(word2)
+    dp = [[0 for __ in range(m + 1)] for __ in range(n + 1)]
+    for j in range(m + 1):
+        dp[0][j] = j
+    for i in range(n + 1):
+        dp[i][0] = i
+    for i in range(1, n + 1):  ## word2
+        for j in range(1, m + 1): ## word1
+            delect_cost = ed_delect_cost(j-1, i-1, word1, word2, scores)  ## delect a[i]
+            insert_cost = ed_insert_cost(j-1, i-1, word1, word2, scores)  ## insert b[j]
+            if word1[j - 1] != word2[i - 1]:
+                replace_cost = ed_replace_cost(j-1, i-1, word1, word2, scores) ## replace a[i] with b[j]
+            else:
+                replace_cost = 0
+            dp[i][j] = min(dp[i-1][j] + insert_cost, dp[i][j-1] + delect_cost, dp[i-1][j-1] + replace_cost)
+    return dp[n][m]
+def ed_delect_cost(j, i, word1, word2, scores):
+    ## delect a[i]
+    c = char2num(word1[j])
+    return scores[c][j]
+def ed_insert_cost(i, j, word1, word2, scores):
+    ## insert b[j]
+    if i < len(word1) - 1:
+        c1 = char2num(word1[i])
+        c2 = char2num(word1[i + 1])
+        return (scores[c1][i] + scores[c2][i+1])/2
+    else:
+        c1 = char2num(word1[i])
+        return scores[c1][i]
+def ed_replace_cost(i, j, word1, word2, scores):
+    ## replace a[i] with b[j]
+    c1 = char2num(word1[i])
+    c2 = char2num(word2[j])
+    # if word1 == "eeatpisaababarait".upper():
+    #     print(scores[c2][i]/scores[c1][i])
+    return max(1 - scores[c2][i]/scores[c1][i]*5, 0)
+def char2num(char):
+    if char in '0123456789':
+        num = ord(char) - ord('0') + 1
+    elif char in 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ':
+        num = ord(char.lower()) - ord('a') + 11
+    else:
+        print('error symbol', char)
+        exit()
+    return num - 1

example1.jpg ADDED Viewed

example2.jpg ADDED Viewed

example3.jpg ADDED Viewed

maskrcnn_benchmark/config/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2	+ from .defaults import _C as cfg

maskrcnn_benchmark/config/defaults.py ADDED Viewed

	@@ -0,0 +1,373 @@

+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+import os
+from yacs.config import CfgNode as CN
+# -----------------------------------------------------------------------------
+# Convention about Training / Test specific parameters
+# -----------------------------------------------------------------------------
+# Whenever an argument can be either used for training or for testing, the
+# corresponding name will be post-fixed by a _TRAIN for a training parameter,
+# or _TEST for a test-specific parameter.
+# For example, the number of images during training will be
+# IMAGES_PER_BATCH_TRAIN, while the number of images for testing will be
+# IMAGES_PER_BATCH_TEST
+# -----------------------------------------------------------------------------
+# Config definition
+# -----------------------------------------------------------------------------
+_C = CN()
+_C.MODEL = CN()
+_C.MODEL.RPN_ONLY = False
+_C.MODEL.MASK_ON = False
+_C.MODEL.SEG_ON = False
+_C.MODEL.CHAR_MASK_ON = False
+_C.MODEL.DEVICE = "cuda"
+_C.MODEL.META_ARCHITECTURE = "GeneralizedRCNN"
+_C.MODEL.TRAIN_DETECTION_ONLY = False
+_C.MODEL.RESNET34 = False
+# If the WEIGHT starts with a catalog://, like :R-50, the code will look for
+# the path in paths_catalog. Else, it will use it as the specified absolute
+# path
+_C.MODEL.WEIGHT = ""
+_C.SEQUENCE = CN()
+_C.SEQUENCE.SEQ_ON = False
+_C.SEQUENCE.NUM_CHAR = 38
+_C.SEQUENCE.BOS_TOKEN = 0
+_C.SEQUENCE.MAX_LENGTH = 32
+_C.SEQUENCE.TEACHER_FORCE_RATIO = 1.0
+_C.SEQUENCE.TWO_CONV = False
+_C.SEQUENCE.MEAN_SCORE = False
+_C.SEQUENCE.RESIZE_HEIGHT = 16
+_C.SEQUENCE.RESIZE_WIDTH = 64
+# -----------------------------------------------------------------------------
+# INPUT
+# -----------------------------------------------------------------------------
+_C.INPUT = CN()
+# Size of the smallest side of the image during training
+_C.INPUT.MIN_SIZE_TRAIN = (800,)  # (800,)
+# Maximum size of the side of the image during training
+_C.INPUT.MAX_SIZE_TRAIN = 1333
+# Size of the smallest side of the image during testing
+_C.INPUT.MIN_SIZE_TEST = 800
+# Maximum size of the side of the image during testing
+_C.INPUT.MAX_SIZE_TEST = 1333
+# Values to be used for image normalization
+_C.INPUT.PIXEL_MEAN = [102.9801, 115.9465, 122.7717]
+# Values to be used for image normalization
+_C.INPUT.PIXEL_STD = [1.0, 1.0, 1.0]
+# Convert image to BGR format (for Caffe2 models), in range 0-255
+_C.INPUT.TO_BGR255 = True
+_C.INPUT.STRICT_RESIZE = False
+# -----------------------------------------------------------------------------
+# Dataset
+# -----------------------------------------------------------------------------
+_C.DATASETS = CN()
+# List of the dataset names for training, as present in paths_catalog.py
+_C.DATASETS.TRAIN = ()
+# List of the dataset names for testing, as present in paths_catalog.py
+_C.DATASETS.TEST = ()
+_C.DATASETS.RATIOS = []
+_C.DATASETS.AUG = False
+_C.DATASETS.RANDOM_CROP_PROB = 0.0
+_C.DATASETS.IGNORE_DIFFICULT = False
+_C.DATASETS.FIX_CROP = False
+_C.DATASETS.CROP_SIZE = (512, 512)
+_C.DATASETS.MAX_ROTATE_THETA = 30
+_C.DATASETS.FIX_ROTATE = False
+# -----------------------------------------------------------------------------
+# DataLoader
+# -----------------------------------------------------------------------------
+_C.DATALOADER = CN()
+# Number of data loading threads
+_C.DATALOADER.NUM_WORKERS = 4
+# If > 0, this enforces that each collated batch should have a size divisible
+# by SIZE_DIVISIBILITY
+_C.DATALOADER.SIZE_DIVISIBILITY = 0
+# If True, each batch should contain only images for which the aspect ratio
+# is compatible. This groups portrait images together, and landscape images
+# are not batched with portrait images.
+_C.DATALOADER.ASPECT_RATIO_GROUPING = True
+# ---------------------------------------------------------------------------- #
+# Backbone options
+# ---------------------------------------------------------------------------- #
+_C.MODEL.BACKBONE = CN()
+# The backbone conv body to use
+# The string must match a function that is imported in modeling.model_builder
+# (e.g., 'FPN.add_fpn_ResNet101_conv5_body' to specify a ResNet-101-FPN
+# backbone)
+_C.MODEL.BACKBONE.CONV_BODY = "R-50-C4"
+# Add StopGrad at a specified stage so the bottom layers are frozen
+_C.MODEL.BACKBONE.FREEZE_CONV_BODY_AT = 2
+_C.MODEL.BACKBONE.OUT_CHANNELS = 256 * 4
+# ---------------------------------------------------------------------------- #
+# ResNe[X]t options (ResNets = {ResNet, ResNeXt}
+# Note that parts of a resnet may be used for both the backbone and the head
+# These options apply to both
+# ---------------------------------------------------------------------------- #
+_C.MODEL.RESNETS = CN()
+# Number of groups to use; 1 ==> ResNet; > 1 ==> ResNeXt
+_C.MODEL.RESNETS.NUM_GROUPS = 1
+# Baseline width of each group
+_C.MODEL.RESNETS.WIDTH_PER_GROUP = 64
+# Place the stride 2 conv on the 1x1 filter
+# Use True only for the original MSRA ResNet; use False for C2 and Torch models
+_C.MODEL.RESNETS.STRIDE_IN_1X1 = True
+# Residual transformation function
+_C.MODEL.RESNETS.TRANS_FUNC = "BottleneckWithFixedBatchNorm"
+# ResNet's stem function (conv1 and pool1)
+_C.MODEL.RESNETS.STEM_FUNC = "StemWithFixedBatchNorm"
+# Apply dilation in stage "res5"
+_C.MODEL.RESNETS.RES5_DILATION = 1
+_C.MODEL.RESNETS.BACKBONE_OUT_CHANNELS = 256 * 4
+_C.MODEL.RESNETS.RES2_OUT_CHANNELS = 256
+_C.MODEL.RESNETS.STEM_OUT_CHANNELS = 64
+_C.MODEL.RESNETS.STAGE_WITH_DCN = (False, False, False, False)
+_C.MODEL.RESNETS.WITH_MODULATED_DCN = False
+_C.MODEL.RESNETS.DEFORMABLE_GROUPS = 1
+_C.MODEL.RESNETS.LAYERS = (3, 4, 6, 3)
+# ---------------------------------------------------------------------------- #
+# FPN options
+# ---------------------------------------------------------------------------- #
+_C.MODEL.FPN = CN()
+_C.MODEL.FPN.USE_GN = False
+_C.MODEL.FPN.USE_RELU = False
+# ---------------------------------------------------------------------------- #
+# RPN options
+# ---------------------------------------------------------------------------- #
+_C.MODEL.RPN = CN()
+_C.MODEL.RPN.USE_FPN = False
+# Base RPN anchor sizes given in absolute pixels w.r.t. the scaled network input
+_C.MODEL.RPN.ANCHOR_SIZES = (32, 64, 128, 256, 512)
+# Stride of the feature map that RPN is attached.
+# For FPN, number of strides should match number of scales
+_C.MODEL.RPN.ANCHOR_STRIDE = (16,)
+# RPN anchor aspect ratios
+_C.MODEL.RPN.ASPECT_RATIOS = (0.5, 1.0, 2.0)
+# Remove RPN anchors that go outside the image by RPN_STRADDLE_THRESH pixels
+# Set to -1 or a large value, e.g. 100000, to disable pruning anchors
+_C.MODEL.RPN.STRADDLE_THRESH = 0
+# Minimum overlap required between an anchor and ground-truth box for the
+# (anchor, gt box) pair to be a positive example (IoU >= FG_IOU_THRESHOLD
+# ==> positive RPN example)
+_C.MODEL.RPN.FG_IOU_THRESHOLD = 0.7
+# Maximum overlap allowed between an anchor and ground-truth box for the
+# (anchor, gt box) pair to be a negative examples (IoU < BG_IOU_THRESHOLD
+# ==> negative RPN example)
+_C.MODEL.RPN.BG_IOU_THRESHOLD = 0.3
+# Total number of RPN examples per image
+_C.MODEL.RPN.BATCH_SIZE_PER_IMAGE = 256
+# Target fraction of foreground (positive) examples per RPN minibatch
+_C.MODEL.RPN.POSITIVE_FRACTION = 0.5
+# Number of top scoring RPN proposals to keep before applying NMS
+# When FPN is used, this is *per FPN level* (not total)
+_C.MODEL.RPN.PRE_NMS_TOP_N_TRAIN = 12000
+_C.MODEL.RPN.PRE_NMS_TOP_N_TEST = 6000
+# Number of top scoring RPN proposals to keep after applying NMS
+_C.MODEL.RPN.POST_NMS_TOP_N_TRAIN = 2000
+_C.MODEL.RPN.POST_NMS_TOP_N_TEST = 1000
+# NMS threshold used on RPN proposals
+_C.MODEL.RPN.NMS_THRESH = 0.7
+# Proposal height and width both need to be greater than RPN_MIN_SIZE
+# (a the scale used during training or inference)
+_C.MODEL.RPN.MIN_SIZE = 0
+# Number of top scoring RPN proposals to keep after combining proposals from
+# all FPN levels
+_C.MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN = 2000
+_C.MODEL.RPN.FPN_POST_NMS_TOP_N_TEST = 2000
+_C.MODEL.SEG = CN()
+_C.MODEL.SEG.USE_FPN = False
+_C.MODEL.SEG.USE_FUSE_FEATURE = False
+# Total number of SEG examples per image
+_C.MODEL.SEG.BATCH_SIZE_PER_IMAGE = 256
+# Target fraction of foreground (positive) examples per SEG minibatch
+_C.MODEL.SEG.POSITIVE_FRACTION = 0.5
+# NMS threshold used on SEG proposals
+_C.MODEL.SEG.BINARY_THRESH = 0.5
+_C.MODEL.SEG.USE_MULTIPLE_THRESH = False
+_C.MODEL.SEG.MULTIPLE_THRESH = (0.2, 0.3, 0.5, 0.7)
+_C.MODEL.SEG.BOX_THRESH = 0.7
+# Proposal height and width both need to be greater than RPN_MIN_SIZE
+# (a the scale used during training or inference)
+_C.MODEL.SEG.MIN_SIZE = 0
+_C.MODEL.SEG.SHRINK_RATIO = 0.5
+# Number of top scoring RPN proposals to keep after combining proposals from
+# all FPN levels
+_C.MODEL.SEG.TOP_N_TRAIN = 1000
+_C.MODEL.SEG.TOP_N_TEST = 1000
+_C.MODEL.SEG.AUG_PROPOSALS = False
+_C.MODEL.SEG.IGNORE_DIFFICULT = True
+_C.MODEL.SEG.EXPAND_RATIO = 1.6
+_C.MODEL.SEG.BOX_EXPAND_RATIO = 1.5
+_C.MODEL.SEG.USE_SEG_POLY = False
+_C.MODEL.SEG.USE_PPM = False
+# ---------------------------------------------------------------------------- #
+# ROI HEADS options
+# ---------------------------------------------------------------------------- #
+_C.MODEL.ROI_HEADS = CN()
+_C.MODEL.ROI_HEADS.USE_FPN = False
+# Overlap threshold for an RoI to be considered foreground (if >= FG_IOU_THRESHOLD)
+_C.MODEL.ROI_HEADS.FG_IOU_THRESHOLD = 0.5
+# Overlap threshold for an RoI to be considered background
+# (class = 0 if overlap in [0, BG_IOU_THRESHOLD))
+_C.MODEL.ROI_HEADS.BG_IOU_THRESHOLD = 0.5
+# Default weights on (dx, dy, dw, dh) for normalizing bbox regression targets
+# These are empirically chosen to approximately lead to unit variance targets
+_C.MODEL.ROI_HEADS.BBOX_REG_WEIGHTS = (10.0, 10.0, 5.0, 5.0)
+# RoI minibatch size *per image* (number of regions of interest [ROIs])
+# Total number of RoIs per training minibatch =
+#   TRAIN.BATCH_SIZE_PER_IM * TRAIN.IMS_PER_BATCH * NUM_GPUS
+# E.g., a common configuration is: 512 * 2 * 8 = 8192
+_C.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 512
+# Target fraction of RoI minibatch that is labeled foreground (i.e. class > 0)
+_C.MODEL.ROI_HEADS.POSITIVE_FRACTION = 0.25
+# Only used on test mode
+# Minimum score threshold (assuming scores in a [0, 1] range); a value chosen to
+# balance obtaining high recall with not having too many low precision
+# detections that will slow down inference post processing steps (like NMS)
+# _C.MODEL.ROI_HEADS.SCORE_THRESH = 0.05
+_C.MODEL.ROI_HEADS.SCORE_THRESH = 0.0
+# Overlap threshold used for non-maximum suppression (suppress boxes with
+# IoU >= this threshold)
+_C.MODEL.ROI_HEADS.NMS = 0.5
+# Maximum number of detections to return per image (100 is based on the limit
+# established for the COCO dataset)
+_C.MODEL.ROI_HEADS.DETECTIONS_PER_IMG = 100
+_C.MODEL.ROI_BOX_HEAD = CN()
+_C.MODEL.ROI_BOX_HEAD.FEATURE_EXTRACTOR = "ResNet50Conv5ROIFeatureExtractor"
+_C.MODEL.ROI_BOX_HEAD.PREDICTOR = "FastRCNNPredictor"
+_C.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION = 14
+_C.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO = 0
+_C.MODEL.ROI_BOX_HEAD.POOLER_SCALES = (1.0 / 16,)
+_C.MODEL.ROI_BOX_HEAD.NUM_CLASSES = 81
+# Hidden layer dimension when using an MLP for the RoI box head
+_C.MODEL.ROI_BOX_HEAD.MLP_HEAD_DIM = 1024
+_C.MODEL.ROI_BOX_HEAD.USE_REGRESSION = True
+_C.MODEL.ROI_BOX_HEAD.INFERENCE_USE_BOX = True
+_C.MODEL.ROI_BOX_HEAD.USE_MASKED_FEATURE = False
+_C.MODEL.ROI_BOX_HEAD.SOFT_MASKED_FEATURE_RATIO = 0.
+_C.MODEL.ROI_BOX_HEAD.MIX_OPTION = ""
+_C.MODEL.ROI_MASK_HEAD = CN()
+_C.MODEL.ROI_MASK_HEAD.FEATURE_EXTRACTOR = "ResNet50Conv5ROIFeatureExtractor"
+_C.MODEL.ROI_MASK_HEAD.PREDICTOR = "MaskRCNNC4Predictor"
+_C.MODEL.ROI_MASK_HEAD.POOLER_RESOLUTION = 14
+_C.MODEL.ROI_MASK_HEAD.POOLER_RESOLUTION_H = 32
+_C.MODEL.ROI_MASK_HEAD.POOLER_RESOLUTION_W = 128
+_C.MODEL.ROI_MASK_HEAD.POOLER_SAMPLING_RATIO = 0
+_C.MODEL.ROI_MASK_HEAD.POOLER_SCALES = (1.0 / 16,)
+_C.MODEL.ROI_MASK_HEAD.MLP_HEAD_DIM = 1024
+_C.MODEL.ROI_MASK_HEAD.CONV_LAYERS = (256, 256, 256, 256)
+_C.MODEL.ROI_MASK_HEAD.RESOLUTION = 14
+_C.MODEL.ROI_MASK_HEAD.RESOLUTION_H = 32
+_C.MODEL.ROI_MASK_HEAD.RESOLUTION_W = 128
+_C.MODEL.ROI_MASK_HEAD.SHARE_BOX_FEATURE_EXTRACTOR = True
+_C.MODEL.ROI_MASK_HEAD.CHAR_NUM_CLASSES = 38
+_C.MODEL.ROI_MASK_HEAD.USE_WEIGHTED_CHAR_MASK = False
+_C.MODEL.ROI_MASK_HEAD.MASK_BATCH_SIZE_PER_IM = 64
+_C.MODEL.ROI_MASK_HEAD.USE_MASKED_FEATURE = False
+_C.MODEL.ROI_MASK_HEAD.SOFT_MASKED_FEATURE_RATIO = 0.
+_C.MODEL.ROI_MASK_HEAD.MIX_OPTION = ""
+# ---------------------------------------------------------------------------- #
+# Solver
+# ---------------------------------------------------------------------------- #
+_C.SOLVER = CN()
+_C.SOLVER.MAX_ITER = 40000
+_C.SOLVER.BASE_LR = 0.001
+_C.SOLVER.BIAS_LR_FACTOR = 2
+_C.SOLVER.MOMENTUM = 0.9
+_C.SOLVER.WEIGHT_DECAY = 0.0005
+_C.SOLVER.WEIGHT_DECAY_BIAS = 0
+_C.SOLVER.GAMMA = 0.1
+_C.SOLVER.STEPS = (30000,)
+_C.SOLVER.WARMUP_FACTOR = 1.0 / 3
+_C.SOLVER.WARMUP_ITERS = 500
+_C.SOLVER.WARMUP_METHOD = "linear"
+_C.SOLVER.CHECKPOINT_PERIOD = 5000
+# Number of images per batch
+# This is global, so if we have 8 GPUs and IMS_PER_BATCH = 16, each GPU will
+# see 2 images per batch
+_C.SOLVER.IMS_PER_BATCH = 16
+_C.SOLVER.RESUME = True
+_C.SOLVER.USE_ADAM = False
+_C.SOLVER.POW_SCHEDULE = False
+_C.SOLVER.DISPLAY_FREQ = 20
+# ---------------------------------------------------------------------------- #
+# Specific test options
+# ---------------------------------------------------------------------------- #
+_C.TEST = CN()
+_C.TEST.EXPECTED_RESULTS = []
+_C.TEST.EXPECTED_RESULTS_SIGMA_TOL = 4
+# Number of images per batch
+# This is global, so if we have 8 GPUs and IMS_PER_BATCH = 16, each GPU will
+# see 2 images per batch
+_C.TEST.IMS_PER_BATCH = 8
+_C.TEST.VIS = False
+# from 0 to 255
+_C.TEST.CHAR_THRESH = 128
+# ---------------------------------------------------------------------------- #
+# Misc options
+# ---------------------------------------------------------------------------- #
+_C.OUTPUT_DIR = "."
+_C.PATHS_CATALOG = os.path.join(os.path.dirname(__file__), "paths_catalog.py")
+# ---------------------------------------------------------------------------- #
+# Precision options
+# ---------------------------------------------------------------------------- #
+# Precision of input, allowable: (float32, float16)
+_C.DTYPE = "float32"
+# Enable verbosity in apex.amp
+_C.AMP_VERBOSE = False

maskrcnn_benchmark/config/paths_catalog.py ADDED Viewed

	@@ -0,0 +1,237 @@

+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+"""Centralized catalog of paths."""
+import os
+class DatasetCatalog(object):
+    DATA_DIR = "datasets"
+    # DATA_DIR = "/share/mhliao/MaskTextSpotterV3/datasets/"
+    DATASETS = {
+        "coco_2014_train": (
+            "coco/train2014",
+            "coco/annotations/instances_train2014.json",
+        ),
+        "coco_2014_val": ("coco/val2014", "coco/annotations/instances_val2014.json"),
+        "coco_2014_minival": (
+            "coco/val2014",
+            "coco/annotations/instances_minival2014.json",
+        ),
+        "coco_2014_valminusminival": (
+            "coco/val2014",
+            "coco/annotations/instances_valminusminival2014.json",
+        ),
+        "icdar_2013_train": ("icdar2013/train_images", "icdar2013/train_gts"),
+        "icdar_2013_test": ("icdar2013/test_images", "icdar2013/test_gts"),
+        "rotated_ic13_test_0": ("icdar2013/rotated_test_images_0", "icdar2013/rotated_test_gts_0"),
+        "rotated_ic13_test_15": ("icdar2013/rotated_test_images_15", "icdar2013/rotated_test_gts_15"),
+        "rotated_ic13_test_30": ("icdar2013/rotated_test_images_30", "icdar2013/rotated_test_gts_30"),
+        "rotated_ic13_test_45": ("icdar2013/rotated_test_images_45", "icdar2013/rotated_test_gts_45"),
+        "rotated_ic13_test_60": ("icdar2013/rotated_test_images_60", "icdar2013/rotated_test_gts_60"),
+        "rotated_ic13_test_75": ("icdar2013/rotated_test_images_75", "icdar2013/rotated_test_gts_75"),
+        "rotated_ic13_test_85": ("icdar2013/rotated_test_images_85", "icdar2013/rotated_test_gts_85"),
+        "rotated_ic13_test_90": ("icdar2013/rotated_test_images_90", "icdar2013/rotated_test_gts_90"),
+        "rotated_ic13_test_-15": ("icdar2013/rotated_test_images_-15", "icdar2013/rotated_test_gts_-15"),
+        "rotated_ic13_test_-30": ("icdar2013/rotated_test_images_-30", "icdar2013/rotated_test_gts_-30"),
+        "rotated_ic13_test_-45": ("icdar2013/rotated_test_images_-45", "icdar2013/rotated_test_gts_-45"),
+        "rotated_ic13_test_-60": ("icdar2013/rotated_test_images_-60", "icdar2013/rotated_test_gts_-60"),
+        "rotated_ic13_test_-75": ("icdar2013/rotated_test_images_-75", "icdar2013/rotated_test_gts_-75"),
+        "rotated_ic13_test_-90": ("icdar2013/rotated_test_images_-90", "icdar2013/rotated_test_gts_-90"),
+        "icdar_2015_train": ("icdar2015/train_images", "icdar2015/train_gts"),
+        "icdar_2015_test": (
+            "icdar2015/test_images",
+            # "icdar2015/test_gts",
+        ),
+        "synthtext_train": ("synthtext/train_images", "synthtext/train_gts"),
+        "synthtext_test": ("synthtext/test_images", "synthtext/test_gts"),
+        "total_text_train": ("total_text/train_images", "total_text/train_gts"),
+        "td500_train": ("TD_TR/TD500/train_images", "TD500/train_gts"),
+        "td500_test": ("TD_TR/TD500/test_images", ),
+        "tr400_train": ("TD_TR/TR400/train_images", "TR400/train_gts"),
+        "total_text_test": (
+            "total_text/test_images",
+            # "total_text/test_gts",
+        ),
+        "scut-eng-char_train": (
+            "scut-eng-char/train_images",
+            "scut-eng-char/train_gts",
+        ),
+    }
+    @staticmethod
+    def get(name):
+        if "coco" in name:
+            data_dir = DatasetCatalog.DATA_DIR
+            attrs = DatasetCatalog.DATASETS[name]
+            args = dict(
+                root=os.path.join(data_dir, attrs[0]),
+                ann_file=os.path.join(data_dir, attrs[1]),
+            )
+            return dict(factory="COCODataset", args=args)
+        elif "icdar_2013" in name:
+            data_dir = DatasetCatalog.DATA_DIR
+            attrs = DatasetCatalog.DATASETS[name]
+            args = dict(
+                use_charann=True,
+                imgs_dir=os.path.join(data_dir, attrs[0]),
+                gts_dir=os.path.join(data_dir, attrs[1]),
+                # imgs_dir='/tmp/icdar2013/icdar2013/train_images',
+                # gts_dir='/tmp/icdar2013/icdar2013/train_gts',
+            )
+            return dict(args=args, factory="IcdarDataset")
+        elif "rotated_ic13" in name:
+            data_dir = DatasetCatalog.DATA_DIR
+            attrs = DatasetCatalog.DATASETS[name]
+            args = dict(
+                use_charann=True,
+                imgs_dir=os.path.join(data_dir, attrs[0]),
+                gts_dir=os.path.join(data_dir, attrs[1]),
+            )
+            return dict(args=args, factory="IcdarDataset")
+        elif "icdar_2015" in name:
+            data_dir = DatasetCatalog.DATA_DIR
+            attrs = DatasetCatalog.DATASETS[name]
+            if len(attrs) > 1:
+                gts_dir = os.path.join(data_dir, attrs[1])
+            else:
+                gts_dir = None
+            args = dict(
+                use_charann=False,
+                imgs_dir=os.path.join(data_dir, attrs[0]),
+                gts_dir=gts_dir,
+                # imgs_dir='/tmp/icdar2015/icdar2015/train_images/',
+                # gts_dir='/tmp/icdar2015/icdar2015/train_gts/',
+            )
+            return dict(args=args, factory="IcdarDataset")
+        elif "synthtext" in name:
+            data_dir = DatasetCatalog.DATA_DIR
+            attrs = DatasetCatalog.DATASETS[name]
+            args = dict(
+                use_charann=True,
+                list_file_path=os.path.join(data_dir, "synthtext/train_list.txt"),
+                imgs_dir=os.path.join(data_dir, attrs[0]),
+                gts_dir=os.path.join(data_dir, attrs[1]),
+                # imgs_dir='/tmp/synth/SynthText/',
+                # gts_dir='/tmp/synth_gt/SynthText_GT_E2E/',
+            )
+            return dict(args=args, factory="SynthtextDataset")
+        elif "total_text" in name:
+            data_dir = DatasetCatalog.DATA_DIR
+            # data_dir = '/tmp/total_text/'
+            attrs = DatasetCatalog.DATASETS[name]
+            if len(attrs) > 1:
+                gts_dir = os.path.join(data_dir, attrs[1])
+            else:
+                gts_dir = None
+            args = dict(
+                use_charann=False,
+                imgs_dir=os.path.join(data_dir, attrs[0]),
+                gts_dir=gts_dir,
+                # imgs_dir='/tmp/total_text/total_text/train_images/',
+                # gts_dir='/tmp/total_text/total_text/train_gts/',
+            )
+            return dict(args=args, factory="TotaltextDataset")
+        elif "scut-eng-char" in name:
+            data_dir = DatasetCatalog.DATA_DIR
+            attrs = DatasetCatalog.DATASETS[name]
+            args = dict(
+                use_charann=True,
+                imgs_dir=os.path.join(data_dir, attrs[0]),
+                gts_dir=os.path.join(data_dir, attrs[1]),
+                # imgs_dir='/tmp/scut-eng-char/scut-eng-char/train_images/',
+                # gts_dir='/tmp/scut-eng-char/scut-eng-char/train_gts/',
+            )
+            return dict(args=args, factory="ScutDataset")
+        elif "td500" in name:
+            data_dir = DatasetCatalog.DATA_DIR
+            attrs = DatasetCatalog.DATASETS[name]
+            if len(attrs) > 1:
+                gts_dir = os.path.join(data_dir, attrs[1])
+            else:
+                gts_dir = None
+            args = dict(
+                use_charann=False,
+                imgs_dir=os.path.join(data_dir, attrs[0]),
+                gts_dir=gts_dir,
+            )
+            return dict(args=args, factory="TotaltextDataset")
+        elif "tr400" in name:
+            data_dir = DatasetCatalog.DATA_DIR
+            attrs = DatasetCatalog.DATASETS[name]
+            if len(attrs) > 1:
+                gts_dir = os.path.join(data_dir, attrs[1])
+            else:
+                gts_dir = None
+            args = dict(
+                use_charann=False,
+                imgs_dir=os.path.join(data_dir, attrs[0]),
+                gts_dir=gts_dir,
+            )
+            return dict(args=args, factory="TotaltextDataset")
+        raise RuntimeError("Dataset not available: {}".format(name))
+class ModelCatalog(object):
+    S3_C2_DETECTRON_URL = "https://dl.fbaipublicfiles.com/detectron"
+    C2_IMAGENET_MODELS = {
+        'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
+        'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
+        "MSRA/R-50": "ImageNetPretrained/MSRA/R-50.pkl",
+        "MSRA/R-50-GN": "ImageNetPretrained/47261647/R-50-GN.pkl",
+        "MSRA/R-101": "ImageNetPretrained/MSRA/R-101.pkl",
+        "MSRA/R-101-GN": "ImageNetPretrained/47592356/R-101-GN.pkl",
+        "FAIR/20171220/X-101-32x8d": "ImageNetPretrained/20171220/X-101-32x8d.pkl",
+    }
+    C2_DETECTRON_SUFFIX = "output/train/{}coco_2014_train%3A{}coco_2014_valminusminival/generalized_rcnn/model_final.pkl"
+    C2_DETECTRON_MODELS = {
+        "35857197/e2e_faster_rcnn_R-50-C4_1x": "01_33_49.iAX0mXvW",
+        "35857345/e2e_faster_rcnn_R-50-FPN_1x": "01_36_30.cUF7QR7I",
+        "35857890/e2e_faster_rcnn_R-101-FPN_1x": "01_38_50.sNxI7sX7",
+        "36761737/e2e_faster_rcnn_X-101-32x8d-FPN_1x": "06_31_39.5MIHi1fZ",
+        "35858791/e2e_mask_rcnn_R-50-C4_1x": "01_45_57.ZgkA7hPB",
+        "35858933/e2e_mask_rcnn_R-50-FPN_1x": "01_48_14.DzEQe4wC",
+        "35861795/e2e_mask_rcnn_R-101-FPN_1x": "02_31_37.KqyEK4tT",
+        "36761843/e2e_mask_rcnn_X-101-32x8d-FPN_1x": "06_35_59.RZotkLKI",
+        "37129812/e2e_mask_rcnn_X-152-32x8d-FPN-IN5k_1.44x": "09_35_36.8pzTQKYK",
+        # keypoints
+        "37697547/e2e_keypoint_rcnn_R-50-FPN_1x": "08_42_54.kdzV35ao"
+    }
+    @staticmethod
+    def get(name):
+        if name.startswith("Caffe2Detectron/COCO"):
+            return ModelCatalog.get_c2_detectron_12_2017_baselines(name)
+        if name.startswith("ImageNetPretrained"):
+            return ModelCatalog.get_c2_imagenet_pretrained(name)
+        raise RuntimeError("model not present in the catalog {}".format(name))
+    @staticmethod
+    def get_c2_imagenet_pretrained(name):
+        prefix = ModelCatalog.S3_C2_DETECTRON_URL
+        name = name[len("ImageNetPretrained/") :]
+        name = ModelCatalog.C2_IMAGENET_MODELS[name]
+        if 'resnet34' in name or 'resnet18' in name:
+            return name
+        url = "/".join([prefix, name])
+        return url
+    @staticmethod
+    def get_c2_detectron_12_2017_baselines(name):
+        # Detectron C2 models are stored following the structure
+        # prefix/<model_id>/2012_2017_baselines/<model_name>.yaml.<signature>/suffix
+        # we use as identifiers in the catalog Caffe2Detectron/COCO/<model_id>/<model_name>
+        prefix = ModelCatalog.S3_C2_DETECTRON_URL
+        suffix = ModelCatalog.C2_DETECTRON_SUFFIX
+        # remove identification prefix
+        name = name[len("Caffe2Detectron/COCO/") :]
+        # split in <model_id> and <model_name>
+        model_id, model_name = name.split("/")
+        # parsing to make it match the url address from the Caffe2 models
+        model_name = "{}.yaml".format(model_name)
+        signature = ModelCatalog.C2_DETECTRON_MODELS[name]
+        unique_name = ".".join([model_name, signature])
+        url = "/".join([prefix, model_id, "12_2017_baselines", unique_name, suffix])
+        return url

maskrcnn_benchmark/csrc/ROIAlign.h ADDED Viewed

	@@ -0,0 +1,46 @@

+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+#pragma once
+#include "cpu/vision.h"
+#ifdef WITH_CUDA
+#include "cuda/vision.h"
+#endif
+// Interface for Python
+at::Tensor ROIAlign_forward(const at::Tensor& input,
+                            const at::Tensor& rois,
+                            const float spatial_scale,
+                            const int pooled_height,
+                            const int pooled_width,
+                            const int sampling_ratio) {
+  if (input.type().is_cuda()) {
+#ifdef WITH_CUDA
+    return ROIAlign_forward_cuda(input, rois, spatial_scale, pooled_height, pooled_width, sampling_ratio);
+#else
+    AT_ERROR("Not compiled with GPU support");
+#endif
+  }
+  return ROIAlign_forward_cpu(input, rois, spatial_scale, pooled_height, pooled_width, sampling_ratio);
+}
+at::Tensor ROIAlign_backward(const at::Tensor& grad,
+                             const at::Tensor& rois,
+                             const float spatial_scale,
+                             const int pooled_height,
+                             const int pooled_width,
+                             const int batch_size,
+                             const int channels,
+                             const int height,
+                             const int width,
+                             const int sampling_ratio) {
+  if (grad.type().is_cuda()) {
+#ifdef WITH_CUDA
+    return ROIAlign_backward_cuda(grad, rois, spatial_scale, pooled_height, pooled_width, batch_size, channels, height, width, sampling_ratio);
+#else
+    AT_ERROR("Not compiled with GPU support");
+#endif
+  }
+  AT_ERROR("Not implemented on the CPU");
+}

maskrcnn_benchmark/csrc/ROIPool.h ADDED Viewed

	@@ -0,0 +1,48 @@

+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+#pragma once
+#include "cpu/vision.h"
+#ifdef WITH_CUDA
+#include "cuda/vision.h"
+#endif
+std::tuple<at::Tensor, at::Tensor> ROIPool_forward(const at::Tensor& input,
+                                const at::Tensor& rois,
+                                const float spatial_scale,
+                                const int pooled_height,
+                                const int pooled_width) {
+  if (input.type().is_cuda()) {
+#ifdef WITH_CUDA
+    return ROIPool_forward_cuda(input, rois, spatial_scale, pooled_height, pooled_width);
+#else
+    AT_ERROR("Not compiled with GPU support");
+#endif
+  }
+  AT_ERROR("Not implemented on the CPU");
+}
+at::Tensor ROIPool_backward(const at::Tensor& grad,
+                                 const at::Tensor& input,
+                                 const at::Tensor& rois,
+                                 const at::Tensor& argmax,
+                                 const float spatial_scale,
+                                 const int pooled_height,
+                                 const int pooled_width,
+                                 const int batch_size,
+                                 const int channels,
+                                 const int height,
+                                 const int width) {
+  if (grad.type().is_cuda()) {
+#ifdef WITH_CUDA
+    return ROIPool_backward_cuda(grad, input, rois, argmax, spatial_scale, pooled_height, pooled_width, batch_size, channels, height, width);
+#else
+    AT_ERROR("Not compiled with GPU support");
+#endif
+  }
+  AT_ERROR("Not implemented on the CPU");
+}

maskrcnn_benchmark/csrc/SigmoidFocalLoss.h ADDED Viewed

	@@ -0,0 +1,41 @@

+#pragma once
+#include "cpu/vision.h"
+#ifdef WITH_CUDA
+#include "cuda/vision.h"
+#endif
+// Interface for Python
+at::Tensor SigmoidFocalLoss_forward(
+		const at::Tensor& logits,
+                const at::Tensor& targets,
+		const int num_classes,
+		const float gamma,
+		const float alpha) {
+  if (logits.type().is_cuda()) {
+#ifdef WITH_CUDA
+    return SigmoidFocalLoss_forward_cuda(logits, targets, num_classes, gamma, alpha);
+#else
+    AT_ERROR("Not compiled with GPU support");
+#endif
+  }
+  AT_ERROR("Not implemented on the CPU");
+}
+at::Tensor SigmoidFocalLoss_backward(
+			     const at::Tensor& logits,
+                             const at::Tensor& targets,
+			     const at::Tensor& d_losses,
+			     const int num_classes,
+			     const float gamma,
+			     const float alpha) {
+  if (logits.type().is_cuda()) {
+#ifdef WITH_CUDA
+    return SigmoidFocalLoss_backward_cuda(logits, targets, d_losses, num_classes, gamma, alpha);
+#else
+    AT_ERROR("Not compiled with GPU support");
+#endif
+  }
+  AT_ERROR("Not implemented on the CPU");
+}

maskrcnn_benchmark/csrc/cpu/ROIAlign_cpu.cpp ADDED Viewed

	@@ -0,0 +1,257 @@

+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+#include "cpu/vision.h"
+// implementation taken from Caffe2
+template <typename T>
+struct PreCalc {
+  int pos1;
+  int pos2;
+  int pos3;
+  int pos4;
+  T w1;
+  T w2;
+  T w3;
+  T w4;
+};
+template <typename T>
+void pre_calc_for_bilinear_interpolate(
+    const int height,
+    const int width,
+    const int pooled_height,
+    const int pooled_width,
+    const int iy_upper,
+    const int ix_upper,
+    T roi_start_h,
+    T roi_start_w,
+    T bin_size_h,
+    T bin_size_w,
+    int roi_bin_grid_h,
+    int roi_bin_grid_w,
+    std::vector<PreCalc<T>>& pre_calc) {
+  int pre_calc_index = 0;
+  for (int ph = 0; ph < pooled_height; ph++) {
+    for (int pw = 0; pw < pooled_width; pw++) {
+      for (int iy = 0; iy < iy_upper; iy++) {
+        const T yy = roi_start_h + ph * bin_size_h +
+            static_cast<T>(iy + .5f) * bin_size_h /
+                static_cast<T>(roi_bin_grid_h); // e.g., 0.5, 1.5
+        for (int ix = 0; ix < ix_upper; ix++) {
+          const T xx = roi_start_w + pw * bin_size_w +
+              static_cast<T>(ix + .5f) * bin_size_w /
+                  static_cast<T>(roi_bin_grid_w);
+          T x = xx;
+          T y = yy;
+          // deal with: inverse elements are out of feature map boundary
+          if (y < -1.0 || y > height || x < -1.0 || x > width) {
+            // empty
+            PreCalc<T> pc;
+            pc.pos1 = 0;
+            pc.pos2 = 0;
+            pc.pos3 = 0;
+            pc.pos4 = 0;
+            pc.w1 = 0;
+            pc.w2 = 0;
+            pc.w3 = 0;
+            pc.w4 = 0;
+            pre_calc[pre_calc_index] = pc;
+            pre_calc_index += 1;
+            continue;
+          }
+          if (y <= 0) {
+            y = 0;
+          }
+          if (x <= 0) {
+            x = 0;
+          }
+          int y_low = (int)y;
+          int x_low = (int)x;
+          int y_high;
+          int x_high;
+          if (y_low >= height - 1) {
+            y_high = y_low = height - 1;
+            y = (T)y_low;
+          } else {
+            y_high = y_low + 1;
+          }
+          if (x_low >= width - 1) {
+            x_high = x_low = width - 1;
+            x = (T)x_low;
+          } else {
+            x_high = x_low + 1;
+          }
+          T ly = y - y_low;
+          T lx = x - x_low;
+          T hy = 1. - ly, hx = 1. - lx;
+          T w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
+          // save weights and indices
+          PreCalc<T> pc;
+          pc.pos1 = y_low * width + x_low;
+          pc.pos2 = y_low * width + x_high;
+          pc.pos3 = y_high * width + x_low;
+          pc.pos4 = y_high * width + x_high;
+          pc.w1 = w1;
+          pc.w2 = w2;
+          pc.w3 = w3;
+          pc.w4 = w4;
+          pre_calc[pre_calc_index] = pc;
+          pre_calc_index += 1;
+        }
+      }
+    }
+  }
+}
+template <typename T>
+void ROIAlignForward_cpu_kernel(
+    const int nthreads,
+    const T* bottom_data,
+    const T& spatial_scale,
+    const int channels,
+    const int height,
+    const int width,
+    const int pooled_height,
+    const int pooled_width,
+    const int sampling_ratio,
+    const T* bottom_rois,
+    //int roi_cols,
+    T* top_data) {
+  //AT_ASSERT(roi_cols == 4 || roi_cols == 5);
+  int roi_cols = 5;
+  int n_rois = nthreads / channels / pooled_width / pooled_height;
+  // (n, c, ph, pw) is an element in the pooled output
+  // can be parallelized using omp
+  // #pragma omp parallel for num_threads(32)
+  for (int n = 0; n < n_rois; n++) {
+    int index_n = n * channels * pooled_width * pooled_height;
+    // roi could have 4 or 5 columns
+    const T* offset_bottom_rois = bottom_rois + n * roi_cols;
+    int roi_batch_ind = 0;
+    if (roi_cols == 5) {
+      roi_batch_ind = offset_bottom_rois[0];
+      offset_bottom_rois++;
+    }
+    // Do not using rounding; this implementation detail is critical
+    T roi_start_w = offset_bottom_rois[0] * spatial_scale;
+    T roi_start_h = offset_bottom_rois[1] * spatial_scale;
+    T roi_end_w = offset_bottom_rois[2] * spatial_scale;
+    T roi_end_h = offset_bottom_rois[3] * spatial_scale;
+    // T roi_start_w = round(offset_bottom_rois[0] * spatial_scale);
+    // T roi_start_h = round(offset_bottom_rois[1] * spatial_scale);
+    // T roi_end_w = round(offset_bottom_rois[2] * spatial_scale);
+    // T roi_end_h = round(offset_bottom_rois[3] * spatial_scale);
+    // Force malformed ROIs to be 1x1
+    T roi_width = std::max(roi_end_w - roi_start_w, (T)1.);
+    T roi_height = std::max(roi_end_h - roi_start_h, (T)1.);
+    T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
+    T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h = (sampling_ratio > 0)
+        ? sampling_ratio
+        : ceil(roi_height / pooled_height); // e.g., = 2
+    int roi_bin_grid_w =
+        (sampling_ratio > 0) ? sampling_ratio : ceil(roi_width / pooled_width);
+    // We do average (integral) pooling inside a bin
+    const T count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4
+    // we want to precalculate indices and weights shared by all channels,
+    // this is the key point of optimization
+    std::vector<PreCalc<T>> pre_calc(
+        roi_bin_grid_h * roi_bin_grid_w * pooled_width * pooled_height);
+    pre_calc_for_bilinear_interpolate(
+        height,
+        width,
+        pooled_height,
+        pooled_width,
+        roi_bin_grid_h,
+        roi_bin_grid_w,
+        roi_start_h,
+        roi_start_w,
+        bin_size_h,
+        bin_size_w,
+        roi_bin_grid_h,
+        roi_bin_grid_w,
+        pre_calc);
+      for (int c = 0; c < channels; c++) {
+      int index_n_c = index_n + c * pooled_width * pooled_height;
+      const T* offset_bottom_data =
+          bottom_data + (roi_batch_ind * channels + c) * height * width;
+      int pre_calc_index = 0;
+      for (int ph = 0; ph < pooled_height; ph++) {
+        for (int pw = 0; pw < pooled_width; pw++) {
+          int index = index_n_c + ph * pooled_width + pw;
+          T output_val = 0.;
+          for (int iy = 0; iy < roi_bin_grid_h; iy++) {
+            for (int ix = 0; ix < roi_bin_grid_w; ix++) {
+              PreCalc<T> pc = pre_calc[pre_calc_index];
+              output_val += pc.w1 * offset_bottom_data[pc.pos1] +
+                  pc.w2 * offset_bottom_data[pc.pos2] +
+                  pc.w3 * offset_bottom_data[pc.pos3] +
+                  pc.w4 * offset_bottom_data[pc.pos4];
+              pre_calc_index += 1;
+            }
+          }
+          output_val /= count;
+          top_data[index] = output_val;
+        } // for pw
+      } // for ph
+    } // for c
+  } // for n
+}
+at::Tensor ROIAlign_forward_cpu(const at::Tensor& input,
+                                const at::Tensor& rois,
+                                const float spatial_scale,
+                                const int pooled_height,
+                                const int pooled_width,
+                                const int sampling_ratio) {
+  AT_ASSERTM(!input.type().is_cuda(), "input must be a CPU tensor");
+  AT_ASSERTM(!rois.type().is_cuda(), "rois must be a CPU tensor");
+  auto num_rois = rois.size(0);
+  auto channels = input.size(1);
+  auto height = input.size(2);
+  auto width = input.size(3);
+  auto output = at::empty({num_rois, channels, pooled_height, pooled_width}, input.options());
+  auto output_size = num_rois * pooled_height * pooled_width * channels;
+  if (output.numel() == 0) {
+    return output;
+  }
+  AT_DISPATCH_FLOATING_TYPES(input.type(), "ROIAlign_forward", [&] {
+    ROIAlignForward_cpu_kernel<scalar_t>(
+         output_size,
+         input.data<scalar_t>(),
+         spatial_scale,
+         channels,
+         height,
+         width,
+         pooled_height,
+         pooled_width,
+         sampling_ratio,
+         rois.data<scalar_t>(),
+         output.data<scalar_t>());
+  });
+  return output;
+}

maskrcnn_benchmark/csrc/cpu/nms_cpu.cpp ADDED Viewed

	@@ -0,0 +1,75 @@

+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+#include "cpu/vision.h"
+template <typename scalar_t>
+at::Tensor nms_cpu_kernel(const at::Tensor& dets,
+                          const at::Tensor& scores,
+                          const float threshold) {
+  AT_ASSERTM(!dets.type().is_cuda(), "dets must be a CPU tensor");
+  AT_ASSERTM(!scores.type().is_cuda(), "scores must be a CPU tensor");
+  AT_ASSERTM(dets.type() == scores.type(), "dets should have the same type as scores");
+  if (dets.numel() == 0) {
+    return at::empty({0}, dets.options().dtype(at::kLong).device(at::kCPU));
+  }
+  auto x1_t = dets.select(1, 0).contiguous();
+  auto y1_t = dets.select(1, 1).contiguous();
+  auto x2_t = dets.select(1, 2).contiguous();
+  auto y2_t = dets.select(1, 3).contiguous();
+  at::Tensor areas_t = (x2_t - x1_t + 1) * (y2_t - y1_t + 1);
+  auto order_t = std::get<1>(scores.sort(0, /* descending=*/true));
+  auto ndets = dets.size(0);
+  at::Tensor suppressed_t = at::zeros({ndets}, dets.options().dtype(at::kByte).device(at::kCPU));
+  auto suppressed = suppressed_t.data<uint8_t>();
+  auto order = order_t.data<int64_t>();
+  auto x1 = x1_t.data<scalar_t>();
+  auto y1 = y1_t.data<scalar_t>();
+  auto x2 = x2_t.data<scalar_t>();
+  auto y2 = y2_t.data<scalar_t>();
+  auto areas = areas_t.data<scalar_t>();
+  for (int64_t _i = 0; _i < ndets; _i++) {
+    auto i = order[_i];
+    if (suppressed[i] == 1)
+      continue;
+    auto ix1 = x1[i];
+    auto iy1 = y1[i];
+    auto ix2 = x2[i];
+    auto iy2 = y2[i];
+    auto iarea = areas[i];
+    for (int64_t _j = _i + 1; _j < ndets; _j++) {
+      auto j = order[_j];
+      if (suppressed[j] == 1)
+        continue;
+      auto xx1 = std::max(ix1, x1[j]);
+      auto yy1 = std::max(iy1, y1[j]);
+      auto xx2 = std::min(ix2, x2[j]);
+      auto yy2 = std::min(iy2, y2[j]);
+      auto w = std::max(static_cast<scalar_t>(0), xx2 - xx1 + 1);
+      auto h = std::max(static_cast<scalar_t>(0), yy2 - yy1 + 1);
+      auto inter = w * h;
+      auto ovr = inter / (iarea + areas[j] - inter);
+      if (ovr >= threshold)
+        suppressed[j] = 1;
+   }
+  }
+  return at::nonzero(suppressed_t == 0).squeeze(1);
+}
+at::Tensor nms_cpu(const at::Tensor& dets,
+               const at::Tensor& scores,
+               const float threshold) {
+  at::Tensor result;
+  AT_DISPATCH_FLOATING_TYPES(dets.type(), "nms", [&] {
+    result = nms_cpu_kernel<scalar_t>(dets, scores, threshold);
+  });
+  return result;
+}

maskrcnn_benchmark/csrc/cpu/vision.h ADDED Viewed

	@@ -0,0 +1,16 @@

+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+#pragma once
+#include <torch/extension.h>
+at::Tensor ROIAlign_forward_cpu(const at::Tensor& input,
+                                const at::Tensor& rois,
+                                const float spatial_scale,
+                                const int pooled_height,
+                                const int pooled_width,
+                                const int sampling_ratio);
+at::Tensor nms_cpu(const at::Tensor& dets,
+                   const at::Tensor& scores,
+                   const float threshold);

maskrcnn_benchmark/csrc/cuda/ROIAlign_cuda.cu ADDED Viewed

	@@ -0,0 +1,346 @@

+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+#include <ATen/ATen.h>
+#include <ATen/cuda/CUDAContext.h>
+#include <THC/THC.h>
+#include <THC/THCAtomics.cuh>
+#include <THC/THCDeviceUtils.cuh>
+// TODO make it in a common file
+#define CUDA_1D_KERNEL_LOOP(i, n)                            \
+  for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
+       i += blockDim.x * gridDim.x)
+template <typename T>
+__device__ T bilinear_interpolate(const T* bottom_data,
+    const int height, const int width,
+    T y, T x,
+    const int index /* index for debug only*/) {
+  // deal with cases that inverse elements are out of feature map boundary
+  if (y < -1.0 || y > height || x < -1.0 || x > width) {
+    //empty
+    return 0;
+  }
+  if (y <= 0) y = 0;
+  if (x <= 0) x = 0;
+  int y_low = (int) y;
+  int x_low = (int) x;
+  int y_high;
+  int x_high;
+  if (y_low >= height - 1) {
+    y_high = y_low = height - 1;
+    y = (T) y_low;
+  } else {
+    y_high = y_low + 1;
+  }
+  if (x_low >= width - 1) {
+    x_high = x_low = width - 1;
+    x = (T) x_low;
+  } else {
+    x_high = x_low + 1;
+  }
+  T ly = y - y_low;
+  T lx = x - x_low;
+  T hy = 1. - ly, hx = 1. - lx;
+  // do bilinear interpolation
+  T v1 = bottom_data[y_low * width + x_low];
+  T v2 = bottom_data[y_low * width + x_high];
+  T v3 = bottom_data[y_high * width + x_low];
+  T v4 = bottom_data[y_high * width + x_high];
+  T w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
+  T val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
+  return val;
+}
+template <typename T>
+__global__ void RoIAlignForward(const int nthreads, const T* bottom_data,
+    const T spatial_scale, const int channels,
+    const int height, const int width,
+    const int pooled_height, const int pooled_width,
+    const int sampling_ratio,
+    const T* bottom_rois, T* top_data) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+    const T* offset_bottom_rois = bottom_rois + n * 5;
+    int roi_batch_ind = offset_bottom_rois[0];
+    // Do not using rounding; this implementation detail is critical
+    T roi_start_w = offset_bottom_rois[1] * spatial_scale;
+    T roi_start_h = offset_bottom_rois[2] * spatial_scale;
+    T roi_end_w = offset_bottom_rois[3] * spatial_scale;
+    T roi_end_h = offset_bottom_rois[4] * spatial_scale;
+    // T roi_start_w = round(offset_bottom_rois[1] * spatial_scale);
+    // T roi_start_h = round(offset_bottom_rois[2] * spatial_scale);
+    // T roi_end_w = round(offset_bottom_rois[3] * spatial_scale);
+    // T roi_end_h = round(offset_bottom_rois[4] * spatial_scale);
+    // Force malformed ROIs to be 1x1
+    T roi_width = max(roi_end_w - roi_start_w, (T)1.);
+    T roi_height = max(roi_end_h - roi_start_h, (T)1.);
+    T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
+    T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
+    const T* offset_bottom_data = bottom_data + (roi_batch_ind * channels + c) * height * width;
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h = (sampling_ratio > 0) ? sampling_ratio : ceil(roi_height / pooled_height); // e.g., = 2
+    int roi_bin_grid_w = (sampling_ratio > 0) ? sampling_ratio : ceil(roi_width / pooled_width);
+    // We do average (integral) pooling inside a bin
+    const T count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4
+    T output_val = 0.;
+    for (int iy = 0; iy < roi_bin_grid_h; iy ++) // e.g., iy = 0, 1
+    {
+      const T y = roi_start_h + ph * bin_size_h + static_cast<T>(iy + .5f) * bin_size_h / static_cast<T>(roi_bin_grid_h); // e.g., 0.5, 1.5
+      for (int ix = 0; ix < roi_bin_grid_w; ix ++)
+      {
+        const T x = roi_start_w + pw * bin_size_w + static_cast<T>(ix + .5f) * bin_size_w / static_cast<T>(roi_bin_grid_w);
+        T val = bilinear_interpolate(offset_bottom_data, height, width, y, x, index);
+        output_val += val;
+      }
+    }
+    output_val /= count;
+    top_data[index] = output_val;
+  }
+}
+template <typename T>
+__device__ void bilinear_interpolate_gradient(
+    const int height, const int width,
+    T y, T x,
+    T & w1, T & w2, T & w3, T & w4,
+    int & x_low, int & x_high, int & y_low, int & y_high,
+    const int index /* index for debug only*/) {
+  // deal with cases that inverse elements are out of feature map boundary
+  if (y < -1.0 || y > height || x < -1.0 || x > width) {
+    //empty
+    w1 = w2 = w3 = w4 = 0.;
+    x_low = x_high = y_low = y_high = -1;
+    return;
+  }
+  if (y <= 0) y = 0;
+  if (x <= 0) x = 0;
+  y_low = (int) y;
+  x_low = (int) x;
+  if (y_low >= height - 1) {
+    y_high = y_low = height - 1;
+    y = (T) y_low;
+  } else {
+    y_high = y_low + 1;
+  }
+  if (x_low >= width - 1) {
+    x_high = x_low = width - 1;
+    x = (T) x_low;
+  } else {
+    x_high = x_low + 1;
+  }
+  T ly = y - y_low;
+  T lx = x - x_low;
+  T hy = 1. - ly, hx = 1. - lx;
+  // reference in forward
+  // T v1 = bottom_data[y_low * width + x_low];
+  // T v2 = bottom_data[y_low * width + x_high];
+  // T v3 = bottom_data[y_high * width + x_low];
+  // T v4 = bottom_data[y_high * width + x_high];
+  // T val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
+  w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
+  return;
+}
+template <typename T>
+__global__ void RoIAlignBackwardFeature(const int nthreads, const T* top_diff,
+    const int num_rois, const T spatial_scale,
+    const int channels, const int height, const int width,
+    const int pooled_height, const int pooled_width,
+    const int sampling_ratio,
+    T* bottom_diff,
+    const T* bottom_rois) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+    const T* offset_bottom_rois = bottom_rois + n * 5;
+    int roi_batch_ind = offset_bottom_rois[0];
+    // Do not using rounding; this implementation detail is critical
+    T roi_start_w = offset_bottom_rois[1] * spatial_scale;
+    T roi_start_h = offset_bottom_rois[2] * spatial_scale;
+    T roi_end_w = offset_bottom_rois[3] * spatial_scale;
+    T roi_end_h = offset_bottom_rois[4] * spatial_scale;
+    // T roi_start_w = round(offset_bottom_rois[1] * spatial_scale);
+    // T roi_start_h = round(offset_bottom_rois[2] * spatial_scale);
+    // T roi_end_w = round(offset_bottom_rois[3] * spatial_scale);
+    // T roi_end_h = round(offset_bottom_rois[4] * spatial_scale);
+    // Force malformed ROIs to be 1x1
+    T roi_width = max(roi_end_w - roi_start_w, (T)1.);
+    T roi_height = max(roi_end_h - roi_start_h, (T)1.);
+    T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
+    T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
+    T* offset_bottom_diff = bottom_diff + (roi_batch_ind * channels + c) * height * width;
+    int top_offset    = (n * channels + c) * pooled_height * pooled_width;
+    const T* offset_top_diff = top_diff + top_offset;
+    const T top_diff_this_bin = offset_top_diff[ph * pooled_width + pw];
+    // We use roi_bin_grid to sample the grid and mimic integral
+    int roi_bin_grid_h = (sampling_ratio > 0) ? sampling_ratio : ceil(roi_height / pooled_height); // e.g., = 2
+    int roi_bin_grid_w = (sampling_ratio > 0) ? sampling_ratio : ceil(roi_width / pooled_width);
+    // We do average (integral) pooling inside a bin
+    const T count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4
+    for (int iy = 0; iy < roi_bin_grid_h; iy ++) // e.g., iy = 0, 1
+    {
+      const T y = roi_start_h + ph * bin_size_h + static_cast<T>(iy + .5f) * bin_size_h / static_cast<T>(roi_bin_grid_h); // e.g., 0.5, 1.5
+      for (int ix = 0; ix < roi_bin_grid_w; ix ++)
+      {
+        const T x = roi_start_w + pw * bin_size_w + static_cast<T>(ix + .5f) * bin_size_w / static_cast<T>(roi_bin_grid_w);
+        T w1, w2, w3, w4;
+        int x_low, x_high, y_low, y_high;
+        bilinear_interpolate_gradient(height, width, y, x,
+            w1, w2, w3, w4,
+            x_low, x_high, y_low, y_high,
+            index);
+        T g1 = top_diff_this_bin * w1 / count;
+        T g2 = top_diff_this_bin * w2 / count;
+        T g3 = top_diff_this_bin * w3 / count;
+        T g4 = top_diff_this_bin * w4 / count;
+        if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0)
+        {
+          atomicAdd(offset_bottom_diff + y_low * width + x_low, static_cast<T>(g1));
+          atomicAdd(offset_bottom_diff + y_low * width + x_high, static_cast<T>(g2));
+          atomicAdd(offset_bottom_diff + y_high * width + x_low, static_cast<T>(g3));
+          atomicAdd(offset_bottom_diff + y_high * width + x_high, static_cast<T>(g4));
+        } // if
+      } // ix
+    } // iy
+  } // CUDA_1D_KERNEL_LOOP
+} // RoIAlignBackward
+at::Tensor ROIAlign_forward_cuda(const at::Tensor& input,
+                                 const at::Tensor& rois,
+                                 const float spatial_scale,
+                                 const int pooled_height,
+                                 const int pooled_width,
+                                 const int sampling_ratio) {
+  AT_ASSERTM(input.type().is_cuda(), "input must be a CUDA tensor");
+  AT_ASSERTM(rois.type().is_cuda(), "rois must be a CUDA tensor");
+  auto num_rois = rois.size(0);
+  auto channels = input.size(1);
+  auto height = input.size(2);
+  auto width = input.size(3);
+  auto output = at::empty({num_rois, channels, pooled_height, pooled_width}, input.options());
+  auto output_size = num_rois * pooled_height * pooled_width * channels;
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  dim3 grid(std::min(THCCeilDiv((long)output_size, 512L), 4096L));
+  dim3 block(512);
+  if (output.numel() == 0) {
+    THCudaCheck(cudaGetLastError());
+    return output;
+  }
+  AT_DISPATCH_FLOATING_TYPES(input.type(), "ROIAlign_forward", [&] {
+    RoIAlignForward<scalar_t><<<grid, block, 0, stream>>>(
+         output_size,
+         input.contiguous().data<scalar_t>(),
+         spatial_scale,
+         channels,
+         height,
+         width,
+         pooled_height,
+         pooled_width,
+         sampling_ratio,
+         rois.contiguous().data<scalar_t>(),
+         output.data<scalar_t>());
+  });
+  THCudaCheck(cudaGetLastError());
+  return output;
+}
+// TODO remove the dependency on input and use instead its sizes -> save memory
+at::Tensor ROIAlign_backward_cuda(const at::Tensor& grad,
+                                  const at::Tensor& rois,
+                                  const float spatial_scale,
+                                  const int pooled_height,
+                                  const int pooled_width,
+                                  const int batch_size,
+                                  const int channels,
+                                  const int height,
+                                  const int width,
+                                  const int sampling_ratio) {
+  AT_ASSERTM(grad.type().is_cuda(), "grad must be a CUDA tensor");
+  AT_ASSERTM(rois.type().is_cuda(), "rois must be a CUDA tensor");
+  auto num_rois = rois.size(0);
+  auto grad_input = at::zeros({batch_size, channels, height, width}, grad.options());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  dim3 grid(std::min(THCCeilDiv((long)grad.numel(), 512L), 4096L));
+  dim3 block(512);
+  // handle possibly empty gradients
+  if (grad.numel() == 0) {
+    THCudaCheck(cudaGetLastError());
+    return grad_input;
+  }
+  AT_DISPATCH_FLOATING_TYPES(grad.type(), "ROIAlign_backward", [&] {
+    RoIAlignBackwardFeature<scalar_t><<<grid, block, 0, stream>>>(
+         grad.numel(),
+         grad.contiguous().data<scalar_t>(),
+         num_rois,
+         spatial_scale,
+         channels,
+         height,
+         width,
+         pooled_height,
+         pooled_width,
+         sampling_ratio,
+         grad_input.data<scalar_t>(),
+         rois.contiguous().data<scalar_t>());
+  });
+  THCudaCheck(cudaGetLastError());
+  return grad_input;
+}

maskrcnn_benchmark/csrc/cuda/ROIPool_cuda.cu ADDED Viewed

	@@ -0,0 +1,202 @@

+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+#include <ATen/ATen.h>
+#include <ATen/cuda/CUDAContext.h>
+#include <THC/THC.h>
+#include <THC/THCAtomics.cuh>
+#include <THC/THCDeviceUtils.cuh>
+// TODO make it in a common file
+#define CUDA_1D_KERNEL_LOOP(i, n)                            \
+  for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
+       i += blockDim.x * gridDim.x)
+template <typename T>
+__global__ void RoIPoolFForward(const int nthreads, const T* bottom_data,
+    const T spatial_scale, const int channels, const int height,
+    const int width, const int pooled_height, const int pooled_width,
+    const T* bottom_rois, T* top_data, int* argmax_data) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+    const T* offset_bottom_rois = bottom_rois + n * 5;
+    int roi_batch_ind = offset_bottom_rois[0];
+    int roi_start_w = round(offset_bottom_rois[1] * spatial_scale);
+    int roi_start_h = round(offset_bottom_rois[2] * spatial_scale);
+    int roi_end_w = round(offset_bottom_rois[3] * spatial_scale);
+    int roi_end_h = round(offset_bottom_rois[4] * spatial_scale);
+    // Force malformed ROIs to be 1x1
+    int roi_width = max(roi_end_w - roi_start_w + 1, 1);
+    int roi_height = max(roi_end_h - roi_start_h + 1, 1);
+    T bin_size_h = static_cast<T>(roi_height)
+                       / static_cast<T>(pooled_height);
+    T bin_size_w = static_cast<T>(roi_width)
+                       / static_cast<T>(pooled_width);
+    int hstart = static_cast<int>(floor(static_cast<T>(ph)
+                                        * bin_size_h));
+    int wstart = static_cast<int>(floor(static_cast<T>(pw)
+                                        * bin_size_w));
+    int hend = static_cast<int>(ceil(static_cast<T>(ph + 1)
+                                     * bin_size_h));
+    int wend = static_cast<int>(ceil(static_cast<T>(pw + 1)
+                                     * bin_size_w));
+    // Add roi offsets and clip to input boundaries
+    hstart = min(max(hstart + roi_start_h, 0), height);
+    hend = min(max(hend + roi_start_h, 0), height);
+    wstart = min(max(wstart + roi_start_w, 0), width);
+    wend = min(max(wend + roi_start_w, 0), width);
+    bool is_empty = (hend <= hstart) || (wend <= wstart);
+    // Define an empty pooling region to be zero
+    T maxval = is_empty ? 0 : -FLT_MAX;
+    // If nothing is pooled, argmax = -1 causes nothing to be backprop'd
+    int maxidx = -1;
+    const T* offset_bottom_data =
+        bottom_data + (roi_batch_ind * channels + c) * height * width;
+    for (int h = hstart; h < hend; ++h) {
+      for (int w = wstart; w < wend; ++w) {
+        int bottom_index = h * width + w;
+        if (offset_bottom_data[bottom_index] > maxval) {
+          maxval = offset_bottom_data[bottom_index];
+          maxidx = bottom_index;
+        }
+      }
+    }
+    top_data[index] = maxval;
+    argmax_data[index] = maxidx;
+  }
+}
+template <typename T>
+__global__ void RoIPoolFBackward(const int nthreads, const T* top_diff,
+    const int* argmax_data, const int num_rois, const T spatial_scale,
+    const int channels, const int height, const int width,
+    const int pooled_height, const int pooled_width, T* bottom_diff,
+    const T* bottom_rois) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // (n, c, ph, pw) is an element in the pooled output
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+    const T* offset_bottom_rois = bottom_rois + n * 5;
+    int roi_batch_ind = offset_bottom_rois[0];
+    int bottom_offset = (roi_batch_ind * channels + c) * height * width;
+    int top_offset    = (n * channels + c) * pooled_height * pooled_width;
+    const T* offset_top_diff = top_diff + top_offset;
+    T* offset_bottom_diff = bottom_diff + bottom_offset;
+    const int* offset_argmax_data = argmax_data + top_offset;
+    int argmax = offset_argmax_data[ph * pooled_width + pw];
+    if (argmax != -1) {
+      atomicAdd(
+          offset_bottom_diff + argmax,
+          static_cast<T>(offset_top_diff[ph * pooled_width + pw]));
+    }
+  }
+}
+std::tuple<at::Tensor, at::Tensor> ROIPool_forward_cuda(const at::Tensor& input,
+                                const at::Tensor& rois,
+                                const float spatial_scale,
+                                const int pooled_height,
+                                const int pooled_width) {
+  AT_ASSERTM(input.type().is_cuda(), "input must be a CUDA tensor");
+  AT_ASSERTM(rois.type().is_cuda(), "rois must be a CUDA tensor");
+  auto num_rois = rois.size(0);
+  auto channels = input.size(1);
+  auto height = input.size(2);
+  auto width = input.size(3);
+  auto output = at::empty({num_rois, channels, pooled_height, pooled_width}, input.options());
+  auto output_size = num_rois * pooled_height * pooled_width * channels;
+  auto argmax = at::zeros({num_rois, channels, pooled_height, pooled_width}, input.options().dtype(at::kInt));
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  dim3 grid(std::min(THCCeilDiv((long)output_size, 512L), 4096L));
+  dim3 block(512);
+  if (output.numel() == 0) {
+    THCudaCheck(cudaGetLastError());
+    return std::make_tuple(output, argmax);
+  }
+  AT_DISPATCH_FLOATING_TYPES(input.type(), "ROIPool_forward", [&] {
+    RoIPoolFForward<scalar_t><<<grid, block, 0, stream>>>(
+         output_size,
+         input.contiguous().data<scalar_t>(),
+         spatial_scale,
+         channels,
+         height,
+         width,
+         pooled_height,
+         pooled_width,
+         rois.contiguous().data<scalar_t>(),
+         output.data<scalar_t>(),
+         argmax.data<int>());
+  });
+  THCudaCheck(cudaGetLastError());
+  return std::make_tuple(output, argmax);
+}
+// TODO remove the dependency on input and use instead its sizes -> save memory
+at::Tensor ROIPool_backward_cuda(const at::Tensor& grad,
+                                 const at::Tensor& input,
+                                 const at::Tensor& rois,
+                                 const at::Tensor& argmax,
+                                 const float spatial_scale,
+                                 const int pooled_height,
+                                 const int pooled_width,
+                                 const int batch_size,
+                                 const int channels,
+                                 const int height,
+                                 const int width) {
+  AT_ASSERTM(grad.type().is_cuda(), "grad must be a CUDA tensor");
+  AT_ASSERTM(rois.type().is_cuda(), "rois must be a CUDA tensor");
+  // TODO add more checks
+  auto num_rois = rois.size(0);
+  auto grad_input = at::zeros({batch_size, channels, height, width}, grad.options());
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  dim3 grid(std::min(THCCeilDiv((long)grad.numel(), 512L), 4096L));
+  dim3 block(512);
+  // handle possibly empty gradients
+  if (grad.numel() == 0) {
+    THCudaCheck(cudaGetLastError());
+    return grad_input;
+  }
+  AT_DISPATCH_FLOATING_TYPES(grad.type(), "ROIPool_backward", [&] {
+    RoIPoolFBackward<scalar_t><<<grid, block, 0, stream>>>(
+         grad.numel(),
+         grad.contiguous().data<scalar_t>(),
+         argmax.data<int>(),
+         num_rois,
+         spatial_scale,
+         channels,
+         height,
+         width,
+         pooled_height,
+         pooled_width,
+         grad_input.data<scalar_t>(),
+         rois.contiguous().data<scalar_t>());
+  });
+  THCudaCheck(cudaGetLastError());
+  return grad_input;
+}

maskrcnn_benchmark/csrc/cuda/SigmoidFocalLoss_cuda.cu ADDED Viewed

	@@ -0,0 +1,189 @@

+// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
+// This file is modified from  https://github.com/pytorch/pytorch/blob/master/modules/detectron/sigmoid_focal_loss_op.cu
+// Cheng-Yang Fu
+// [email protected]
+#include <ATen/ATen.h>
+#include <ATen/cuda/CUDAContext.h>
+#include <THC/THC.h>
+#include <THC/THCAtomics.cuh>
+#include <THC/THCDeviceUtils.cuh>
+#include <cfloat>
+// TODO make it in a common file
+#define CUDA_1D_KERNEL_LOOP(i, n)                            \
+  for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
+       i += blockDim.x * gridDim.x)
+template <typename T>
+__global__ void SigmoidFocalLossForward(const int nthreads,
+    const T* logits,
+    const int* targets,
+    const int num_classes,
+    const float gamma,
+    const float alpha,
+    const int num,
+    T* losses) {
+  CUDA_1D_KERNEL_LOOP(i, nthreads) {
+    int n = i / num_classes;
+    int d = i % num_classes; // current class[0~79];
+    int t = targets[n]; // target class [1~80];
+    // Decide it is positive or negative case.
+    T c1 = (t == (d+1));
+    T c2 = (t>=0 & t != (d+1));
+    T zn = (1.0 - alpha);
+    T zp = (alpha);
+    // p = 1. / 1. + expf(-x); p = sigmoid(x)
+    T  p = 1. / (1. + expf(-logits[i]));
+    // (1-p)**gamma * log(p) where
+    T term1 = powf((1. - p), gamma) * logf(max(p, FLT_MIN));
+    // p**gamma * log(1-p)
+    T term2 = powf(p, gamma) *
+            (-1. * logits[i] * (logits[i] >= 0) -
+             logf(1. + expf(logits[i] - 2. * logits[i] * (logits[i] >= 0))));
+    losses[i] = 0.0;
+    losses[i] += -c1 * term1 * zp;
+    losses[i] += -c2 * term2 * zn;
+  } // CUDA_1D_KERNEL_LOOP
+} // SigmoidFocalLossForward
+template <typename T>
+__global__ void SigmoidFocalLossBackward(const int nthreads,
+                const T* logits,
+                const int* targets,
+                const T* d_losses,
+                const int num_classes,
+                const float gamma,
+                const float alpha,
+                const int num,
+                T* d_logits) {
+  CUDA_1D_KERNEL_LOOP(i, nthreads) {
+    int n = i / num_classes;
+    int d = i % num_classes; // current class[0~79];
+    int t = targets[n]; // target class [1~80], 0 is background;
+    // Decide it is positive or negative case.
+    T c1 = (t == (d+1));
+    T c2 = (t>=0 & t != (d+1));
+    T zn = (1.0 - alpha);
+    T zp = (alpha);
+    // p = 1. / 1. + expf(-x); p = sigmoid(x)
+    T  p = 1. / (1. + expf(-logits[i]));
+    // (1-p)**g * (1 - p - g*p*log(p)
+    T term1 = powf((1. - p), gamma) *
+                      (1. - p - (p * gamma * logf(max(p, FLT_MIN))));
+    // (p**g) * (g*(1-p)*log(1-p) - p)
+    T term2 = powf(p, gamma) *
+                  ((-1. * logits[i] * (logits[i] >= 0) -
+                      logf(1. + expf(logits[i] - 2. * logits[i] * (logits[i] >= 0)))) *
+                      (1. - p) * gamma - p);
+    d_logits[i] = 0.0;
+    d_logits[i] += -c1 * term1 * zp;
+    d_logits[i] += -c2 * term2 * zn;
+    d_logits[i] = d_logits[i] * d_losses[i];
+  } // CUDA_1D_KERNEL_LOOP
+} // SigmoidFocalLossBackward
+at::Tensor SigmoidFocalLoss_forward_cuda(
+		const at::Tensor& logits,
+                const at::Tensor& targets,
+		const int num_classes,
+		const float gamma,
+		const float alpha) {
+  AT_ASSERTM(logits.type().is_cuda(), "logits must be a CUDA tensor");
+  AT_ASSERTM(targets.type().is_cuda(), "targets must be a CUDA tensor");
+  AT_ASSERTM(logits.dim() == 2, "logits should be NxClass");
+  const int num_samples = logits.size(0);
+  auto losses = at::empty({num_samples, logits.size(1)}, logits.options());
+  auto losses_size = num_samples * logits.size(1);
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  dim3 grid(std::min(THCCeilDiv((long)losses_size, 512L), 4096L));
+  dim3 block(512);
+  if (losses.numel() == 0) {
+    THCudaCheck(cudaGetLastError());
+    return losses;
+  }
+  AT_DISPATCH_FLOATING_TYPES(logits.type(), "SigmoidFocalLoss_forward", [&] {
+    SigmoidFocalLossForward<scalar_t><<<grid, block, 0, stream>>>(
+         losses_size,
+         logits.contiguous().data<scalar_t>(),
+	 targets.contiguous().data<int>(),
+         num_classes,
+	 gamma,
+	 alpha,
+	 num_samples,
+         losses.data<scalar_t>());
+  });
+  THCudaCheck(cudaGetLastError());
+  return losses;
+}
+at::Tensor SigmoidFocalLoss_backward_cuda(
+		const at::Tensor& logits,
+                const at::Tensor& targets,
+		const at::Tensor& d_losses,
+		const int num_classes,
+		const float gamma,
+		const float alpha) {
+  AT_ASSERTM(logits.type().is_cuda(), "logits must be a CUDA tensor");
+  AT_ASSERTM(targets.type().is_cuda(), "targets must be a CUDA tensor");
+  AT_ASSERTM(d_losses.type().is_cuda(), "d_losses must be a CUDA tensor");
+  AT_ASSERTM(logits.dim() == 2, "logits should be NxClass");
+  const int num_samples = logits.size(0);
+  AT_ASSERTM(logits.size(1) == num_classes, "logits.size(1) should be num_classes");
+  auto d_logits = at::zeros({num_samples, num_classes}, logits.options());
+  auto d_logits_size = num_samples * logits.size(1);
+  cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+  dim3 grid(std::min(THCCeilDiv((long)d_logits_size, 512L), 4096L));
+  dim3 block(512);
+  if (d_logits.numel() == 0) {
+    THCudaCheck(cudaGetLastError());
+    return d_logits;
+  }
+  AT_DISPATCH_FLOATING_TYPES(logits.type(), "SigmoidFocalLoss_backward", [&] {
+    SigmoidFocalLossBackward<scalar_t><<<grid, block, 0, stream>>>(
+         d_logits_size,
+         logits.contiguous().data<scalar_t>(),
+	 targets.contiguous().data<int>(),
+	 d_losses.contiguous().data<scalar_t>(),
+         num_classes,
+	 gamma,
+	 alpha,
+	 num_samples,
+         d_logits.data<scalar_t>());
+  });
+  THCudaCheck(cudaGetLastError());
+  return d_logits;
+}

maskrcnn_benchmark/csrc/cuda/deform_conv_cuda.cu ADDED Viewed

	@@ -0,0 +1,691 @@

+// modify from
+// https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/blob/mmdetection/mmdet/ops/dcn/src/deform_conv_cuda.c
+#include <ATen/ATen.h>
+#include <ATen/cuda/CUDAContext.h>
+#include <THC/THC.h>
+#include <THC/THCDeviceUtils.cuh>
+#include <vector>
+#include <iostream>
+#include <cmath>
+void deformable_im2col(const at::Tensor data_im, const at::Tensor data_offset,
+                       const int channels, const int height, const int width,
+                       const int ksize_h, const int ksize_w, const int pad_h,
+                       const int pad_w, const int stride_h, const int stride_w,
+                       const int dilation_h, const int dilation_w,
+                       const int parallel_imgs, const int deformable_group,
+                       at::Tensor data_col);
+void deformable_col2im(const at::Tensor data_col, const at::Tensor data_offset,
+                       const int channels, const int height, const int width,
+                       const int ksize_h, const int ksize_w, const int pad_h,
+                       const int pad_w, const int stride_h, const int stride_w,
+                       const int dilation_h, const int dilation_w,
+                       const int parallel_imgs, const int deformable_group,
+                       at::Tensor grad_im);
+void deformable_col2im_coord(
+    const at::Tensor data_col, const at::Tensor data_im,
+    const at::Tensor data_offset, const int channels, const int height,
+    const int width, const int ksize_h, const int ksize_w, const int pad_h,
+    const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int parallel_imgs,
+    const int deformable_group, at::Tensor grad_offset);
+void modulated_deformable_im2col_cuda(
+    const at::Tensor data_im, const at::Tensor data_offset,
+    const at::Tensor data_mask, const int batch_size, const int channels,
+    const int height_im, const int width_im, const int height_col,
+    const int width_col, const int kernel_h, const int kenerl_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int deformable_group,
+    at::Tensor data_col);
+void modulated_deformable_col2im_cuda(
+    const at::Tensor data_col, const at::Tensor data_offset,
+    const at::Tensor data_mask, const int batch_size, const int channels,
+    const int height_im, const int width_im, const int height_col,
+    const int width_col, const int kernel_h, const int kenerl_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int deformable_group,
+    at::Tensor grad_im);
+void modulated_deformable_col2im_coord_cuda(
+    const at::Tensor data_col, const at::Tensor data_im,
+    const at::Tensor data_offset, const at::Tensor data_mask,
+    const int batch_size, const int channels, const int height_im,
+    const int width_im, const int height_col, const int width_col,
+    const int kernel_h, const int kenerl_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w, const int dilation_h,
+    const int dilation_w, const int deformable_group, at::Tensor grad_offset,
+    at::Tensor grad_mask);
+void shape_check(at::Tensor input, at::Tensor offset, at::Tensor *gradOutput,
+                 at::Tensor weight, int kH, int kW, int dH, int dW, int padH,
+                 int padW, int dilationH, int dilationW, int group,
+                 int deformable_group)
+{
+  AT_CHECK(weight.ndimension() == 4,
+           "4D weight tensor (nOutputPlane,nInputPlane,kH,kW) expected, "
+           "but got: %s",
+           weight.ndimension());
+  AT_CHECK(weight.is_contiguous(), "weight tensor has to be contiguous");
+  AT_CHECK(kW > 0 && kH > 0,
+           "kernel size should be greater than zero, but got kH: %d kW: %d", kH,
+           kW);
+  AT_CHECK((weight.size(2) == kH && weight.size(3) == kW),
+           "kernel size should be consistent with weight, ",
+           "but got kH: %d kW: %d weight.size(2): %d, weight.size(3): %d", kH,
+           kW, weight.size(2), weight.size(3));
+  AT_CHECK(dW > 0 && dH > 0,
+           "stride should be greater than zero, but got dH: %d dW: %d", dH, dW);
+  AT_CHECK(
+      dilationW > 0 && dilationH > 0,
+      "dilation should be greater than 0, but got dilationH: %d dilationW: %d",
+      dilationH, dilationW);
+  int ndim = input.ndimension();
+  int dimf = 0;
+  int dimh = 1;
+  int dimw = 2;
+  if (ndim == 4) {
+    dimf++;
+    dimh++;
+    dimw++;
+  }
+  AT_CHECK(ndim == 3 || ndim == 4, "3D or 4D input tensor expected but got: %s",
+           ndim);
+  long nInputPlane = weight.size(1) * group;
+  long inputHeight = input.size(dimh);
+  long inputWidth = input.size(dimw);
+  long nOutputPlane = weight.size(0);
+  long outputHeight =
+      (inputHeight + 2 * padH - (dilationH * (kH - 1) + 1)) / dH + 1;
+  long outputWidth =
+      (inputWidth + 2 * padW - (dilationW * (kW - 1) + 1)) / dW + 1;
+  AT_CHECK(nInputPlane % deformable_group == 0,
+           "input channels must divide deformable group size");
+  if (outputWidth < 1 || outputHeight < 1)
+    AT_ERROR(
+        "Given input size: (%ld x %ld x %ld). "
+        "Calculated output size: (%ld x %ld x %ld). Output size is too small",
+        nInputPlane, inputHeight, inputWidth, nOutputPlane, outputHeight,
+        outputWidth);
+  AT_CHECK(input.size(1) == nInputPlane,
+           "invalid number of input planes, expected: %d, but got: %d",
+           nInputPlane, input.size(1));
+  AT_CHECK((inputHeight >= kH && inputWidth >= kW),
+           "input image is smaller than kernel");
+  AT_CHECK((offset.size(2) == outputHeight && offset.size(3) == outputWidth),
+           "invalid spatial size of offset, expected height: %d width: %d, but "
+           "got height: %d width: %d",
+           outputHeight, outputWidth, offset.size(2), offset.size(3));
+  AT_CHECK((offset.size(1) == deformable_group * 2 * kH * kW),
+           "invalid number of channels of offset");
+  if (gradOutput != NULL) {
+    AT_CHECK(gradOutput->size(dimf) == nOutputPlane,
+             "invalid number of gradOutput planes, expected: %d, but got: %d",
+             nOutputPlane, gradOutput->size(dimf));
+    AT_CHECK((gradOutput->size(dimh) == outputHeight &&
+              gradOutput->size(dimw) == outputWidth),
+             "invalid size of gradOutput, expected height: %d width: %d , but "
+             "got height: %d width: %d",
+             outputHeight, outputWidth, gradOutput->size(dimh),
+             gradOutput->size(dimw));
+  }
+}
+int deform_conv_forward_cuda(at::Tensor input, at::Tensor weight,
+                             at::Tensor offset, at::Tensor output,
+                             at::Tensor columns, at::Tensor ones, int kW,
+                             int kH, int dW, int dH, int padW, int padH,
+                             int dilationW, int dilationH, int group,
+                             int deformable_group, int im2col_step)
+{
+  // todo: resize columns to include im2col: done
+  // todo: add im2col_step as input
+  // todo: add new output buffer and transpose it to output (or directly
+  // transpose output) todo: possibly change data indexing because of
+  // parallel_imgs
+  shape_check(input, offset, NULL, weight, kH, kW, dH, dW, padH, padW,
+              dilationH, dilationW, group, deformable_group);
+  input = input.contiguous();
+  offset = offset.contiguous();
+  weight = weight.contiguous();
+  int batch = 1;
+  if (input.ndimension() == 3) {
+    // Force batch
+    batch = 0;
+    input.unsqueeze_(0);
+    offset.unsqueeze_(0);
+  }
+  // todo: assert batchsize dividable by im2col_step
+  long batchSize = input.size(0);
+  long nInputPlane = input.size(1);
+  long inputHeight = input.size(2);
+  long inputWidth = input.size(3);
+  long nOutputPlane = weight.size(0);
+  long outputWidth =
+      (inputWidth + 2 * padW - (dilationW * (kW - 1) + 1)) / dW + 1;
+  long outputHeight =
+      (inputHeight + 2 * padH - (dilationH * (kH - 1) + 1)) / dH + 1;
+  AT_CHECK((offset.size(0) == batchSize), "invalid batch size of offset");
+  output = output.view({batchSize / im2col_step, im2col_step, nOutputPlane,
+                        outputHeight, outputWidth});
+  columns = at::zeros(
+      {nInputPlane * kW * kH, im2col_step * outputHeight * outputWidth},
+      input.options());
+  if (ones.ndimension() != 2 ||
+      ones.size(0) * ones.size(1) < outputHeight * outputWidth) {
+    ones = at::ones({outputHeight, outputWidth}, input.options());
+  }
+  input = input.view({batchSize / im2col_step, im2col_step, nInputPlane,
+                      inputHeight, inputWidth});
+  offset =
+      offset.view({batchSize / im2col_step, im2col_step,
+                   deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+  at::Tensor output_buffer =
+      at::zeros({batchSize / im2col_step, nOutputPlane,
+                 im2col_step * outputHeight, outputWidth},
+                output.options());
+  output_buffer = output_buffer.view(
+      {output_buffer.size(0), group, output_buffer.size(1) / group,
+       output_buffer.size(2), output_buffer.size(3)});
+  for (int elt = 0; elt < batchSize / im2col_step; elt++) {
+    deformable_im2col(input[elt], offset[elt], nInputPlane, inputHeight,
+                      inputWidth, kH, kW, padH, padW, dH, dW, dilationH,
+                      dilationW, im2col_step, deformable_group, columns);
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+    weight = weight.view({group, weight.size(0) / group, weight.size(1),
+                          weight.size(2), weight.size(3)});
+    for (int g = 0; g < group; g++) {
+      output_buffer[elt][g] = output_buffer[elt][g]
+                                  .flatten(1)
+                                  .addmm_(weight[g].flatten(1), columns[g])
+                                  .view_as(output_buffer[elt][g]);
+    }
+  }
+  output_buffer = output_buffer.view(
+      {output_buffer.size(0), output_buffer.size(1) * output_buffer.size(2),
+       output_buffer.size(3), output_buffer.size(4)});
+  output_buffer = output_buffer.view({batchSize / im2col_step, nOutputPlane,
+                                      im2col_step, outputHeight, outputWidth});
+  output_buffer.transpose_(1, 2);
+  output.copy_(output_buffer);
+  output = output.view({batchSize, nOutputPlane, outputHeight, outputWidth});
+  input = input.view({batchSize, nInputPlane, inputHeight, inputWidth});
+  offset = offset.view(
+      {batchSize, deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+  if (batch == 0) {
+    output = output.view({nOutputPlane, outputHeight, outputWidth});
+    input = input.view({nInputPlane, inputHeight, inputWidth});
+    offset = offset.view({offset.size(1), offset.size(2), offset.size(3)});
+  }
+  return 1;
+}
+int deform_conv_backward_input_cuda(at::Tensor input, at::Tensor offset,
+                                    at::Tensor gradOutput, at::Tensor gradInput,
+                                    at::Tensor gradOffset, at::Tensor weight,
+                                    at::Tensor columns, int kW, int kH, int dW,
+                                    int dH, int padW, int padH, int dilationW,
+                                    int dilationH, int group,
+                                    int deformable_group, int im2col_step)
+{
+  shape_check(input, offset, &gradOutput, weight, kH, kW, dH, dW, padH, padW,
+              dilationH, dilationW, group, deformable_group);
+  input = input.contiguous();
+  offset = offset.contiguous();
+  gradOutput = gradOutput.contiguous();
+  weight = weight.contiguous();
+  int batch = 1;
+  if (input.ndimension() == 3) {
+    // Force batch
+    batch = 0;
+    input = input.view({1, input.size(0), input.size(1), input.size(2)});
+    offset = offset.view({1, offset.size(0), offset.size(1), offset.size(2)});
+    gradOutput = gradOutput.view(
+        {1, gradOutput.size(0), gradOutput.size(1), gradOutput.size(2)});
+  }
+  long batchSize = input.size(0);
+  long nInputPlane = input.size(1);
+  long inputHeight = input.size(2);
+  long inputWidth = input.size(3);
+  long nOutputPlane = weight.size(0);
+  long outputWidth =
+      (inputWidth + 2 * padW - (dilationW * (kW - 1) + 1)) / dW + 1;
+  long outputHeight =
+      (inputHeight + 2 * padH - (dilationH * (kH - 1) + 1)) / dH + 1;
+  AT_CHECK((offset.size(0) == batchSize), 3, "invalid batch size of offset");
+  gradInput = gradInput.view({batchSize, nInputPlane, inputHeight, inputWidth});
+  columns = at::zeros(
+      {nInputPlane * kW * kH, im2col_step * outputHeight * outputWidth},
+      input.options());
+  // change order of grad output
+  gradOutput = gradOutput.view({batchSize / im2col_step, im2col_step,
+                                nOutputPlane, outputHeight, outputWidth});
+  gradOutput.transpose_(1, 2);
+  gradInput = gradInput.view({batchSize / im2col_step, im2col_step, nInputPlane,
+                              inputHeight, inputWidth});
+  input = input.view({batchSize / im2col_step, im2col_step, nInputPlane,
+                      inputHeight, inputWidth});
+  gradOffset = gradOffset.view({batchSize / im2col_step, im2col_step,
+                                deformable_group * 2 * kH * kW, outputHeight,
+                                outputWidth});
+  offset =
+      offset.view({batchSize / im2col_step, im2col_step,
+                   deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+  for (int elt = 0; elt < batchSize / im2col_step; elt++) {
+    // divide into groups
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+    weight = weight.view({group, weight.size(0) / group, weight.size(1),
+                          weight.size(2), weight.size(3)});
+    gradOutput = gradOutput.view(
+        {gradOutput.size(0), group, gradOutput.size(1) / group,
+         gradOutput.size(2), gradOutput.size(3), gradOutput.size(4)});
+    for (int g = 0; g < group; g++) {
+      columns[g] = columns[g].addmm_(weight[g].flatten(1).transpose(0, 1),
+                                     gradOutput[elt][g].flatten(1), 0.0f, 1.0f);
+    }
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+    gradOutput = gradOutput.view(
+        {gradOutput.size(0), gradOutput.size(1) * gradOutput.size(2),
+         gradOutput.size(3), gradOutput.size(4), gradOutput.size(5)});
+    deformable_col2im_coord(columns, input[elt], offset[elt], nInputPlane,
+                            inputHeight, inputWidth, kH, kW, padH, padW, dH, dW,
+                            dilationH, dilationW, im2col_step, deformable_group,
+                            gradOffset[elt]);
+    deformable_col2im(columns, offset[elt], nInputPlane, inputHeight,
+                      inputWidth, kH, kW, padH, padW, dH, dW, dilationH,
+                      dilationW, im2col_step, deformable_group, gradInput[elt]);
+  }
+  gradOutput.transpose_(1, 2);
+  gradOutput =
+      gradOutput.view({batchSize, nOutputPlane, outputHeight, outputWidth});
+  gradInput = gradInput.view({batchSize, nInputPlane, inputHeight, inputWidth});
+  input = input.view({batchSize, nInputPlane, inputHeight, inputWidth});
+  gradOffset = gradOffset.view(
+      {batchSize, deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+  offset = offset.view(
+      {batchSize, deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+  if (batch == 0) {
+    gradOutput = gradOutput.view({nOutputPlane, outputHeight, outputWidth});
+    input = input.view({nInputPlane, inputHeight, inputWidth});
+    gradInput = gradInput.view({nInputPlane, inputHeight, inputWidth});
+    offset = offset.view({offset.size(1), offset.size(2), offset.size(3)});
+    gradOffset =
+        gradOffset.view({offset.size(1), offset.size(2), offset.size(3)});
+  }
+  return 1;
+}
+int deform_conv_backward_parameters_cuda(
+    at::Tensor input, at::Tensor offset, at::Tensor gradOutput,
+    at::Tensor gradWeight,  // at::Tensor gradBias,
+    at::Tensor columns, at::Tensor ones, int kW, int kH, int dW, int dH,
+    int padW, int padH, int dilationW, int dilationH, int group,
+    int deformable_group, float scale, int im2col_step)
+{
+  // todo: transpose and reshape outGrad
+  // todo: reshape columns
+  // todo: add im2col_step as input
+  shape_check(input, offset, &gradOutput, gradWeight, kH, kW, dH, dW, padH,
+              padW, dilationH, dilationW, group, deformable_group);
+  input = input.contiguous();
+  offset = offset.contiguous();
+  gradOutput = gradOutput.contiguous();
+  int batch = 1;
+  if (input.ndimension() == 3) {
+    // Force batch
+    batch = 0;
+    input = input.view(
+        at::IntList({1, input.size(0), input.size(1), input.size(2)}));
+    gradOutput = gradOutput.view(
+        {1, gradOutput.size(0), gradOutput.size(1), gradOutput.size(2)});
+  }
+  long batchSize = input.size(0);
+  long nInputPlane = input.size(1);
+  long inputHeight = input.size(2);
+  long inputWidth = input.size(3);
+  long nOutputPlane = gradWeight.size(0);
+  long outputWidth =
+      (inputWidth + 2 * padW - (dilationW * (kW - 1) + 1)) / dW + 1;
+  long outputHeight =
+      (inputHeight + 2 * padH - (dilationH * (kH - 1) + 1)) / dH + 1;
+  AT_CHECK((offset.size(0) == batchSize), "invalid batch size of offset");
+  columns = at::zeros(
+      {nInputPlane * kW * kH, im2col_step * outputHeight * outputWidth},
+      input.options());
+  gradOutput = gradOutput.view({batchSize / im2col_step, im2col_step,
+                                nOutputPlane, outputHeight, outputWidth});
+  gradOutput.transpose_(1, 2);
+  at::Tensor gradOutputBuffer = at::zeros_like(gradOutput);
+  gradOutputBuffer =
+      gradOutputBuffer.view({batchSize / im2col_step, nOutputPlane, im2col_step,
+                             outputHeight, outputWidth});
+  gradOutputBuffer.copy_(gradOutput);
+  gradOutputBuffer =
+      gradOutputBuffer.view({batchSize / im2col_step, nOutputPlane,
+                             im2col_step * outputHeight, outputWidth});
+  gradOutput.transpose_(1, 2);
+  gradOutput =
+      gradOutput.view({batchSize, nOutputPlane, outputHeight, outputWidth});
+  input = input.view({batchSize / im2col_step, im2col_step, nInputPlane,
+                      inputHeight, inputWidth});
+  offset =
+      offset.view({batchSize / im2col_step, im2col_step,
+                   deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+  for (int elt = 0; elt < batchSize / im2col_step; elt++) {
+    deformable_im2col(input[elt], offset[elt], nInputPlane, inputHeight,
+                      inputWidth, kH, kW, padH, padW, dH, dW, dilationH,
+                      dilationW, im2col_step, deformable_group, columns);
+    // divide into group
+    gradOutputBuffer = gradOutputBuffer.view(
+        {gradOutputBuffer.size(0), group, gradOutputBuffer.size(1) / group,
+         gradOutputBuffer.size(2), gradOutputBuffer.size(3)});
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+    gradWeight =
+        gradWeight.view({group, gradWeight.size(0) / group, gradWeight.size(1),
+                         gradWeight.size(2), gradWeight.size(3)});
+    for (int g = 0; g < group; g++) {
+      gradWeight[g] = gradWeight[g]
+                          .flatten(1)
+                          .addmm_(gradOutputBuffer[elt][g].flatten(1),
+                                  columns[g].transpose(1, 0), 1.0, scale)
+                          .view_as(gradWeight[g]);
+    }
+    gradOutputBuffer = gradOutputBuffer.view(
+        {gradOutputBuffer.size(0),
+         gradOutputBuffer.size(1) * gradOutputBuffer.size(2),
+         gradOutputBuffer.size(3), gradOutputBuffer.size(4)});
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+    gradWeight = gradWeight.view({gradWeight.size(0) * gradWeight.size(1),
+                                  gradWeight.size(2), gradWeight.size(3),
+                                  gradWeight.size(4)});
+  }
+  input = input.view({batchSize, nInputPlane, inputHeight, inputWidth});
+  offset = offset.view(
+      {batchSize, deformable_group * 2 * kH * kW, outputHeight, outputWidth});
+  if (batch == 0) {
+    gradOutput = gradOutput.view({nOutputPlane, outputHeight, outputWidth});
+    input = input.view({nInputPlane, inputHeight, inputWidth});
+  }
+  return 1;
+}
+void modulated_deform_conv_cuda_forward(
+    at::Tensor input, at::Tensor weight, at::Tensor bias, at::Tensor ones,
+    at::Tensor offset, at::Tensor mask, at::Tensor output, at::Tensor columns,
+    int kernel_h, int kernel_w, const int stride_h, const int stride_w,
+    const int pad_h, const int pad_w, const int dilation_h,
+    const int dilation_w, const int group, const int deformable_group,
+    const bool with_bias)
+{
+  AT_CHECK(input.is_contiguous(), "input tensor has to be contiguous");
+  AT_CHECK(weight.is_contiguous(), "weight tensor has to be contiguous");
+  const int batch = input.size(0);
+  const int channels = input.size(1);
+  const int height = input.size(2);
+  const int width = input.size(3);
+  const int channels_out = weight.size(0);
+  const int channels_kernel = weight.size(1);
+  const int kernel_h_ = weight.size(2);
+  const int kernel_w_ = weight.size(3);
+  if (kernel_h_ != kernel_h || kernel_w_ != kernel_w)
+    AT_ERROR("Input shape and kernel shape wont match: (%d x %d vs %d x %d).",
+             kernel_h_, kernel_w, kernel_h_, kernel_w_);
+  if (channels != channels_kernel * group)
+    AT_ERROR("Input shape and kernel channels wont match: (%d vs %d).",
+             channels, channels_kernel * group);
+  const int height_out =
+      (height + 2 * pad_h - (dilation_h * (kernel_h - 1) + 1)) / stride_h + 1;
+  const int width_out =
+      (width + 2 * pad_w - (dilation_w * (kernel_w - 1) + 1)) / stride_w + 1;
+  if (ones.ndimension() != 2 ||
+      ones.size(0) * ones.size(1) < height_out * width_out) {
+    // Resize plane and fill with ones...
+    ones = at::ones({height_out, width_out}, input.options());
+  }
+  // resize output
+  output = output.view({batch, channels_out, height_out, width_out}).zero_();
+  // resize temporary columns
+  columns =
+      at::zeros({channels * kernel_h * kernel_w, 1 * height_out * width_out},
+                input.options());
+  output = output.view({output.size(0), group, output.size(1) / group,
+                        output.size(2), output.size(3)});
+  for (int b = 0; b < batch; b++) {
+    modulated_deformable_im2col_cuda(
+        input[b], offset[b], mask[b], 1, channels, height, width, height_out,
+        width_out, kernel_h, kernel_w, pad_h, pad_w, stride_h, stride_w,
+        dilation_h, dilation_w, deformable_group, columns);
+    // divide into group
+    weight = weight.view({group, weight.size(0) / group, weight.size(1),
+                          weight.size(2), weight.size(3)});
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+    for (int g = 0; g < group; g++) {
+      output[b][g] = output[b][g]
+                         .flatten(1)
+                         .addmm_(weight[g].flatten(1), columns[g])
+                         .view_as(output[b][g]);
+    }
+    weight = weight.view({weight.size(0) * weight.size(1), weight.size(2),
+                          weight.size(3), weight.size(4)});
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+  }
+  output = output.view({output.size(0), output.size(1) * output.size(2),
+                        output.size(3), output.size(4)});
+  if (with_bias) {
+    output += bias.view({1, bias.size(0), 1, 1});
+  }
+}
+void modulated_deform_conv_cuda_backward(
+    at::Tensor input, at::Tensor weight, at::Tensor bias, at::Tensor ones,
+    at::Tensor offset, at::Tensor mask, at::Tensor columns,
+    at::Tensor grad_input, at::Tensor grad_weight, at::Tensor grad_bias,
+    at::Tensor grad_offset, at::Tensor grad_mask, at::Tensor grad_output,
+    int kernel_h, int kernel_w, int stride_h, int stride_w, int pad_h,
+    int pad_w, int dilation_h, int dilation_w, int group, int deformable_group,
+    const bool with_bias)
+{
+  AT_CHECK(input.is_contiguous(), "input tensor has to be contiguous");
+  AT_CHECK(weight.is_contiguous(), "weight tensor has to be contiguous");
+  const int batch = input.size(0);
+  const int channels = input.size(1);
+  const int height = input.size(2);
+  const int width = input.size(3);
+  const int channels_kernel = weight.size(1);
+  const int kernel_h_ = weight.size(2);
+  const int kernel_w_ = weight.size(3);
+  if (kernel_h_ != kernel_h || kernel_w_ != kernel_w)
+    AT_ERROR("Input shape and kernel shape wont match: (%d x %d vs %d x %d).",
+             kernel_h_, kernel_w, kernel_h_, kernel_w_);
+  if (channels != channels_kernel * group)
+    AT_ERROR("Input shape and kernel channels wont match: (%d vs %d).",
+             channels, channels_kernel * group);
+  const int height_out =
+      (height + 2 * pad_h - (dilation_h * (kernel_h - 1) + 1)) / stride_h + 1;
+  const int width_out =
+      (width + 2 * pad_w - (dilation_w * (kernel_w - 1) + 1)) / stride_w + 1;
+  if (ones.ndimension() != 2 ||
+      ones.size(0) * ones.size(1) < height_out * width_out) {
+    // Resize plane and fill with ones...
+    ones = at::ones({height_out, width_out}, input.options());
+  }
+  grad_input = grad_input.view({batch, channels, height, width});
+  columns = at::zeros({channels * kernel_h * kernel_w, height_out * width_out},
+                      input.options());
+  grad_output =
+      grad_output.view({grad_output.size(0), group, grad_output.size(1) / group,
+                        grad_output.size(2), grad_output.size(3)});
+  for (int b = 0; b < batch; b++) {
+    // divide int group
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+    weight = weight.view({group, weight.size(0) / group, weight.size(1),
+                          weight.size(2), weight.size(3)});
+    for (int g = 0; g < group; g++) {
+      columns[g].addmm_(weight[g].flatten(1).transpose(0, 1),
+                        grad_output[b][g].flatten(1), 0.0f, 1.0f);
+    }
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+    weight = weight.view({weight.size(0) * weight.size(1), weight.size(2),
+                          weight.size(3), weight.size(4)});
+    // gradient w.r.t. input coordinate data
+    modulated_deformable_col2im_coord_cuda(
+        columns, input[b], offset[b], mask[b], 1, channels, height, width,
+        height_out, width_out, kernel_h, kernel_w, pad_h, pad_w, stride_h,
+        stride_w, dilation_h, dilation_w, deformable_group, grad_offset[b],
+        grad_mask[b]);
+    // gradient w.r.t. input data
+    modulated_deformable_col2im_cuda(
+        columns, offset[b], mask[b], 1, channels, height, width, height_out,
+        width_out, kernel_h, kernel_w, pad_h, pad_w, stride_h, stride_w,
+        dilation_h, dilation_w, deformable_group, grad_input[b]);
+    // gradient w.r.t. weight, dWeight should accumulate across the batch and
+    // group
+    modulated_deformable_im2col_cuda(
+        input[b], offset[b], mask[b], 1, channels, height, width, height_out,
+        width_out, kernel_h, kernel_w, pad_h, pad_w, stride_h, stride_w,
+        dilation_h, dilation_w, deformable_group, columns);
+    columns = columns.view({group, columns.size(0) / group, columns.size(1)});
+    grad_weight = grad_weight.view({group, grad_weight.size(0) / group,
+                                    grad_weight.size(1), grad_weight.size(2),
+                                    grad_weight.size(3)});
+    if (with_bias)
+      grad_bias = grad_bias.view({group, grad_bias.size(0) / group});
+    for (int g = 0; g < group; g++) {
+      grad_weight[g] =
+          grad_weight[g]
+              .flatten(1)
+              .addmm_(grad_output[b][g].flatten(1), columns[g].transpose(0, 1))
+              .view_as(grad_weight[g]);
+      if (with_bias) {
+        grad_bias[g] =
+            grad_bias[g]
+                .view({-1, 1})
+                .addmm_(grad_output[b][g].flatten(1), ones.view({-1, 1}))
+                .view(-1);
+      }
+    }
+    columns =
+        columns.view({columns.size(0) * columns.size(1), columns.size(2)});
+    grad_weight = grad_weight.view({grad_weight.size(0) * grad_weight.size(1),
+                                    grad_weight.size(2), grad_weight.size(3),
+                                    grad_weight.size(4)});
+    if (with_bias)
+      grad_bias = grad_bias.view({grad_bias.size(0) * grad_bias.size(1)});
+  }
+  grad_output = grad_output.view({grad_output.size(0) * grad_output.size(1),
+                                  grad_output.size(2), grad_output.size(3),
+                                  grad_output.size(4)});
+}

maskrcnn_benchmark/csrc/cuda/deform_conv_kernel_cuda.cu ADDED Viewed

	@@ -0,0 +1,874 @@

+/*!
+ ******************* BEGIN Caffe Copyright Notice and Disclaimer ****************
+ *
+ * COPYRIGHT
+ *
+ * All contributions by the University of California:
+ * Copyright (c) 2014-2017 The Regents of the University of California (Regents)
+ * All rights reserved.
+ *
+ * All other contributions:
+ * Copyright (c) 2014-2017, the respective contributors
+ * All rights reserved.
+ *
+ * Caffe uses a shared copyright model: each contributor holds copyright over
+ * their contributions to Caffe. The project versioning records all such
+ * contribution and copyright details. If a contributor wants to further mark
+ * their specific copyright on a particular contribution, they should indicate
+ * their copyright solely in the commit message of the change when it is
+ * committed.
+ *
+ * LICENSE
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice, this
+ * list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright notice,
+ * this list of conditions and the following disclaimer in the documentation
+ * and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * CONTRIBUTION AGREEMENT
+ *
+ * By contributing to the BVLC/caffe repository through pull-request, comment,
+ * or otherwise, the contributor releases their content to the
+ * license and copyright terms herein.
+ *
+ ***************** END Caffe Copyright Notice and Disclaimer ********************
+ *
+ * Copyright (c) 2018 Microsoft
+ * Licensed under The MIT License [see LICENSE for details]
+ * \file modulated_deformable_im2col.cuh
+ * \brief Function definitions of converting an image to
+ * column matrix based on kernel, padding, dilation, and offset.
+ * These functions are mainly used in deformable convolution operators.
+ * \ref: https://arxiv.org/abs/1703.06211
+ * \author Yuwen Xiong, Haozhi Qi, Jifeng Dai, Xizhou Zhu, Han Hu, Dazhi Cheng
+ */
+// modify from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/blob/mmdetection/mmdet/ops/dcn/src/deform_conv_cuda_kernel.cu
+#include <ATen/ATen.h>
+#include <THC/THCAtomics.cuh>
+#include <stdio.h>
+#include <math.h>
+#include <float.h>
+using namespace at;
+#define CUDA_KERNEL_LOOP(i, n)                                 \
+  for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); \
+       i += blockDim.x * gridDim.x)
+const int CUDA_NUM_THREADS = 1024;
+const int kMaxGridNum = 65535;
+inline int GET_BLOCKS(const int N)
+{
+  return std::min(kMaxGridNum, (N + CUDA_NUM_THREADS - 1) / CUDA_NUM_THREADS);
+}
+/*
+const int CUDA_NUM_THREADS = 1024;
+inline int GET_BLOCKS(const int N)
+{
+  return (N + CUDA_NUM_THREADS - 1) / CUDA_NUM_THREADS;
+}*/
+template <typename scalar_t>
+__device__ scalar_t deformable_im2col_bilinear(const scalar_t *bottom_data, const int data_width,
+                                               const int height, const int width, scalar_t h, scalar_t w)
+{
+  int h_low = floor(h);
+  int w_low = floor(w);
+  int h_high = h_low + 1;
+  int w_high = w_low + 1;
+  scalar_t lh = h - h_low;
+  scalar_t lw = w - w_low;
+  scalar_t hh = 1 - lh, hw = 1 - lw;
+  scalar_t v1 = 0;
+  if (h_low >= 0 && w_low >= 0)
+    v1 = bottom_data[h_low * data_width + w_low];
+  scalar_t v2 = 0;
+  if (h_low >= 0 && w_high <= width - 1)
+    v2 = bottom_data[h_low * data_width + w_high];
+  scalar_t v3 = 0;
+  if (h_high <= height - 1 && w_low >= 0)
+    v3 = bottom_data[h_high * data_width + w_low];
+  scalar_t v4 = 0;
+  if (h_high <= height - 1 && w_high <= width - 1)
+    v4 = bottom_data[h_high * data_width + w_high];
+  scalar_t w1 = hh * hw, w2 = hh * lw, w3 = lh * hw, w4 = lh * lw;
+  scalar_t val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
+  return val;
+}
+template <typename scalar_t>
+__device__ scalar_t get_gradient_weight(scalar_t argmax_h, scalar_t argmax_w,
+                                        const int h, const int w, const int height, const int width)
+{
+  if (argmax_h <= -1 || argmax_h >= height || argmax_w <= -1 || argmax_w >= width)
+  {
+    //empty
+    return 0;
+  }
+  int argmax_h_low = floor(argmax_h);
+  int argmax_w_low = floor(argmax_w);
+  int argmax_h_high = argmax_h_low + 1;
+  int argmax_w_high = argmax_w_low + 1;
+  scalar_t weight = 0;
+  if (h == argmax_h_low && w == argmax_w_low)
+    weight = (h + 1 - argmax_h) * (w + 1 - argmax_w);
+  if (h == argmax_h_low && w == argmax_w_high)
+    weight = (h + 1 - argmax_h) * (argmax_w + 1 - w);
+  if (h == argmax_h_high && w == argmax_w_low)
+    weight = (argmax_h + 1 - h) * (w + 1 - argmax_w);
+  if (h == argmax_h_high && w == argmax_w_high)
+    weight = (argmax_h + 1 - h) * (argmax_w + 1 - w);
+  return weight;
+}
+template <typename scalar_t>
+__device__ scalar_t get_coordinate_weight(scalar_t argmax_h, scalar_t argmax_w,
+                                          const int height, const int width, const scalar_t *im_data,
+                                          const int data_width, const int bp_dir)
+{
+  if (argmax_h <= -1 || argmax_h >= height || argmax_w <= -1 || argmax_w >= width)
+  {
+    //empty
+    return 0;
+  }
+  int argmax_h_low = floor(argmax_h);
+  int argmax_w_low = floor(argmax_w);
+  int argmax_h_high = argmax_h_low + 1;
+  int argmax_w_high = argmax_w_low + 1;
+  scalar_t weight = 0;
+  if (bp_dir == 0)
+  {
+    if (argmax_h_low >= 0 && argmax_w_low >= 0)
+      weight += -1 * (argmax_w_low + 1 - argmax_w) * im_data[argmax_h_low * data_width + argmax_w_low];
+    if (argmax_h_low >= 0 && argmax_w_high <= width - 1)
+      weight += -1 * (argmax_w - argmax_w_low) * im_data[argmax_h_low * data_width + argmax_w_high];
+    if (argmax_h_high <= height - 1 && argmax_w_low >= 0)
+      weight += (argmax_w_low + 1 - argmax_w) * im_data[argmax_h_high * data_width + argmax_w_low];
+    if (argmax_h_high <= height - 1 && argmax_w_high <= width - 1)
+      weight += (argmax_w - argmax_w_low) * im_data[argmax_h_high * data_width + argmax_w_high];
+  }
+  else if (bp_dir == 1)
+  {
+    if (argmax_h_low >= 0 && argmax_w_low >= 0)
+      weight += -1 * (argmax_h_low + 1 - argmax_h) * im_data[argmax_h_low * data_width + argmax_w_low];
+    if (argmax_h_low >= 0 && argmax_w_high <= width - 1)
+      weight += (argmax_h_low + 1 - argmax_h) * im_data[argmax_h_low * data_width + argmax_w_high];
+    if (argmax_h_high <= height - 1 && argmax_w_low >= 0)
+      weight += -1 * (argmax_h - argmax_h_low) * im_data[argmax_h_high * data_width + argmax_w_low];
+    if (argmax_h_high <= height - 1 && argmax_w_high <= width - 1)
+      weight += (argmax_h - argmax_h_low) * im_data[argmax_h_high * data_width + argmax_w_high];
+  }
+  return weight;
+}
+template <typename scalar_t>
+__global__ void deformable_im2col_gpu_kernel(const int n, const scalar_t *data_im, const scalar_t *data_offset,
+                                             const int height, const int width, const int kernel_h, const int kernel_w,
+                                             const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+                                             const int dilation_h, const int dilation_w, const int channel_per_deformable_group,
+                                             const int batch_size, const int num_channels, const int deformable_group,
+                                             const int height_col, const int width_col,
+                                             scalar_t *data_col)
+{
+  CUDA_KERNEL_LOOP(index, n)
+  {
+    // index index of output matrix
+    const int w_col = index % width_col;
+    const int h_col = (index / width_col) % height_col;
+    const int b_col = (index / width_col / height_col) % batch_size;
+    const int c_im = (index / width_col / height_col) / batch_size;
+    const int c_col = c_im * kernel_h * kernel_w;
+    // compute deformable group index
+    const int deformable_group_index = c_im / channel_per_deformable_group;
+    const int h_in = h_col * stride_h - pad_h;
+    const int w_in = w_col * stride_w - pad_w;
+    scalar_t *data_col_ptr = data_col + ((c_col * batch_size + b_col) * height_col + h_col) * width_col + w_col;
+    //const scalar_t* data_im_ptr = data_im + ((b_col * num_channels + c_im) * height + h_in) * width + w_in;
+    const scalar_t *data_im_ptr = data_im + (b_col * num_channels + c_im) * height * width;
+    const scalar_t *data_offset_ptr = data_offset + (b_col * deformable_group + deformable_group_index) * 2 * kernel_h * kernel_w * height_col * width_col;
+    for (int i = 0; i < kernel_h; ++i)
+    {
+      for (int j = 0; j < kernel_w; ++j)
+      {
+        const int data_offset_h_ptr = ((2 * (i * kernel_w + j)) * height_col + h_col) * width_col + w_col;
+        const int data_offset_w_ptr = ((2 * (i * kernel_w + j) + 1) * height_col + h_col) * width_col + w_col;
+        const scalar_t offset_h = data_offset_ptr[data_offset_h_ptr];
+        const scalar_t offset_w = data_offset_ptr[data_offset_w_ptr];
+        scalar_t val = static_cast<scalar_t>(0);
+        const scalar_t h_im = h_in + i * dilation_h + offset_h;
+        const scalar_t w_im = w_in + j * dilation_w + offset_w;
+        if (h_im > -1 && w_im > -1 && h_im < height && w_im < width)
+        {
+          //const scalar_t map_h = i * dilation_h + offset_h;
+          //const scalar_t map_w = j * dilation_w + offset_w;
+          //const int cur_height = height - h_in;
+          //const int cur_width = width - w_in;
+          //val = deformable_im2col_bilinear(data_im_ptr, width, cur_height, cur_width, map_h, map_w);
+          val = deformable_im2col_bilinear(data_im_ptr, width, height, width, h_im, w_im);
+        }
+        *data_col_ptr = val;
+        data_col_ptr += batch_size * height_col * width_col;
+      }
+    }
+  }
+}
+void deformable_im2col(
+    const at::Tensor data_im, const at::Tensor data_offset, const int channels,
+    const int height, const int width, const int ksize_h, const int ksize_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w, const int parallel_imgs,
+    const int deformable_group, at::Tensor data_col)
+{
+  // num_axes should be smaller than block size
+  // todo: check parallel_imgs is correctly passed in
+  int height_col = (height + 2 * pad_h - (dilation_h * (ksize_h - 1) + 1)) / stride_h + 1;
+  int width_col = (width + 2 * pad_w - (dilation_w * (ksize_w - 1) + 1)) / stride_w + 1;
+  int num_kernels = channels * height_col * width_col * parallel_imgs;
+  int channel_per_deformable_group = channels / deformable_group;
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_im.type(), "deformable_im2col_gpu", ([&] {
+        const scalar_t *data_im_ = data_im.data<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data<scalar_t>();
+        scalar_t *data_col_ = data_col.data<scalar_t>();
+        deformable_im2col_gpu_kernel<<<GET_BLOCKS(num_kernels), CUDA_NUM_THREADS>>>(
+            num_kernels, data_im_, data_offset_, height, width, ksize_h, ksize_w,
+            pad_h, pad_w, stride_h, stride_w, dilation_h, dilation_w,
+            channel_per_deformable_group, parallel_imgs, channels, deformable_group,
+            height_col, width_col, data_col_);
+      }));
+  cudaError_t err = cudaGetLastError();
+  if (err != cudaSuccess)
+  {
+    printf("error in deformable_im2col: %s\n", cudaGetErrorString(err));
+  }
+}
+template <typename scalar_t>
+__global__ void deformable_col2im_gpu_kernel(
+    const int n, const scalar_t *data_col, const scalar_t *data_offset,
+    const int channels, const int height, const int width,
+    const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w,
+    const int channel_per_deformable_group,
+    const int batch_size, const int deformable_group,
+    const int height_col, const int width_col,
+    scalar_t *grad_im)
+{
+  CUDA_KERNEL_LOOP(index, n)
+  {
+    const int j = (index / width_col / height_col / batch_size) % kernel_w;
+    const int i = (index / width_col / height_col / batch_size / kernel_w) % kernel_h;
+    const int c = index / width_col / height_col / batch_size / kernel_w / kernel_h;
+    // compute the start and end of the output
+    const int deformable_group_index = c / channel_per_deformable_group;
+    int w_out = index % width_col;
+    int h_out = (index / width_col) % height_col;
+    int b = (index / width_col / height_col) % batch_size;
+    int w_in = w_out * stride_w - pad_w;
+    int h_in = h_out * stride_h - pad_h;
+    const scalar_t *data_offset_ptr = data_offset + (b * deformable_group + deformable_group_index) *
+                                                        2 * kernel_h * kernel_w * height_col * width_col;
+    const int data_offset_h_ptr = ((2 * (i * kernel_w + j)) * height_col + h_out) * width_col + w_out;
+    const int data_offset_w_ptr = ((2 * (i * kernel_w + j) + 1) * height_col + h_out) * width_col + w_out;
+    const scalar_t offset_h = data_offset_ptr[data_offset_h_ptr];
+    const scalar_t offset_w = data_offset_ptr[data_offset_w_ptr];
+    const scalar_t cur_inv_h_data = h_in + i * dilation_h + offset_h;
+    const scalar_t cur_inv_w_data = w_in + j * dilation_w + offset_w;
+    const scalar_t cur_top_grad = data_col[index];
+    const int cur_h = (int)cur_inv_h_data;
+    const int cur_w = (int)cur_inv_w_data;
+    for (int dy = -2; dy <= 2; dy++)
+    {
+      for (int dx = -2; dx <= 2; dx++)
+      {
+        if (cur_h + dy >= 0 && cur_h + dy < height &&
+            cur_w + dx >= 0 && cur_w + dx < width &&
+            abs(cur_inv_h_data - (cur_h + dy)) < 1 &&
+            abs(cur_inv_w_data - (cur_w + dx)) < 1)
+        {
+          int cur_bottom_grad_pos = ((b * channels + c) * height + cur_h + dy) * width + cur_w + dx;
+          scalar_t weight = get_gradient_weight(cur_inv_h_data, cur_inv_w_data, cur_h + dy, cur_w + dx, height, width);
+          atomicAdd(grad_im + cur_bottom_grad_pos, weight * cur_top_grad);
+        }
+      }
+    }
+  }
+}
+void deformable_col2im(
+    const at::Tensor data_col, const at::Tensor data_offset, const int channels,
+    const int height, const int width, const int ksize_h,
+    const int ksize_w, const int pad_h, const int pad_w,
+    const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w,
+    const int parallel_imgs, const int deformable_group,
+    at::Tensor grad_im)
+{
+  // todo: make sure parallel_imgs is passed in correctly
+  int height_col = (height + 2 * pad_h - (dilation_h * (ksize_h - 1) + 1)) / stride_h + 1;
+  int width_col = (width + 2 * pad_w - (dilation_w * (ksize_w - 1) + 1)) / stride_w + 1;
+  int num_kernels = channels * ksize_h * ksize_w * height_col * width_col * parallel_imgs;
+  int channel_per_deformable_group = channels / deformable_group;
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_col.type(), "deformable_col2im_gpu", ([&] {
+        const scalar_t *data_col_ = data_col.data<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data<scalar_t>();
+        scalar_t *grad_im_ = grad_im.data<scalar_t>();
+        deformable_col2im_gpu_kernel<<<GET_BLOCKS(num_kernels), CUDA_NUM_THREADS>>>(
+            num_kernels, data_col_, data_offset_, channels, height, width, ksize_h,
+            ksize_w, pad_h, pad_w, stride_h, stride_w,
+            dilation_h, dilation_w, channel_per_deformable_group,
+            parallel_imgs, deformable_group, height_col, width_col, grad_im_);
+      }));
+  cudaError_t err = cudaGetLastError();
+  if (err != cudaSuccess)
+  {
+    printf("error in deformable_col2im: %s\n", cudaGetErrorString(err));
+  }
+}
+template <typename scalar_t>
+__global__ void deformable_col2im_coord_gpu_kernel(const int n, const scalar_t *data_col,
+                                                   const scalar_t *data_im, const scalar_t *data_offset,
+                                                   const int channels, const int height, const int width,
+                                                   const int kernel_h, const int kernel_w,
+                                                   const int pad_h, const int pad_w,
+                                                   const int stride_h, const int stride_w,
+                                                   const int dilation_h, const int dilation_w,
+                                                   const int channel_per_deformable_group,
+                                                   const int batch_size, const int offset_channels, const int deformable_group,
+                                                   const int height_col, const int width_col, scalar_t *grad_offset)
+{
+  CUDA_KERNEL_LOOP(index, n)
+  {
+    scalar_t val = 0;
+    int w = index % width_col;
+    int h = (index / width_col) % height_col;
+    int c = (index / width_col / height_col) % offset_channels;
+    int b = (index / width_col / height_col) / offset_channels;
+    // compute the start and end of the output
+    const int deformable_group_index = c / (2 * kernel_h * kernel_w);
+    const int col_step = kernel_h * kernel_w;
+    int cnt = 0;
+    const scalar_t *data_col_ptr = data_col + deformable_group_index * channel_per_deformable_group *
+                                                  batch_size * width_col * height_col;
+    const scalar_t *data_im_ptr = data_im + (b * deformable_group + deformable_group_index) *
+                                                channel_per_deformable_group / kernel_h / kernel_w * height * width;
+    const scalar_t *data_offset_ptr = data_offset + (b * deformable_group + deformable_group_index) * 2 *
+                                                        kernel_h * kernel_w * height_col * width_col;
+    const int offset_c = c - deformable_group_index * 2 * kernel_h * kernel_w;
+    for (int col_c = (offset_c / 2); col_c < channel_per_deformable_group; col_c += col_step)
+    {
+      const int col_pos = (((col_c * batch_size + b) * height_col) + h) * width_col + w;
+      const int bp_dir = offset_c % 2;
+      int j = (col_pos / width_col / height_col / batch_size) % kernel_w;
+      int i = (col_pos / width_col / height_col / batch_size / kernel_w) % kernel_h;
+      int w_out = col_pos % width_col;
+      int h_out = (col_pos / width_col) % height_col;
+      int w_in = w_out * stride_w - pad_w;
+      int h_in = h_out * stride_h - pad_h;
+      const int data_offset_h_ptr = (((2 * (i * kernel_w + j)) * height_col + h_out) * width_col + w_out);
+      const int data_offset_w_ptr = (((2 * (i * kernel_w + j) + 1) * height_col + h_out) * width_col + w_out);
+      const scalar_t offset_h = data_offset_ptr[data_offset_h_ptr];
+      const scalar_t offset_w = data_offset_ptr[data_offset_w_ptr];
+      scalar_t inv_h = h_in + i * dilation_h + offset_h;
+      scalar_t inv_w = w_in + j * dilation_w + offset_w;
+      if (inv_h <= -1 || inv_w <= -1 || inv_h >= height || inv_w >= width)
+      {
+        inv_h = inv_w = -2;
+      }
+      const scalar_t weight = get_coordinate_weight(
+          inv_h, inv_w,
+          height, width, data_im_ptr + cnt * height * width, width, bp_dir);
+      val += weight * data_col_ptr[col_pos];
+      cnt += 1;
+    }
+    grad_offset[index] = val;
+  }
+}
+void deformable_col2im_coord(
+    const at::Tensor data_col, const at::Tensor data_im, const at::Tensor data_offset,
+    const int channels, const int height, const int width, const int ksize_h,
+    const int ksize_w, const int pad_h, const int pad_w, const int stride_h,
+    const int stride_w, const int dilation_h, const int dilation_w,
+    const int parallel_imgs, const int deformable_group, at::Tensor grad_offset)
+{
+  int height_col = (height + 2 * pad_h - (dilation_h * (ksize_h - 1) + 1)) / stride_h + 1;
+  int width_col = (width + 2 * pad_w - (dilation_w * (ksize_w - 1) + 1)) / stride_w + 1;
+  int num_kernels = height_col * width_col * 2 * ksize_h * ksize_w * deformable_group * parallel_imgs;
+  int channel_per_deformable_group = channels * ksize_h * ksize_w / deformable_group;
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_col.type(), "deformable_col2im_coord_gpu", ([&] {
+        const scalar_t *data_col_ = data_col.data<scalar_t>();
+        const scalar_t *data_im_ = data_im.data<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data<scalar_t>();
+        scalar_t *grad_offset_ = grad_offset.data<scalar_t>();
+        deformable_col2im_coord_gpu_kernel<<<GET_BLOCKS(num_kernels), CUDA_NUM_THREADS>>>(
+            num_kernels, data_col_, data_im_, data_offset_, channels, height, width,
+            ksize_h, ksize_w, pad_h, pad_w, stride_h, stride_w,
+            dilation_h, dilation_w, channel_per_deformable_group,
+            parallel_imgs, 2 * ksize_h * ksize_w * deformable_group, deformable_group,
+            height_col, width_col, grad_offset_);
+      }));
+}
+template <typename scalar_t>
+__device__ scalar_t dmcn_im2col_bilinear(const scalar_t *bottom_data, const int data_width,
+                                         const int height, const int width, scalar_t h, scalar_t w)
+{
+  int h_low = floor(h);
+  int w_low = floor(w);
+  int h_high = h_low + 1;
+  int w_high = w_low + 1;
+  scalar_t lh = h - h_low;
+  scalar_t lw = w - w_low;
+  scalar_t hh = 1 - lh, hw = 1 - lw;
+  scalar_t v1 = 0;
+  if (h_low >= 0 && w_low >= 0)
+    v1 = bottom_data[h_low * data_width + w_low];
+  scalar_t v2 = 0;
+  if (h_low >= 0 && w_high <= width - 1)
+    v2 = bottom_data[h_low * data_width + w_high];
+  scalar_t v3 = 0;
+  if (h_high <= height - 1 && w_low >= 0)
+    v3 = bottom_data[h_high * data_width + w_low];
+  scalar_t v4 = 0;
+  if (h_high <= height - 1 && w_high <= width - 1)
+    v4 = bottom_data[h_high * data_width + w_high];
+  scalar_t w1 = hh * hw, w2 = hh * lw, w3 = lh * hw, w4 = lh * lw;
+  scalar_t val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
+  return val;
+}
+template <typename scalar_t>
+__device__ scalar_t dmcn_get_gradient_weight(scalar_t argmax_h, scalar_t argmax_w,
+                                             const int h, const int w, const int height, const int width)
+{
+  if (argmax_h <= -1 || argmax_h >= height || argmax_w <= -1 || argmax_w >= width)
+  {
+    //empty
+    return 0;
+  }
+  int argmax_h_low = floor(argmax_h);
+  int argmax_w_low = floor(argmax_w);
+  int argmax_h_high = argmax_h_low + 1;
+  int argmax_w_high = argmax_w_low + 1;
+  scalar_t weight = 0;
+  if (h == argmax_h_low && w == argmax_w_low)
+    weight = (h + 1 - argmax_h) * (w + 1 - argmax_w);
+  if (h == argmax_h_low && w == argmax_w_high)
+    weight = (h + 1 - argmax_h) * (argmax_w + 1 - w);
+  if (h == argmax_h_high && w == argmax_w_low)
+    weight = (argmax_h + 1 - h) * (w + 1 - argmax_w);
+  if (h == argmax_h_high && w == argmax_w_high)
+    weight = (argmax_h + 1 - h) * (argmax_w + 1 - w);
+  return weight;
+}
+template <typename scalar_t>
+__device__ scalar_t dmcn_get_coordinate_weight(scalar_t argmax_h, scalar_t argmax_w,
+                                               const int height, const int width, const scalar_t *im_data,
+                                               const int data_width, const int bp_dir)
+{
+  if (argmax_h <= -1 || argmax_h >= height || argmax_w <= -1 || argmax_w >= width)
+  {
+    //empty
+    return 0;
+  }
+  int argmax_h_low = floor(argmax_h);
+  int argmax_w_low = floor(argmax_w);
+  int argmax_h_high = argmax_h_low + 1;
+  int argmax_w_high = argmax_w_low + 1;
+  scalar_t weight = 0;
+  if (bp_dir == 0)
+  {
+    if (argmax_h_low >= 0 && argmax_w_low >= 0)
+      weight += -1 * (argmax_w_low + 1 - argmax_w) * im_data[argmax_h_low * data_width + argmax_w_low];
+    if (argmax_h_low >= 0 && argmax_w_high <= width - 1)
+      weight += -1 * (argmax_w - argmax_w_low) * im_data[argmax_h_low * data_width + argmax_w_high];
+    if (argmax_h_high <= height - 1 && argmax_w_low >= 0)
+      weight += (argmax_w_low + 1 - argmax_w) * im_data[argmax_h_high * data_width + argmax_w_low];
+    if (argmax_h_high <= height - 1 && argmax_w_high <= width - 1)
+      weight += (argmax_w - argmax_w_low) * im_data[argmax_h_high * data_width + argmax_w_high];
+  }
+  else if (bp_dir == 1)
+  {
+    if (argmax_h_low >= 0 && argmax_w_low >= 0)
+      weight += -1 * (argmax_h_low + 1 - argmax_h) * im_data[argmax_h_low * data_width + argmax_w_low];
+    if (argmax_h_low >= 0 && argmax_w_high <= width - 1)
+      weight += (argmax_h_low + 1 - argmax_h) * im_data[argmax_h_low * data_width + argmax_w_high];
+    if (argmax_h_high <= height - 1 && argmax_w_low >= 0)
+      weight += -1 * (argmax_h - argmax_h_low) * im_data[argmax_h_high * data_width + argmax_w_low];
+    if (argmax_h_high <= height - 1 && argmax_w_high <= width - 1)
+      weight += (argmax_h - argmax_h_low) * im_data[argmax_h_high * data_width + argmax_w_high];
+  }
+  return weight;
+}
+template <typename scalar_t>
+__global__ void modulated_deformable_im2col_gpu_kernel(const int n,
+                                                       const scalar_t *data_im, const scalar_t *data_offset, const scalar_t *data_mask,
+                                                       const int height, const int width, const int kernel_h, const int kernel_w,
+                                                       const int pad_h, const int pad_w,
+                                                       const int stride_h, const int stride_w,
+                                                       const int dilation_h, const int dilation_w,
+                                                       const int channel_per_deformable_group,
+                                                       const int batch_size, const int num_channels, const int deformable_group,
+                                                       const int height_col, const int width_col,
+                                                       scalar_t *data_col)
+{
+  CUDA_KERNEL_LOOP(index, n)
+  {
+    // index index of output matrix
+    const int w_col = index % width_col;
+    const int h_col = (index / width_col) % height_col;
+    const int b_col = (index / width_col / height_col) % batch_size;
+    const int c_im = (index / width_col / height_col) / batch_size;
+    const int c_col = c_im * kernel_h * kernel_w;
+    // compute deformable group index
+    const int deformable_group_index = c_im / channel_per_deformable_group;
+    const int h_in = h_col * stride_h - pad_h;
+    const int w_in = w_col * stride_w - pad_w;
+    scalar_t *data_col_ptr = data_col + ((c_col * batch_size + b_col) * height_col + h_col) * width_col + w_col;
+    //const float* data_im_ptr = data_im + ((b_col * num_channels + c_im) * height + h_in) * width + w_in;
+    const scalar_t *data_im_ptr = data_im + (b_col * num_channels + c_im) * height * width;
+    const scalar_t *data_offset_ptr = data_offset + (b_col * deformable_group + deformable_group_index) * 2 * kernel_h * kernel_w * height_col * width_col;
+    const scalar_t *data_mask_ptr = data_mask + (b_col * deformable_group + deformable_group_index) * kernel_h * kernel_w * height_col * width_col;
+    for (int i = 0; i < kernel_h; ++i)
+    {
+      for (int j = 0; j < kernel_w; ++j)
+      {
+        const int data_offset_h_ptr = ((2 * (i * kernel_w + j)) * height_col + h_col) * width_col + w_col;
+        const int data_offset_w_ptr = ((2 * (i * kernel_w + j) + 1) * height_col + h_col) * width_col + w_col;
+        const int data_mask_hw_ptr = ((i * kernel_w + j) * height_col + h_col) * width_col + w_col;
+        const scalar_t offset_h = data_offset_ptr[data_offset_h_ptr];
+        const scalar_t offset_w = data_offset_ptr[data_offset_w_ptr];
+        const scalar_t mask = data_mask_ptr[data_mask_hw_ptr];
+        scalar_t val = static_cast<scalar_t>(0);
+        const scalar_t h_im = h_in + i * dilation_h + offset_h;
+        const scalar_t w_im = w_in + j * dilation_w + offset_w;
+        //if (h_im >= 0 && w_im >= 0 && h_im < height && w_im < width) {
+        if (h_im > -1 && w_im > -1 && h_im < height && w_im < width)
+        {
+          //const float map_h = i * dilation_h + offset_h;
+          //const float map_w = j * dilation_w + offset_w;
+          //const int cur_height = height - h_in;
+          //const int cur_width = width - w_in;
+          //val = dmcn_im2col_bilinear(data_im_ptr, width, cur_height, cur_width, map_h, map_w);
+          val = dmcn_im2col_bilinear(data_im_ptr, width, height, width, h_im, w_im);
+        }
+        *data_col_ptr = val * mask;
+        data_col_ptr += batch_size * height_col * width_col;
+        //data_col_ptr += height_col * width_col;
+      }
+    }
+  }
+}
+template <typename scalar_t>
+__global__ void modulated_deformable_col2im_gpu_kernel(const int n,
+                                                       const scalar_t *data_col, const scalar_t *data_offset, const scalar_t *data_mask,
+                                                       const int channels, const int height, const int width,
+                                                       const int kernel_h, const int kernel_w,
+                                                       const int pad_h, const int pad_w,
+                                                       const int stride_h, const int stride_w,
+                                                       const int dilation_h, const int dilation_w,
+                                                       const int channel_per_deformable_group,
+                                                       const int batch_size, const int deformable_group,
+                                                       const int height_col, const int width_col,
+                                                       scalar_t *grad_im)
+{
+  CUDA_KERNEL_LOOP(index, n)
+  {
+    const int j = (index / width_col / height_col / batch_size) % kernel_w;
+    const int i = (index / width_col / height_col / batch_size / kernel_w) % kernel_h;
+    const int c = index / width_col / height_col / batch_size / kernel_w / kernel_h;
+    // compute the start and end of the output
+    const int deformable_group_index = c / channel_per_deformable_group;
+    int w_out = index % width_col;
+    int h_out = (index / width_col) % height_col;
+    int b = (index / width_col / height_col) % batch_size;
+    int w_in = w_out * stride_w - pad_w;
+    int h_in = h_out * stride_h - pad_h;
+    const scalar_t *data_offset_ptr = data_offset + (b * deformable_group + deformable_group_index) * 2 * kernel_h * kernel_w * height_col * width_col;
+    const scalar_t *data_mask_ptr = data_mask + (b * deformable_group + deformable_group_index) * kernel_h * kernel_w * height_col * width_col;
+    const int data_offset_h_ptr = ((2 * (i * kernel_w + j)) * height_col + h_out) * width_col + w_out;
+    const int data_offset_w_ptr = ((2 * (i * kernel_w + j) + 1) * height_col + h_out) * width_col + w_out;
+    const int data_mask_hw_ptr = ((i * kernel_w + j) * height_col + h_out) * width_col + w_out;
+    const scalar_t offset_h = data_offset_ptr[data_offset_h_ptr];
+    const scalar_t offset_w = data_offset_ptr[data_offset_w_ptr];
+    const scalar_t mask = data_mask_ptr[data_mask_hw_ptr];
+    const scalar_t cur_inv_h_data = h_in + i * dilation_h + offset_h;
+    const scalar_t cur_inv_w_data = w_in + j * dilation_w + offset_w;
+    const scalar_t cur_top_grad = data_col[index] * mask;
+    const int cur_h = (int)cur_inv_h_data;
+    const int cur_w = (int)cur_inv_w_data;
+    for (int dy = -2; dy <= 2; dy++)
+    {
+      for (int dx = -2; dx <= 2; dx++)
+      {
+        if (cur_h + dy >= 0 && cur_h + dy < height &&
+            cur_w + dx >= 0 && cur_w + dx < width &&
+            abs(cur_inv_h_data - (cur_h + dy)) < 1 &&
+            abs(cur_inv_w_data - (cur_w + dx)) < 1)
+        {
+          int cur_bottom_grad_pos = ((b * channels + c) * height + cur_h + dy) * width + cur_w + dx;
+          scalar_t weight = dmcn_get_gradient_weight(cur_inv_h_data, cur_inv_w_data, cur_h + dy, cur_w + dx, height, width);
+          atomicAdd(grad_im + cur_bottom_grad_pos, weight * cur_top_grad);
+        }
+      }
+    }
+  }
+}
+template <typename scalar_t>
+__global__ void modulated_deformable_col2im_coord_gpu_kernel(const int n,
+                                                             const scalar_t *data_col, const scalar_t *data_im,
+                                                             const scalar_t *data_offset, const scalar_t *data_mask,
+                                                             const int channels, const int height, const int width,
+                                                             const int kernel_h, const int kernel_w,
+                                                             const int pad_h, const int pad_w,
+                                                             const int stride_h, const int stride_w,
+                                                             const int dilation_h, const int dilation_w,
+                                                             const int channel_per_deformable_group,
+                                                             const int batch_size, const int offset_channels, const int deformable_group,
+                                                             const int height_col, const int width_col,
+                                                             scalar_t *grad_offset, scalar_t *grad_mask)
+{
+  CUDA_KERNEL_LOOP(index, n)
+  {
+    scalar_t val = 0, mval = 0;
+    int w = index % width_col;
+    int h = (index / width_col) % height_col;
+    int c = (index / width_col / height_col) % offset_channels;
+    int b = (index / width_col / height_col) / offset_channels;
+    // compute the start and end of the output
+    const int deformable_group_index = c / (2 * kernel_h * kernel_w);
+    const int col_step = kernel_h * kernel_w;
+    int cnt = 0;
+    const scalar_t *data_col_ptr = data_col + deformable_group_index * channel_per_deformable_group * batch_size * width_col * height_col;
+    const scalar_t *data_im_ptr = data_im + (b * deformable_group + deformable_group_index) * channel_per_deformable_group / kernel_h / kernel_w * height * width;
+    const scalar_t *data_offset_ptr = data_offset + (b * deformable_group + deformable_group_index) * 2 * kernel_h * kernel_w * height_col * width_col;
+    const scalar_t *data_mask_ptr = data_mask + (b * deformable_group + deformable_group_index) * kernel_h * kernel_w * height_col * width_col;
+    const int offset_c = c - deformable_group_index * 2 * kernel_h * kernel_w;
+    for (int col_c = (offset_c / 2); col_c < channel_per_deformable_group; col_c += col_step)
+    {
+      const int col_pos = (((col_c * batch_size + b) * height_col) + h) * width_col + w;
+      const int bp_dir = offset_c % 2;
+      int j = (col_pos / width_col / height_col / batch_size) % kernel_w;
+      int i = (col_pos / width_col / height_col / batch_size / kernel_w) % kernel_h;
+      int w_out = col_pos % width_col;
+      int h_out = (col_pos / width_col) % height_col;
+      int w_in = w_out * stride_w - pad_w;
+      int h_in = h_out * stride_h - pad_h;
+      const int data_offset_h_ptr = (((2 * (i * kernel_w + j)) * height_col + h_out) * width_col + w_out);
+      const int data_offset_w_ptr = (((2 * (i * kernel_w + j) + 1) * height_col + h_out) * width_col + w_out);
+      const int data_mask_hw_ptr = (((i * kernel_w + j) * height_col + h_out) * width_col + w_out);
+      const scalar_t offset_h = data_offset_ptr[data_offset_h_ptr];
+      const scalar_t offset_w = data_offset_ptr[data_offset_w_ptr];
+      const scalar_t mask = data_mask_ptr[data_mask_hw_ptr];
+      scalar_t inv_h = h_in + i * dilation_h + offset_h;
+      scalar_t inv_w = w_in + j * dilation_w + offset_w;
+      if (inv_h <= -1 || inv_w <= -1 || inv_h >= height || inv_w >= width)
+      {
+        inv_h = inv_w = -2;
+      }
+      else
+      {
+        mval += data_col_ptr[col_pos] * dmcn_im2col_bilinear(data_im_ptr + cnt * height * width, width, height, width, inv_h, inv_w);
+      }
+      const scalar_t weight = dmcn_get_coordinate_weight(
+          inv_h, inv_w,
+          height, width, data_im_ptr + cnt * height * width, width, bp_dir);
+      val += weight * data_col_ptr[col_pos] * mask;
+      cnt += 1;
+    }
+    // KERNEL_ASSIGN(grad_offset[index], offset_req, val);
+    grad_offset[index] = val;
+    if (offset_c % 2 == 0)
+      // KERNEL_ASSIGN(grad_mask[(((b * deformable_group + deformable_group_index) * kernel_h * kernel_w + offset_c / 2) * height_col + h) * width_col + w], mask_req, mval);
+      grad_mask[(((b * deformable_group + deformable_group_index) * kernel_h * kernel_w + offset_c / 2) * height_col + h) * width_col + w] = mval;
+  }
+}
+void modulated_deformable_im2col_cuda(
+    const at::Tensor data_im, const at::Tensor data_offset, const at::Tensor data_mask,
+    const int batch_size, const int channels, const int height_im, const int width_im,
+    const int height_col, const int width_col, const int kernel_h, const int kenerl_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w,
+    const int deformable_group, at::Tensor data_col)
+{
+  // num_axes should be smaller than block size
+  const int channel_per_deformable_group = channels / deformable_group;
+  const int num_kernels = channels * batch_size * height_col * width_col;
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_im.type(), "modulated_deformable_im2col_gpu", ([&] {
+        const scalar_t *data_im_ = data_im.data<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data<scalar_t>();
+        const scalar_t *data_mask_ = data_mask.data<scalar_t>();
+        scalar_t *data_col_ = data_col.data<scalar_t>();
+        modulated_deformable_im2col_gpu_kernel<<<GET_BLOCKS(num_kernels), CUDA_NUM_THREADS>>>(
+            num_kernels, data_im_, data_offset_, data_mask_, height_im, width_im, kernel_h, kenerl_w,
+            pad_h, pad_w, stride_h, stride_w, dilation_h, dilation_w, channel_per_deformable_group,
+            batch_size, channels, deformable_group, height_col, width_col, data_col_);
+      }));
+  cudaError_t err = cudaGetLastError();
+  if (err != cudaSuccess)
+  {
+    printf("error in modulated_deformable_im2col_cuda: %s\n", cudaGetErrorString(err));
+  }
+}
+void modulated_deformable_col2im_cuda(
+    const at::Tensor data_col, const at::Tensor data_offset, const at::Tensor data_mask,
+    const int batch_size, const int channels, const int height_im, const int width_im,
+    const int height_col, const int width_col, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w,
+    const int deformable_group, at::Tensor grad_im)
+{
+  const int channel_per_deformable_group = channels / deformable_group;
+  const int num_kernels = channels * kernel_h * kernel_w * batch_size * height_col * width_col;
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_col.type(), "modulated_deformable_col2im_gpu", ([&] {
+        const scalar_t *data_col_ = data_col.data<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data<scalar_t>();
+        const scalar_t *data_mask_ = data_mask.data<scalar_t>();
+        scalar_t *grad_im_ = grad_im.data<scalar_t>();
+        modulated_deformable_col2im_gpu_kernel<<<GET_BLOCKS(num_kernels), CUDA_NUM_THREADS>>>(
+            num_kernels, data_col_, data_offset_, data_mask_, channels, height_im, width_im,
+            kernel_h, kernel_w, pad_h, pad_h, stride_h, stride_w,
+            dilation_h, dilation_w, channel_per_deformable_group,
+            batch_size, deformable_group, height_col, width_col, grad_im_);
+      }));
+  cudaError_t err = cudaGetLastError();
+  if (err != cudaSuccess)
+  {
+    printf("error in modulated_deformable_col2im_cuda: %s\n", cudaGetErrorString(err));
+  }
+}
+void modulated_deformable_col2im_coord_cuda(
+    const at::Tensor data_col, const at::Tensor data_im, const at::Tensor data_offset, const at::Tensor data_mask,
+    const int batch_size, const int channels, const int height_im, const int width_im,
+    const int height_col, const int width_col, const int kernel_h, const int kernel_w,
+    const int pad_h, const int pad_w, const int stride_h, const int stride_w,
+    const int dilation_h, const int dilation_w,
+    const int deformable_group,
+    at::Tensor grad_offset, at::Tensor grad_mask)
+{
+  const int num_kernels = batch_size * height_col * width_col * 2 * kernel_h * kernel_w * deformable_group;
+  const int channel_per_deformable_group = channels * kernel_h * kernel_w / deformable_group;
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+      data_col.type(), "modulated_deformable_col2im_coord_gpu", ([&] {
+        const scalar_t *data_col_ = data_col.data<scalar_t>();
+        const scalar_t *data_im_ = data_im.data<scalar_t>();
+        const scalar_t *data_offset_ = data_offset.data<scalar_t>();
+        const scalar_t *data_mask_ = data_mask.data<scalar_t>();
+        scalar_t *grad_offset_ = grad_offset.data<scalar_t>();
+        scalar_t *grad_mask_ = grad_mask.data<scalar_t>();
+        modulated_deformable_col2im_coord_gpu_kernel<<<GET_BLOCKS(num_kernels), CUDA_NUM_THREADS>>>(
+            num_kernels, data_col_, data_im_, data_offset_, data_mask_, channels, height_im, width_im,
+            kernel_h, kernel_w, pad_h, pad_w, stride_h, stride_w,
+            dilation_h, dilation_w, channel_per_deformable_group,
+            batch_size, 2 * kernel_h * kernel_w * deformable_group, deformable_group, height_col, width_col,
+            grad_offset_, grad_mask_);
+      }));
+  cudaError_t err = cudaGetLastError();
+  if (err != cudaSuccess)
+  {
+    printf("error in modulated_deformable_col2im_coord_cuda: %s\n", cudaGetErrorString(err));
+  }
+}

maskrcnn_benchmark/csrc/cuda/deform_pool_cuda.cu ADDED Viewed

	@@ -0,0 +1,87 @@

+// modify from
+// https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/blob/mmdetection/mmdet/ops/dcn/src/modulated_dcn_cuda.c
+// based on
+// author: Charles Shang
+// https://github.com/torch/cunn/blob/master/lib/THCUNN/generic/SpatialConvolutionMM.cu
+#include <ATen/ATen.h>
+#include <ATen/cuda/CUDAContext.h>
+#include <THC/THC.h>
+#include <THC/THCDeviceUtils.cuh>
+#include <vector>
+#include <iostream>
+#include <cmath>
+void DeformablePSROIPoolForward(
+    const at::Tensor data, const at::Tensor bbox, const at::Tensor trans,
+    at::Tensor out, at::Tensor top_count, const int batch, const int channels,
+    const int height, const int width, const int num_bbox,
+    const int channels_trans, const int no_trans, const float spatial_scale,
+    const int output_dim, const int group_size, const int pooled_size,
+    const int part_size, const int sample_per_part, const float trans_std);
+void DeformablePSROIPoolBackwardAcc(
+    const at::Tensor out_grad, const at::Tensor data, const at::Tensor bbox,
+    const at::Tensor trans, const at::Tensor top_count, at::Tensor in_grad,
+    at::Tensor trans_grad, const int batch, const int channels,
+    const int height, const int width, const int num_bbox,
+    const int channels_trans, const int no_trans, const float spatial_scale,
+    const int output_dim, const int group_size, const int pooled_size,
+    const int part_size, const int sample_per_part, const float trans_std);
+void deform_psroi_pooling_cuda_forward(
+    at::Tensor input, at::Tensor bbox, at::Tensor trans, at::Tensor out,
+    at::Tensor top_count, const int no_trans, const float spatial_scale,
+    const int output_dim, const int group_size, const int pooled_size,
+    const int part_size, const int sample_per_part, const float trans_std)
+{
+  AT_CHECK(input.is_contiguous(), "input tensor has to be contiguous");
+  const int batch = input.size(0);
+  const int channels = input.size(1);
+  const int height = input.size(2);
+  const int width = input.size(3);
+  const int channels_trans = no_trans ? 2 : trans.size(1);
+  const int num_bbox = bbox.size(0);
+  if (num_bbox != out.size(0))
+    AT_ERROR("Output shape and bbox number wont match: (%d vs %d).",
+             out.size(0), num_bbox);
+  DeformablePSROIPoolForward(
+      input, bbox, trans, out, top_count, batch, channels, height, width,
+      num_bbox, channels_trans, no_trans, spatial_scale, output_dim, group_size,
+      pooled_size, part_size, sample_per_part, trans_std);
+}
+void deform_psroi_pooling_cuda_backward(
+    at::Tensor out_grad, at::Tensor input, at::Tensor bbox, at::Tensor trans,
+    at::Tensor top_count, at::Tensor input_grad, at::Tensor trans_grad,
+    const int no_trans, const float spatial_scale, const int output_dim,
+    const int group_size, const int pooled_size, const int part_size,
+    const int sample_per_part, const float trans_std)
+{
+  AT_CHECK(out_grad.is_contiguous(), "out_grad tensor has to be contiguous");
+  AT_CHECK(input.is_contiguous(), "input tensor has to be contiguous");
+  const int batch = input.size(0);
+  const int channels = input.size(1);
+  const int height = input.size(2);
+  const int width = input.size(3);
+  const int channels_trans = no_trans ? 2 : trans.size(1);
+  const int num_bbox = bbox.size(0);
+  if (num_bbox != out_grad.size(0))
+    AT_ERROR("Output shape and bbox number wont match: (%d vs %d).",
+             out_grad.size(0), num_bbox);
+  DeformablePSROIPoolBackwardAcc(
+      out_grad, input, bbox, trans, top_count, input_grad, trans_grad, batch,
+      channels, height, width, num_bbox, channels_trans, no_trans,
+      spatial_scale, output_dim, group_size, pooled_size, part_size,
+      sample_per_part, trans_std);
+}