Spaces:
Running
on
Zero
Running
on
Zero
File size: 8,152 Bytes
a9a0ec2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
# Data Augmentation
Augmentation is an important part of training.
Detectron2's data augmentation system aims at addressing the following goals:
1. Allow augmenting multiple data types together
(e.g., images together with their bounding boxes and masks)
2. Allow applying a sequence of statically-declared augmentation
3. Allow adding custom new data types to augment (rotated bounding boxes, video clips, etc.)
4. Process and manipulate the __operations__ that are applied by augmentations
The first two features cover most of the common use cases, and is also
available in other libraries such as [albumentations](https://medium.com/pytorch/multi-target-in-albumentations-16a777e9006e).
Supporting other features adds some overhead to detectron2's augmentation API,
which we'll explain in this tutorial.
This tutorial focuses on how to use augmentations when writing new data loaders,
and how to write new augmentations.
If you use the default data loader in detectron2, it already supports taking a user-provided list of custom augmentations,
as explained in the [Dataloader tutorial](data_loading).
## Basic Usage
The basic usage of feature (1) and (2) is like the following:
```python
from detectron2.data import transforms as T
# Define a sequence of augmentations:
augs = T.AugmentationList([
T.RandomBrightness(0.9, 1.1),
T.RandomFlip(prob=0.5),
T.RandomCrop("absolute", (640, 640))
]) # type: T.Augmentation
# Define the augmentation input ("image" required, others optional):
input = T.AugInput(image, boxes=boxes, sem_seg=sem_seg)
# Apply the augmentation:
transform = augs(input) # type: T.Transform
image_transformed = input.image # new image
sem_seg_transformed = input.sem_seg # new semantic segmentation
# For any extra data that needs to be augmented together, use transform, e.g.:
image2_transformed = transform.apply_image(image2)
polygons_transformed = transform.apply_polygons(polygons)
```
Three basic concepts are involved here. They are:
* [T.Augmentation](../modules/data_transforms.html#detectron2.data.transforms.Augmentation) defines the __"policy"__ to modify inputs.
* its `__call__(AugInput) -> Transform` method augments the inputs in-place, and returns the operation that is applied
* [T.Transform](../modules/data_transforms.html#detectron2.data.transforms.Transform)
implements the actual __operations__ to transform data
* it has methods such as `apply_image`, `apply_coords` that define how to transform each data type
* [T.AugInput](../modules/data_transforms.html#detectron2.data.transforms.AugInput)
stores inputs needed by `T.Augmentation` and how they should be transformed.
This concept is needed for some advanced usage.
Using this class directly should be sufficient for all common use cases,
since extra data not in `T.AugInput` can be augmented using the returned
`transform`, as shown in the above example.
## Write New Augmentations
Most 2D augmentations only need to know about the input image. Such augmentation can be implemented easily like this:
```python
class MyColorAugmentation(T.Augmentation):
def get_transform(self, image):
r = np.random.rand(2)
return T.ColorTransform(lambda x: x * r[0] + r[1] * 10)
class MyCustomResize(T.Augmentation):
def get_transform(self, image):
old_h, old_w = image.shape[:2]
new_h, new_w = int(old_h * np.random.rand()), int(old_w * 1.5)
return T.ResizeTransform(old_h, old_w, new_h, new_w)
augs = MyCustomResize()
transform = augs(input)
```
In addition to image, any attributes of the given `AugInput` can be used as long
as they are part of the function signature, e.g.:
```python
class MyCustomCrop(T.Augmentation):
def get_transform(self, image, sem_seg):
# decide where to crop using both image and sem_seg
return T.CropTransform(...)
augs = MyCustomCrop()
assert hasattr(input, "image") and hasattr(input, "sem_seg")
transform = augs(input)
```
New transform operation can also be added by subclassing
[T.Transform](../modules/data_transforms.html#detectron2.data.transforms.Transform).
## Advanced Usage
We give a few examples of advanced usages that
are enabled by our system.
These options can be interesting to new research,
although changing them is often not needed
for standard use cases.
### Custom transform strategy
Instead of only returning the augmented data, detectron2's `Augmentation` returns the __operations__ as `T.Transform`.
This allows users to apply custom transform strategy on their data.
We use keypoints data as an example.
Keypoints are (x, y) coordinates, but they are not so trivial to augment due to the semantic meaning they carry.
Such meaning is only known to the users, therefore users may want to augment them manually
by looking at the returned `transform`.
For example, when an image is horizontally flipped, we'd like to swap the keypoint annotations for "left eye" and "right eye".
This can be done like this (included by default in detectron2's default data loader):
```python
# augs, input are defined as in previous examples
transform = augs(input) # type: T.Transform
keypoints_xy = transform.apply_coords(keypoints_xy) # transform the coordinates
# get a list of all transforms that were applied
transforms = T.TransformList([transform]).transforms
# check if it is flipped for odd number of times
do_hflip = sum(isinstance(t, T.HFlipTransform) for t in transforms) % 2 == 1
if do_hflip:
keypoints_xy = keypoints_xy[flip_indices_mapping]
```
As another example, keypoints annotations often have a "visibility" field.
A sequence of augmentations might augment a visible keypoint out of the image boundary (e.g. with cropping),
but then bring it back within the boundary afterwards (e.g. with image padding).
If users decide to label such keypoints "invisible",
then the visibility check has to happen after every transform step.
This can be achieved by:
```python
transform = augs(input) # type: T.TransformList
assert isinstance(transform, T.TransformList)
for t in transform.transforms:
keypoints_xy = t.apply_coords(keypoints_xy)
visibility &= (keypoints_xy >= [0, 0] & keypoints_xy <= [W, H]).all(axis=1)
# btw, detectron2's `transform_keypoint_annotations` function chooses to label such keypoints "visible":
# keypoints_xy = transform.apply_coords(keypoints_xy)
# visibility &= (keypoints_xy >= [0, 0] & keypoints_xy <= [W, H]).all(axis=1)
```
### Geometrically invert the transform
If images are pre-processed by augmentations before inference, the predicted results
such as segmentation masks are localized on the augmented image.
We'd like to invert the applied augmentation with the [inverse()](../modules/data_transforms.html#detectron2.data.transforms.Transform.inverse)
API, to obtain results on the original image:
```python
transform = augs(input)
pred_mask = make_prediction(input.image)
inv_transform = transform.inverse()
pred_mask_orig = inv_transform.apply_segmentation(pred_mask)
```
### Add new data types
[T.Transform](../modules/data_transforms.html#detectron2.data.transforms.Transform)
supports a few common data types to transform, including images, coordinates, masks, boxes, polygons.
It allows registering new data types, e.g.:
```python
@T.HFlipTransform.register_type("rotated_boxes")
def func(flip_transform: T.HFlipTransform, rotated_boxes: Any):
# do the work
return flipped_rotated_boxes
t = HFlipTransform(width=800)
transformed_rotated_boxes = t.apply_rotated_boxes(rotated_boxes) # func will be called
```
### Extend T.AugInput
An augmentation can only access attributes available in the given input.
[T.AugInput](../modules/data_transforms.html#detectron2.data.transforms.StandardAugInput) defines "image", "boxes", "sem_seg",
which are sufficient for common augmentation strategies to decide how to augment.
If not, a custom implementation is needed.
By re-implement the "transform()" method in AugInput, it is also possible to
augment different fields in ways that are dependent on each other.
Such use case is uncommon (e.g. post-process bounding box based on augmented masks), but allowed by the system.
|