Spaces:
Running
on
Zero
Running
on
Zero
# Data Augmentation | |
Augmentation is an important part of training. | |
Detectron2's data augmentation system aims at addressing the following goals: | |
1. Allow augmenting multiple data types together | |
(e.g., images together with their bounding boxes and masks) | |
2. Allow applying a sequence of statically-declared augmentation | |
3. Allow adding custom new data types to augment (rotated bounding boxes, video clips, etc.) | |
4. Process and manipulate the __operations__ that are applied by augmentations | |
The first two features cover most of the common use cases, and is also | |
available in other libraries such as [albumentations](https://medium.com/pytorch/multi-target-in-albumentations-16a777e9006e). | |
Supporting other features adds some overhead to detectron2's augmentation API, | |
which we'll explain in this tutorial. | |
This tutorial focuses on how to use augmentations when writing new data loaders, | |
and how to write new augmentations. | |
If you use the default data loader in detectron2, it already supports taking a user-provided list of custom augmentations, | |
as explained in the [Dataloader tutorial](data_loading). | |
## Basic Usage | |
The basic usage of feature (1) and (2) is like the following: | |
```python | |
from detectron2.data import transforms as T | |
# Define a sequence of augmentations: | |
augs = T.AugmentationList([ | |
T.RandomBrightness(0.9, 1.1), | |
T.RandomFlip(prob=0.5), | |
T.RandomCrop("absolute", (640, 640)) | |
]) # type: T.Augmentation | |
# Define the augmentation input ("image" required, others optional): | |
input = T.AugInput(image, boxes=boxes, sem_seg=sem_seg) | |
# Apply the augmentation: | |
transform = augs(input) # type: T.Transform | |
image_transformed = input.image # new image | |
sem_seg_transformed = input.sem_seg # new semantic segmentation | |
# For any extra data that needs to be augmented together, use transform, e.g.: | |
image2_transformed = transform.apply_image(image2) | |
polygons_transformed = transform.apply_polygons(polygons) | |
``` | |
Three basic concepts are involved here. They are: | |
* [T.Augmentation](../modules/data_transforms.html#detectron2.data.transforms.Augmentation) defines the __"policy"__ to modify inputs. | |
* its `__call__(AugInput) -> Transform` method augments the inputs in-place, and returns the operation that is applied | |
* [T.Transform](../modules/data_transforms.html#detectron2.data.transforms.Transform) | |
implements the actual __operations__ to transform data | |
* it has methods such as `apply_image`, `apply_coords` that define how to transform each data type | |
* [T.AugInput](../modules/data_transforms.html#detectron2.data.transforms.AugInput) | |
stores inputs needed by `T.Augmentation` and how they should be transformed. | |
This concept is needed for some advanced usage. | |
Using this class directly should be sufficient for all common use cases, | |
since extra data not in `T.AugInput` can be augmented using the returned | |
`transform`, as shown in the above example. | |
## Write New Augmentations | |
Most 2D augmentations only need to know about the input image. Such augmentation can be implemented easily like this: | |
```python | |
class MyColorAugmentation(T.Augmentation): | |
def get_transform(self, image): | |
r = np.random.rand(2) | |
return T.ColorTransform(lambda x: x * r[0] + r[1] * 10) | |
class MyCustomResize(T.Augmentation): | |
def get_transform(self, image): | |
old_h, old_w = image.shape[:2] | |
new_h, new_w = int(old_h * np.random.rand()), int(old_w * 1.5) | |
return T.ResizeTransform(old_h, old_w, new_h, new_w) | |
augs = MyCustomResize() | |
transform = augs(input) | |
``` | |
In addition to image, any attributes of the given `AugInput` can be used as long | |
as they are part of the function signature, e.g.: | |
```python | |
class MyCustomCrop(T.Augmentation): | |
def get_transform(self, image, sem_seg): | |
# decide where to crop using both image and sem_seg | |
return T.CropTransform(...) | |
augs = MyCustomCrop() | |
assert hasattr(input, "image") and hasattr(input, "sem_seg") | |
transform = augs(input) | |
``` | |
New transform operation can also be added by subclassing | |
[T.Transform](../modules/data_transforms.html#detectron2.data.transforms.Transform). | |
## Advanced Usage | |
We give a few examples of advanced usages that | |
are enabled by our system. | |
These options can be interesting to new research, | |
although changing them is often not needed | |
for standard use cases. | |
### Custom transform strategy | |
Instead of only returning the augmented data, detectron2's `Augmentation` returns the __operations__ as `T.Transform`. | |
This allows users to apply custom transform strategy on their data. | |
We use keypoints data as an example. | |
Keypoints are (x, y) coordinates, but they are not so trivial to augment due to the semantic meaning they carry. | |
Such meaning is only known to the users, therefore users may want to augment them manually | |
by looking at the returned `transform`. | |
For example, when an image is horizontally flipped, we'd like to swap the keypoint annotations for "left eye" and "right eye". | |
This can be done like this (included by default in detectron2's default data loader): | |
```python | |
# augs, input are defined as in previous examples | |
transform = augs(input) # type: T.Transform | |
keypoints_xy = transform.apply_coords(keypoints_xy) # transform the coordinates | |
# get a list of all transforms that were applied | |
transforms = T.TransformList([transform]).transforms | |
# check if it is flipped for odd number of times | |
do_hflip = sum(isinstance(t, T.HFlipTransform) for t in transforms) % 2 == 1 | |
if do_hflip: | |
keypoints_xy = keypoints_xy[flip_indices_mapping] | |
``` | |
As another example, keypoints annotations often have a "visibility" field. | |
A sequence of augmentations might augment a visible keypoint out of the image boundary (e.g. with cropping), | |
but then bring it back within the boundary afterwards (e.g. with image padding). | |
If users decide to label such keypoints "invisible", | |
then the visibility check has to happen after every transform step. | |
This can be achieved by: | |
```python | |
transform = augs(input) # type: T.TransformList | |
assert isinstance(transform, T.TransformList) | |
for t in transform.transforms: | |
keypoints_xy = t.apply_coords(keypoints_xy) | |
visibility &= (keypoints_xy >= [0, 0] & keypoints_xy <= [W, H]).all(axis=1) | |
# btw, detectron2's `transform_keypoint_annotations` function chooses to label such keypoints "visible": | |
# keypoints_xy = transform.apply_coords(keypoints_xy) | |
# visibility &= (keypoints_xy >= [0, 0] & keypoints_xy <= [W, H]).all(axis=1) | |
``` | |
### Geometrically invert the transform | |
If images are pre-processed by augmentations before inference, the predicted results | |
such as segmentation masks are localized on the augmented image. | |
We'd like to invert the applied augmentation with the [inverse()](../modules/data_transforms.html#detectron2.data.transforms.Transform.inverse) | |
API, to obtain results on the original image: | |
```python | |
transform = augs(input) | |
pred_mask = make_prediction(input.image) | |
inv_transform = transform.inverse() | |
pred_mask_orig = inv_transform.apply_segmentation(pred_mask) | |
``` | |
### Add new data types | |
[T.Transform](../modules/data_transforms.html#detectron2.data.transforms.Transform) | |
supports a few common data types to transform, including images, coordinates, masks, boxes, polygons. | |
It allows registering new data types, e.g.: | |
```python | |
@T.HFlipTransform.register_type("rotated_boxes") | |
def func(flip_transform: T.HFlipTransform, rotated_boxes: Any): | |
# do the work | |
return flipped_rotated_boxes | |
t = HFlipTransform(width=800) | |
transformed_rotated_boxes = t.apply_rotated_boxes(rotated_boxes) # func will be called | |
``` | |
### Extend T.AugInput | |
An augmentation can only access attributes available in the given input. | |
[T.AugInput](../modules/data_transforms.html#detectron2.data.transforms.StandardAugInput) defines "image", "boxes", "sem_seg", | |
which are sufficient for common augmentation strategies to decide how to augment. | |
If not, a custom implementation is needed. | |
By re-implement the "transform()" method in AugInput, it is also possible to | |
augment different fields in ways that are dependent on each other. | |
Such use case is uncommon (e.g. post-process bounding box based on augmented masks), but allowed by the system. | |