Split Data
splits the input data into datasets based on condition.
If input project/dataset contains nested datasets data from them will be distributed across splits, and the final project will not include nested datasets.
There are 4 ways of splitting the data:
-
percent: data will be spread across datasets divided by the percentage of total images count.
-
number: Each dataset will contain selected number of images. The last split will contain fewer images if the total number of images isn't evenly divisible by the selected number of images per split.
-
classes: data will be split based on the presence of objects of annotation classes. Please note that when using this method the final project will likely include more images than the input project, as images are being duplicated into different datasets when there are more than one unique annotation class on it. Images with no annotations will be placed in the "unlabeled" dataset.
-
tags: data will be split based on the presence of image and object tags on the image. Please note that when using this method the final project will likely include more images than the input project, as images are being duplicated into different datasets when there are more than one unique image/object tag on it. Images with no tags on them will be placed in the "unlabeled" dataset.
Original datasets | Result: new datasets structure |
Original datasets | Result: new datasets structure |
in this instance, the amount of images more than doubled: the input project contained 953 images, and the output project contains 2213 images, because on average each image contained more than 2 unique annotation classes on it
JSON view
"action": "split_data", "src": { "source": [ "$images_project_1" ] }, "dst": "$split_data_2", "settings": { "split_method": "classes", "split_ratio": 50, "split_num": 50 }