Skip to content

Latest commit

 

History

History
63 lines (49 loc) · 2.84 KB

README.md

File metadata and controls

63 lines (49 loc) · 2.84 KB

Split Data

Split Data splits the input data into datasets based on condition.

If input project/dataset contains nested datasets data from them will be distributed across splits, and the final project will not include nested datasets.

Settings

There are 4 ways of splitting the data:

  • percent: data will be spread across datasets divided by the percentage of total images count.

  • number: Each dataset will contain selected number of images. The last split will contain fewer images if the total number of images isn't evenly divisible by the selected number of images per split.

  • classes: data will be split based on the presence of objects of annotation classes. Please note that when using this method the final project will likely include more images than the input project, as images are being duplicated into different datasets when there are more than one unique annotation class on it. Images with no annotations will be placed in the "unlabeled" dataset.

  • tags: data will be split based on the presence of image and object tags on the image. Please note that when using this method the final project will likely include more images than the input project, as images are being duplicated into different datasets when there are more than one unique image/object tag on it. Images with no tags on them will be placed in the "unlabeled" dataset.

Example 1. Split Data by percentage

Original datasets Result: new datasets structure
Original datasets Result datasets

Example 2. Split Data by classes

Original datasets Result: new datasets structure
Original datasets Result datasets

in this instance, the amount of images more than doubled: the input project contained 953 images, and the output project contains 2213 images, because on average each image contained more than 2 unique annotation classes on it

JSON view
  "action": "split_data",
  "src": {
    "source": [
      "$images_project_1"
    ]
  },
  "dst": "$split_data_2",
  "settings": {
    "split_method": "classes",
    "split_ratio": 50,
    "split_num": 50
  }