This section will guide you on how to download the datasets, and explain about each file format exist in the datasets.
ARKitScenes includes 3 datasets,
3dod
- The dataset used to train 3d object detection. The dataset includes 3 assets: low resolution RGB image, low resolution depth image and the labels (The total Size is 623.4 GB for 5047 threedod scans)upsampling
- The dataset used to train depth upsampling. The dataset includes 3 assets: high resolution RGB image, low resolution depth image and high resolution depth imageraw
- This dataset includes all data available in ARKitScenes, the 3dod and depth upsampling datasets are a subset of it, the dataset includes much more assets that are not part of 3DOD or depth upsampling.
Each dataset has a CSV file that includes all the visit_id
, video_id
and fold
available in the dataset.
3DOD CSV path:
ARKitScenes/threedod/3dod_train_val_splits.csv
Upsampling CSV path:
ARKitScenes/depth_upsampling/upsampling_train_val_splits.csv
Raw CSV path:
ARKitScenes/raw/raw_train_val_splits.csv
To download each one of the datasets, we added a python script - download_data.py
.
To download a specific video_id or series of video_ids, download_data.py
expect the first argument to be the dataset name (i.e. 3dod/upsampling/raw)
the second argument the fold (i.e. Training/Validation) and video_id or series of video_ids.
python3 download_data.py [3dod/upsampling/raw] --split [Training/Validation] --video_id video_id1 video_id2 \
--download_dir YOUR_DATA_FOLDER
for example
python3 download_data.py raw --split Training --video_id 47333462 \
--download_dir /tmp/ARKitScenes/
or
python3 download_data.py raw --split Training --video_id 47333462 \
--download_dir /tmp/ARKitScenes/ --download_laser_scanner_point_cloud
to download the laser scanner point-clouds (available only for the raw dataset)
To download with CSV, download_data.py
expect the first argument to be a dataset name (i.e. 3dod/upsampling/raw),
and no need for the fold, because the fold information exist in the CSV file.
python3 download_data.py [3dod/upsampling/raw] --video_id_csv CSV_PATH \
--download_dir YOUR_DATA_FOLDER
for example
python3 download_data.py 3dod --video_id_csv threedod/3dod_train_val_splits.csv \
--download_dir /tmp/raw_ARKitScenes/
Please note that for raw data, you will need to specify the type(s) of data you would like to download. The choices are
mov annotation mesh confidence highres_depth lowres_depth lowres_wide.traj lowres_wide lowres_wide_intrinsics ultrawide
ultrawide_intrinsics vga_wide vga_wide_intrinsics
for example
python3 download_data.py raw --video_id_csv raw/raw_train_val_splits.csv --download_dir /tmp/ar_raw_all/ \
--raw_dataset_assets mov annotation mesh confidence highres_depth lowres_depth lowres_wide.traj \
lowres_wide lowres_wide_intrinsics ultrawide ultrawide_intrinsics vga_wide vga_wide_intrinsics
The data folder (i.e. YOUR_DATA_DIR
) will includes two directories, Training
and Validation
which includes all the assets
belonging to training and validation bin respectively.
The dataset includes the following formats
.png
- store RGB images, depth images and confidence imagesRGB images
- regularuint8
, 3 channel imagedepth image
-uint16
png format in millimetersconfidence
-uint8
png format0
-low confidence2
-high confidence
.pincam
- store the intrinsic matrix for each RGB image- is a single-line text file, space-delimited, with the following fields:
width
height
focal_length_x
focal_length_y
principal_point_x
principal_point_y
- is a single-line text file, space-delimited, with the following fields:
.json
- store the object annotation.traj
- is a space-delimited file where each line represents a camera position at a particular timestamp- Column 1: timestamp
- Columns 2-4: rotation (axis-angle representation in radians)
- Columns 5-7: translation (in meters)
.ply
- store the mesh generated by ARKit or the point-clouds generated by the Faro laser scanner.mov
- video captured with ARKit (raw dataset only)_pose.txt
- Transformation matrix to align/register multiple FARO scans - Lines 0-2 contain the rotation matrix and line 3 the translation vector
To deep dive into the structure of each of the datasets please go to the documentation of each one of the datasets