Build and Run

Necessary dependencies as well as instructions on how to build and run our library and examples is described here.

References vs. Rescans: What's the difference?

The scans in 3RScan are organized by scene (being an indoor space e.g. a kitchen or a bedroom). We provide multiple scans of each indoor scene, one scan has been selected as the reference / initial scan (usually the most complete or the first scan). Each scene is identified by the scan identifier (hash value), where scene["reference"] is the scan id of the initial scan as follows 3RScan.json:

[
  {
    // id of the initial scan (reference)
    "reference": "531cff08-0021-28f6-8e08-ba2eeb945e09", 
     ... // data of the reference scan
    "scans": [ // rescans
      { "reference": "531cff10-0021-28f6-8f94-80db8fdbbbee", ... }, // data of 1st rescan
      { "reference": "19eda6f4-55aa-29a0-8893-8eac3a4d8193", ... }, // data of 2nd rescan
      { ... }
    ]
  }
]

All the other remaining scans are the rescans, scene["reference"]["scans"], their annotations are with respect to the reference scan. By doing so, object or scene alignments can easily be recovered between one rescan to another (by transforming from rescan to reference and then via the inverse back to another rescan). We provide the functions IsRescan() and IsReference() in our library to easily determine if a scan Id belongs a reference or rescan and GetReference() to obtain the reference Id of a rescan.

  RIO::RIOConfig config(data_path);
  RIO::RIOLib rio(config);
  rio.IsRescan(scan_id); // retruns true if scan_id is a rescan

How to determine if a scan belongs to the train, val or test split?

Each scene with the reference and all corresponding rescans in 3RScan.json are assigned a split, see type field in json file. We also provide a list of ids here: splits. Please note that these files only include references. A full list of scan ids including rescans can be found here: train, val, test.

How to align the rescan 3D models and camera poses with the reference scan?

Scene-to-scene transformations to align the 3D maps are available in the json-metafile 3RScan.json. Please note that the transformation of the translation in the json file is in millimeter while the 3D models are stored in meters. We provide the function Transform2Reference() as an example on how to use those in our rio_example project. The same procedure applies when aligning the ground truth trajectories, see function GetCameraPose() and align_poses.

Dataformat: How is the data stored?

The README gives a rough overview about the available files in 3RScan. Please note, our download script allows to download a subset of the dataset (by filtering either by file name or scan id).

What is the format of of the depth map?

The depth map is a 16 bit image and stores depth values as an unsigned short in millimeter. Please note that the camera poses are in meters, so you might need to multiply with 0.001 or 1000.

If the quality of the Tango depth image is not sufficient for your task I recommend to use the intrinsics and the 3D mesh to simply render the corresponding depth map (see render function example).

What is the coordinate system of the camera poses?

The camera pose stored in the .pose.txt files gives the pose of the RGB sensor as a transformation from the RGB camera to the world coordiante system.

Are the color and depth images calibrated?

The color and depth images are calibrated (you do not need to do this manually), the depth and RGB image only differ in their size (resizing would do the job). Since the *.pose.txt is the camera pose of the RGB sensor and the depth is aligned with backprojection is straight forward: You need the intrisncis stored in _info.txt to backproject the depth image or likewise to render the 3D mesh / segmentation in the RGB frame. See backprojection and rendering for more details.

Semantics, semantics, semantics

There are quite some files to possibly parse to get the semantic labels, you have many options depending on what you want to do in your code. Let's answer some of the question you might have:

The ply files seem to be binary - how do I read them?

The ply files of the first version of our dataset are indeed stored as binary. Our code provides a function ReSavePLYASCII() that uses tinyply to load the binary ply and to save it in ASCII format. The files in v2 the data is stored as ascii.

How do I get the semantic labels?

There are two ways to extract the semantic class of each 3D point: either via the semseg.v2.json or via objects.json. The function PrintSemanticLabels() shows simple code to print these labels for a specific scene. A semseg.v2.json is provided per scan as follows:

{
  "scan_id": "8eabc405-5af7-2f32-86d2-d757fa7b019d",
  ...
  "segGroups": [
    {
      "id": 15,
      "objectId": 15,
      "label": "window",
      "segments": [ 21, 175, ... ],
      "obb": {
        "centroid": [ 2.15, 2.17, -1.18 ],
        "axesLengths": [ 4.57, 0.62, 4.14 ],
        ...
      }, 
      ...
    }, {
      "id": 29,
      "objectId": 29,
      "label": "plant",
      ...
    }, {
      "id": 14,
      "objectId": 14,
      "label": "windowsill",
      ...
    }

In semseg.v2.json, json["segGroups"] is the list of instances in this particular scene with objectId or Id being the instance id of a specific object instance in the 3D scene and label being the assigned semantic label. The 3D geometry of the objects is stored in labels.instances.annotated.v2.ply - the objectId of each 3D vertex is stored in a costum vertex property, see header:

ply
format ascii 1.0
element vertex 83385
property float x
property float y
property float z
property uchar red
property uchar green
property uchar blue
property ushort objectId // that is the objectId that maps to the objectId in semseg.v2.json
property ushort globalId
property uchar NYU40
property uchar Eigen13
property uchar RIO27
element face 27795
property list uchar int vertex_indices
end_header

On the other hand, objects.json provides the labels for all scenes parsed in one file (plus giving additional information such as object attributes).

{
  "scans": [
    {
      "scan": "00d42bed-778d-2ac6-86a7-0e0e5f5f5660", // unique scan identifier (either a reference or a rescan)
      "objects": [
        {
          "global_id": "6", // global Id of the semantic label (unique for each semantic class e.g. chairs, bed, etc.)
          "label": "floor", // semantic label of the object (same as "label" in semseg.json)
          "ply_color": "#aec7e8", // color of the object in the labels.instances.annotated.ply
          "id": "1", // instance Id of the object (same as "objectId" or "Id" in semseg.json)
          "attributes": {
            "color": [ "dark", "gray" ], ...
          },
          ...
        },
        {
          "global_id": "36",
          "label": "bed",
          "ply_color": "#ff7f0e",
          "id": "4",
          "attributes": {
            "state": [ "messy" ],
            "shape": [ "square" ]
          }
        }, ...
      ]
    }, {
      "objects": [ ... ],
      "scan": "4acaebba-6c10-2a2a-8650-34c2f160db99"
    }
  ]
}

How to map this to a global instance Id?

Please note that the objectId is scene specific (1 might be the floor in one scene but could be a sofa in another). To get a 3D semantic segmentation with a consistent, global instance ID check out the function RemapLabelsPly(). The global instance ID produced here in a *.ply (check header) is the same as the global_id in objects.json.

Advanced 2D or 3D Processing

You might want to do some more complex processing; check out the following FAQ about 2D / 3D processing with 3RScan:

How to get 3D bounding boxes?

It's easy: Our semseg.v2.json provides oriented bounding boxes (you simply need to read json["segGroups"]["obb"]). Detailed instructions on how to get the 2D bounding boxes can be found below.

Backproject: How to get a 3D point cloud from a given depth image?

As described above (see Dataformat) the color and depth images are calibrated. The intrinscis and the RGB and depth image can be found in _info.txt. An example on how to read and backproject them to get a colored 3D point cloud is shown in our example code using the function Backproject(scan, frame_id). The produced colored 3D point cloud aligns with the 3D reconstructions *.obj and *.ply.

How to render scans in 3RScan?

In this repository we provide the project rio_renderer. The project provides an OpenGL renderer RIO::Renderer and a corresponding rendering function render() to produce 2D semantic images (colors and 16 bit instance images), 16bit depth images or RGB color renderings. The rendering function is designed to work with 3RScan and uses the intrinsics m_calibrationColorIntrinsic in _info.txt and the 3D models mesh.refined.v2.obj and labels.instances.annotated.v2.ply to render the corresponding 2D images. The code generates the following files:

How to render all camera poses at once for one scan in 3RScan?

The project rio_renderer provides a predefined executable called rio_renderer_render_all that renders all camera poses for a given scan in 3RScan. It produces all the data that the rio_renderer creates, for every camera pose i.e. 2D semantic images (colors and 16 bit instance images), 16 bit depth images or RGB color renderings.

How to get the 2D bounding boxes?

Computing 2D bounding boxes first requires to render the semantic instance image. Based on the 2D instance map, 2D bounding boxes can be derived. We provide example code in rio_renderer, see Get2DBoundingBoxes(). The code also save a file *.bb.txt with the 2D bounding boxes of the current view where each line starts with the instance Id followed by the min / max corners of the bounding box.

How to filter instances that are barely visible in an image (i.e. occluded)?

The project rio_renderer_render_all also computes two visibility metrics: occlusion and truncation. For each instance in each rendered image they contain the amount of pixels that are visible, occluded by other instances, and truncated at the image borders. This allows to filter instances based on the degree of occlusion and truncation, i.e. only use an instance that is at most 30% occluded and 20% truncated. In the following we describe the visibility metrics in more detail.

It provides the file frame-xxxxxx.visibility.txt that is used to store the calculated truncation and occlusion metrics. Its content are rows with following syntax: <instance_id> <truncation_number_pixels_original_image> <truncation_number_pixels_larger_fov_image> <truncation_metric> <occlusion_number_pixels_original_image> <occlusion_number_pixels_only_with_that_instance> <occlusion_metric> where

instance_id: as defined in objects.json
truncation_number_pixels_original_image: how many pixels from instance are visible in the original part of the image when rendered with a larger fov, but only looking at the original crop.
truncation_number_pixels_larger_fov_image: how many pixels from instance are visible in the larger fov image.
truncation: percentage of how much the object is cut off at the edges w.r.t. rendering with larger FOV. 1 means the whole object is visible in the original image, it is not cut off at image edges. 0 < x < 1 means the object is visible to x% in the original image, the rest is cut off at image edges but is visible with a larger FOV.
occlusion_number_pixels_original_image: how many pixels from instance are visible in original image with original fov.
occlusion_number_pixels_only_with_that_instance: how many pixels from instance are visible in original image with original fov when only rendering that instance.
occlusion: percentage of how much the object is cut off by other objects in the original image (without larger FOV). 1 means "the whole object is visible in the original image, it is not cut off by other objects in the image". 0 < x < 1 means the object is visible to x% in the original image, the rest is cut off by other objects in the image, but is visible in the original FOV when only rendering that single object

Caveat: Note that truncation_number_pixels_original_image and occlusion_number_pixels_original_image are not the same number. This is because we used a larger FOV for rendering the image in truncation and therefore the absolute number of pixels is less because we look at a smaller width/height in total. When using the number of pixels as object mask, we should therefore use the occlusion_number_pixels_original_image.

One example frame-xxxxxx.visibility.txt file can look like this:

1 39326 76215 0.515988 158780 268173 0.59208
2 10764 108274 0.0994144 41363 67629 0.611616
17 2663 6377 0.417594 10974 18437 0.595216
18 3190 3190 1 12769 12770 0.999922
19 36676 37854 0.96888 146999 146999 1
22 12373 187178 0.0661029 56761 56761 1
24 4345 20030 0.216925 14900 14900 1
25 13322 13325 0.999775 53334 56449 0.944817
28 2171 8714 0.249139 7958 8392 0.948284
33 2149 8319 0.258324 8267 8297 0.996384

These files get computed when calling the executable rio_renderer_render_all this way:

./rio_renderer_render_all <data_path> <scan_id> <output_folder> <render_mode>

e.g.

./rio_renderer_render_all /path/to/3RScan/ 5630cfc9-12bf-2860-84ed-5bb189f0e94e sequence 0

where the render mode 0 saves images, bounding box and occlusion / truncation, 1 only images, 2 only depth maps, 3 additionally bounding boxes and 4, only bounding boxes.

In mode 0 frame-xxxxxx.visibility.txt are created. It also renders the images with the default fov setting (i.e. 2D semantic images (colors and 16 bit instance images), 16bit depth images or RGB color renderings). If you just want to render the 2D images, option 1 is best for you.

Why do the number of vertices in the .obj and .ply differ?

The reason for this is the annotation interface / segmentation code that we used to get the labels. However, the number of faces is the same, see TransformInstance on how to extract the 3D model of a specific instance from the 3D mesh. In v2 of the dataset mesh_refined.v2.obj and labels.instances.annotated.v2.ply have the same number of vertices.

Some scenes in 3RScan seem to be quite small / partial, why's that?

Some scans (mostly rescans) are indeed quite small due to issues when processing the data on the phone or our servers. While this only affects a small number of scans, we still decided to provide the scans scince it might be useful for some tasks. Feel free to skip them if it’s not useful for you, a list of rather small scans are available here.

What is the difference between v1 and v2 of the dataset?

In v2 the labels and annotations are cleaned, we removed typos in the semantic labels and minimal changes in the 3D annotation. We additionally adjusted the 3D models by excluding vertices, faces or segments that were annotated with the class remove from the mesh, the semantic segmentation, the corresponding over-segmentation and the labelled 3D model. The labels.instances.annotated.v2.ply files are saved as ASCII and the number of vertices is now the same as in the *.obj file. Please also note the mapping and the vertex properties of the *.ply file: objectId (instance ID), globalId (global 3RScan mapping) and its semantic mappings to NYU40, Eigen13 and RIO27.

Are there example projects that use this dataset?

Check out google-research/tf3d for 3D semantic / instance segmentation code with tensorflow. The corresponding *.tfrecords can be obtained via our download script (using the --type=tfrecords flag).

You can have a look at the following project, that utilizes the 3RScan dataset and its project rio_renderer to create a triplet network in PyTorch. It also uses many rendering artifacts (i.e. 2D semantic images (colors and 16 bit instance images), RGB color renderings, bounding-box and visibility files) and provides an example pipeline that utilizes that type of data.

https://github.com/lukasHoel/3rscan-triplet-dataset-toolkit

Something is still unclear ...

If this didn't answer your question or you still have some problem that goes beyond these FAQ, please don't hesitate to contact me via johanna.wald@tum.de.

Chances are high that I'm able to help :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAQ.md

FAQ.md

Build and Run

References vs. Rescans: What's the difference?

How to determine if a scan belongs to the train, val or test split?

How to align the rescan 3D models and camera poses with the reference scan?

Dataformat: How is the data stored?

What is the format of of the depth map?

What is the coordinate system of the camera poses?

Are the color and depth images calibrated?

Semantics, semantics, semantics

The ply files seem to be binary - how do I read them?

How do I get the semantic labels?

How to map this to a global instance Id?

Advanced 2D or 3D Processing

How to get 3D bounding boxes?

Backproject: How to get a 3D point cloud from a given depth image?

How to render scans in 3RScan?

How to render all camera poses at once for one scan in 3RScan?

How to get the 2D bounding boxes?

How to filter instances that are barely visible in an image (i.e. occluded)?

Why do the number of vertices in the .obj and .ply differ?

Some scenes in 3RScan seem to be quite small / partial, why's that?

What is the difference between v1 and v2 of the dataset?

Are there example projects that use this dataset?

Something is still unclear ...

Files

FAQ.md

Latest commit

History

FAQ.md

File metadata and controls

Build and Run

References vs. Rescans: What's the difference?

How to determine if a scan belongs to the train, val or test split?

How to align the rescan 3D models and camera poses with the reference scan?

Dataformat: How is the data stored?

What is the format of of the depth map?

What is the coordinate system of the camera poses?

Are the color and depth images calibrated?

Semantics, semantics, semantics

The ply files seem to be binary - how do I read them?

How do I get the semantic labels?

How to map this to a global instance Id?

Advanced 2D or 3D Processing

How to get 3D bounding boxes?

Backproject: How to get a 3D point cloud from a given depth image?

How to render scans in 3RScan?

How to render all camera poses at once for one scan in 3RScan?

How to get the 2D bounding boxes?

How to filter instances that are barely visible in an image (i.e. occluded)?

Why do the number of vertices in the .obj and .ply differ?

Some scenes in 3RScan seem to be quite small / partial, why's that?

What is the difference between v1 and v2 of the dataset?

Are there example projects that use this dataset?

Something is still unclear ...