nutonomy · holger-motional · Jul 13, 2020 · Jul 11, 2020 · Jul 11, 2020 · Jul 11, 2020
diff --git a/docs/instructions_nuimages.md b/docs/instructions_nuimages.md
@@ -0,0 +1 @@
+TODO: Coming soon!
diff --git a/docs/schema_nuimages.md b/docs/schema_nuimages.md
@@ -1,6 +1,164 @@
 nuImages schema
 ==========
+This document describes the database schema used in nuImages.
+All annotations and meta data (including calibration, maps, vehicle coordinates etc.) are covered in a relational database.
+The database tables are listed below.
+Every row can be identified by its unique primary key `token`.
+Foreign keys such as `sample_token` may be used to link to the `token` of the table `sample`.
+Please refer to the [tutorial](https://www.nuscenes.org/nuimages#tutorial) for an introduction to the most important database tables.
 
-TODO: Coming soon!
+![](https://www.nuscenes.org/public/images/nuimages-schema.svg)
 
-![](https://www.nuscenes.org/public/images/nuimages-schema.svg)
+attribute
+---------
+An attribute is a property of an instance that can change while the category remains the same.
+Example: a vehicle being parked/stopped/moving, and whether or not a bicycle has a rider.
+The attributes in nuImages are a superset of those in nuScenes.
+```
+attribute {
+   "token":                   <str> -- Unique record identifier.
+   "name":                    <str> -- Attribute name.
+   "description":             <str> -- Attribute description.
+}
+```
+
+calibrated_sensor
+---------
+Definition of a particular sensor (lidar/camera, but no radar) as calibrated on a particular vehicle.
+All extrinsic parameters are given with respect to the ego vehicle body frame.
+All camera images come undistorted and rectified.
+```
+calibrated_sensor {
+   "token":                   <str> -- Unique record identifier.
+   "sensor_token":            <str> -- Foreign key pointing to the sensor type.
+   "translation":             <float> [3] -- Coordinate system origin in meters: x, y, z.
+   "rotation":                <float> [4] -- Coordinate system orientation as quaternion: w, x, y, z.
+   "camera_intrinsic":        <float> [3, 3] -- Intrinsic camera calibration. Empty for sensors that are not cameras.
+   "camera_distortion":       <float> [5 or 6] -- Camera calibration parameters. We use the 5 parameter camera convention of the CalTech camera calibration toolbox, that is also used in OpenCV. Only for fish-eye lenses in CAM_BACK do we use the 6th parameter.
+}
+```
+
+category
+---------
+Taxonomy of object categories (e.g. vehicle, human). 
+Subcategories are delineated by a period (e.g. `human.pedestrian.adult`).
+The categories in nuImages are the same as in the nuScenes (w/o lidarseg), plus `flat.driveable_surface`.
+```
+category {
+   "token":                   <str> -- Unique record identifier.
+   "name":                    <str> -- Category name. Subcategories indicated by period.
+   "description":             <str> -- Category description.
+}
+```
+
+ego_pose
+---------
+Ego vehicle pose at a particular timestamp. Given with respect to global coordinate system of the log's map.
+The ego_pose is the output of a lidar map-based localization algorithm described in our paper.
+The localization is 2-dimensional in the x-y plane.
+Warning: nuImages is collected from almost 500 logs with different maps versions.
+Therefore the coordinates **should not be compared across logs** or rendered on the semantic maps of nuScenes.
+```
+ego_pose {
+   "token":                   <str> -- Unique record identifier.
+   "translation":             <float> [3] -- Coordinate system origin in meters: x, y, z. Note that z is always 0.
+   "rotation":                <float> [4] -- Coordinate system orientation as quaternion: w, x, y, z.
+   "timestamp":               <int> -- Unix time stamp.
+   "rotation_rate":           <float> [3] -- The angular velocity vector (x, y, z) of the vehicle in rad/s. This is expressed in the ego vehicle frame.
+   "acceleration":            <float> [3] -- Acceleration vector (x, y, z) in the ego vehicle frame in m/s/s. The z value is close to the gravitational acceleration `g = 9.81 m/s/s`.
+   "speed":                   <float> -- The speed of the ego vehicle in the driving direction in m/s.
+}
+```
+
+log
+---------
+Information about the log from which the data was extracted.
+```
+log {
+   "token":                   <str> -- Unique record identifier.
+   "logfile":                 <str> -- Log file name.
+   "vehicle":                 <str> -- Vehicle name.
+   "date_captured":           <str> -- Date (YYYY-MM-DD).
+   "location":                <str> -- Area where log was captured, e.g. singapore-onenorth.
+}
+```
+
+object_ann
+---------
+The annotation of a foreground object (car, bike, pedestrian) in an image.
+Each foreground object is annotated with a 2d box, a 2d instance mask and category-specific attributes.
+```
+object_ann {
+    "token":                  <str> -- Unique record identifier.
+    "sample_data_token":      <str> -- Foreign key pointing to the sample data, which must be a keyframe image.
+    "category_token":         <str> -- Foreign key pointing to the object category.
+    "attribute_tokens":       <str> [n] -- Foreign keys. List of attributes for this annotation.
+    "bbox":                   <int> [4] -- Annotated amodal bounding box. Given as [xmin, ymin, xmax, ymax].
+    "mask":                   <RLE> -- Run length encoding of instance mask using the pycocotools package.
+}
+```
+
+sample_data
+---------
+A sensor data e.g. image or lidar pointcloud. Note that we don't have radar in nuImages.
+Sample_data covers all sensor data, regardless of whether it is a keyframe or not.
+For every keyframe image or lidar, we also include up to 6 past and 6 future sweeps at 2 Hz.
+We can navigate between consecutive lidar or camera sample_datas using the `prev` and `next` pointers.
+Only keyframe (sample) images are annotated.
+The sample timestamp is inherited from the keyframe camera sample_data timestamp.
+```
+sample_data {
+   "token":                   <str> -- Unique record identifier.
+   "sample_token":            <str> -- Foreign key. Sample to which this sample_data is associated.
+   "ego_pose_token":          <str> -- Foreign key.
+   "calibrated_sensor_token": <str> -- Foreign key.
+   "filename":                <str> -- Relative path to data-blob on disk.
+   "fileformat":              <str> -- Data file format.
+   "width":                   <int> -- If the sample data is an image, this is the image width in pixels.
+   "height":                  <int> -- If the sample data is an image, this is the image height in pixels.
+   "timestamp":               <int> -- Unix time stamp.
+   "is_key_frame":            <bool> -- True if sample_data is part of key_frame, else False.
+   "next":                    <str> -- Foreign key. Sample data from the same sensor that follows this in time. Empty if end of scene.
+   "prev":                    <str> -- Foreign key. Sample data from the same sensor that precedes this in time. Empty if start of scene.
+}
+```
+
+sample
+---------
+A sample is an annotated keyframe selected from a large pool of images in a log.
+Every sample has up to 13 lidar sample_datas and 13 camera sample_datas corresponding to it.
+These include the actual lidar and camera keyframe sample_datas, which can be accessed via the `key_*_token` fields.
+```
+sample {
+   "token":                   <str> -- Unique record identifier.
+   "timestamp":               <int> -- Unix time stamp.
+   "log_token":               <str> -- Foreign key pointing to the log.
+   "key_camera_token":        <str> -- Foreign key of the sample_data corresponding to the camera keyframe.
+   "key_lidar_token":         <str> -- Foreign key of the sample_data corresponding to the lidar keyframe.
+}
+```
+
+sensor
+---------
+---------
+A specific sensor type.
+```
+sensor {
+   "token":                   <str> -- Unique record identifier.
+   "channel":                 <str> -- Sensor channel name.
+   "modality":                <str> {camera, lidar} -- Sensor modality. Supports category(ies) in brackets.
+}
+```
+
+surface_ann
+---------
+The annotation of a background object (driveable surface) in an image.
+Each background object is annotated with a 2d semantic segmentation mask.
+```
+surface_ann {
+   "token":                   <str> -- Unique record identifier.
+    "sample_data_token":      <str> -- Foreign key pointing to the sample data, which must be a keyframe image.
+    "category_token":         <str> -- Foreign key pointing to the surface category.
+    "mask":                   <RLE> -- Run length encoding of segmentation mask using the pycocotools package.
+}
+```
diff --git a/docs/schema_nuscenes.md b/docs/schema_nuscenes.md
@@ -5,25 +5,24 @@ All annotations and meta data (including calibration, maps, vehicle coordinates
 The database tables are listed below.
 Every row can be identified by its unique primary key `token`.
 Foreign keys such as `sample_token` may be used to link to the `token` of the table `sample`.
-Please refer to the [tutorial](https://www.nuscenes.org/tutorial) for an introduction to the most important database tables.
+Please refer to the [tutorial](https://www.nuscenes.org/nuimages#tutorial) for an introduction to the most important database tables.
 
 ![](https://www.nuscenes.org/public/images/nuscenes-schema.svg)
 
 attribute
 ---------
-
 An attribute is a property of an instance that can change while the category remains the same.
- Example: a vehicle being parked/stopped/moving, and whether or not a bicycle has a rider.
+Example: a vehicle being parked/stopped/moving, and whether or not a bicycle has a rider.
 ```
 attribute {
    "token":                   <str> -- Unique record identifier.
    "name":                    <str> -- Attribute name.
    "description":             <str> -- Attribute description.
 }
 ```
+
 calibrated_sensor
 ---------
-
 Definition of a particular sensor (lidar/radar/camera) as calibrated on a particular vehicle.
 All extrinsic parameters are given with respect to the ego vehicle body frame.
 All camera images come undistorted and rectified.
@@ -36,11 +35,11 @@ calibrated_sensor {
    "camera_intrinsic":        <float> [3, 3] -- Intrinsic camera calibration. Empty for sensors that are not cameras.
 }
 ```
+
 category
 ---------
-
 Taxonomy of object categories (e.g. vehicle, human). 
-Subcategories are delineated by a period (e.g. human.pedestrian.adult).
+Subcategories are delineated by a period (e.g. `human.pedestrian.adult`).
 ```
 category {
    "token":                   <str> -- Unique record identifier.
@@ -49,9 +48,9 @@ category {
    "index":                   <int> -- The index of the label used for efficiency reasons in the .bin label files of nuScenes-lidarseg. This field did not exist previously.
 }
 ```
+
 ego_pose
 ---------
-
 Ego vehicle pose at a particular timestamp. Given with respect to global coordinate system of the log's map.
 The ego_pose is the output of a lidar map-based localization algorithm described in our paper.
 The localization is 2-dimensional in the x-y plane.
@@ -63,24 +62,24 @@ ego_pose {
    "timestamp":               <int> -- Unix time stamp.
 }
 ```
+
 instance
 ---------
-
 An object instance, e.g. particular vehicle.
 This table is an enumeration of all object instances we observed.
 Note that instances are not tracked across scenes.
 ```
 instance {
    "token":                   <str> -- Unique record identifier.
-   "category_token":          <str> -- Foreign key. Object instance category.
+   "category_token":          <str> -- Foreign key pointing to the object category.
    "nbr_annotations":         <int> -- Number of annotations of this instance.
    "first_annotation_token":  <str> -- Foreign key. Points to the first annotation of this instance.
    "last_annotation_token":   <str> -- Foreign key. Points to the last annotation of this instance.
 }
 ```
+
 lidarseg
 ---------
-
 Mapping between nuScenes-lidarseg annotations and sample_datas corresponding to the lidar pointcloud associated with a keyframe. 
 ```
 lidarseg {
@@ -89,9 +88,9 @@ lidarseg {
    "sample_data_token":       <str> -- Foreign key. Sample_data corresponding to the annotated lidar pointcloud with is_key_frame=True. 
 }
 ```
+
 log
 ---------
-
 Information about the log from which the data was extracted.
 ```
 log {
@@ -102,9 +101,9 @@ log {
    "location":                <str> -- Area where log was captured, e.g. singapore-onenorth.
 }
 ```
+
 map
 ---------
-
 Map data that is stored as binary semantic masks from a top-down view.
 ```
 map {
@@ -114,10 +113,11 @@ map {
    "filename":                <str> -- Relative path to the file with the map mask.
 }
 ```
+
 sample
 ---------
-
-A sample is data collected at (approximately) the same timestamp as part of a single LIDAR sweep.
+A sample is an annotated keyframe at 2 Hz.
+The data is collected at (approximately) the same timestamp as part of a single LIDAR sweep.
 ```
 sample {
    "token":                   <str> -- Unique record identifier.
@@ -127,17 +127,17 @@ sample {
    "prev":                    <str> -- Foreign key. Sample that precedes this in time. Empty if start of scene.
 }
 ```
+
 sample_annotation
 ---------
-
 A bounding box defining the position of an object seen in a sample.
 All location data is given with respect to the global coordinate system.
 ```
 sample_annotation {
    "token":                   <str> -- Unique record identifier.
    "sample_token":            <str> -- Foreign key. NOTE: this points to a sample NOT a sample_data since annotations are done on the sample level taking all relevant sample_data into account.
    "instance_token":          <str> -- Foreign key. Which object instance is this annotating. An instance can have multiple annotations over time.
-   "attribute_tokens":        <str> [n] -- Foreign keys. List of attributes for this annotation. Attributes can change over time, so they belong here, not in the object table.
+   "attribute_tokens":        <str> [n] -- Foreign keys. List of attributes for this annotation. Attributes can change over time, so they belong here, not in the instance table.
    "visibility_token":        <str> -- Foreign key. Visibility may also change over time. If no visibility is annotated, the token is an empty string.
    "translation":             <float> [3] -- Bounding box location in meters as center_x, center_y, center_z.
    "size":                    <float> [3] -- Bounding box size in meters as width, length, height.
@@ -148,9 +148,9 @@ sample_annotation {
    "prev":                    <str> -- Foreign key. Sample annotation from the same object instance that precedes this in time. Empty if this is the first annotation for this object.
 }
 ```
+
 sample_data
 ---------
-
 A sensor data e.g. image, point cloud or radar return. 
 For sample_data with is_key_frame=True, the time-stamps should be very close to the sample it points to.
 For non key-frames the sample_data points to the sample that follows closest in time.
@@ -170,9 +170,9 @@ sample_data {
    "prev":                    <str> -- Foreign key. Sample data from the same sensor that precedes this in time. Empty if start of scene.
 }
 ```
+
 scene
 ---------
-
 A scene is a 20s long sequence of consecutive frames extracted from a log. 
 Multiple scenes can come from the same log. 
 Note that object identities (instance tokens) are not preserved across scenes.
@@ -187,9 +187,9 @@ scene {
    "last_sample_token":       <str> -- Foreign key. Points to the last sample in scene.
 }
 ```
+
 sensor
 ---------
-
 A specific sensor type.
 ```
 sensor {
@@ -198,9 +198,9 @@ sensor {
    "modality":                <str> {camera, lidar, radar} -- Sensor modality. Supports category(ies) in brackets.
 }
 ```
+
 visibility
 ---------
-
 The visibility of an instance is the fraction of annotation visible in all 6 images. Binned into 4 bins 0-40%, 40-60%, 60-80% and 80-100%.
 ```
 visibility {