Skip to content

Commit

Permalink
nuImages videos (#432)
Browse files Browse the repository at this point in the history
* Fix automatic test discover and disable test outputs

* Add nuImages schema

* Add function to render videos

* Unified image and video rendering

* Further improvements to render script

* Fix bug around unchanged mode

* Check number of sample_datas

* Add new function to tutorial

* Add sort order argument to list_categories

* Fix sorting

* Garbage collection to avoid memory overflow

* Format

* Reorganize render_images, add render_rare_classes

* Rename has_rider attribute in test

* Minor fix for tests on test set

* Replace val with mini

* Address review comments
  • Loading branch information
holger-motional authored Jul 13, 2020
1 parent ac4bbb7 commit 71c0f2b
Show file tree
Hide file tree
Showing 15 changed files with 559 additions and 134 deletions.
1 change: 1 addition & 0 deletions docs/instructions_nuimages.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
TODO: Coming soon!
162 changes: 160 additions & 2 deletions docs/schema_nuimages.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,164 @@
nuImages schema
==========
This document describes the database schema used in nuImages.
All annotations and meta data (including calibration, maps, vehicle coordinates etc.) are covered in a relational database.
The database tables are listed below.
Every row can be identified by its unique primary key `token`.
Foreign keys such as `sample_token` may be used to link to the `token` of the table `sample`.
Please refer to the [tutorial](https://www.nuscenes.org/nuimages#tutorial) for an introduction to the most important database tables.

TODO: Coming soon!
![](https://www.nuscenes.org/public/images/nuimages-schema.svg)

![](https://www.nuscenes.org/public/images/nuimages-schema.svg)
attribute
---------
An attribute is a property of an instance that can change while the category remains the same.
Example: a vehicle being parked/stopped/moving, and whether or not a bicycle has a rider.
The attributes in nuImages are a superset of those in nuScenes.
```
attribute {
"token": <str> -- Unique record identifier.
"name": <str> -- Attribute name.
"description": <str> -- Attribute description.
}
```

calibrated_sensor
---------
Definition of a particular sensor (lidar/camera, but no radar) as calibrated on a particular vehicle.
All extrinsic parameters are given with respect to the ego vehicle body frame.
All camera images come undistorted and rectified.
```
calibrated_sensor {
"token": <str> -- Unique record identifier.
"sensor_token": <str> -- Foreign key pointing to the sensor type.
"translation": <float> [3] -- Coordinate system origin in meters: x, y, z.
"rotation": <float> [4] -- Coordinate system orientation as quaternion: w, x, y, z.
"camera_intrinsic": <float> [3, 3] -- Intrinsic camera calibration. Empty for sensors that are not cameras.
"camera_distortion": <float> [5 or 6] -- Camera calibration parameters. We use the 5 parameter camera convention of the CalTech camera calibration toolbox, that is also used in OpenCV. Only for fish-eye lenses in CAM_BACK do we use the 6th parameter.
}
```

category
---------
Taxonomy of object categories (e.g. vehicle, human).
Subcategories are delineated by a period (e.g. `human.pedestrian.adult`).
The categories in nuImages are the same as in the nuScenes (w/o lidarseg), plus `flat.driveable_surface`.
```
category {
"token": <str> -- Unique record identifier.
"name": <str> -- Category name. Subcategories indicated by period.
"description": <str> -- Category description.
}
```

ego_pose
---------
Ego vehicle pose at a particular timestamp. Given with respect to global coordinate system of the log's map.
The ego_pose is the output of a lidar map-based localization algorithm described in our paper.
The localization is 2-dimensional in the x-y plane.
Warning: nuImages is collected from almost 500 logs with different maps versions.
Therefore the coordinates **should not be compared across logs** or rendered on the semantic maps of nuScenes.
```
ego_pose {
"token": <str> -- Unique record identifier.
"translation": <float> [3] -- Coordinate system origin in meters: x, y, z. Note that z is always 0.
"rotation": <float> [4] -- Coordinate system orientation as quaternion: w, x, y, z.
"timestamp": <int> -- Unix time stamp.
"rotation_rate": <float> [3] -- The angular velocity vector (x, y, z) of the vehicle in rad/s. This is expressed in the ego vehicle frame.
"acceleration": <float> [3] -- Acceleration vector (x, y, z) in the ego vehicle frame in m/s/s. The z value is close to the gravitational acceleration `g = 9.81 m/s/s`.
"speed": <float> -- The speed of the ego vehicle in the driving direction in m/s.
}
```

log
---------
Information about the log from which the data was extracted.
```
log {
"token": <str> -- Unique record identifier.
"logfile": <str> -- Log file name.
"vehicle": <str> -- Vehicle name.
"date_captured": <str> -- Date (YYYY-MM-DD).
"location": <str> -- Area where log was captured, e.g. singapore-onenorth.
}
```

object_ann
---------
The annotation of a foreground object (car, bike, pedestrian) in an image.
Each foreground object is annotated with a 2d box, a 2d instance mask and category-specific attributes.
```
object_ann {
"token": <str> -- Unique record identifier.
"sample_data_token": <str> -- Foreign key pointing to the sample data, which must be a keyframe image.
"category_token": <str> -- Foreign key pointing to the object category.
"attribute_tokens": <str> [n] -- Foreign keys. List of attributes for this annotation.
"bbox": <int> [4] -- Annotated amodal bounding box. Given as [xmin, ymin, xmax, ymax].
"mask": <RLE> -- Run length encoding of instance mask using the pycocotools package.
}
```

sample_data
---------
A sensor data e.g. image or lidar pointcloud. Note that we don't have radar in nuImages.
Sample_data covers all sensor data, regardless of whether it is a keyframe or not.
For every keyframe image or lidar, we also include up to 6 past and 6 future sweeps at 2 Hz.
We can navigate between consecutive lidar or camera sample_datas using the `prev` and `next` pointers.
Only keyframe (sample) images are annotated.
The sample timestamp is inherited from the keyframe camera sample_data timestamp.
```
sample_data {
"token": <str> -- Unique record identifier.
"sample_token": <str> -- Foreign key. Sample to which this sample_data is associated.
"ego_pose_token": <str> -- Foreign key.
"calibrated_sensor_token": <str> -- Foreign key.
"filename": <str> -- Relative path to data-blob on disk.
"fileformat": <str> -- Data file format.
"width": <int> -- If the sample data is an image, this is the image width in pixels.
"height": <int> -- If the sample data is an image, this is the image height in pixels.
"timestamp": <int> -- Unix time stamp.
"is_key_frame": <bool> -- True if sample_data is part of key_frame, else False.
"next": <str> -- Foreign key. Sample data from the same sensor that follows this in time. Empty if end of scene.
"prev": <str> -- Foreign key. Sample data from the same sensor that precedes this in time. Empty if start of scene.
}
```

sample
---------
A sample is an annotated keyframe selected from a large pool of images in a log.
Every sample has up to 13 lidar sample_datas and 13 camera sample_datas corresponding to it.
These include the actual lidar and camera keyframe sample_datas, which can be accessed via the `key_*_token` fields.
```
sample {
"token": <str> -- Unique record identifier.
"timestamp": <int> -- Unix time stamp.
"log_token": <str> -- Foreign key pointing to the log.
"key_camera_token": <str> -- Foreign key of the sample_data corresponding to the camera keyframe.
"key_lidar_token": <str> -- Foreign key of the sample_data corresponding to the lidar keyframe.
}
```

sensor
---------
---------
A specific sensor type.
```
sensor {
"token": <str> -- Unique record identifier.
"channel": <str> -- Sensor channel name.
"modality": <str> {camera, lidar} -- Sensor modality. Supports category(ies) in brackets.
}
```

surface_ann
---------
The annotation of a background object (driveable surface) in an image.
Each background object is annotated with a 2d semantic segmentation mask.
```
surface_ann {
"token": <str> -- Unique record identifier.
"sample_data_token": <str> -- Foreign key pointing to the sample data, which must be a keyframe image.
"category_token": <str> -- Foreign key pointing to the surface category.
"mask": <RLE> -- Run length encoding of segmentation mask using the pycocotools package.
}
```
40 changes: 20 additions & 20 deletions docs/schema_nuscenes.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,25 +5,24 @@ All annotations and meta data (including calibration, maps, vehicle coordinates
The database tables are listed below.
Every row can be identified by its unique primary key `token`.
Foreign keys such as `sample_token` may be used to link to the `token` of the table `sample`.
Please refer to the [tutorial](https://www.nuscenes.org/tutorial) for an introduction to the most important database tables.
Please refer to the [tutorial](https://www.nuscenes.org/nuimages#tutorial) for an introduction to the most important database tables.

![](https://www.nuscenes.org/public/images/nuscenes-schema.svg)

attribute
---------

An attribute is a property of an instance that can change while the category remains the same.
Example: a vehicle being parked/stopped/moving, and whether or not a bicycle has a rider.
Example: a vehicle being parked/stopped/moving, and whether or not a bicycle has a rider.
```
attribute {
"token": <str> -- Unique record identifier.
"name": <str> -- Attribute name.
"description": <str> -- Attribute description.
}
```

calibrated_sensor
---------

Definition of a particular sensor (lidar/radar/camera) as calibrated on a particular vehicle.
All extrinsic parameters are given with respect to the ego vehicle body frame.
All camera images come undistorted and rectified.
Expand All @@ -36,11 +35,11 @@ calibrated_sensor {
"camera_intrinsic": <float> [3, 3] -- Intrinsic camera calibration. Empty for sensors that are not cameras.
}
```

category
---------

Taxonomy of object categories (e.g. vehicle, human).
Subcategories are delineated by a period (e.g. human.pedestrian.adult).
Subcategories are delineated by a period (e.g. `human.pedestrian.adult`).
```
category {
"token": <str> -- Unique record identifier.
Expand All @@ -49,9 +48,9 @@ category {
"index": <int> -- The index of the label used for efficiency reasons in the .bin label files of nuScenes-lidarseg. This field did not exist previously.
}
```

ego_pose
---------

Ego vehicle pose at a particular timestamp. Given with respect to global coordinate system of the log's map.
The ego_pose is the output of a lidar map-based localization algorithm described in our paper.
The localization is 2-dimensional in the x-y plane.
Expand All @@ -63,24 +62,24 @@ ego_pose {
"timestamp": <int> -- Unix time stamp.
}
```

instance
---------

An object instance, e.g. particular vehicle.
This table is an enumeration of all object instances we observed.
Note that instances are not tracked across scenes.
```
instance {
"token": <str> -- Unique record identifier.
"category_token": <str> -- Foreign key. Object instance category.
"category_token": <str> -- Foreign key pointing to the object category.
"nbr_annotations": <int> -- Number of annotations of this instance.
"first_annotation_token": <str> -- Foreign key. Points to the first annotation of this instance.
"last_annotation_token": <str> -- Foreign key. Points to the last annotation of this instance.
}
```

lidarseg
---------

Mapping between nuScenes-lidarseg annotations and sample_datas corresponding to the lidar pointcloud associated with a keyframe.
```
lidarseg {
Expand All @@ -89,9 +88,9 @@ lidarseg {
"sample_data_token": <str> -- Foreign key. Sample_data corresponding to the annotated lidar pointcloud with is_key_frame=True.
}
```

log
---------

Information about the log from which the data was extracted.
```
log {
Expand All @@ -102,9 +101,9 @@ log {
"location": <str> -- Area where log was captured, e.g. singapore-onenorth.
}
```

map
---------

Map data that is stored as binary semantic masks from a top-down view.
```
map {
Expand All @@ -114,10 +113,11 @@ map {
"filename": <str> -- Relative path to the file with the map mask.
}
```

sample
---------

A sample is data collected at (approximately) the same timestamp as part of a single LIDAR sweep.
A sample is an annotated keyframe at 2 Hz.
The data is collected at (approximately) the same timestamp as part of a single LIDAR sweep.
```
sample {
"token": <str> -- Unique record identifier.
Expand All @@ -127,17 +127,17 @@ sample {
"prev": <str> -- Foreign key. Sample that precedes this in time. Empty if start of scene.
}
```

sample_annotation
---------

A bounding box defining the position of an object seen in a sample.
All location data is given with respect to the global coordinate system.
```
sample_annotation {
"token": <str> -- Unique record identifier.
"sample_token": <str> -- Foreign key. NOTE: this points to a sample NOT a sample_data since annotations are done on the sample level taking all relevant sample_data into account.
"instance_token": <str> -- Foreign key. Which object instance is this annotating. An instance can have multiple annotations over time.
"attribute_tokens": <str> [n] -- Foreign keys. List of attributes for this annotation. Attributes can change over time, so they belong here, not in the object table.
"attribute_tokens": <str> [n] -- Foreign keys. List of attributes for this annotation. Attributes can change over time, so they belong here, not in the instance table.
"visibility_token": <str> -- Foreign key. Visibility may also change over time. If no visibility is annotated, the token is an empty string.
"translation": <float> [3] -- Bounding box location in meters as center_x, center_y, center_z.
"size": <float> [3] -- Bounding box size in meters as width, length, height.
Expand All @@ -148,9 +148,9 @@ sample_annotation {
"prev": <str> -- Foreign key. Sample annotation from the same object instance that precedes this in time. Empty if this is the first annotation for this object.
}
```

sample_data
---------

A sensor data e.g. image, point cloud or radar return.
For sample_data with is_key_frame=True, the time-stamps should be very close to the sample it points to.
For non key-frames the sample_data points to the sample that follows closest in time.
Expand All @@ -170,9 +170,9 @@ sample_data {
"prev": <str> -- Foreign key. Sample data from the same sensor that precedes this in time. Empty if start of scene.
}
```

scene
---------

A scene is a 20s long sequence of consecutive frames extracted from a log.
Multiple scenes can come from the same log.
Note that object identities (instance tokens) are not preserved across scenes.
Expand All @@ -187,9 +187,9 @@ scene {
"last_sample_token": <str> -- Foreign key. Points to the last sample in scene.
}
```

sensor
---------

A specific sensor type.
```
sensor {
Expand All @@ -198,9 +198,9 @@ sensor {
"modality": <str> {camera, lidar, radar} -- Sensor modality. Supports category(ies) in brackets.
}
```

visibility
---------

The visibility of an instance is the fraction of annotation visible in all 6 images. Binned into 4 bins 0-40%, 40-60%, 60-80% and 80-100%.
```
visibility {
Expand Down
Loading

0 comments on commit 71c0f2b

Please sign in to comment.