Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

article about rosbags in ROS2 #160

Open
wants to merge 4 commits into
base: gh-pages
Choose a base branch
from
Open

article about rosbags in ROS2 #160

wants to merge 4 commits into from

Conversation

Karsten1987
Copy link
Contributor

This PR is the place to gather feedback for the design doc about rosbags in ROS2.0

The section so far is structured in such that it provides a couple of "Alternatives" for both, the requirements for ROSbags as well as which underlying data format is going to be used.
Eventually, these alternatives become part of the fixed requirements and proposed data formats when discussion comes to a consensus.

@Karsten1987 Karsten1987 added website more-information-needed Further information is required help wanted Extra attention is needed labels Jan 10, 2018
@Karsten1987 Karsten1987 self-assigned this Jan 10, 2018
@allenh1
Copy link

allenh1 commented Jan 10, 2018

I was thinking about this recently myself, and, since you started writing this, I'll tag this on here.

I thought it might be cool to introduce a layer of abstraction such that the user can provide functions to read from their format, then ros2 bag will simply use those functions when publishing its messages.

from ros2bag.message_layer import MessageLayer
import ros2bag.spin

class WeirdFormatPlayer:
    def __init__(self):
        convert_layer = ros2bag.message_layer()
        # override with a custom way to get the next message
        convert_layer.get_next = self.get_next_message()

    def get_next_message(self):
        # reads the next message from our weird thing
        return next_message_to_publish

# somewhere in main
ros2bag.spin()

Given that this is hideous pseudocode, this abstract silliness might help someone rephrase this to something useful (or me when I have a few minutes).

sqlite

more pros and cons

restructure
#### Parallel I/O
Processing time increases with slow file I/O.
In order to provide efficient data processing, a parallel read and write to the file from multiple processes should be available.
This would allow multiple processes (e.g. one per sensor) directly write to a commonly shared bag file without having a single recoding instance subscribing to all topics..

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo: two periods at the end of the sentence.

@dirk-thomas dirk-thomas added the in progress Actively being worked on (Kanban column) label Jan 18, 2018
A third alternative which provides capabilities for data logging, and is used as such, is [SQLite](https://www.sqlite.org/about.html)
Despite other relational database systems, it doesn't require any external SQL Server, but is self-contained.
It is also open source, [extensively tested](https://www.sqlite.org/testing.html) and known to have a large [community](https://www.sqlite.org/support.html).
The Gazebo team created a nicely composed [comparison](https://osrfoundation.atlassian.net/wiki/spaces/GAZ/pages/99844178/SQLite3+Proposal) between SQLite and the existing rosbag format.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RTI's recording tools are also based on SQLite, and seem to work pretty well. You might want to experiment with them as part of your research.

@jack-oquin
Copy link

@katiewasnothere: some of these ideas may be relevant to your project.

@katiewasnothere
Copy link

The Building Wide Intelligence Lab at UT Austin is currently looking into data loss issues with rosbag in ROS 1.0. Our goal is to be able to maintain the original topic publishing frequency within a rosbag so that there’s little to no data loss (minimal messages dropped and minimal to no compression).

For example, using the current implementation in ROS 1.0, our empirical data shows that raw messages from a kinect camera topic in a simulated environment typically produce a publishing frequency of about 19.942 hz. However, when all topics in the simulation environment are recorded to a rosbag (some 24 ROS topics ranging in frequency from about 200 hz to 1 hz), a significant amount of messages are dropped, reducing the publishing rate when the rosbag is played back to roughly 6.375 hz. The ability to save all topic messages for a given scenario can greatly help with training learning algorithms as well as in debugging failures. It would be nice to have functionality in ROS 2.0’s future rosbag implementation to prevent such extreme data loss.

@ruffsl
Copy link
Member

ruffsl commented Apr 17, 2018

This might be a lofty goal, but I think it would be useful for auditing purposes to have a optional storage format that could support recording append-only like file system or write to disk behavior. Think of rosbag recording in autonomous cars or other commercial applications.

I'm currently making an audit framework for creating existence proofs for appending log files, thus rendering them immutable or at least cryptographically verifiable on the fly. I'm not yet sure if checkpointing log events (or in the case of rosbag: messages or chunks of messages) via SQLite is feasible (I haven't yet looked into it deeply), but doing so on the ros1 v2 bagfile binary format is not so practical as the file includes that changes bitwise on disk in an out of order fashion with respect to when bytes on disk are first written to, E.g. key=value pairs are overwritten in ros2 bagfiles index header near the start of the file whenever a new message with a novel connection id is received given its indexing is connection-oriented.

This header structure in ros1 v2 bagfile of course speeds up the traversal of large binary data with meta information, but its implementation of random write operations makes it hard to protect growing recording files from malicious mutation or accidental corruption given that a digest of written bytes of the file cannot be linked to provide a time series proof of existence, eg compounding HMAC computations on log writes.

Perhaps this is a discussion for a specific storage plugin for immutable data stores:
http://usblogs.pwc.com/emerging-technology/the-rise-of-immutable-data-stores/

but I just wanted to mention this as data storage formats are touched upon in this PR.

@gbiggs
Copy link
Member

gbiggs commented Apr 18, 2018

I think it would be useful for auditing purposes to have a optional storage format that could support recording append-only like file system or write to disk behavior.

Big vote of support for this idea.

@gbiggs
Copy link
Member

gbiggs commented Apr 18, 2018

The Building Wide Intelligence Lab at UT Austin is currently looking into data loss issues with rosbag in ROS 1.0. Our goal is to be able to maintain the original topic publishing frequency within a rosbag so that there’s little to no data loss (minimal messages dropped and minimal to no compression).

This is as much about the tool implementation as it is about the format. A lossless rosbag would need to provide timing (possibly hard real-time) and bandwidth guarantees about how much data it can save. It may need to use strategic buffering, it may need to leverage OS support for buffers, it would need to understand the storage medium and that medium's capabilities, it would probably need to make configurable decisions about what gets dropped if there is not enough storage bandwidth available, and probably a whole pile of things I haven't thought of because I'm not a storage expert.

In short, it would make a pretty interesting research project and the result would be massively useful to people like @ruffsl and the wider ROS community.

It would be worth looking at the logging tools used in the financial domain. They have to log absolutely everything exactly as it happened and have really tight timing requirements.

@paulbovbel
Copy link

paulbovbel commented May 5, 2018

A notable issue with ROS1 bagging as a data recording approach is that it creates a parallel communication channel for the monitored topics, and is not always an accurate representation of data flow in the system being monitored.

In ROS1, where the rosbag record records topics via a standard pub/sub API, two bag recorders running on the same graph may end up with a different set of data. However, most traffic in ROS1 graphs is 'reliable'. I imagine this issue could be much more common in ROS2 due to the first-class support of 'best-effort' data transmission.

It might be worthwhile to consider either implementing, or leaving the door open via a plugin-model, to something like a pcap-based ros2bag record, where rosbag data is collected directly from the wire (or from an existing packet capture).

@LucidOne
Copy link

LucidOne commented May 19, 2018

ROS1 Notes

Here are some of my notes from after we logged over a petabyte of data with ROS1 rosbag in production with multiple GigE Vision cameras, 3D/2D LIDARS, GPS, etc.

Multithreaded compression

We were able to record uncompressed bags at speeds up to about 6Gb/s (sensor limited) with ROS1. Storage was to a striped pair of SATA3 4TB drives.

Compression issues caused silent failures as rosbag was limited to a single thread per bag and it was unable to provide sufficient throughput. With the unlimited queue size (--buffsize) configured with rosbag, and rosbag grew to fill system memory. The OOM killer then terminated rosbag processes when the single compression thread was unable to keep up, the alternative was dropping messages.

Better defaults / errors

I'm not sure the current defaults are the best defaults for most users, much less novice users.
It might be worth having a rosbag benchmark to determine optimal --chunksize based on actual publishing rates / compression performance.
rosbag should throw warnings when the message queue (--buffsize) is full
rosbag needs to log errors before the OOM Killer strikes.

Split/concatenate bags

We somewhat arbitrarily chose --split to be 2GB to avoid running into trouble. This worked well when we were post-processing data in the cloud, but it would have been nice to be able to concatenate multiple bags into a single bag for smoother playback in RViz.

I'd also argue that I should be able to split an existing bag and have metadata copied to each new bag.

When playing a sequence of bags there was some latency as the current bag was closed and the next one opened, depending on compression and system load this can cause timing issues. Being able to concatenate bags would help with this.

Spatial Indexing

For a roughly 12 hour operational period we could record up to 4TB of data split into 2GB bags (~20 seconds per bag). To locate which bag held data associate with a given GPS position we recorded a separate unsplit low frequency bag to index GPS position information that spanned the entire period. Our cloud processing system ingested the GPS index bag and would then process bags with the full sensor data as needed.

Caching

We used rosbag info to validate data uploads to the cloud. However it can take considerable time to run when processing vast quantities of data. I think it is worth considering caching the output in a metadata record to help with data validation.

Param/TF Recording

Recording and playback of TFs never quite works as expected. The main rosbag issue we had is that if a static TF was published at 1Hz and we split bags the first second of data might be discarded as the associated TF was in the previous bag.

One issue commonly encountered during development was trying to debug a node that published a TF. Bag playback would broadcast the same TF occasionally causing confusion with developers. While it can be done with rosbag filter, I'd like to propose adding something like --no-tf to rosbag play. to avoid duplication.

From a usability standpoint, maybe it makes sense to store static TF data separately at the beginning of each bag and enable rosbag play to automatically launch a static TF publisher via plugins.

Service calls / Parameter Server

Is it reasonable to publish service call information in a manner similar to diagnostic aggregator for recording/playback? Does this provide a reasonable way of recording parameter updates?

ROS1 Reliability

It is my experience that rosbag is extremely reliable, when everything else is working. I do not believe we found any real bugs in rosbag. If asked to quantify the reliability I would validate rosbag against something like pcap or pf_ring from actual hardware data to isolate network issues.

During development many of our issues turned out to be network related. Collisions caused retransmissions which consumed bandwidth, which caused collisions, etc. The worst of which was caused by 8 GigE Vision cameras that were hardware synchronized and crashed the network every 30th of a second while iftop claimed bandwidth to spare due to the sampling window. Fixing this, it was found that many quad-port gigabit Ethernet cards actually have an internal switch instead of separate independent physical interfaces and can not sustain more than 1 Gb/s.

I also recall we had an issue similar to what @paulbovbel commented on where the publishing rate measured by diagnostic aggregators did not match the logged frequency.

it is also worth noting that not all hardware drivers have the same sense of timestamps, and when an image is retrieved from hardware may not be the time it was captured, especially when capture is hardware triggered. This caused issues that we initially blamed on rosbag.

@LucidOne
Copy link

LucidOne commented May 20, 2018

Article Feedback

In general this looks like a good start

SQLite

I think SQLite storage is a reasonable option, however I'm unconvinced that it should be the default storage format. SQLite optimizes for select performance not insert performance

  1. In my experience with rosbag, onboard write performance during operation is generally more resource constrained than playback during debugging on a development workstation
  2. System resources used by bagging are not available for other tasks, many embedded-ish systems (Atom, ARM) have memory & i/o bandwidth constraints
  3. SQLite b-trees do not take advantage of the temporal adjacency of message recording and playback, whereas with Bag Format 2.0 the messages are recorded next to each other on disk
  4. Offline conversion from Bag Format 2.0 to Postgres has previously worked well for me and improving CLI conversion/export tools for SQLite/Postgres could provide a solution for random access use cases
rosbag export example.bag sqlite://example.db3
rosbag export example.bag postgresql://ros@localhost/control
  1. My opinion is that the defaults should optimize for production instead of simple demos. IMHO high datarate debugging falls into the 80% (of my personal use cases, obvs)

HDF5

😒 👎
Edit: I'll add more on this soon.

ROS1

"No random access"
While random access is not available by default, using a seek operation to access the correct chunk isn't that inefficient/difficult and usually most applications also require data that is adjacent anyway. It may be that some developers are more comfortable with having random access to data, even if they end up randomly accessing sequential data sequentially.

From the Gazeebo comments

"Rosbag 2.0 format is analogous to a singly-linked list and requires reading from the beginning of the file."
I would like to note that it is not a singly-linked list of messages, but a singly linked list of chunks that have a default size of 768KB. This makes it relatively efficient to seek to the next index record to find a particular time stamp in a bag. It looks like there would be an upper bound of 2604 seek operations to locate a timestamp. One way to improve performance may be to store a meta index record with offset position of all previous index records at the end of the bag to make it easier to implement a binary search of timestamp positions. This meta-index could be generated offline via rosbag reindex to simplify implementation. A fixed size metadata record at the beginning could contain the position of the metaindex. This should bring things down to something on the order of 9 seek operations (for a 2GB bag file) to find the chunk associated with a given timestamp.

SELECT messages.message FROM messages JOIN topics ON topics.id = messages.topic_id WHERE time_recv_utc > 12345678 AND time_recv_utc < 23456789 AND topics.name LIKE "/some/topic/name";

As far as I can tell, this would be more efficient with format 2.0 + metaindex than with SQL.

SELECT topics.name, message_types.name FROM topics JOIN message_types ON topics.message_type_id = message_types.id;

This functionality could be efficient with format 2.0 by caching the output of rosbag info.

Alternate storage container

While I like the idea of full support for storing messages in a relational database, I believe the default should be a stream oriented append only format 3.0.

Even if SQLite is the default, I would prefer ROS2 rosbag use a directory as the base storage container. This will provide a place to store bagging configuration information, signatures, hashes and metadata.
rosbag2 play yetanothertest_2018-01-01-00-00-01

yetanothertest_2018-01-01-00-00-01/rosbag.config
yetanothertest_2018-01-01-00-00-01/metadata.yaml
yetanothertest_2018-01-01-00-00-01/wind_speed.rrd
yetanothertest_2018-01-01-00-00-01/gps.db3
yetanothertest_2018-01-01-00-00-01/imu.db3
yetanothertest_2018-01-01-00-00-01/left_camera.bag
yetanothertest_2018-01-01-00-00-01/right_camera.bag
yetanothertest_2018-01-01-00-00-01/control.bag
yetanothertest_2018-01-01-00-00-01/control.bag.sha1
yetanothertest_2018-01-01-00-00-01/control.bag.sign

This optimizes for implementation simplicity and multithreaded performance but requires additional files (1 Petabyte of 2 GB Bag Format 2.0 already requires 2M files and things tend to get odd after 32k files in a single directory)

One option for reducing the number of files might be to use a loopback image mount

Features

Format 3.0

Given that it was able to push 6Gb/s, I'm a proponent of the ROS bag format 2.0 and would like to see it updated to support ROS2 natively instead of being bridged. Maybe CDR, protobuf, etc can be implemented as a record types in format 3.0

Metadata

There have been several use cases where projects have needed some sort of metadata storage

Previously, we used markdown files to make sure field notes on hardware changes and operational information (client name, test/production, experiment #, etc) was passed from field to the cloud.

On another project, due to limited engineering resources available at the time, we published camera serial numbers to separate topics to track hardware changes. This worked well enough, but it required scanning the bag to grab the serial numbers or discarding messages published before the serial numbers were published. I think the current "standard" for this is to output serial numbers via a diagnostic aggregator, which isn't much better. Reimplementing this, I would have preferred storing the serial numbers in metadata so it is stored once at the beginning.

For cloud uploads we needed some metadata to help validate that the data sent was received. In this case we wrote the output of rosbag info to a file, uploaded the data and then ran rosbag info in the cloud to check that they matched. It would have been nice to cache this at the beginning of the same file

./Storage/2017-01-25-20-03-14.bag
./Storage/2017-01-25-20-03-14.md
./Storage/2017-01-25-20-03-14.info
./Storage/2017-01-25-20-03-14.sha1

Fixed size time series data

I think it is worth considering how to support something like RRDTool for data that decreases in resolution over time.

From a computer vision perspective lossy compression can be problematic, however something like RRDTool that supports dropping frames but keeping each frame uncompressed for older data may be useful for dash cam and blackbox applications. 30fps for the previous hour, 10fps for the previous day, 1 fps for the previous week, etc.

@LucidOne
Copy link

LucidOne commented May 20, 2018

@gbiggs I'm not sure about performance, but the Java makes this look like the financial industry solution
https://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy
https://www.datastax.com/customers

@gavanderhoorn
Copy link
Contributor

@LucidOne: your comments appear very rational and to be based on experience with the current (ROS 1) rosbag in production environments that (as you phrase it yourself) "are not simple demos".

Quite some people (in datascience) seem to at least imply that HDF5 is a format that is well suited to store (very) large datasets coming in at high datarates, is able to deal with partial data, supports hierarchical access and has some other characteristics that make it seem like it would at least deserve consideration as a potential rosbag format 3 storage format.

Your comment on HDF5 was:

😒 👎

In contrast to your other comments, this one seemed to lack some rationale :)

Could you perhaps expand a little on why you feel that HDF5 would not be up to the task (or at least, that is what I believe these two emoticons are intended to convey)?

@LucidOne
Copy link

LucidOne commented May 21, 2018

HDF5

Unlike SQLite, I have not actually used HDF5 so this has a bit more gut feelings and a bit less hands on experience, however I have put the word out to one of my colleagues who does use it to get more information about their experiences and will add that if/when they respond.

First, I would like to say that in theory HDF5 might a great choice, in practice I have some concerns. The blog post in the article and the HN comments cover a lot.

Complexity

It seems to me that HDF5 is a complex Domain Specific Format, one like GeoTIFF that just happens to support multiple domains. I personally don't have the fondest memories of GeoTIFF or even TIFF.

"When TIFF was introduced, its extensibility provoked compatibility problems." -Wikipedia
"Like the TIFF standard itself, GeoTIFF is conceptually simple, but the exact specification is complex and technical." -USGS
"The algorithms and data structures stored in an HDF5 file can be complex and difficult to understand well enough to parse correctly." -HDF Group

Storage

HDF5 that supports both contiguous and chunked data with a B-tree indexes. Which it pretty much ideal for what we want, however the single base C implementation means everything has the same bugs.

I understand how this happens, but HDF5's extensible compression seems excessive to me.

As far as I can tell it uses ASCII by default.

Concurrent Read Access is supported but thread safety may be a work in progress.

Development process

According to this HDF Group has switched to Git and will accept patches if you sign a Contributor agreement so they can close source the patch for enterprise customers. :real talk:

Tests have been written but I am unable to find where Continuous Integration for HDF5 is hosted.

The website needs improvement/refactoring with multiple copies of the Specification, that may or may not be the same.

The Git repo exists, but I was unable to find an issue tracker in the release notes, in 2018.
It looks like the HDF issue tracker may only be accessible to members of the HDF Group. 😕

Summary

I think HDF5 might be a reasonable in theory and I agree with much of the reasoning behind switching to a 3rd party file format maintained by a dedicated group, but in practice, my gut feeling is that a lot of time will be spent working through the complexity and cultural differences.

On the other hand, the HDF Group just announced "Enterprise" support, and hired a new Director of Engineering 6 months ago, so if their website improved, development processes modernize, benchmarks were reasonable, and engineering resource were available to deal with the complexity I could be talked into it.

@LucidOne
Copy link

@gbiggs
Copy link
Member

gbiggs commented May 30, 2018

Many, many years ago myself and Ingo Lütkebohle (@iluetkeb in case Github can ping him) did some work on improving the rosbag format. As I recall, Ingo and his group moved in the direction of an improved version of the rosbag format as it was at that time; I don't recall their results but I do remember that they changed stuff.

For myself, I spent quite a bit of time looking at using a modified version of the Matroska media container as a rosbag format and also as a format for recording point clouds. The reasoning behind this is that 1) rosbag is recording streams of time-indexed data, which media containers are explicitly designed to hold, and 2) Matroska uses an extremely flexible format called EBML (embedded binary markup language; think of it as being XML but with binary tags and for binary data). The format that resulted is specified here. I also had one prototype implementation and a half-complete redo based on what I learned, but given that part of that work's purpose was to teach me C++11 it's a bit of a mess in places. The work never went anywhere, unfortunately, so I don't know how performant the format is for recording.

The reason I'm mentioning this is that the modified format I and Ingo came up with, the Matroska format, and media containers such as MP4 in general support most of the feartures @LucidOne is asking for. In no particular order:

  • Metadata can be stored anywhere in the file and is instantly locatable.
  • The format provides a time-based index into the data at any resolution desired (even individual "frames" if you're willing to have a massive index). The index is stored at a place in the file recorded in the metadata at the start of the file so it can be quickly located. The index can easily be overwritten afterwards if it needs to be changed. Adding one after recording (rather than building it during recording) is also simple.
  • How big a chunk is used for data streams can be changed even while recording and on a stream-by-stream basis.
  • "Attachments" can be added to the file, allowing things like hardware serial numbers to be included, viewed, and even added or edited later if necessary without needing to have a dedicated data stream to store them.
  • Native support for segmenting a set of data into multiple files, with the metadata duplicated in each segment or not as desired and segment ordering defined in the files or controlled at playback as desired. Segmenting is commonly used by media players (the DVD and blu-ray formats are built on it) and they must do it with zero latency so if playing back multiple files without gaps is important to you, look into how media players achieve gapless playback (hint: playback buffers).
  • Data could be recorded as one stream per file, then a single file (or set of files covering a single set of data) can be built by muxing those streams together in the desired structure later on, without having to actually process the data itself - simply copy the segments into a single file and split them based on time.
  • Has optimisations in the format to reduce metadata, enabling recording with minimal overhead (when combined with buffering) and reduced disk space.
  • Files can be recovered so long as you have the SeekHead element, and you can probably recover the file even if you don't. Reorganising a file is also possible if you have the disc space, so this enables recording data as it comes in then reorganising it afterwards for efficient playback or querying or to put each data stream's data all together or whatever. There are places in the format where information can be optionally included to make recovery from corrupted streams easier and more robust.
  • Supports chapters, so you could make chapters with periodic GPS coordinates for the title so you can quickly find the place in your 12 hours of data that corresponds to a particular position.
  • Supports tagging, which is good for cloud storage services.
  • Fixed parameters could be done using attachments. I think it would be easy to add an additional element used for static topics, or add a flag to the streams so you can say to the player "this stream is static, it only has one frame, publish that and latch it".

Like I said, I never got as far as testing performance. My gut feeling is that you would need to record a file with the bare minimum of metadata, then reprocess it afterwards to add additional metadata like the cue index. However this would not be difficult and the format enables much of it to be done in-place, particularly if you are planning for this when the recording is made (which a tool can do automatically). I do think that the format's flexibility means it would be relatively easy to optimise how data is recorded to achieve high performance recording.

Ultimately, the work stopped due to other committments coming down from on high, and a lack of motivation due to it being unlikely to be adopted by rosbag. If the ROS 2 version of rosbag follows through on the goal of allowing multiple storage formats, then I might be interested in picking it up again, perhaps as a tool to teach me another new language. 😄

I think that media formats should be investigated for their usefulness in the ROS 2 rosbag. If recording performance meets the needs of high-load systems, then the features of these formats are likely to be very useful.

In the same work I also look at HDF5, at the urging of Herman Bruyninckx. I only vagule recall my results, but I think that my impression was that HDF5 was a bit over-structured for recording ROS data and would need work done to generate HDF5-equivalent data structures for the ROS messages so that the data could be stored natively in HDF5, in order to leverage the format properly. It's not really designed for storing binary blobs, which is what rosbag does.

@LucidOne
Copy link

I emailed back and forth with a few earth science people who regularly work with HDF5. Here are a summary of my notes.

HDF5 Notes Continued

  • They disagree with me that HDF5 is domain specific, as all of the domain specific bits are at the naming and layout. I still think it may be difficult to build generic tools without a pre-determined layout or naming but perhaps there is enough introspection in the API to make it work.

  • They explained that much of the complexity in HDF5 is for data (de)serialization, endian conversion, etc. If the plan was to store ROS messages as CDR in HDF5, then I don't see how the complexity is worth it. Does storing message elements as native types in HDF5 require an extra pair of deserialize/serialize operations? As an example, to store pointclouds in HDF5 and maximize usability with existing tools, should we store them in LAS in HDF5 or as "Sensor Independent Point Cloud" which are apparently (Can not find link to standard) is a standard pointcloud HDF5 layout.

  • Do we need better endianess support, has anyone ran into problems in practice? This is one example of how ROS handles endianess issues.

  • Multi-write can be done with multiple files and a post merge operation, if I understand correctly.

  • It is worth noting that everyone I communicated with who used HDF5 liked it and are using it in professional environments for important projects.

  • I have been assured that The HDF Group is working on making the issue tracker publicly available and they are probably running CI internally.

Links

http://davis.lbl.gov/Manuals/HDF5-1.8.7/UG/11_Datatypes.html
https://www.hdfgroup.org/2015/09/python-hdf5-a-vision/
http://www.opengeospatial.org/projects/groups/pointclouddwg
http://docs.opengeospatial.org/per/16-034.html
https://wiki.osgeo.org/wiki/LIDAR_Format_Letter

@wjwwood
Copy link
Member

wjwwood commented May 31, 2018

Thanks for all the feedback so far. I've found it very interesting and useful. We're still reading and considering.


Another thing to consider (because there were not enough):

http://asdf-standard.readthedocs.io/en/latest/

That's the replacement for FITS (a format used to store astronomical data) and ASDF will be used with the James Webb Space Telescope project.

@dejanpan
Copy link

@Karsten1987 thanks for this PR. I will just add couple of requirements for large data logging in automotive industry, in particular for the self-driving part of it.

Most of the data throughput comes from the sensors. A typical sensor setup consists of:

  1. 5 lidar sensors (1 on the roof, 1 front, 1 rear, 1 next to each side mirror) => 4x4MB/s + 1x16MB/s = 32MB/s
  2. 6 cameras (3 under the windshield, 2 in each side mirror, one rear looking) => 6x90MB/s = 540MB/s
  3. 5 radars (1 long-range in the front, 4 mid-range mounted on every corner of the car) => KB/s
  4. 1 gps => KB/s
  5. 1 imu => KB/s
  6. 12 ultra sound sensors => KB/s

So we are talking about the data throughput under 1GB/s in total for a fully self-driving car.

In the development phase we normally want to record all of this data for debugging purposes. Recording in our case normally happens on a NVME drive or a system like the one from Quantum: https://autonomoustuff.com/product/quantum-storage-solution-kits/. In the development phase we do not like to compress the data since this binds additional resources, and we like to at least know if the data was lost before it was flashed into the drive. We would also like to have services be recorded as well.

In the production stage (that is a self-driving car as a product) we will also record all the data but only over a short time period. Probably about a minute of it. In this setting all the data should be recorded in-memory into the ring buffer that rolls over every minute. A separate application running in a non-safety critical part of the system (separate recording device, process in a different hypervisor) should then fetch this data and store it into a drive. For the part of the rosbag that writes the data into the memory it would be important that it is designed as realtime as for instance introduces in this article: https://design.ros2.org/articles/realtime_background.html. So no memory allocation during runtime, no blocking sync primitives and no already above mentioned disk IO operations.
There should be absolutely no message losses in this case.

In terms of using ROS 1 rosbag tool, our experience is very, very similar to this one so I won't repeat.

@dirk-thomas dirk-thomas mentioned this pull request Jul 12, 2018
4 tasks
@Karsten1987 Karsten1987 added the in review Waiting for review (Kanban column) label Jul 16, 2018
@Karsten1987 Karsten1987 removed the in progress Actively being worked on (Kanban column) label Jul 16, 2018
@Karsten1987
Copy link
Contributor Author

Thank you all for being patient and giving feedback on this.
We finally picked up the development of rosbags and therefore I am putting this officially in review.

TL;DR Given the feedback we decided that there is no single best storage format, which suits all needs and use-cases. We therefore focus on a flexible plugin API, which supports multiple storage formats. We start off with SQLite as our default storage format because of its simplicity and community support. We simultaneously will work on a plugin which reads legacy ros1 bags.
The idea is that the API is powerful enough to easily provide further storage formats, which might be faster than SQLite and can be provided by the ROS community.

We will start working on rosbags according to this design doc, but we are happy to incorporate any further feedback.

@vik748
Copy link

vik748 commented Aug 14, 2018

Folks, I am hoping this is the right place to make a suggestion about rosbag record. We have been working with high frame rate image capture and have come to the realization that the standard ROS 1.0 subscription model which uses serialization / deserialization over the network interface might not be the most efficient. For lot of imaging / lidar type sensors, ROS community came up with a clever idea of using nodelets to avoid that step. However, to my knowledge rosbag record was not able to take advantage of this shared pointer to disk technique.
My understanding is that ROS 2.0 is supposed to natively and transparently support shared pointers. Accordingly, I'd suggest that the rosbag record feature also use the shared pointers to improve throughput, latency and reduce computational overhead.
Thanks for listening.

@gavanderhoorn
Copy link
Contributor

@vik748 wrote:

For lot of imaging / lidar type sensors, ROS community came up with a clever idea of using nodelets to avoid that step. However, to my knowledge rosbag record was not able to take advantage of this shared pointer to disk technique.

off-topic, but see ros/ros_comm#103.

@dirk-thomas
Copy link
Member

Has there been any consideration how to handle the feature of "migration rules" from ROS 1?

@wjwwood
Copy link
Member

wjwwood commented Oct 8, 2018

@vik748

Folks, I am hoping this is the right place to make a suggestion about rosbag record. We have been working with high frame rate image capture and have come to the realization that the standard ROS 1.0 subscription model which uses serialization / deserialization over the network interface might not be the most efficient. For lot of imaging / lidar type sensors, ROS community came up with a clever idea of using nodelets to avoid that step. However, to my knowledge rosbag record was not able to take advantage of this shared pointer to disk technique.

So, rosbag requires that the data needs to be serialized in order to write it, because our messages are not POD (plain old data) and therefore they are not laid out consecutively in memory. Serialization solves this by packing all data in the message into a single contiguous buffer. So you won't avoid serialization for writing and deserialization for reading bag files.

In ROS 1 there's already a nodelet for rosbag (https://github.com/osrf/nodelet_rosbag) and the same advantage can already be gained in ROS 2, but in both cases the data must be serialized before writing to the disk even if you avoid it during transport.

@Karsten1987
Copy link
Contributor Author

Has there been any consideration how to handle the feature of "migration rules" from ROS 1?

@dirk-thomas, we don't have concrete details on how to implement these rules as for now. But the current design considers versioning and thus has room for migration rules in their respective convert functions.
The idea here is that when converting ros messages from one serialization format to another one, additional conversion policies/rules can be passed to these functions. One of these rules can then be migration. Does that make sense?

@paulbovbel
Copy link

It looks like rosbag2 development is in full swing, so a question about something that made me curious in the action design PR (#193 (comment)) - will rosbag2 bag services as well as topics?

@Karsten1987
Copy link
Contributor Author

@paulbovbel, currently we can only grab topics in a serialized way and thus bag nicely. That is not to say that it's impossible to get service callbacks in a similar serialized way. It's just not yet implemented.

However, I am unsure at this point whether we can guarantee that service requests can always be listened to by rosbag (without responding). Also, when doing so I have to look into how to fetch the service responses.

@gbiggs
Copy link
Member

gbiggs commented Oct 30, 2018

Even if services can't be played back, there is value for debugging tools and inspection tools in at least logging them (assuming the response can also be captured).

@iluetkeb
Copy link

btw, I mentioned this @Karsten1987 a while ago: If you choose lttng as a tracer, instrumenting services or any other points you want to get data from would be trivial. It's a whole different set of tools, independent of the middleware -- which could be both a pro and a con -- so not a decision to take lightly.

@paulbovbel
Copy link

paulbovbel commented Oct 31, 2018

My takeway is, that with the current implementation using DDS-RPC for service implementation, I imagine services can't be bagged via rmw transport mechanisms, correct? This means that services and actions (as currently designed) are not going to make it into ROS2 bags.

Not being able to bag services was a huge limitation in ROS1 (if bagging was your primary approach for grabbing debug/telemetry/etc. data, as I believe it is for many users), and a common reason for 'just using actions'. Now that will not be an option either.

Is anything in the first paragraph up for reconsideration in the near future? I recall @wjwwood mentioning implementing services-over-DDS-topics was considered, rather than using DDS-RPC, which sounds like it would be possible given the introduction of keyed topics.

@Karsten1987
Copy link
Contributor Author

I am not sure if we could generalize that all services are necessarily implemented via topics. That might hold for DDS implementations, but not necessarily for other middlewares.

Similarly to topics, rosbag could open a client/server for each service available on startup. This would depend though on whether the rosbag client would receive the answer from the original service server.

The first step however is to be able to receive service data in binary form, which the rmw interface doesn't currently allow.

Acronyms should always be expanded upon first use
We chose [SQLite](https://www.sqlite.org/about.html) as the default solution for this implementation.
Contrary to other relational database systems, it doesn't require any external SQL Server, but is self-contained.
It is also open source, [extensively tested](https://www.sqlite.org/testing.html) and known to have a large [community](https://www.sqlite.org/support.html).
The Gazebo team created a nicely composed [comparison](https://osrfoundation.atlassian.net/wiki/spaces/GAZ/pages/99844178/SQLite3+Proposal) between SQLite and the existing rosbag format.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vmanchal1
Copy link

vmanchal1 commented Sep 1, 2020

In the production stage (that is a self-driving car as a product) we will also record all the data but only over a short time period. Probably about a minute of it. In this setting all the data should be recorded in-memory into the ring buffer that rolls over every minute. A separate application running in a non-safety critical part of the system (separate recording device, process in a different hypervisor) should then fetch this data and store it into a drive.

@dejanpan Do you know if this is going to be available in ROS2? Or is Apex working on a solution to this? Do you have any pointers on how one can store recordings, in ROS2, to memory?

I am interested, precisely, in this or anything close "In the production stage (that is a self-driving car as a product) we will also record all the data but only over a short time period. Probably about a minute of it. In this setting all the data should be recorded in-memory into the ring buffer that rolls over every minute. A separate application running in a non-safety critical part of the system (separate recording device, process in a different hypervisor) should then fetch this data and store it into a drive.". Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed in review Waiting for review (Kanban column) more-information-needed Further information is required website
Projects
None yet
Development

Successfully merging this pull request may close these issues.