-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
article about rosbags in ROS2 #160
base: gh-pages
Are you sure you want to change the base?
Conversation
I was thinking about this recently myself, and, since you started writing this, I'll tag this on here. I thought it might be cool to introduce a layer of abstraction such that the user can provide functions to read from their format, then ros2 bag will simply use those functions when publishing its messages. from ros2bag.message_layer import MessageLayer
import ros2bag.spin
class WeirdFormatPlayer:
def __init__(self):
convert_layer = ros2bag.message_layer()
# override with a custom way to get the next message
convert_layer.get_next = self.get_next_message()
def get_next_message(self):
# reads the next message from our weird thing
return next_message_to_publish
# somewhere in main
ros2bag.spin() Given that this is hideous pseudocode, this abstract silliness might help someone rephrase this to something useful (or me when I have a few minutes). |
sqlite more pros and cons restructure
articles/rosbags.md
Outdated
#### Parallel I/O | ||
Processing time increases with slow file I/O. | ||
In order to provide efficient data processing, a parallel read and write to the file from multiple processes should be available. | ||
This would allow multiple processes (e.g. one per sensor) directly write to a commonly shared bag file without having a single recoding instance subscribing to all topics.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo: two periods at the end of the sentence.
A third alternative which provides capabilities for data logging, and is used as such, is [SQLite](https://www.sqlite.org/about.html) | ||
Despite other relational database systems, it doesn't require any external SQL Server, but is self-contained. | ||
It is also open source, [extensively tested](https://www.sqlite.org/testing.html) and known to have a large [community](https://www.sqlite.org/support.html). | ||
The Gazebo team created a nicely composed [comparison](https://osrfoundation.atlassian.net/wiki/spaces/GAZ/pages/99844178/SQLite3+Proposal) between SQLite and the existing rosbag format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RTI's recording tools are also based on SQLite, and seem to work pretty well. You might want to experiment with them as part of your research.
@katiewasnothere: some of these ideas may be relevant to your project. |
The Building Wide Intelligence Lab at UT Austin is currently looking into data loss issues with rosbag in ROS 1.0. Our goal is to be able to maintain the original topic publishing frequency within a rosbag so that there’s little to no data loss (minimal messages dropped and minimal to no compression). For example, using the current implementation in ROS 1.0, our empirical data shows that raw messages from a kinect camera topic in a simulated environment typically produce a publishing frequency of about 19.942 hz. However, when all topics in the simulation environment are recorded to a rosbag (some 24 ROS topics ranging in frequency from about 200 hz to 1 hz), a significant amount of messages are dropped, reducing the publishing rate when the rosbag is played back to roughly 6.375 hz. The ability to save all topic messages for a given scenario can greatly help with training learning algorithms as well as in debugging failures. It would be nice to have functionality in ROS 2.0’s future rosbag implementation to prevent such extreme data loss. |
This might be a lofty goal, but I think it would be useful for auditing purposes to have a optional storage format that could support recording append-only like file system or write to disk behavior. Think of rosbag recording in autonomous cars or other commercial applications. I'm currently making an audit framework for creating existence proofs for appending log files, thus rendering them immutable or at least cryptographically verifiable on the fly. I'm not yet sure if checkpointing log events (or in the case of rosbag: messages or chunks of messages) via SQLite is feasible (I haven't yet looked into it deeply), but doing so on the ros1 v2 bagfile binary format is not so practical as the file includes that changes bitwise on disk in an out of order fashion with respect to when bytes on disk are first written to, E.g. key=value pairs are overwritten in ros2 bagfiles index header near the start of the file whenever a new message with a novel connection id is received given its indexing is connection-oriented. This header structure in ros1 v2 bagfile of course speeds up the traversal of large binary data with meta information, but its implementation of random write operations makes it hard to protect growing recording files from malicious mutation or accidental corruption given that a digest of written bytes of the file cannot be linked to provide a time series proof of existence, eg compounding HMAC computations on log writes. Perhaps this is a discussion for a specific storage plugin for immutable data stores: but I just wanted to mention this as data storage formats are touched upon in this PR. |
Big vote of support for this idea. |
This is as much about the tool implementation as it is about the format. A lossless rosbag would need to provide timing (possibly hard real-time) and bandwidth guarantees about how much data it can save. It may need to use strategic buffering, it may need to leverage OS support for buffers, it would need to understand the storage medium and that medium's capabilities, it would probably need to make configurable decisions about what gets dropped if there is not enough storage bandwidth available, and probably a whole pile of things I haven't thought of because I'm not a storage expert. In short, it would make a pretty interesting research project and the result would be massively useful to people like @ruffsl and the wider ROS community. It would be worth looking at the logging tools used in the financial domain. They have to log absolutely everything exactly as it happened and have really tight timing requirements. |
A notable issue with ROS1 bagging as a data recording approach is that it creates a parallel communication channel for the monitored topics, and is not always an accurate representation of data flow in the system being monitored. In ROS1, where the It might be worthwhile to consider either implementing, or leaving the door open via a plugin-model, to something like a pcap-based |
ROS1 NotesHere are some of my notes from after we logged over a petabyte of data with ROS1 Multithreaded compressionWe were able to record uncompressed bags at speeds up to about 6Gb/s (sensor limited) with ROS1. Storage was to a striped pair of SATA3 4TB drives. Compression issues caused silent failures as Better defaults / errorsI'm not sure the current defaults are the best defaults for most users, much less novice users. Split/concatenate bagsWe somewhat arbitrarily chose I'd also argue that I should be able to split an existing bag and have metadata copied to each new bag. When playing a sequence of bags there was some latency as the current bag was closed and the next one opened, depending on compression and system load this can cause timing issues. Being able to concatenate bags would help with this. Spatial IndexingFor a roughly 12 hour operational period we could record up to 4TB of data split into 2GB bags (~20 seconds per bag). To locate which bag held data associate with a given GPS position we recorded a separate unsplit low frequency bag to index GPS position information that spanned the entire period. Our cloud processing system ingested the GPS index bag and would then process bags with the full sensor data as needed. CachingWe used Param/TF RecordingRecording and playback of TFs never quite works as expected. The main rosbag issue we had is that if a static TF was published at 1Hz and we split bags the first second of data might be discarded as the associated TF was in the previous bag. One issue commonly encountered during development was trying to debug a node that published a TF. Bag playback would broadcast the same TF occasionally causing confusion with developers. While it can be done with From a usability standpoint, maybe it makes sense to store static TF data separately at the beginning of each bag and enable Service calls / Parameter ServerIs it reasonable to publish service call information in a manner similar to diagnostic aggregator for recording/playback? Does this provide a reasonable way of recording parameter updates? ROS1 ReliabilityIt is my experience that During development many of our issues turned out to be network related. Collisions caused retransmissions which consumed bandwidth, which caused collisions, etc. The worst of which was caused by 8 GigE Vision cameras that were hardware synchronized and crashed the network every 30th of a second while iftop claimed bandwidth to spare due to the sampling window. Fixing this, it was found that many quad-port gigabit Ethernet cards actually have an internal switch instead of separate independent physical interfaces and can not sustain more than 1 Gb/s. I also recall we had an issue similar to what @paulbovbel commented on where the publishing rate measured by diagnostic aggregators did not match the logged frequency. it is also worth noting that not all hardware drivers have the same sense of timestamps, and when an image is retrieved from hardware may not be the time it was captured, especially when capture is hardware triggered. This caused issues that we initially blamed on |
Article FeedbackIn general this looks like a good start SQLiteI think SQLite storage is a reasonable option, however I'm unconvinced that it should be the default storage format. SQLite optimizes for select performance not insert performance
HDF5😒 👎 ROS1"No random access" From the Gazeebo comments"Rosbag 2.0 format is analogous to a singly-linked list and requires reading from the beginning of the file."
As far as I can tell, this would be more efficient with format 2.0 + metaindex than with SQL.
This functionality could be efficient with format 2.0 by caching the output of Alternate storage containerWhile I like the idea of full support for storing messages in a relational database, I believe the default should be a stream oriented append only format 3.0. Even if SQLite is the default, I would prefer ROS2
This optimizes for implementation simplicity and multithreaded performance but requires additional files (1 Petabyte of 2 GB Bag Format 2.0 already requires 2M files and things tend to get odd after 32k files in a single directory) One option for reducing the number of files might be to use a loopback image mount FeaturesFormat 3.0Given that it was able to push 6Gb/s, I'm a proponent of the ROS bag format 2.0 and would like to see it updated to support ROS2 natively instead of being bridged. Maybe CDR, protobuf, etc can be implemented as a record types in format 3.0 MetadataThere have been several use cases where projects have needed some sort of metadata storage Previously, we used markdown files to make sure field notes on hardware changes and operational information (client name, test/production, experiment #, etc) was passed from field to the cloud. On another project, due to limited engineering resources available at the time, we published camera serial numbers to separate topics to track hardware changes. This worked well enough, but it required scanning the bag to grab the serial numbers or discarding messages published before the serial numbers were published. I think the current "standard" for this is to output serial numbers via a diagnostic aggregator, which isn't much better. Reimplementing this, I would have preferred storing the serial numbers in metadata so it is stored once at the beginning. For cloud uploads we needed some metadata to help validate that the data sent was received. In this case we wrote the output of
Fixed size time series dataI think it is worth considering how to support something like RRDTool for data that decreases in resolution over time. From a computer vision perspective lossy compression can be problematic, however something like RRDTool that supports dropping frames but keeping each frame uncompressed for older data may be useful for dash cam and blackbox applications. 30fps for the previous hour, 10fps for the previous day, 1 fps for the previous week, etc. |
@gbiggs I'm not sure about performance, but the Java makes this look like the financial industry solution |
@LucidOne: your comments appear very rational and to be based on experience with the current (ROS 1) Quite some people (in datascience) seem to at least imply that HDF5 is a format that is well suited to store (very) large datasets coming in at high datarates, is able to deal with partial data, supports hierarchical access and has some other characteristics that make it seem like it would at least deserve consideration as a potential rosbag format 3 storage format. Your comment on HDF5 was:
In contrast to your other comments, this one seemed to lack some rationale :) Could you perhaps expand a little on why you feel that HDF5 would not be up to the task (or at least, that is what I believe these two emoticons are intended to convey)? |
HDF5Unlike SQLite, I have not actually used HDF5 so this has a bit more gut feelings and a bit less hands on experience, however I have put the word out to one of my colleagues who does use it to get more information about their experiences and will add that if/when they respond. First, I would like to say that in theory HDF5 might a great choice, in practice I have some concerns. The blog post in the article and the HN comments cover a lot. ComplexityIt seems to me that HDF5 is a complex Domain Specific Format, one like GeoTIFF that just happens to support multiple domains. I personally don't have the fondest memories of GeoTIFF or even TIFF. "When TIFF was introduced, its extensibility provoked compatibility problems." -Wikipedia StorageHDF5 that supports both contiguous and chunked data with a B-tree indexes. Which it pretty much ideal for what we want, however the single base C implementation means everything has the same bugs. I understand how this happens, but HDF5's extensible compression seems excessive to me. As far as I can tell it uses ASCII by default. Concurrent Read Access is supported but thread safety may be a work in progress. Development processAccording to this HDF Group has switched to Git and will accept patches if you sign a Contributor agreement so they can close source the patch for enterprise customers. :real talk: Tests have been written but I am unable to find where Continuous Integration for HDF5 is hosted. The website needs improvement/refactoring with multiple copies of the Specification, that may or may not be the same.
The Git repo exists, but I was unable to find an issue tracker in the release notes, in 2018. SummaryI think HDF5 might be a reasonable in theory and I agree with much of the reasoning behind switching to a 3rd party file format maintained by a dedicated group, but in practice, my gut feeling is that a lot of time will be spent working through the complexity and cultural differences. On the other hand, the HDF Group just announced "Enterprise" support, and hired a new Director of Engineering 6 months ago, so if their website improved, development processes modernize, benchmarks were reasonable, and engineering resource were available to deal with the complexity I could be talked into it. |
This was suggested to me as a possible option. https://arrow.apache.org/ |
Many, many years ago myself and Ingo Lütkebohle (@iluetkeb in case Github can ping him) did some work on improving the rosbag format. As I recall, Ingo and his group moved in the direction of an improved version of the rosbag format as it was at that time; I don't recall their results but I do remember that they changed stuff. For myself, I spent quite a bit of time looking at using a modified version of the Matroska media container as a rosbag format and also as a format for recording point clouds. The reasoning behind this is that 1) rosbag is recording streams of time-indexed data, which media containers are explicitly designed to hold, and 2) Matroska uses an extremely flexible format called EBML (embedded binary markup language; think of it as being XML but with binary tags and for binary data). The format that resulted is specified here. I also had one prototype implementation and a half-complete redo based on what I learned, but given that part of that work's purpose was to teach me C++11 it's a bit of a mess in places. The work never went anywhere, unfortunately, so I don't know how performant the format is for recording. The reason I'm mentioning this is that the modified format I and Ingo came up with, the Matroska format, and media containers such as MP4 in general support most of the feartures @LucidOne is asking for. In no particular order:
Like I said, I never got as far as testing performance. My gut feeling is that you would need to record a file with the bare minimum of metadata, then reprocess it afterwards to add additional metadata like the cue index. However this would not be difficult and the format enables much of it to be done in-place, particularly if you are planning for this when the recording is made (which a tool can do automatically). I do think that the format's flexibility means it would be relatively easy to optimise how data is recorded to achieve high performance recording. Ultimately, the work stopped due to other committments coming down from on high, and a lack of motivation due to it being unlikely to be adopted by rosbag. If the ROS 2 version of rosbag follows through on the goal of allowing multiple storage formats, then I might be interested in picking it up again, perhaps as a tool to teach me another new language. 😄 I think that media formats should be investigated for their usefulness in the ROS 2 rosbag. If recording performance meets the needs of high-load systems, then the features of these formats are likely to be very useful. In the same work I also look at HDF5, at the urging of Herman Bruyninckx. I only vagule recall my results, but I think that my impression was that HDF5 was a bit over-structured for recording ROS data and would need work done to generate HDF5-equivalent data structures for the ROS messages so that the data could be stored natively in HDF5, in order to leverage the format properly. It's not really designed for storing binary blobs, which is what rosbag does. |
I emailed back and forth with a few earth science people who regularly work with HDF5. Here are a summary of my notes. HDF5 Notes Continued
Linkshttp://davis.lbl.gov/Manuals/HDF5-1.8.7/UG/11_Datatypes.html |
Thanks for all the feedback so far. I've found it very interesting and useful. We're still reading and considering. Another thing to consider (because there were not enough): http://asdf-standard.readthedocs.io/en/latest/ That's the replacement for FITS (a format used to store astronomical data) and ASDF will be used with the James Webb Space Telescope project. |
@Karsten1987 thanks for this PR. I will just add couple of requirements for large data logging in automotive industry, in particular for the self-driving part of it. Most of the data throughput comes from the sensors. A typical sensor setup consists of:
So we are talking about the data throughput under 1GB/s in total for a fully self-driving car. In the development phase we normally want to record all of this data for debugging purposes. Recording in our case normally happens on a NVME drive or a system like the one from Quantum: https://autonomoustuff.com/product/quantum-storage-solution-kits/. In the development phase we do not like to compress the data since this binds additional resources, and we like to at least know if the data was lost before it was flashed into the drive. We would also like to have services be recorded as well. In the production stage (that is a self-driving car as a product) we will also record all the data but only over a short time period. Probably about a minute of it. In this setting all the data should be recorded in-memory into the ring buffer that rolls over every minute. A separate application running in a non-safety critical part of the system (separate recording device, process in a different hypervisor) should then fetch this data and store it into a drive. For the part of the rosbag that writes the data into the memory it would be important that it is designed as realtime as for instance introduces in this article: https://design.ros2.org/articles/realtime_background.html. So no memory allocation during runtime, no blocking sync primitives and no already above mentioned disk IO operations. In terms of using ROS 1 rosbag tool, our experience is very, very similar to this one so I won't repeat. |
Thank you all for being patient and giving feedback on this. TL;DR Given the feedback we decided that there is no single best storage format, which suits all needs and use-cases. We therefore focus on a flexible plugin API, which supports multiple storage formats. We start off with SQLite as our default storage format because of its simplicity and community support. We simultaneously will work on a plugin which reads legacy ros1 bags. We will start working on rosbags according to this design doc, but we are happy to incorporate any further feedback. |
Folks, I am hoping this is the right place to make a suggestion about rosbag record. We have been working with high frame rate image capture and have come to the realization that the standard ROS 1.0 subscription model which uses serialization / deserialization over the network interface might not be the most efficient. For lot of imaging / lidar type sensors, ROS community came up with a clever idea of using nodelets to avoid that step. However, to my knowledge rosbag record was not able to take advantage of this shared pointer to disk technique. |
@vik748 wrote:
off-topic, but see ros/ros_comm#103. |
Has there been any consideration how to handle the feature of "migration rules" from ROS 1? |
So, rosbag requires that the data needs to be serialized in order to write it, because our messages are not POD (plain old data) and therefore they are not laid out consecutively in memory. Serialization solves this by packing all data in the message into a single contiguous buffer. So you won't avoid serialization for writing and deserialization for reading bag files. In ROS 1 there's already a nodelet for rosbag (https://github.com/osrf/nodelet_rosbag) and the same advantage can already be gained in ROS 2, but in both cases the data must be serialized before writing to the disk even if you avoid it during transport. |
@dirk-thomas, we don't have concrete details on how to implement these rules as for now. But the current design considers versioning and thus has room for migration rules in their respective convert functions. |
It looks like rosbag2 development is in full swing, so a question about something that made me curious in the action design PR (#193 (comment)) - will rosbag2 bag services as well as topics? |
@paulbovbel, currently we can only grab topics in a serialized way and thus bag nicely. That is not to say that it's impossible to get service callbacks in a similar serialized way. It's just not yet implemented. However, I am unsure at this point whether we can guarantee that service requests can always be listened to by rosbag (without responding). Also, when doing so I have to look into how to fetch the service responses. |
Even if services can't be played back, there is value for debugging tools and inspection tools in at least logging them (assuming the response can also be captured). |
btw, I mentioned this @Karsten1987 a while ago: If you choose lttng as a tracer, instrumenting services or any other points you want to get data from would be trivial. It's a whole different set of tools, independent of the middleware -- which could be both a pro and a con -- so not a decision to take lightly. |
My takeway is, that with the current implementation using DDS-RPC for service implementation, I imagine services can't be bagged via rmw transport mechanisms, correct? This means that services and actions (as currently designed) are not going to make it into ROS2 bags. Not being able to bag services was a huge limitation in ROS1 (if bagging was your primary approach for grabbing debug/telemetry/etc. data, as I believe it is for many users), and a common reason for 'just using actions'. Now that will not be an option either. Is anything in the first paragraph up for reconsideration in the near future? I recall @wjwwood mentioning implementing services-over-DDS-topics was considered, rather than using DDS-RPC, which sounds like it would be possible given the introduction of keyed topics. |
I am not sure if we could generalize that all services are necessarily implemented via topics. That might hold for DDS implementations, but not necessarily for other middlewares. Similarly to topics, rosbag could open a client/server for each service available on startup. This would depend though on whether the rosbag client would receive the answer from the original service server. The first step however is to be able to receive service data in binary form, which the rmw interface doesn't currently allow. |
Acronyms should always be expanded upon first use
We chose [SQLite](https://www.sqlite.org/about.html) as the default solution for this implementation. | ||
Contrary to other relational database systems, it doesn't require any external SQL Server, but is self-contained. | ||
It is also open source, [extensively tested](https://www.sqlite.org/testing.html) and known to have a large [community](https://www.sqlite.org/support.html). | ||
The Gazebo team created a nicely composed [comparison](https://osrfoundation.atlassian.net/wiki/spaces/GAZ/pages/99844178/SQLite3+Proposal) between SQLite and the existing rosbag format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link https://osrfoundation.atlassian.net/wiki/spaces/GAZ/pages/99844178/SQLite3+Proposal doesn't appear to be working :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a back of this document (I think): https://drive.google.com/file/d/1mmxPSv6doMSnF0iu6jCvCi_SOY6Su1AD/view?usp=sharing
@dejanpan Do you know if this is going to be available in ROS2? Or is Apex working on a solution to this? Do you have any pointers on how one can store recordings, in ROS2, to memory? I am interested, precisely, in this or anything close "In the production stage (that is a self-driving car as a product) we will also record all the data but only over a short time period. Probably about a minute of it. In this setting all the data should be recorded in-memory into the ring buffer that rolls over every minute. A separate application running in a non-safety critical part of the system (separate recording device, process in a different hypervisor) should then fetch this data and store it into a drive.". Thank you. |
This PR is the place to gather feedback for the design doc about rosbags in ROS2.0
The section so far is structured in such that it provides a couple of "Alternatives" for both, the requirements for ROSbags as well as which underlying data format is going to be used.
Eventually, these alternatives become part of the fixed requirements and proposed data formats when discussion comes to a consensus.