Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added seek interface #836

Merged
merged 2 commits into from
Aug 13, 2021
Merged

added seek interface #836

merged 2 commits into from
Aug 13, 2021

Conversation

lihui815
Copy link
Contributor

@lihui815 lihui815 commented Aug 3, 2021

Per #824, created seek(timestamp) method in rosbag2_storage::storage_interfaces::ReadOnlyInterface API.

  • added seek(timestamp)
  • default sqlite implementation:
    • seek implementation keeps the storage filter, but will re-run the query with a new seek_time
    • updated set_filter/reset_filter to be able to be called in between reads in place. This means that if the message topics were in the order of:
      1 AAA
      2 BBB
      3 CCC
      4 AAA
      5 BBB
      6 AAA
      
      and we set filter on topic AAA after reading two messages (AAA & BBB), the next message read would be the AAA on the 4th row.
  • added seek implementation to sequential reader (assumes multi-file is in time order)
    • needs to reset the current file to the first file in order to be able to skip backwards
    • read_next() calls has_next() to ensure that reader rolls over to next file at end of current file
    • has_next() calls itself recursively until either there is a next row or there are no more files (doesn't assume that the next file will definitely contain valid row)
  • added tests for seeking/filtering both single and multi-file bags

NOTE

Because of continued confusion in the comments, I'm highlighting here the difference between

  • sqlite implementation of storage_interfaces::ReadOnlyInterface->seek(t)
  • SequentialReader->seek(t)

The differences are as follows:

  • reading in time order:
    • SqliteStorage DOES guarantee time-ordered reads regardless of write order. This means that calling seek(t) will also guarantee subsequent read_next() calls to return messages in time order until the end of the file.
    • SequentialReader DOES NOT guarantee time-ordered reads if individual files within a bag are not in non-overlapping sequential time order ("Time normalize" bag written from API so that splits are in order #842). This means that, with or without calling seek(t), there may be messages that occur at an earlier timestamp than previously-read messages when one file rolls over to the next file. SequentialReader->seek(t) will only guarantee that all messages and only messages with timestamp >= t will be read (satisfying any topic filters).
  • In-place setting/resetting of topic filters:
    • in SqliteStorage, messages are queried in time order. The implicit tiebreaker between messages of the same timestamp is on the row id. This PR makes this second-degree ordering explicit. When we modify topic filters "in place", we actually rerun the query of the db file. In order to prevent re-reading previously read messages, we filter the timestamp to prevent reading "older" messages, AND we filter row-id to prevent reading already-read messages at the current timestamp. Note that if we don't assume write order (consistent with existing SqliteStorage implementation), we need to do both. For example:
       # write order
        row0 t1 msgA
        row1 t0 msgB
        row2 t0 msgC
        row3 t2 msgD
        row4 t1 msgE
      
        # read order
        row1 t0 msgB
        row2 t0 msgC  <-- same timestamp as msgB but loses row_id tiebreak
        row0 t1 msgA  <-- later timestamp than msgB & msgC, but lower row-id
        row4 t1 msgE  <-- same timestamp as msgA but loses row_id tiebreak
        row3 t2 msgD
      
      From this example, if we set topic filter between reading msgA and msgE, then we need to check against the last read message time t1 to prevent previously read messages msgB and msgC to be re-read, and to check against the row id row0 to make sure that msgA doesn't get re-read. This is why in the SqliteStorage implementation we update seek_time_ and seek_row_id_ with every read, to make sure that we can make the correct in-place update query. Thus the filter query is necessarily (timestamp > last_t) or (timestamp = last_t and row_id > last_rid).
    • in SequentialReader, this is not applicable. When setting/resetting topic filters the "in place pointer" is kept by reference to the current file. We can assume that all files previous to the current file have already been read, so the setting/resetting of topic filters just occurs on the current & subsequent files. Within the current file, the "in place" message pointer is already handled by the single-file storage implementation.

@lihui815 lihui815 requested a review from a team as a code owner August 3, 2021 20:27
@lihui815 lihui815 requested review from emersonknapp and david-prody and removed request for a team August 3, 2021 20:27
@lihui815 lihui815 force-pushed the sonia-seek branch 7 times, most recently from fb6caf3 to 04d27ab Compare August 6, 2021 04:20
Copy link
Collaborator

@emersonknapp emersonknapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@emersonknapp
Copy link
Collaborator

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am requesting for changes in sqlite_storage.cpp to cleanup logic since in current stage it's misleading further in sequential_reader

@lihui815 lihui815 dismissed MichaelOrlov’s stale review August 13, 2021 15:50

see updated description for why query can't be simplified.

@emersonknapp
Copy link
Collaborator

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

- added seek(time) interface to read_only_interface

- default sqlite implementation

- added seek implementation to sequential reader (assumes multi-file is in time order)

- updated set_filter/reset_filter to be able to update in between reads without resetting to beginning

- added tests for seeking/filtering both single and multi-file bags

fix mock for seek_time

remove extra line

moved setting up filters and seek_time for new file into `load_current_file`

split up seek tests

added docs to read_only_interface for seek, set_filter & reset_filter

Signed-off-by: Sonia Jin <diegothemuich@gmail.com>
@lihui815 lihui815 force-pushed the sonia-seek branch 2 times, most recently from 5e85787 to fde201b Compare August 13, 2021 21:31
…and doesnt enforce weakly monotonic reads in time

Signed-off-by: Sonia Jin <diegothemuich@gmail.com>
@emersonknapp emersonknapp merged commit 44e6422 into ros2:master Aug 13, 2021
@lihui815 lihui815 deleted the sonia-seek branch August 13, 2021 22:38
WJaworskiRobotec pushed a commit to RobotecAI/rosbag2 that referenced this pull request Sep 12, 2021
- added seek(time) interface to read_only_interface
- default sqlite implementation
- added seek implementation to sequential reader (assumes multi-file is in time order)
- updated set_filter/reset_filter to be able to update in between reads without resetting to beginning
- added tests for seeking/filtering both single and multi-file bags

Signed-off-by: Sonia Jin <diegothemuich@gmail.com>
Signed-off-by: Wojciech Jaworski <wojciech.jaworski@robotec.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants