clp-s: Add support for decompression in ascending timestamp order. #440

gibber9809 · 2024-06-13T19:24:28Z

Description

This PR adds support to decompress a clp-s archive in timestamp order with the --ordered command line option. This is implemented by loading all tables into memory, putting them into a min-heap ordered by the timestamp of the next record in each table, and popping from the heap until all records have been decompressed. The timestamp is pulled from the authoritative timestamp column specified at compression time, and if a table does not contain such a column the timestamp is 0 for ordering purposes.

Note: we don't sort by timestamp within each table, so the logs can end up slightly out of timestamp order (this is probably preferred anyway, since it brings us closer to log order).

This seems to add minimal decompression speed overhead (about 2.5%) for fairly typical archives, though would likely have higher overhead for large archives or archives with many tables (due to extra memory allocations compared to decompressing one table at a time). Similarly there can be significant memory overhead, particularly for large archives.

Validation performed

Validated that records end up sorted in timestamp order after decompression
Validated that in order decompression results in identical set of records compared to out of order decompression

wraymo · 2024-06-13T20:15:01Z

components/core/src/clp_s/CommandLineArguments.cpp

+            decompression_options.add_options()(
+                    "ordered",
+                    po::bool_switch(&m_ordered_decompression),
+                    "Enable in-order decompression for this archive"


Do we want to make it clear that "order" is based on timestamp?

Yeah, I had some vague idea that we'd transparently use timestamp or MOT depending on what was available in the archive, but being explicit is best for now.

wraymo · 2024-06-13T20:16:55Z

components/core/src/clp_s/SchemaReader.hpp

+     */
+    epochtime_t get_next_timestamp() const { return m_get_timestamp(); }
+
+    bool done() const { return m_cur_message >= m_num_messages; }


Do we want to add a description for this method?

wraymo · 2024-06-13T20:44:04Z

components/core/src/clp_s/ArchiveReader.cpp

-    auto& schema_reader
-            = create_schema_reader(schema_id, should_extract_timestamp, should_marshal_records);
+    create_schema_reader(
+            m_schema_reader,


Do you want to rename it to initialize_schema_reader?

wraymo · 2024-06-13T20:48:28Z

components/core/src/clp_s/ArchiveReader.cpp

+    return m_schema_reader;
+}
+
+std::vector<std::shared_ptr<SchemaReader>> ArchiveReader::load_all_tables() {


Do you want to rename it to read_all_tables (we have read_table above)?

components/core/src/clp_s/ArchiveReader.cpp

wraymo · 2024-06-13T21:05:23Z

components/core/src/clp_s/JsonConstructor.hpp

@@ -49,8 +51,11 @@ class JsonConstructor {
    void store();

 private:
+    void construct_in_order(FileWriter& writer);


Can you add a description for this method?

Co-authored-by: wraymo <37269683+wraymo@users.noreply.github.com>

…ssion

wraymo

Great work! Since we only support ascending order for timestamps, should we explicitly state this in the command-line description and the commit message?

wraymo

For the commit message, what about "clp-s: Add support for decompression in ascending timestamp order."?

…-scope#440) Co-authored-by: wraymo <37269683+wraymo@users.noreply.github.com>

gibber9809 added 2 commits June 13, 2024 14:19

Implement decompression in timestamp order

76f3735

Fix bug causing extra fake record to be decompressed from each table

881fb52

gibber9809 requested a review from wraymo June 13, 2024 19:24

wraymo reviewed Jun 13, 2024

View reviewed changes

gibber9809 and others added 4 commits June 13, 2024 19:43

Update components/core/src/clp_s/ArchiveReader.cpp

bf4981f

Co-authored-by: wraymo <37269683+wraymo@users.noreply.github.com>

lint fix

a405f6e

Merge remote-tracking branch 'upstream' into timestamp-order-decompre…

f6527ef

…ssion

Address review comments

3c83b1a

gibber9809 requested a review from wraymo June 13, 2024 23:54

wraymo reviewed Jun 14, 2024

View reviewed changes

Merge branch 'main' into timestamp-order-decompression

0a8285a

gibber9809 changed the title ~~clp-s: Implement decompression in timestamp order.~~ clp-s: Implement decompression in ascending timestamp order. Jun 14, 2024

Address review comment

4cf5019

gibber9809 requested a review from wraymo June 14, 2024 15:16

wraymo approved these changes Jun 14, 2024

View reviewed changes

gibber9809 changed the title ~~clp-s: Implement decompression in ascending timestamp order.~~ clp-s: Add support for decompression in ascending timestamp order. Jun 14, 2024

gibber9809 merged commit 8cfa96c into y-scope:main Jun 14, 2024
11 checks passed

jackluo923 pushed a commit to jackluo923/clp that referenced this pull request Dec 4, 2024

clp-s: Add support for decompression in ascending timestamp order. (y…

4ab24c4

…-scope#440) Co-authored-by: wraymo <37269683+wraymo@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clp-s: Add support for decompression in ascending timestamp order. #440

clp-s: Add support for decompression in ascending timestamp order. #440

gibber9809 commented Jun 13, 2024

wraymo Jun 13, 2024

gibber9809 Jun 13, 2024

wraymo Jun 13, 2024

wraymo Jun 13, 2024

wraymo Jun 13, 2024

wraymo Jun 13, 2024

wraymo left a comment

wraymo left a comment

clp-s: Add support for decompression in ascending timestamp order. #440

clp-s: Add support for decompression in ascending timestamp order. #440

Conversation

gibber9809 commented Jun 13, 2024

Description

Validation performed

wraymo Jun 13, 2024

Choose a reason for hiding this comment

gibber9809 Jun 13, 2024

Choose a reason for hiding this comment

wraymo Jun 13, 2024

Choose a reason for hiding this comment

wraymo Jun 13, 2024

Choose a reason for hiding this comment

wraymo Jun 13, 2024

Choose a reason for hiding this comment

wraymo Jun 13, 2024

Choose a reason for hiding this comment

wraymo left a comment

Choose a reason for hiding this comment

wraymo left a comment

Choose a reason for hiding this comment