Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_storage: add timestamps to remote segment index #11802

Merged
merged 2 commits into from
Jul 11, 2023

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Jun 30, 2023

This is a prerequisite to make timequeries seek
to the proper chunk, instead of reading from the
start of segments.

The timestamp we care about is the batch max timestamp, because timequeries will want to find a batch that is definitely before the target timestamp, to start their scan from there.

Related: #11801

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

  • none

@jcsp jcsp added kind/enhance New feature or request area/cloud-storage Shadow indexing subsystem labels Jun 30, 2023
@jcsp jcsp requested review from Lazin and abhijat June 30, 2023 12:41
@jcsp jcsp force-pushed the remote-index-timestamps branch 4 times, most recently from 54fe19a to c3d1676 Compare July 4, 2023 08:28
@jcsp jcsp marked this pull request as ready for review July 10, 2023 11:11
@jcsp
Copy link
Contributor Author

jcsp commented Jul 10, 2023

/cdt tests/rptest/scale_tests/tiered_storage*

@jcsp jcsp requested a review from andijcr July 10, 2023 11:12
int64_t base_time;
int64_t last_time;
std::vector<int64_t> time_write_buf;
iobuf time_index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to care for version<1> ->version<2> compat?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using serde's default read/write impls, & I believe serde::envelope's default behavior is that if it just runs out of data on decode, it will not populate those later fields (so these values remain at their defaults).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I have also updated the base_time and last_time to have explicit initialization -- previously one would have had to use the vector or iobuf to check for whether timestamp fields are set, because the ints would have had undefined contents)

auto actual = iobuf_to_bytes(ix.to_iobuf());
auto ix_buf = ix.to_iobuf();
iobuf expected_buf;
expected_buf.append((const uint8_t*)(expected.c_str()), expected.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: .c_str() .data()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a bit neater. Changed.

@jcsp jcsp force-pushed the remote-index-timestamps branch from c3d1676 to 95e3ec7 Compare July 10, 2023 13:56
Lazin
Lazin previously approved these changes Jul 11, 2023
This is a prerequisite to make timequeries seek
to the proper chunk, instead of reading from the
start of segments.

The timestamp we care about is the batch max timestamp,
because timequeries will want to find a batch that is
definitely before the target timestamp, to start their
scan from there.

Related: redpanda-data#11801
@jcsp jcsp force-pushed the remote-index-timestamps branch from 95e3ec7 to 955e374 Compare July 11, 2023 09:49
@jcsp
Copy link
Contributor Author

jcsp commented Jul 11, 2023

Force push: rebase for merge conflict.

@jcsp jcsp force-pushed the remote-index-timestamps branch from 955e374 to b3e5b48 Compare July 11, 2023 09:55
@jcsp jcsp merged commit 17a9c84 into redpanda-data:dev Jul 11, 2023
@jcsp jcsp deleted the remote-index-timestamps branch July 11, 2023 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem area/redpanda kind/enhance New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants