Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read the enitre message in one go without copies #108

Merged
merged 4 commits into from
Feb 27, 2024

Conversation

MaxVerevkin
Copy link
Contributor

No description provided.

@MaxVerevkin
Copy link
Contributor Author

I've run cargo clippy --fix and it fixed more stuff than I expected, probably because I'm on nightly.


const BUFSIZE: usize = 512;
let mut tmpbuf = [0u8; BUFSIZE];
if self.msg_buf_in.len() < max_buffer_size {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be if self.msg_buf_in.len() != max_buffer_size { so that msg_buf_in also gets resized to be smaller if the next message is smaller than the previous one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I specifically didn't do this to avoid unnecessary memsets.

Copy link
Owner

@KillingSpark KillingSpark Feb 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the msg_buf_in should only ever contain at most as many bytes as the message that is currently being read.

I think the msg_buf_in should also be resized after the read to match how many bytes actually where received

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information is stored in msg_buf_filled. Imagine you receive a sequence of messages with the following sizes: 1K, 128B, 512B, 1K. The way this PR works right now, resizes would write 1K of zeroes. With your suggestion it would write at least 2K-128B of zeroes (best case, if each message is read in one recvmsg).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true but I would accept that cost for the sake of an easier invariant and making the msg_buf_filled field unnecessary. The few memsets are unlikely to be a bottleneck in the library.

Another thing I don't really like is that the code currently "reuses" the msg_buf_in by cloning (part of) it when a full message has been received. I cases where there are typically small messages received one big message will forever bloat the memory usage.

Copy link
Owner

@KillingSpark KillingSpark Feb 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you point where the cloning is?

rustbus/src/wire/unmarshal.rs:175

I think it would be better to do a swap here with a newly allocated vec of a sensible default size instead of doing it like this. Or maybe there is an even better way?

Right. Exactly as it was. .clear() does not release any memory

Yes, I am aware. I wanted to say that it's currently unsatisfying and that, while we are working on this anyways we could improve this aspect too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you point where the cloning is?

rustbus/src/wire/unmarshal.rs:175

I think it would be better to do a swap here with a newly allocated vec of a sensible default size instead of doing it like this. Or maybe there is an even better way?

Thanks! I completely missed this. A swap, even with an empty vector, sounds reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, should I .drain(..offset) bytes from the buffer, or keep the offset to avoid shifting?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think draining is fine as a first step towards improvement. Ideally the initial offset would be passed through to the unmarshalling code though. But that could be a separate PR, or you could leave it for me :)

Copy link
Contributor Author

@MaxVerevkin MaxVerevkin Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented the take+drain part, but left the msg_buf_filled for now. The annoying thing is that without it the message must be resized back in case of any error, not just partial reads.

@KillingSpark KillingSpark merged commit af9be8e into KillingSpark:master Feb 27, 2024
4 checks passed
@MaxVerevkin MaxVerevkin deleted the msgread branch February 27, 2024 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants