Make it (somehow) possible to avoid needless allocations when serialising requests (needs GAT?) #711

psychon · 2022-06-04T12:53:41Z

I just was looking at some code and wondered why we really need vec! here:

impl GetInputFocusRequest {
    /// Serialize this request into bytes for the provided connection
    pub fn serialize(self) -> BufWithFds<PiecewiseBuf<'static>> {
        let length_so_far = 0;
        let mut request0 = vec![
            GET_INPUT_FOCUS_REQUEST,
            0,
            0,
            0,
        ];
        let length_so_far = length_so_far + request0.len();
        assert_eq!(length_so_far % 4, 0);
        let length = u16::try_from(length_so_far / 4).unwrap_or(0);
        request0[2..4].copy_from_slice(&length.to_ne_bytes());
        (vec![request0.into()], vec![])
    }
}

All of this is static. It would e.g. possible to do something like this (however, no idea how to set the length field in a way that works for both big endian and little endian):

fn serialize(self) -> &'static [u8] {
    &[GET_INPUT_FOCUS_REQUEST, 0, 1, 0]
}

Of course, this return type is not compatible with everything else. Perhaps a Cow<'static, [u8]> could work?

Also: Of course, "just because" is not good enough, so: Would there be any actual benefit to doing this?

(The original code has two allocations. The one I am talking about above is for the "actual on-the-wire bytes". There is also the two vec!s in the last line. The first one could also be replaced with something static, but Cow<'static, [Cow<'static, [u8]]> seems a bit ugly. The second is actually free since there are no FDs and AFAIK an empty Vec does not allocate anything.)

The text was updated successfully, but these errors were encountered:

notgull · 2022-06-04T16:00:29Z

My solution to this (that I discussed earlier in #687) is to rewrite the Serialize trait to, rather than return a buffer of data, fill a buffer with the data required. The design I was experimenting with (that also takes advantage of vectorization) would look something like this:

pub trait Serialize {
    fn serialize<'this>(&'this self, target: &mut [IoSlice<'this>]);
    
    /// Get the number of slices that need to be available for `serialize`.
    fn num_slices(&self) -> usize;

    /// This function can be overrided by downstream implementors to use a fixed size array instead of a vector.
    fn to_slices(&self, callback: impl FnOnce(&mut [IoSlice<'_>])) {
        let mut target = vec![empty_io_slice(); self.num_slices()];
        self.serialize(&mut target);
        callback(&mut target);
    }
}

impl Serialize for GetInputFocusRequest {
    fn serialize(...) {
        target[0] = &[OPCODE, 0, 0, 0].into_io_slice();
    }

    fn num_slices(...) {
        1
    }

    fn to_slices(...) {
        let mut target = [empty_io_slice()];
        self.serialize(&mut target);
        callback(&mut target);
    }
}

With smaller requests, it ensures that serialization is kept out of the heap entirely. With larger requests that need the heap, it ensures that only one level is needed.

Is it worth it? Maybe. The bottleneck here is definitely going to be server communication. It's worth a flamegraph. That being said, it should be worth it in certain cases, such as with larger requests (e.g. Trapezoids, which requires a copy under the current implementation).

notgull · 2022-06-05T23:38:02Z

Some things I noticed from profiling the simple_window example:

Just as we expected, overhead from polling the server takes up most of the time. In contrast, however, it doesn't take up as much time as expected. When you factor out the time that's spent waiting for events to arrive, I'd say about twice as much time is spent on I/O in comparison to allocations.
A surprising amount of time is spent in try_read_packets, more than pretty much everything else that isn't I/O. Most of it's in allocations and memmove. You could probably squeeze some important performance wins there.
PolyLineRequest::serialize() spends all of its time up to a rounding error growing/allocating the vectors it returns. So yes, reducing the allocations here would help a lot.
Other notable time sinks include Event::parse() and mutex locking/unlocking.
That being said, the above all go pretty fast in the grand scheme of things. About as much time is spent in the fmt module and writing to standard output as all of the above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it (somehow) possible to avoid needless allocations when serialising requests (needs GAT?) #711

Make it (somehow) possible to avoid needless allocations when serialising requests (needs GAT?) #711

psychon commented Jun 4, 2022

notgull commented Jun 4, 2022 •

edited

Loading

notgull commented Jun 5, 2022

Make it (somehow) possible to avoid needless allocations when serialising requests (needs GAT?) #711

Make it (somehow) possible to avoid needless allocations when serialising requests (needs GAT?) #711

Comments

psychon commented Jun 4, 2022

notgull commented Jun 4, 2022 • edited Loading

notgull commented Jun 5, 2022

notgull commented Jun 4, 2022 •

edited

Loading