Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement request parsing #452

Merged
merged 21 commits into from
May 28, 2020
Merged

Implement request parsing #452

merged 21 commits into from
May 28, 2020

Conversation

khuey
Copy link
Contributor

@khuey khuey commented May 27, 2020

I'd suggest reviewing the individual commits rather than the full diff that Github wants you to look at. The division is fairly logical, although the "Implement parsing for request fields" changeset ends up being a giant blob. There are tests for a few simple requests showing that this works at least some times.

@psychon
Copy link
Owner

psychon commented May 27, 2020

So far I only looked at half the first commit and I feel like something will be wrong. Could you add a test for xproto's ChangeSaveSet (or alternatively just tell me that it will work). This request has two fields: mode: u8 and window: Window. On the wire, this ends up as:

typedef struct xcb_change_save_set_request_t {
    uint8_t      major_opcode;
    uint8_t      mode;
    uint16_t     length;
    xcb_window_t window;
} xcb_change_save_set_request_t;

I feel like your code might mis-handle this, because with your abstraction, the mode ends up as part of the header.

Am I wrong?

(Also, clippy is complaining about one thing)

Copy link
Owner

@psychon psychon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my many complaints.

"Big ones" should be: Can we remove RequestHeader and the checking that the input slice is exhausted at the end of parsing?

src/x11_utils.rs Outdated Show resolved Hide resolved
src/x11_utils.rs Outdated Show resolved Hide resolved
generator/src/generator/namespace.rs Outdated Show resolved Hide resolved
generator/src/generator/namespace.rs Outdated Show resolved Hide resolved
} else {
outln!(out, "let {} = remaining.to_vec();", rust_field_name);
outln!(out, "let remaining = &remaining[remaining.len()..];");
// Only force taking ownership for non-request types.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(unrelated to this PR) What would it take to make "normal struct parsing" also be able to borrow stuff from the input? The hand-written API moved away from this being possible, but how close is the code generator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to look at some of the generated code but I don't think it would be particularly difficult at this point.

tests/request_parsing_tests.rs Show resolved Hide resolved
tests/request_parsing_tests.rs Show resolved Hide resolved
tests/request_parsing_tests.rs Outdated Show resolved Hide resolved
#[test]
fn test_randr_get_output_property() {
use x11rb::protocol::randr::GetOutputPropertyRequest;
let header = RequestHeader {
Copy link
Owner

@psychon psychon May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Comment is at a random position) Why do we need RequestHeader?

I think that it might be possible to do without. The parsing code could just parse the request "normally" without skipping any fields. Opcodes can then be compared against expected values. Things that are not needed can be sent to let _ =.

RequestHeader does not bring anything to the parsing code, does it?

Edit: Well... it hides BigRequests support somewhat. How bad would it be to put BigRequests support into the generated code "everywhere"? We already do that for request serialisation... kind of but not really (it uses a request size of zero and expects something else to fix things up and add a second request size).
For parsing, this would however be relatively straight forward:

let (short_length, mut remaining) = u16::try_parse(remaining)?
let length = if short_length == 0 {
   // BigRequests
  let (long_length, new_remaining) = u32::try_parse(remaining)?;
  remaining = new_remaining;
  long_length
} else {
  short_length.into()
}

Well... okay, that is surprisingly long. Perhaps a helper function in x11_utils for parsing the request length field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason I did it this way is that parsing requests is not stateless. You need to know the state of the connection with the server (in particular whether or not BigRequests is enabled and what extensions are enabled at what major opcodes) in order to correctly parse requests. We could choose to decide that we don't care about tracking whether BigRequests is enabled and simply always accept that format. However, even if we did that, someone outside of a FooRequest::try_parse implementation is always going to have to look at the first two bytes of the request to determine which parsing implementation to feed the buffer to (and for any opcodes above 127 they need to have been parsing the connection all along to know which extension opcodes exist). So RequestHeader is currently all the stuff that requires external state to process.

I'm not necessarily opposed to changing it but that is the motivation for the current design.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of similar to x11rb::protocol::Event and Error. That code also needs to look up extensions based on error code / response type / major opcode. I guess we should also do something similar for request parsing (but that can be done in another PR).

This code uses some ad-hoc functions at the end of the file to extract the necessary info from the &[u8].

A possible API for x11rb::protocol::Request would then be fn parse(request: &[u8], ext_info_provider: &dyn ExtInfoProvider) -> Result<Self, ParseError> (this is just the existing API for Event/Error).
Your version would be -> Result<(RequestHeader, Self), ParseError>, right? However... does the request header provide anything useful here? The request is already identified by the variant of the enum Self that is being returned. The request length is known to the caller as .len() of the provided slice. The difference between major/minor opcode does not really matter, I think.

What kind of API do you have in mind for this? Or did you plan to hand-write the necessary bits outside of x11rb?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in particular whether or not BigRequests is enabled

Well... a big request is an illegal request when BigRequests is disabled, since a length of 0 would mean that there is not even enough space for the length field... So, personally I don't mind to assume that BigRequests is always enabled. Technically, this could also check the maximum request length, but this seems to be the wrong code to do that (since the request was already received at this point and is in-memory, such handling should be done before everything is read).

I also now looked at the X11 server and actually... there are some requests where it requires an exact length and sometimes where "trailing garbage" is okay, There are several macros for this in include/dix.h:

  • REQUEST_SIZE_MATCH, 860
  • REQUEST_AT_LEAST_SIZE, 346
  • REQUEST_AT_LEAST_EXTRA_SIZE, 3
  • REQUEST_FIXED_SIZE, 107

For some reason, ProcNoOperation uses REQUEST_AT_LEAST_SIZE and does not do an exact match. The number after the macro in the list above is the result of git grep MACRO | wc -l. Seems like: It's complicated

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorter way to parse length fields:

let (length, remaining) = match u16::try_parse(remaining)? {
  // BigRequests?
  (0, remaining) => u32::try_parse(remaining),
  (n, remaining) => (n.into(), remaining),
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something else I realized is that regardless of our decision around BigRequests whoever is driving parsing has to parse the length themselves anyways to know if they have received enough bytes before calling the parsing method. Since ParseError doesn't distinguish between various types of errors "insufficient bytes" is otherwise indistinguishable from other types of parsing errors.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. That's also how the existing code in e.g. RustConnection parses errors/events/replies.

Once upon a time, the only possible parse error was "not enough data". Since then, a lot has happened. I wouldn't mind improving ParseError, but so far I do not really see a good use case for that. Our parsing errors most likely would cause memory bugs in anything with XCB, so apparently all of this "never happens" anyway...

src/protocol/composite.rs Outdated Show resolved Hide resolved
Comment on lines +21 to +25
impl From<std::convert::Infallible> for ParseError {
fn from(_: std::convert::Infallible) -> Self {
unreachable!()
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is it actually used?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. A quick test build says "this is unused".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in xinput. It could probably be eliminated with some more special casing in the code generator.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to actually enable the necessary features in my test build. 🤦
New try says:

error[E0271]: type mismatch resolving `<u8 as std::convert::TryFrom<u8>>::Error == errors::ParseError`
    --> src/protocol/xinput.rs:8798:29
     |
8798 |         let format = format.try_into()?;
     |                             ^^^^^^^^ expected enum `std::convert::Infallible`, found enum `errors::ParseError`

error[E0271]: type mismatch resolving `<u8 as std::convert::TryFrom<u8>>::Error == errors::ParseError`
     --> src/protocol/xinput.rs:13546:29
      |
13546 |         let format = format.try_into()?;
      |                             ^^^^^^^^ expected enum `std::convert::Infallible`, found enum `errors::ParseError`

error: aborting due to 2 previous errors

This is inside ChangeDevicePropertyRequest and XIChangePropertyRequest In both cases, a format u8 is converted to u8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we could probably try harder to avoid that u8 -> u8 conversion if we wanted to.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this try_into comes from emit_field_post_parse, which would normally convert it into an enum. But here it is "converted" into an u8 because it is a deducible field.

In emit_struct_type, this is avoided with:

if !field
    .name()
    .map(|field_name| deducible_fields.contains_key(field_name))
    .unwrap_or(false)
{
    self.emit_field_post_parse(field, out);
}

@khuey
Copy link
Contributor Author

khuey commented May 27, 2020

Ok, I think the biggest remaining piece is around request length and validation. I'm not familiar with the edge cases in X here. The X spec says

The length field defines the total length of the request, including the header. The length field in a request must equal the minimum length required to contain the request. If the specified length is smaller or larger than the required length, an error is generated

Obviously if you provide fewer bytes than necessary parsing will fair when we run out of bytes. If you provide more bytes than necessary what happens? e.g. if I send a GetInputFocus (which has no body) with length == 26 and I send 100 bytes of padding with it will the X server accept that? Per the language of the spec it shouldn't, but I haven't actually tested it. Do you know the answer?

add_ne!(body, 0x00000000u32);
body.push(0x01);
body.extend(&[0x00, 0x00, 0x00]); // Final padding.
let r = CreateWindowRequest::try_parse_request(header, &body).unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone have access to a big endian machine or a good idea on how we could test this on CI? The best idea that I have so far would be qemu and I bet that would be quite slow.

Alternatively, I am fine with just assuming this works. That approach also worked so far. :-)

@psychon
Copy link
Owner

psychon commented May 27, 2020

e.g. if I send a GetInputFocus (which has no body) with length == 26 and I send 100 bytes of padding with it will the X server accept that? Per the language of the spec it shouldn't, but I haven't actually tested it. Do you know the answer?

TL;DR: It's complicated and we cannot know in general how individual requests behave.

GetInputFocus is one of the requests where the Xorg server uses REQUEST_SIZE_MATCH to check that sizeof(some struct) matches the size of the request.

https://gitlab.freedesktop.org/xorg/xserver/-/blob/4195e8035645007be313ade79032b8d561ceec6c/dix/events.c#L4847

The macro checks that the request size matches exactly the expected value and if not, it will return a BadLength error:

https://gitlab.freedesktop.org/xorg/xserver/-/blob/4195e8035645007be313ade79032b8d561ceec6c/include/dix.h#L69-71

The calling code will however still consume the full 26*4 bytes that you claim your request has. I do not have a code place to point at in here, but its this function:

https://gitlab.freedesktop.org/xorg/xserver/-/blob/4195e8035645007be313ade79032b8d561ceec6c/os/io.c#L227

This functions gets the request length out of the buffer of data that was already read.
The caller of this one then follows some function pointer to actually handle the request:

https://gitlab.freedesktop.org/xorg/xserver/-/blob/master/dix/dispatch.c#L465-500

As an example of a request that allows trailing data (as already mentioned above): ProcNoOperation uses REQUEST_AT_LEAST_SIZE, so allows extra data:

https://gitlab.freedesktop.org/xorg/xserver/-/blob/master/dix/dispatch.c#L3405-3412

@khuey
Copy link
Contributor Author

khuey commented May 27, 2020

Ok, that's rather unfortunate that X is not consistent here.

@khuey
Copy link
Contributor Author

khuey commented May 27, 2020

Ok I think this is ready for another pass.

The main issue you brought up that I haven't changed at all is RequestHeader. The intended usage of this code is (pseudo code)

while (have_some_bytes) {
    let header = parse_request_header(bytes);
    if length_indicated_by_header < number_of_bytes_we_have {
        wait_for_more_bytes();
    }
    let request = match (major_opcode_from_header, minor_opcode_from_header) {
      ... => invoke the relevant parsers
    };
    do_something(request);
}

The opcodes and the request length all have to be extracted before calling a given FooRequest::try_parse_request. The major and minor opcodes are needed to choose the correct FooRequest, and the request length is needed to make sure that we have received enough bytes from the socket or whatever before calling into try_parse_request. Because "not enough data" is indistinguishable from other parsing problems with the current error reporting setup, we have to preclude that before attempting to parse to know that any errors we receive are "real". Given that somebody outside has to look at these fields anyways, I think the current design where they're parsed externally into the RequestHeader struct and the try_parse_request function received a copy of the struct plus the remaining input bytes makes sense.

I do plan to generate a Request enum that contains all the different request variants (matching those for Error and Event) and a function on that that takes the input and an ExtInfoProvider and parses out the header, selects the appropriate request variant, and invokes its try_parse_request function. But not in this PR.

@khuey khuey requested a review from psychon May 27, 2020 22:52
@psychon psychon requested a review from eduardosm May 28, 2020 07:10
Copy link
Owner

@psychon psychon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 from me

@mergify mergify bot merged commit bd46246 into psychon:master May 28, 2020
@khuey
Copy link
Contributor Author

khuey commented May 28, 2020

Excellent, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants