Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement host lookup support #71

Merged
merged 10 commits into from
Nov 5, 2023

Conversation

flokli
Copy link
Contributor

@flokli flokli commented Oct 25, 2023

This adds support for the GETAI, GETHOSTBYADDR, GETHOSTBYADDRv6, GETHOSTBYNAME, GETHOSTBYNAMEv6 request types.

For the more complex GETAI lookup, we use the dns_lookup crate.

In previous iterations of this change, we also used the same underlying getaddrinfo call to respond to the gethostbyname/gethostbyaddr operations.

Even though gethostbyname/gethostbyaddr officially are deprecated, there's a lot of tools still using it, and relying on them behaving differently.

So it's important to still implement it, with exactly the same behaviour, to prevent some tools from breaking (like hostname --fqdn, see nix-community#4).

We FFI the right Glibc gethostbyname_2r/gethostbyaddr_2r (now deprecated) functions and use it to back the GETHOSTBYNAME, GETHOSTBYNAME6, GETHOSTBYADDR, and GETHOSTBYADDR6 Nscd interfaces.

Took us three tries to get this right. This is actually the third full rewrite.

The Nscd behaviour for these two legacy functions is really confusing. We're supposed to ignore the herrno (herrno != errno!!) and set it to 0 if gethostbyaddr/name returns a non-null hostent. If we end up with a null hostent, we return the herrno together with a dummy hostent header.

I tried to keep things as safe as possible by extracting the glibc hostent to a proper rust structure. This structure mirrors the libc hostent in a Rust idiomatic way. We should probably try to upstream the FFI part of this commit to the Nix crate at some point.

Fixes nix-community#4

@codecov-commenter
Copy link

codecov-commenter commented Oct 25, 2023

Codecov Report

Attention: 43 lines in your changes are missing coverage. Please review.

Comparison is base (bc843bd) 41.32% compared to head (644e82c) 56.57%.

Additional details and impacted files
@@             Coverage Diff             @@
##             main      #71       +/-   ##
===========================================
+ Coverage   41.32%   56.57%   +15.25%     
===========================================
  Files           6        6               
  Lines         271      456      +185     
===========================================
+ Hits          112      258      +146     
- Misses        159      198       +39     
Files Coverage Δ
src/protocol.rs 53.57% <100.00%> (+12.66%) ⬆️
src/ffi.rs 82.81% <81.96%> (+82.81%) ⬆️
src/handlers.rs 55.55% <73.10%> (+21.88%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@flokli
Copy link
Contributor Author

flokli commented Oct 25, 2023

We tested these changes extensively using sockburp on the host-lookups branch on the nix-community fork. This is a cherry-pick of the latest iteration.

We'll give it some more testing on our side, but wanted to send this out already, so you can take a look.

@flokli flokli marked this pull request as ready for review October 26, 2023 11:02
@flokli
Copy link
Contributor Author

flokli commented Oct 26, 2023

Marking as ready for review, PTAL :-)

@flokli flokli marked this pull request as draft October 26, 2023 14:36
@picnoir
Copy link
Contributor

picnoir commented Oct 26, 2023

There's something still off with the getaddrinfo canonical name.

@flokli
Copy link
Contributor Author

flokli commented Oct 26, 2023

The remaining issue is fixed too, and cherry-picked here - see nix-community#10 for details.

@flokli flokli marked this pull request as ready for review October 26, 2023 21:45
Copy link
Collaborator

@leifwalsh leifwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple minor typos and some questions. Overall I think it's fairly straightforward (or as straightforward as code dealing with getaddrinfo and friends can be). Haven't tried running it at all but the code feels like it would be debuggable if something crashed?

src/ffi.rs Outdated Show resolved Hide resolved
src/ffi.rs Outdated Show resolved Hide resolved
src/ffi.rs Outdated Show resolved Hide resolved
src/ffi.rs Show resolved Hide resolved
src/ffi.rs Show resolved Hide resolved
src/handlers.rs Outdated
Comment on lines 198 to 212
let hostent = match gethostbyaddr_r(LibcIp::V4(address_bytes)) {
Ok(hostent) => hostent,
Err(e) =>
// We shouldn't end up in that branch. Something
// got very very wrong on the glibc client side if
// we do. It's okay to bail, there's nothing much
// we can do.
{
bail!("unexpected gethostbyaddr error: {}", e)
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let hostent = match gethostbyaddr_r(LibcIp::V4(address_bytes)) {
Ok(hostent) => hostent,
Err(e) =>
// We shouldn't end up in that branch. Something
// got very very wrong on the glibc client side if
// we do. It's okay to bail, there's nothing much
// we can do.
{
bail!("unexpected gethostbyaddr error: {}", e)
}
};
let hostent = gethostbyaddr_r(LibcIp::V4(address_bytes))?;

I probably would have written this, and just let the original error surface and get logged. But I don't have strong (or generally good) opinions about error handling. I'll defer to @blinsay if he wants to have a stronger or better opinion.

(same idea below too)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not address this one yet. Let me know what you prefer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole thing with the sentinel error-case hostent getting returned from gethostbyaddr_r is very odd to me. We don't have to reproduce libc's strangeness exactly, even if we do have to serialize that error value and send it back to the caller.

What if we did something like:

enum HostentError {
  NotFound,
  TryAgain,
  NoRecovery,
  NoData,
  Other(anyhow::Error),
}

pub fn gethostbyname2_r(name: String, af: libc::c_int) -> Result<Hostent, HostentError> {
  ...
}

and then in handlers we could do something like this:

            let hostent = match gethostbyaddr_r(LibcIp::V4(address_bytes)) {
                Ok(hostent) => hostent,
                Err(Other(e)) => bail!("oh no, something very bad happened: {e}"),
                Err(e) => {
                  log::debug!("gethostbyaddr_r(..) = {e:?}");
                  Hostent::error_value()
                },
            };

src/handlers.rs Outdated Show resolved Hide resolved
src/handlers.rs Outdated Show resolved Hide resolved
src/handlers.rs Outdated Show resolved Hide resolved
src/handlers.rs Outdated Show resolved Hide resolved
@picnoir
Copy link
Contributor

picnoir commented Oct 30, 2023

I squashed the nits/small modifications in the original commits.

Separated the getaddrinfo serialization routine refactoring (asked in #71 (comment)) in a separate commit: it's a lot, it'll likely be simpler to review it from a separate commit.

@picnoir picnoir force-pushed the host-lookup-upstream branch 2 times, most recently from c739e9b to e49382f Compare October 30, 2023 15:41
@picnoir
Copy link
Contributor

picnoir commented Oct 30, 2023

(fixed the clippy error)

src/ffi.rs Outdated Show resolved Hide resolved
@leifwalsh
Copy link
Collaborator

leifwalsh commented Oct 31, 2023

@flokli and @picnoir have you sent us a CLA yet? I'm trying to track this down internally too but people are on vacation/doing end-of-year activities so I figured I'd just ask you while I wait to hear back. If you haven't, please do so (emailing a scanned PDF is fine) and let me know.

(I think we accepted one of @flokli's earlier PRs as de minimis, so we probably don't have one from either of you yet)

@leifwalsh
Copy link
Collaborator

leifwalsh commented Oct 31, 2023

Otherwise LGTM, @geofft @blinsay let me know if you want to take a pass, otherwise I'll merge once we have CLAs or on Wednesday, whichever comes later.

picnoir and others added 3 commits October 31, 2023 09:24
This adds support for the GETAI, GETHOSTBYADDR, GETHOSTBYADDRv6,
GETHOSTBYNAME, GETHOSTBYNAMEv6 request types.

For the more complex GETAI lookup, we use the dns_lookup crate.

In previous iterations of this change, we also used the same underlying
getaddrinfo call to respond to the  gethostbyname/gethostbyaddr
operations.

Even though gethostbyname/gethostbyaddr officially are deprecated,
there's a lot of tools still using it, and relying on them behaving
differently.

So it's important to still implement it, with exactly the same
behaviour, to prevent some tools from breaking (like `hostname --fqdn`,
see #4).

We FFI the right Glibc gethostbyname_2r/gethostbyaddr_2r (now
deprecated) functions and use it to back the GETHOSTBYNAME,
GETHOSTBYNAME6, GETHOSTBYADDR, and GETHOSTBYADDR6 Nscd interfaces.

Took me three try to get this right. This is actually the third
full rewrite.

The Nscd behaviour for these two legacy functions is *really*
confusing. We're supposed to ignore the herrno (herrno != errno!!) and
set it to 0 if gethostbyaddr/name returns a non-null hostent. If we
end up with a null hostent, we return the herrno together with a dummy
hostent header.

I tried to keep things as safe as possible by extracting the glibc
hostent to a proper rust structure. This structure mirrors the libc
hostent in a Rust idiomatic way. We should probably try to upstream
the FFI part of this commit to the Nix crate at some point.

Fixes #4

Co-Authored-By: Florian Klink <flokli@flokli.de>
Nscd always sets the AI_CANONNAME flag for a getaddrinfo request. When
this flag is on, the canonical name (~ FQDN) for the requested address is
retrieved.

We found this issue through the nixosTests.hostname.explicitDomain
NixOS VM test. It went unnoticed in the wild probably because
the nscd client tend to fill canonical name in the request itself
once it retrieved it once.

While investigating this issue, I realized that setting the
SOCK_STREAM flag gets rid of the duplicate addrs. Meaning that we do
not need to filter them ourselves with a HashSet anymore.
We simplify the getaddrinfo logic by refactoring all the logic in two
branches:

- a happy path, if we get >0 addrs back
- an error path, mirroring glibc's behaviour, if we do not get any
  addr back.

Co-Authored-By: Florian Klink <flokli@flokli.de>
@flokli
Copy link
Contributor Author

flokli commented Oct 31, 2023

@flokli and @picnoir have you sent us a CLA yet? I'm trying to track this down internally too but people are on vacation/doing end-of-year activities so I figured I'd just ask you while I wait to hear back. If you haven't, please do so (emailing a scanned PDF is fine) and let me know.

(I think we accepted one of @flokli's earlier PRs as de minimis, so we probably don't have one from either of you yet)

I'm very sure I sent one out Nov 14 2022 (to tsos at twosigma.com` just found the email), and am somewhat convinced @picnoir did so as well at around the same time.

@picnoir
Copy link
Contributor

picnoir commented Oct 31, 2023

Yup, according to #44 (reply in thread) I did.

src/ffi.rs Outdated Show resolved Hide resolved
src/handlers.rs Outdated Show resolved Hide resolved
src/handlers.rs Outdated Show resolved Hide resolved
The other serialize functions are not embedded in their relevant
structs. Let's be consistent and remove serialize_hostent from the
Hostent struct.
Saves us a clone() for the canonical name.
Copy link
Collaborator

@blinsay blinsay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking three(!) passes at this, it's clear you all did a ton of work in here.

src/ffi.rs Outdated Show resolved Hide resolved
src/ffi.rs Outdated Show resolved Hide resolved
src/ffi.rs Outdated Show resolved Hide resolved
src/ffi.rs Outdated Show resolved Hide resolved
src/ffi.rs Outdated Show resolved Hide resolved
src/ffi.rs Show resolved Hide resolved
src/handlers.rs Show resolved Hide resolved
src/handlers.rs Outdated Show resolved Hide resolved
src/handlers.rs Outdated Show resolved Hide resolved
src/handlers.rs Outdated Show resolved Hide resolved
@flokli
Copy link
Contributor Author

flokli commented Nov 2, 2023

The whole thing with the sentinel error-case hostent getting returned from gethostbyaddr_r is very odd to me. We don't have to reproduce libc's strangeness exactly, even if we do have to serialize that error value and send it back to the caller.

We're just forwarding herrno, or setting it to 0 (in the case we documented in the last push). We don't get much from trying to interpret it and carving out specific enums.

Some iteration of PR tried to do this, but turned out to be very error-prone, for little added value. We opted to stick more to how it's done by glibc here, especially considering how much userspace relies on this to be identical.

@blinsay
Copy link
Collaborator

blinsay commented Nov 2, 2023

Some iteration of PR tried to do this, but turned out to be very error-prone, for little added value. We opted to stick more to how it's done by glibc here, especially considering how much userspace relies on this to be identical

What parts were more error prone? Is one of those iterations a closed PR I can go look at?

@picnoir
Copy link
Contributor

picnoir commented Nov 2, 2023

Is one of those iterations a closed PR I can go look at?

Sure. https://github.com/nix-community/nsncd/blob/15133dd5bfbbc1111afb929a7005e8f672cdce68/src/ffi.rs#L56

Don't use that, it's not a sound approach.

What parts were more error prone?

Trying to interpret herrno ourselves and return an error hostent if herrno != 0. The correct thing to do here is:

  1. if hostent != null, set herrno to 0 and return the received hostent as it is.
  2. if hostent == null, return the error header, forward herrno.

We're not logging anything about herrno, we're not using it for any logic, there's no reason trying to parse it. It can only yield to more errors.

In the grand scheme of things, Nscd is a glorified (and lossy) glibc proxy. We just forward what glibc throws at us. I think that architecturally-wise, we should try to parse as few things as possible. Just the bare minimum to perform the Nscd error logic.

To be franc, I even came to believe we'd be better off not even trying to parse the string bytes into a Rust string. There are valid cases where a valid glibc string could be an invalid Rust string.

Don't try to do any parsing on the bytes - there's no guarantees they're
valid UTF-8 strings. We get C Strings from glibc, we serialize CStrings
back out.
@flokli
Copy link
Contributor Author

flokli commented Nov 3, 2023

@picnoir good call, I pushed another commit on top that now keeps these CStrings CStrings, allowing to remove some of the unnecessary back-and-forth parsing and serializing.

@blinsay
Copy link
Collaborator

blinsay commented Nov 4, 2023

I pushed another commit on top that now keeps these CStrings CStrings, allowing to remove some of the unnecessary back-and-forth parsing and serializing.

Nice :D

We're not logging anything about herrno, we're not using it for any logic, there's no reason trying to parse it. It can only yield to more errors.

In the grand scheme of things, Nscd is a glorified (and lossy) glibc proxy. We just forward what glibc throws at us. I think that architecturally-wise, we should try to parse as few things as possible. Just the bare minimum to perform the Nscd error logic.

This is the right position to take imo, but it was our choice to bundle herrno in with libc::hostent fields. It seems likeHostent contains an herror field because that was easy. Point taken on not parsing values we don't have to, but I think that means the right thing to do is just not interpret the error code, rather than pretend there's no difference between a libc::hostent and and error.

Given that I think I I'd suggest we do something like this:

enum HostentError {
  HError(i32),
  Other(anyhow::Error),
}

pub fn gethostbyname2_r(name: String, af: libc::c_int) -> Result<Hostent, HostentError> {
  ...
}

or even like this:

enum Hostent {
  Ok{ name: CString, ... },
  Error(i32),
}

pub fn gethostbyname2_r(name: String, af: libc::c_int) -> Result<Hostent, anyhow::Error> {
  ...
}

There's conceptually 3 kinds of response for a gethostbyxxx request:

1. Success: in that case we proxy the resulting hostent.
2. Lookup Error: in that case, we proxy the herrno together with a
                 "dummy" hostent error.
3. Internal Error: we bail out, don't send back anything and log a
                   error message.

Hopefully, we'll never use branch 3.

Reflecting this to the return type of these functions.
@picnoir
Copy link
Contributor

picnoir commented Nov 4, 2023

done.

At first, I disliked the verbose error handling on the FFI side. But it makes the error path more explicit on the handler side. I like it, thanks for the suggestion :)

@blinsay
Copy link
Collaborator

blinsay commented Nov 4, 2023

Awesome, looks good. I'm good to merge if @leifwalsh is.

@leifwalsh leifwalsh merged commit 9a63314 into twosigma:main Nov 5, 2023
8 checks passed
@flokli flokli deleted the host-lookup-upstream branch November 5, 2023 14:16
@flokli
Copy link
Contributor Author

flokli commented Nov 5, 2023

🎉 thanks!

@leifwalsh
Copy link
Collaborator

thank you both!

@leifwalsh leifwalsh mentioned this pull request Nov 5, 2023
flokli added a commit to nix-community/nsncd that referenced this pull request Jan 25, 2024
twosigma#71 landed upstream, and nixpkgs doesn't point to this fork anymore. Update the readme to reflect this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Imcomplete gethostbyname response
5 participants