Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getaddrinfo throws UVError "unknown node or service" #14972

Open
samoconnor opened this issue Feb 7, 2016 · 20 comments
Open

getaddrinfo throws UVError "unknown node or service" #14972

samoconnor opened this issue Feb 7, 2016 · 20 comments
Labels
missing data Base.missing and related functionality

Comments

@samoconnor
Copy link
Contributor

The manual says only: "getaddrinfo(host) Gets the IP address of the host (may have to do a DNS lookup)".

The current implementation of getaddrinfo throws UVError if the host can't be found (including in cases where the cause is a temporary network outage).

@StefanKarpinski wrote:

Unlike Python, catching exceptions in Julia is not considered a valid way to do control flow. Julia's philosophy here is closer to Go's than to Python's – if an exception gets thrown it should only ever be because the caller screwed up and the program may reasonably panic. You can use try/catch to handle such a situation and recover, but any Julia API that requires you to do this is a broken API.

In this case there is no precondition violation, either I'm looking up an address that doesn't exist (yet/anymore) or I'm looking up a valid address but can't contact the DNS at the moment.

julia> getaddrinfo("notreallyarealaddress.com")
ERROR: getaddrinfo callback: unknown node or service (EAI_NONAME)

# With network turned off:
julia> getaddrinfo("google.com")
ERROR: getaddrinfo callback: unknown node or service (EAI_NONAME)

What pattern is preferred by the core Julia devs for an API that can fail even when the precondition is met? Is there an API that can be held up as a "good example" of this pattern?

Should getaddrinfo return Bool and fill in a IPv4 object passed as an arg?

Should the return be nothing if the lookup fails?

Should getaddrinfo return some kind of IPv4Request object that has fields ok::Bool, error_info::String and result::Nullable{IPv4}?

Background: I'm trying to figure out what is the best way to deal with exceptions in the AWSCore.jl API. (where "best" == most consistent with the philosophy and style of Julia). So I'm looking at Base for examples of how error conditions should be handled.

@nalimilan
Copy link
Member

Returning a Nullable sounds like the best solution. That pattern has already been retained for tryparse (#9487). But do we really need getaddrinfo and trygetaddrinfo? This is going to get unwieldy really fast...

@samoconnor
Copy link
Contributor Author

do we really need getaddrinfo and trygetaddrinfo?

No. There is no difference between getaddrinfo and trygetaddrinfo.

parse is deterministic and reliable but getaddrinfo is not. It is always only an attempt.

That is why this issue cites getaddrinfo. It is a good example of an API that breaks the rule: "any Julia API that requires you to [use try/catch] is a broken API".

@samoconnor
Copy link
Contributor Author

@vtjnash, @malmaud, @amitmurthy, @hayd, @wavexx, @tknopp ?
Do you have any thoughts on the "right way" to do this? (somewhat related to #7026 discussion)

@malmaud
Copy link
Contributor

malmaud commented Feb 7, 2016

I agree about Nullable - that's generally what Requests.jl does for this situation.

@samoconnor
Copy link
Contributor Author

How should the reason for the failure be handled? e.g. EAI_AGAIN vs EAI_NONAME.

@vtjnash
Copy link
Member

vtjnash commented Feb 8, 2016

Nullable is wrong. We would need to add an ErrorOr type, but I'm slightly inclined to sticking with using throw. In particular, an AssertionError signals that the caller did something completely flawed. An ArgumentError would signal that the caller messed up a precondition. However, a simple script that expects the network to be available may reasonably throw a fatal exception when the DNS query fails (UVError).

@samoconnor
Copy link
Contributor Author

@vtjnash, I'm not opposed to sticking with throw.

I agree that there are APIs like this where, for some (or most?) users, this is a fatal error and they don't want to get bogged down in error handling.
Of course, if the user is trying to write a robust high level API that accesses the network, they need to properly handle the error and retry (or fail-over to another node, or whatever is appropriate).

The error should not be a UVError (that is an implementation detail #7841) . It should at least be something like DNSError, or preferably something finer grained like Union{TemporaryNetworkError,UnknownNodeError} or Union{EAI_NONAME_Error, EAI_AGAIN_Error}

If error types for are added and documented for EAI_AGAIN and EAI_NONAME this would still have to be reconciled with "any Julia API that requires you to [use try/catch] is a broken API".

Perhaps that rule could be refined to take into account the idea of a "simple script that expects the network to be available". The same issue is present in the file access API. e.g. this will work every time for a lot of users: [open(readbytes, f) for f in readdir(".")], but for other users dealing with the No such file or directory error is critical (in the case where the file is deleted between readdir and open).

Maybe the rule could be: "any Julia API that requires you to [use try/catch] in when dealing with simple local resources is a broken API. APIs that deal with remote or shared resources may use exceptions to model re-tryable situations that occur infrequently".

@nalimilan
Copy link
Member

Indeed, returning an exception allows giving an error code, which Nullable doesn't.

But regarding rules about exceptions, the local vs. remote distinction isn't the real underlying reason, though it's a good approximation. I would say what could justify that a function raises an exception in the absence of programming error is the unpredictability of the failure. Indeed, the particularity of I/O and network calls is that something might fail even if you checked that it worked right before doing the call: the network might have been disconnected, or a file removed in the meantime.

@StefanKarpinski Would you be fine with adapting the rules about exceptions and mentioning them in the manual? The quote from you @samoconnor gives above doesn't appear to reflect the current state of the Julia APIs, as can be seen from the fact that e.g. connect("nonexistent",80) raises a "no address" exception, while in Go an error code is returned as the second member of a tuple.

@tknopp
Copy link
Contributor

tknopp commented Feb 8, 2016

I don't think there is agreement that APIs should not throw exceptions. Further I don't understand why this should be linked to whether its network code or local code.

@hayd
Copy link
Member

hayd commented Feb 8, 2016

We'd have to get Stefan's intention when he wrote that, but to me "control flow" means in python you might do something like this:

def collatz(n):
    while True:
        yield n
        if n == 1:
            # Granted a break will work just fine here.
            raise StopIteration()
        elif n % 2:
            n = 3 * n + 1
        else:
            n //= 2

which can be a valid approach (though this particular example isn't really).

In julia we wouldn't use exceptions to do that kind of thing.

...that said, exceptions are part of the language. They have performance issues, but when you're doing network IO that's not really an issue... so what is?


I'm looking up a valid address but can't contact the DNS at the moment.

I would argue this IS an exceptional case, not part of "control flow"... and it is something the caller is going to have to handle (there's not just one way to handle it which the package can just implement).

If you conceal it in a Nullable you lose the the context of the error, the stacktrace etc.
The possible errors/causes should be clearly documented.


Another approach (that the Scala codebase at my current work uses) is to return Eithers or (Future) Success/Failure throughout... but I think this is a tough sell in a non-functional language, and you also lose the context as discussed above (in the cases where there were "unexpected" Failures i.e. debugging is a PITA).


TLDR: Exceptions are just really useful and well-understood.

@StefanKarpinski
Copy link
Member

@hayd: I'd say that's a pretty reasonable interpretation of what I meant. Also keep in mind that we're figuring out how we do things here. At this point it's pretty clear that "the Julian way" does not include using exceptions for basic control flow, but DNS errors are a much tougher call. Of course, exceptions have predictability problems that give me pause (see #7026), but if we had a solution to those, this would be fine, and even without it, it's largely unproblematic.

@samoconnor
Copy link
Contributor Author

@StefanKarpinski I am very interested to know your opinion of the Midori examples, see: #7026 (comment). (Midori makes an explicit distinction between dealing with bugs (abandonment) vs recoverable exceptions and has a mandatory call site keyword similar to the "chain of custody" idea in #7026.)

While stuff is in the process of being figured out, could some general rules be agreed on? e.g.

  • An API must not require the use of try/catch for basic control flow.
  • Any API that requires the use of try/catch for unusual cases must throw an explicit and unambiguous Exception type and must document its meaning, possible causes, and recoverability.
  • An API must not allow low level "implementation layer" exception types to "leak out". Exceptions should be translated into something that makes sense at the level of abstraction that the rest of the API presents to the caller.
  • Others? ...

@StefanKarpinski
Copy link
Member

@samoconnor: I'm reading through this. So far very interesting – thanks for bringing it to my attention.

@samoconnor
Copy link
Contributor Author

See related: #15514

@samoconnor
Copy link
Contributor Author

samoconnor@0a716ba adds a DNSError type that shows the hostname in addition to the underlying UVError information. e.g.

julia> getaddrinfo("google.com")
ERROR: DNSError: google.com, getaddrinfo callback: unknown node or service (EAI_NONAME)

@vtjnash would you support a PR along these lines?
... or since its a fairly tiny change, if you prefer, take the patch, tweak it to your liking and commit it directly?

@vtjnash
Copy link
Member

vtjnash commented Apr 15, 2016

i would go with it. i'm still undecided on whether to change it to use direct return of a nullable error type. in the meantime, providing clearer error messages i think is still a worthwhile investment.

@samoconnor
Copy link
Contributor Author

samoconnor commented Apr 15, 2016

@vtjnash nash see #15879

@samoconnor
Copy link
Contributor Author

@vtjnash #15879 is now passing Travis and AV.

It now also emits SystemError and OutOfMemoryError in uv_error which should benefit a whole bunch of places that call uv_error.

Is there anything you'd like tweaked before merging? (or feel free to merge and tweak to your taste yourself if you think that'll save time).

@nalimilan
Copy link
Member

FWIW, it looks like a good solution here would be to return something similar to Rust's Result type: a wrapper similar to Nullable which either contains a value, or an error code (which could be an unthrown exception in Julia). Of course this wouldn't make sense unless we find a lot more places where such a type would improve the control flow.

@nalimilan nalimilan added the missing data Base.missing and related functionality label Sep 6, 2016
@Nosferican
Copy link
Contributor

I believe I am hitting this when trying to use HTTP.jl to make calls in a containerized setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
missing data Base.missing and related functionality
Projects
None yet
Development

No branches or pull requests

8 participants