Skip to content
This repository has been archived by the owner on Nov 28, 2020. It is now read-only.

thread 'main' panicked at 'invalid option' #125

Closed
awaited-hare opened this issue Dec 30, 2019 · 12 comments · Fixed by #126
Closed

thread 'main' panicked at 'invalid option' #125

awaited-hare opened this issue Dec 30, 2019 · 12 comments · Fixed by #126
Labels
bug Something isn't working

Comments

@awaited-hare
Copy link

awaited-hare commented Dec 30, 2019

Describe the bug

The following program gives error thread 'main' panicked at 'invalid option' (with probability, and if no panic, the program will hang forever, which is also not expected) if run with cargo run --release. However it works well with cargo run.

use std::io::Write;
use libzmq::TcpAddr;
use libzmq::prelude::{TryInto, BuildSocket, BuildRecv, RecvMsg};
use std::time::Duration;

fn main() {
    let input_addr: TcpAddr = "127.0.0.1:9999".try_into().expect("cannot parse zmq addr");
    let mut socket_builder = libzmq::GatherBuilder::new();
    let socket = socket_builder.connect(&[input_addr])
        .recv_hwm(10)
        .recv_timeout(Duration::from_millis(1000))
        .build()
        .expect("cannot build zmq socket");
    loop {
        socket.recv_msg().unwrap();
    }
}

Platform:

  • OS: Arch Linux (5.4.6-arch3-1)
active toolchain
----------------

nightly-x86_64-unknown-linux-gnu (default)
rustc 1.42.0-nightly (0de96d37f 2019-12-19)
@awaited-hare awaited-hare added the bug Something isn't working label Dec 30, 2019
@jean-airoldie

This comment has been minimized.

@jean-airoldie
Copy link
Owner

jean-airoldie commented Dec 30, 2019

I hid my previous comment because my assumption was wrong. Found UB here:

let value_ptr = match maybe {
Some(value) => &value as *const T as *const c_void,
None => &none_value as *const T as *const c_void,
};
setsockopt(mut_sock_ptr, option, value_ptr, size)

Due to a temporary being cast as a raw ptr. I'll fix asap and look if this mistake appears elsewhere.

@jean-airoldie
Copy link
Owner

I'll publish a new release.

@awaited-hare
Copy link
Author

I'll publish a new release.

Sounds great. Thanks for your help!

@jean-airoldie
Copy link
Owner

A general word of advice, I would recommend against using ZeroMQ unless you are already committed to it. The codebase is unmaintainable and there are tons of API, documentation and implementation issues. There are tons of interesting ideas however, if that's what your looking for. I can't recommend an alternative solution however as I'm using my own closed source stack for the moment. Good luck!

@awaited-hare
Copy link
Author

A general word of advice, I would recommend against using ZeroMQ unless you are already committed to it. The codebase is unmaintainable and there are tons of API, documentation and implementation issues. There are tons of interesting ideas however, if that's what your looking for. I can't recommend an alternative solution however as I'm using my own closed source stack for the moment. Good luck!

@jean-airoldie BTW, are you familiar with NNG (https://nng.nanomsg.org/index.html)? Is it any better than ZeroMQ on your forementioned issues? Thanks!

@jean-airoldie
Copy link
Owner

To give some context, I do distributed messaging at a high frequency in rust. Moreover I care about reliable messaging because my messages are not idempotent (aka I can't simply resend the same message if I think it was lost). I used zeromq in production for a while and I came at several conclusion (which can also be applied to nanomsg and nng since they share a similar design).

Problem

TCP is not a reliable transport, neither is HTTP or HTTPS. This is because of 2 things:

  • TCP uses a 16bit checksum which is really too low for modern application with high throughput. In my case I think I would get 1 corrupted packet undetected by the checksum every 400gb of packet (can't remember the specifics so take it with a grain of salt). In practice, since you are using some sort of encryption over TCP, this means a decryption error which is fatal. This can be a big deal in some cases.
  • TCP is only really reliable as long as the connection is alive. The problem is that the connection can die at any time, and you will only know when you timeout the peer. Messages sent after the connection died, but before you timed out the peer, are lost. To fix that you either have to:
    • Wait for a reply after each request (aka. you ACK at the application level) which is really damn slow.
    • ACK every message in a batched asynchronous manner. This would give the best performance but turns out its really damn nasty and complex to pull off. Moreover you need to keep a local copy of every message which increases memory usage.

Solution

I would say that the QUIC protocol solves all of these issues, however its still fairly young and unoptimized. For instance I don't event get 1Gb/sec using quinn where I would get 10Gb/sec using TCP. I tried basically every other viable transport that I'm aware off and they all either suck or are really damn slow.

Problem

ZeroMQ, nanomsg and nng sockets are stateless by design. This means that updates to the socket states will not be propagated to the user. Turns out this can be a major problem:

  • You are not notified when the socket connected or reconnected to the peer. This means that if there is a initial application level handshake that you need to do every time a connection is created you are out of luck.
  • You are not notified when the peer disconnected. This means that you can't implemented a dead man's switch easily.
  • You are not notified when the zeromq authentication mechanism fails, for instance when you misconfigured your credentials. Good luck debugging that.
  • And many more!

Solution

Propagate the socket state updates as notification that are forwarded to the user when he calls recv(). That's an integral part of the networking libraries I write and It works well. I don't really understand why zeromq & nanomsg & nng didn't do this in the first place, other than maybe ideology.

Conclusion

I didn't find anything out there that fits my specific needs so I would need to write my own lib on top of QUIC. However, currently I don't strictly need it and I don't have the time & resources to spare for now.

@awaited-hare
Copy link
Author

Thanks for your very detailed explanation!

@Wonko7
Copy link

Wonko7 commented Feb 20, 2020

I'll publish a new release.

Hi all!
Did this happen? I'm also running into this bug with 0.2.1 which is expected as it's older than this bug report. Are there beta releases or something that you're using in the meantime? I know about putting { git=... } in Cargo.toml but that won't let me publish.

I'm new to rust & its ecosystem, sorry if I'm missing something.

@jean-airoldie
Copy link
Owner

I'm also running into this bug with 0.2.1 which is expected as it's older than this bug report.

My bad, seems like I didn't actually publish a release. I'll do that right now.

@jean-airoldie
Copy link
Owner

know about putting { git=... } in Cargo.toml but that won't let me publish.

If you put libzmq = { git = "https://github.com/jean-airoldie/libzmq-rs" } this will use the latest master branch commit has a dependency. Then you can run cargo update which will fetch de dependencies etc.

@Wonko7
Copy link

Wonko7 commented Feb 20, 2020

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants