Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(stdlib): properly detect tokio runtime in dns_lookup #882

Merged
merged 8 commits into from
Jun 17, 2024

Conversation

esensar
Copy link
Contributor

@esensar esensar commented Jun 7, 2024

No description provided.

@esensar
Copy link
Contributor Author

esensar commented Jun 7, 2024

When testing dns_lookup in vector in a real environment, we got this error:

thread 'vector-worker' panicked at
/cargo/registry/src/index.crates.io-6f17d22bba15001f/domain-0.10.0/src/resolv/stub/mod.rs:287:17:
Cannot start a runtime from within a runtime. This happens because a
function (like `block_on`) attempted to block the current thread while
the thread is being used to drive asynchronous tasks.
note: run with `RUST_BACKTRACE=1` environment variable to display a
backtrace

Which happens because VRL is called in a tokio managed thread. The solution here was to move the call to another thread, because internally domain crate implement blocking calls by still relyting on async function implementation.

Let me know if you have a better solution for this, since this might be too much to spawn a thread for each lookup.

@pront
Copy link
Member

pront commented Jun 7, 2024

Hi @esensar, thanks for suggesting a fix.

since this might be too much to spawn a thread for each lookup.

This might be problematic, could we have a pool or a dedicated thread for this?

stub.query((host, qtype, qclass)).await
})
let answer = match Handle::try_current() {
Ok(_) => thread::spawn(move || {
Copy link
Member

@jszwedko jszwedko Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if it returns Ok you can use something like:

futures::executor::block_on(async {
        handle
            .spawn(async {
            ...
         })
})

to just execute in the current Tokio runtime instead of spawning a separate thread per https://stackoverflow.com/a/62536772

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have to await on return value of block_on which bring us back to the same issue. I think we will have to go with a pool or a dedicated thread until proper async support is added to VRL (if it ever gets added, since it maybe clashes with original goals).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha. Then yeah, a pool seems like a good approach. If/when VRL functions become async we can revisit this. I was going to suggest we thread the through the runtime from Vector into VRL, but I think I'd prefer to just make the functions async than do that.

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm realizing this might be the first example of an "async VRL function". Per vectordotdev/vector#20495 we've tried to avoid that until now. I think we can just hack it as you currently have with sync lookups but if we gather more examples like this it might motivate adding async VRL function variants (internally, not exposing the async nature to users).

@jszwedko
Copy link
Member

@esensar just a heads up that we'll be releasing Vector next week in-case you want to try to fix this before then.

@esensar
Copy link
Contributor Author

esensar commented Jun 12, 2024

I have implemented just a single dedicated thread for this. I know that is probably not the best idea, but this close to the release, I didn't want to risk breaking stuff by adding a thread pool (which would probably include a crate, since it would be better to rely on something existing already).

Comment on lines 24 to 79
static WORKER: Lazy<Worker> = Lazy::new(|| Worker::new());

type Job<T> = Box<dyn FnOnce() -> T + Send + 'static>;
struct JobHandle<T> {
job: Job<T>,
result: Arc<mpsc::Sender<T>>,
}

struct Worker {
thread: Option<thread::JoinHandle<()>>,
queue: Option<mpsc::Sender<JobHandle<Result<Answer, Error>>>>,
}

impl Worker {
fn new() -> Self {
let (sender, receiver) = mpsc::channel::<JobHandle<Result<Answer, Error>>>();
let receiver = Arc::new(Mutex::new(receiver));
Self {
thread: Some(thread::spawn(move || loop {
match receiver.lock().unwrap().recv() {
Ok(handle) => {
let result = (handle.job)();
handle.result.as_ref().send(result).unwrap();
}
Err(_) => todo!(),
}
})),
queue: Some(sender),
}
}

fn execute<F>(&self, f: F) -> Result<Answer, Error>
where
F: FnOnce() -> Result<Answer, Error> + Send + 'static,
{
let job = Box::new(f);
let (sender, receiver) = mpsc::channel();
let receiver = Arc::new(Mutex::new(receiver));
let handle = JobHandle {
job,
result: Arc::new(sender),
};

self.queue.as_ref().unwrap().send(handle).unwrap();
return receiver.lock().unwrap().recv().unwrap();
}
}

impl Drop for Worker {
fn drop(&mut self) {
drop(self.queue.take());
if let Some(thread) = self.thread.take() {
thread.join().unwrap();
}
}
}
Copy link
Member

@pront pront Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @esensar, thank you for this patch. I think it's acceptable workaround for an undocumented function. I think it's good to add some comments here and also a module-level doc //! to explain what this function does and the limitations of a single threading solution. Especially when it comes:

  • Blocking
  • Worker Thread Saturation
  • Unbounded Job Queue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reasonable improvement here is to use a bounded channel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, that makes sense. I will try to add these as soon as possible, but I will not be available for the next 2-3 days. When is the release planned? I would love to get this ready for that release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release is planned for the 17th, which doesn't leave us too much time 😓 We do minor releases every 6 weeks though so it wouldn't have to wait too long if it didn't make it.

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @esensar . I think we are ok including a single-threaded implementation to unblock use of this function for the release while we think about a better model. I left some inline comments below, but we'd also like to see a warning added as a rustdoc just to make people aware of the potential bottleneck. Do you think you could add that?

match receiver.lock().unwrap().recv() {
Ok(handle) => {
let result = (handle.job)();
handle.result.as_ref().send(result).unwrap();
Copy link
Member

@jszwedko jszwedko Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change all of these unwraps to expects?

let result = (handle.job)();
handle.result.as_ref().send(result).unwrap();
}
Err(_) => todo!(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we do here? Panic?

queue: Option<mpsc::Sender<JobHandle<Result<Answer, Error>>>>,
}

impl Worker {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you borrow this implementation from somewhere? It's slightly more complicated than I might have expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is roughly based on https://doc.rust-lang.org/book/ch20-02-multithreaded.html
I tried to simplify it as much as I could, but I thought I had to have some kind of channel to take in the function and to send back the result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha, yeah I see. It seems like that example just creates one channel that is reused where this implementation creates a channel per call to execute. Is there a reason we need to create one channel per execute call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. My bad, one should be enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, a single mpsc::sync_channel(CHANNEL_CAPACITY) should be enough.
Both the sender and the receiver should be lazily instantiated instances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is all contained in Worker instance now, so both of them are lazily insttantiated (since Worker is lazily instantiated). There are 2 bounded channels though, one for jobs and one for results. I hope that is alright.

I have made the capacity 0 for testing (meaning it is always blocking), but on the other hand, not sure if it makes sense to make it any bigger, considering there is only 1 thread handling jobs. I might be missing something,

Comment on lines 59 to 65
let job = Box::new(f);
let (sender, receiver) = mpsc::channel();
let receiver = Arc::new(Mutex::new(receiver));
let handle = JobHandle {
job,
result: Arc::new(sender),
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to my above question, did you borrow this from somewhere? It seems like a lot of "work" to do for each execute function such that I'm wondering what the performance looks like. Maybe we could add a benchmark for it in benches/stdlib.rs to see?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already some benchmarks there for the dns_lookup function. Did you mean we should add some benchmarks for the Worker itself?

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @esensar ! I think this looks good to me know.

It seems possible that having a higher channel capacity could improve performance by letting the worker thread process more messages before it switches out, but we'd need to do some benchmarking to prove that out.

@jszwedko jszwedko requested a review from pront June 17, 2024 13:55
@jszwedko
Copy link
Member

I'd like to get @pront's review again too.

@pront pront added the no-changelog Changes in this PR do not need user-facing explanations in the release changelog label Jun 17, 2024
// Currently blocks on each request until result is received
// It should be avoided unless absolutely needed
static WORKER: Lazy<Worker> = Lazy::new(Worker::new);
const CHANNEL_CAPACITY: usize = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's set a sufficiently large number here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had trouble figuring out the right number for this and then I thought it would make sense to keep it 0, considering that it is just 1 thread, but that is probably wrong.

Do you have suggestion for a number?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you have works but we could probably go a bit further by allowing some buffering (without worrying about memory explosion), based on the docs:

This channel has an internal buffer on which messages will be queued. bound specifies the buffer size. When the internal buffer becomes full, future sends will block waiting for the buffer to open up. 
Note that a buffer size of 0 is valid, in which case this becomes “rendezvous channel” where each [send](https://doc.rust-lang.org/std/sync/mpsc/struct.SyncSender.html#method.send) will not return until a [recv](https://doc.rust-lang.org/std/sync/mpsc/struct.Receiver.html#method.recv) is paired with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tying it to the number of concurrent executions of the VRL runtime could make sense. For example, the remap transform runs one transform per available thread: https://github.com/vectordotdev/vector/blob/b3276b4cc73dee6d3854469562f1b1fcf15a419c/src/topology/builder.rs#L68-L73. To do that "right" though it should probably be configurable on the VRL runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess any number higher than that would be alright, so I guess for now we can hardcode something random, like a 100, since it won't really add more jobs than threads, due to the way the function blocks the current thread while waiting for result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tying it to the number of concurrent executions of the VRL runtime could make sense. For example, the remap transform runs one transform per available thread: https://github.com/vectordotdev/vector/blob/b3276b4cc73dee6d3854469562f1b1fcf15a419c/src/topology/builder.rs#L68-L73. To do that "right" though it should probably be configurable on the VRL runtime.

Sorry for just going for the hard coded number, but I thought we could go for that configurable approach when we update this to use multiple threads (or hopefully sometime in the future make it properly async).

@pront
Copy link
Member

pront commented Jun 17, 2024

I'd like to get @pront's review again too.

This is significantly improved, thanks @esensar. I had a question about the channel capacity which I believe is important to resolve before we merge this.

Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @esensar!

@jszwedko jszwedko added this pull request to the merge queue Jun 17, 2024
Merged via the queue into vectordotdev:main with commit f5719be Jun 17, 2024
13 checks passed
@esensar esensar deleted the fix/dns-lookup-tokio-runtime branch June 17, 2024 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog Changes in this PR do not need user-facing explanations in the release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants