-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roadmap? #128
Comments
This is a good question, and I don't have a good answer at the moment. I do have a list of things to do, though all the things on the top of brain at the moment are fairly delicate re-wirings of things. If you don't mind, let me round up some folks on monday and have a chat about what things we might be able to break out as starter projects. The docs and doc_tests are totally good options, though I think you've gotten enough of a feel for the system at this point that perhaps their purpose (onboard people to timely) isn't as well served for you. There is a todo list at the bottom of readme.md, but it is a bit stale at this point (two of the items got pushed up to user level code in differential, and two of the items are a bit open-ended and speculative). I think refreshing that makes sense, and we can try and break out some easier or more independent parts! |
Ok, we just pushed a 0.5 crates version out, which I think means it is time to start to break things again. Now we will actually try to sort out what some solid todo items are. We have a bit of a list already (of things we need done, and will probably just do), but ideally the thinking stirs up some other ideas. Are there broad areas that motivate you? There are a few levels here, from the low level communication gunk (tracking down copies and removing them; tracking down alloc/deallocs that shouldn't exist) to higher level "ergonomic issues in operator definitions". I'm up for mentoring some of these, but it would be best if it lined up with your interests. If no specific areas right now, no worries and I'll try and whip up a list some elements of which might be more tempting. |
Networking and low level programming are areas I want to learn more about, so low level communication gunk sounds awesome! |
I've pushed a few concrete options as part of the There is another unlisted outstanding issue that the |
Thanks so much for listing out these options! I'm going to poke around in the codebase more and see if I can find an easy issue to get my feet wet. At first glance, using the It seems like many areas of work involve sorting out various performance issues -- do you think adding benchmarking is worthwhile? It feels weird to work on performance without first establishing a baseline. We could even track it over time. |
Benchmarking is totally a good thing (and something that they are trying to get going at ETHZürich), but it has some complications. The main issue here is that timely can pretty easily saturate weak network connections (1Gb, and we've saturated 10Gb), and this becomes more true the more worker cores you have in place. This mean that it can be a bit tricky to tease out communication performance issues without the right set-up. The "right set-up" could mean either i. a fancy network rig, or perhaps ii. a computer with enough cores that one could do loopback TCP (no network involved). It seems reasonable that one could do this, but it is more than just We could also put more effort into mocking for the components, so that perhaps we could extract the internals of the communication infrastructure and benchmark it without actually having data coming over the wire. That seems sane, but I don't have anything on the tip of my brain about how easy that would be. Happy to think about it though (either out loud or privately). Regarding the A second "issue", perhaps not to worry about yet, is that there are currently some mutterings about re-thinking the communication architecture to be a bit more data-driven. Right now the communication threads peel off bytes from the incoming network connections and drop them into queues for operators, which the operators are then expected to find when they next run. There is an interest (mine, maybe others too) in promoting information about the communication to the root of the worker, so that
I suspect that whatever happens here, the candidate Maybe as step 1 we could try and get an example where we can see timely behaving badly due to copies. There is an Edit: also, the workload we initially saw overheads due to copies (and got improvements by removing them) was the pagerank project. It moves a bunch of |
I was pondering the proposed Right now Ideally, the networking threads would maintain references to these large-ish allocations and recover them when the number of outstanding This would also mean that the channels we set up would move fn from_bytes(&mut Vec<u8>) -> Self; probably wants to take an owned I'm happy to help out with the |
Reading on the Also (potential footgun) small slices are stored in-line, so stashing a length zero |
I threw up a lightweight replacement for the bytes crate as #133. It is blocked on me having no idea if the code is in fact safe, nor any clue how to test it. But, it does what I think is the minimal set of things we would need to do, and allows resource recovery (as well as some generality in the backing store of memory, in case that ends up being useful (e.g. communicating between processes with shared memory). |
Wow, between #133 and #135 it seems like a lot of the stepping stones are in place. I'm happy to take this conversation over to issue #111, but it sounds like the next step is to, as you suggested, use exchange.rs to see how much additional allocation occurs due to inter-process communication. I could use #135 for this. I thought I recognized collectl plots in some of your blog posts -- I could use that track memory usage over time. Hmm... perhaps the ideal metric is fraction of runtime spent allocating memory. What do you think? I'll poke around with collectl and maybe ePBF and see if I make a helpful graph or two. |
Hi,
I was wondering if there was a list of improvements/features/fixes that you were considering implementing. I'm interested in contributing but I don't have a good sense of all the possible areas of work.
I did see 48 and 49, and also it seems like there's logging infrastructure work underway.
The text was updated successfully, but these errors were encountered: