-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace redis with multiprocessing.Queue
for SIL interface
#195
Conversation
Hi, thanks a lot for the PR and sorry for getting back late! The multiprocessing queue implementation looks great!
I'll merge this directly, feel free to issue new PRs for the other topics! Thanks :) |
Do you think this problem will persist if the Microgrid was smaller in size? The queue should be performant enough to handle the amount of requests. We should also think about using pipes which are faster than queues by a fair bit.
We could simply empty the queue each time we read the most recent values, right? |
I would like to provide an API that provides a time-series of e.g. battery state, local energy generation, etc. It does not have to have a super large time window for retaining these values, but continuously polling the API for new values of the e.g. battery does not seem sensible to me.
Sounds reasonable, I will make a proposal!
Well, using a Thread it's probably just 3-6 lines of code, and threads/processes are using in the code already anyways. But I will think about ways that seem easier.
To be fair, I have not tested this - running a load test to see whether this actually is a bottleneck - it is just a theoretical concern I had. I'm guessing for the use cases one would use vessim for, it is performant enough, even with the current size of the pickled Microgrid.
I had not considered this. But in theory it would work, since we only have 2 communicating processes. Under modern python versions, a pipe is still faster, but not by a lot apparently.
Right, but I think I misunderstand the question? The queue is already fully emptied each time any of |
You're right, how about we empty the queue every time before we write to it? |
As discussed with @birnbaum, I tried replacing the redis-container based implementation in
sil.py
with another option that does not require external installations (e.g. of docker).I have two prototypes:
this one, which is based upon
multiprocessing.Queue
a prototype which is based upon
sqlite3
That would have allowed more complex queries upon past values, too. Unfortunately it has some problems.
Either way, let's probably stick with this approach :) I've run the tests workflow in my forked repo, it is already a functional replacement for the previous implementation.
Here are some open discussion points:These are all addressed by #199.I would like to add the ability for the
Broker
to access previous values of e.g.p_delta
, otherwise the SIL interface would not be very useful in my case. One approach would be to use an sqlite database for this (just locally for the API process, not for communicating the data across the processes). Alternatively, one could just use a regular python list to store the data and use the builtin bisect for fast lookup of dates. Those are two approaches I came up with and have code for I can reuse to quickly implement this. Open to any other suggestions on how to do this, or to discuss if this is needed in the main repository at all.(Please see next discussion point first, as this one could become redundant.) As you can see in the smaller commit, I would like to make
Microgrid
's representation for pickling implicit using__get_state__
.I wasted quite some hours wondering why this queue-based implementation does not work, noticing way too late it was because I called
pickle.dumps(microgrid)
instead ofmicrogrid.pickle()
.This even lead me to give up on this attempt and trying out sqlite for this purpose instead. When I run into the same issue with sqlite, I noticed that the
multiprocessing.Queue
approach actually worked perfectly.Except for me accidentally trying to pickle the queues themselves, because they were included in the pickled representation of the microgrid.
I do not really see the point in having a separate public method
pickle()
instead of using the__get_state__
mechanism. Is there a reasoning behind that?A typical example of a
Microgrid
consumes a lot of memory for its pickled representation. This becomes a problem if one wants to provide access to past values.It looks like this is because a Generator's
HistoricalSignal
is also included in the pickled representation. It seems likeactors
should also be removed from the pickled representation.But actually, the question becomes if we need to pass the whole microgrid to the Broker anyways? The API cannot modify it directly, and needs to use the
set_event
mechanism for modification regardless. The only meaningful data that can be queried from the Microgrid object directly is the trivialstep_size
, andStorage
+StoragePolicy
objects. Both of these already havestate
methods akin toActors
. I would actually be in favour of removing the pickling ability of Microgrid and extracting this relevant information manually to be added to the queue.Memory footprint example here:
_process_incoming_data
needs to run periodically from within thefastapi
process. Right now it is only called on demand, when the broker is asked for data through e.g.get_p_delta
.I am unsure how to make the
fastapi
-process call this regularily. I guess one could use aThread
spawned by that process, but perhaps there are some better ideas? Maybeuvicorn
/fastapi
even provide mechanisms for this. I found this, but it is in an extra library.Not calling this periodically can lead to the following problems:
_incoming_data_queue
have to be processed by the broker at once, slowing down the API response.multiprocessing.Queue
have a max capacity that is limited through the operating system).