Being Able to Get Updates on Non-Primary Columns #7
Replies: 10 comments 6 replies
-
@cheerfulstoic Thanks for writing that up. I guess given all these trade-offs better not to make any changes until we have a public API that you feel good about. There's no rush to dive into an implementation yet! I think it helps to think about the ideal API. If we could snap a finger and have it, what would it be? I will mull over this for a bit. Ideally I don't want to have to configure anything. I want to be able to be able to write something like: I think my naming earlier was a bit confusing but mapping it more to something that feels like a "live query" feels more intuitive. What is my filter (where) criteria? What am I selecting? What I don't care about is:
If those can be done almost transparently even better. Once we figure out what the best public API is we can work backwards from there. |
Beta Was this translation helpful? Give feedback.
-
Great write-up, some of my thoughts:
would cause the respective events to fire
|
Beta Was this translation helpful? Give feedback.
-
Yeah, agreed about not needing to configure anything. Though I might add that it's great to be able to not configure things until you start to need more complexity. I like the idea of
I get the idea to make it more like a "live query", but I picture people being confused about what's going on and sending in feature requests for things which aren't possible 😅 I definitely don't think that people should have to understand what
Agreed on all of these 👍 |
Beta Was this translation helpful? Give feedback.
-
I somewhat like this idea because it fits with the way that
I don't think that this is even possible. Because PubSub messages are being sent, the publisher needs to be able to know both what fields to send in the message, but also what fields you can subscribe to. If I want to subscribe to
Yeah, I definitely don't want to do that 😅 But also changing the API (the topics and/or the messages that are being sent) would break existing code, so I want to get a path forward so that I don't need to break people's apps a second time (at least in cases where I'm nearly 100% sure that those future features will be requested/needed)
Yeah, totally 👍 I thought about this previously but punted in the perfect-is-the-enemy of good spirit 😉 I was thinking it might be
I guess that might be covered by the |
Beta Was this translation helpful? Give feedback.
-
The glory of events-as-config is you don't have to decide as the library designer :) Consider: watchers: [
{:inserted, MyApp.Blog.Comment, %{id: :id, post_id: :post_id}},
{:inserted, MyApp.Blog.Comment, {:id, :post_id}}
] I personally would rather not have a map that I'm matching on. It just leads to a lot of verbosity to me that I don't really need - it forces my handlers to span multiple lines, and in my handler, I'm not really listening for "456". The code I'm writing (and I imagine what someone would write anywhere outside of the repl) would probably end up using variable names for most of these things. The lines associated with it would be: comment = Repo.(...)
...
EctoWatch.subscribe({:inserted, MyApp.Post.Comment, {comment.id, comment.post_id})
...
def handle_info({:inserted, MyApp.Post.Comment, {comment_id, post_id}) Compare those with forced maps: comment = Repo.(...)
...
EctoWatch.subscribe({:inserted, MyApp.Post.Comment, %{id: comment.id, post_id: comment.post_id})
...
def handle_info({:inserted, MyApp.Post.Comment, %{id: comment_id, post_id: post_id}) |
Beta Was this translation helpful? Give feedback.
-
WRT the second two points:
I think that's being a little overly defensive? In order for there to be a problem, the user would have to make the mistake that there is a postgres verb called changed, not read the documentation, and hallucinate whatever functionality that (not necessarily tied to
did not see that! awesome :) |
Beta Was this translation helpful? Give feedback.
-
I want to take a moment to say that I'm enjoying the conversation. I'm going to be pushing back on things here because I want to tease out ideas and I hope that you're down with that 😄 Message format
Yeah, I guess that's fair. I still feel something deep in my brain that's telling me that it's a bad idea, probably because I got bitten multiple times 😅 Two things that occur to me at the moment:
The idea of being able to specify your own messages does appeal to a part of me, though. But I'm imagining implementing it and I think it would require recursively parsing through all of the different sorts of structures that you could define (lists, tuples, maps, keyword lists, etc...), look for any atoms, and verify that they are fields (association fields specifically?) in the schema. There should probably be errors raised defining some sort of "path" to indicate to the dev where the bad atom was. And then each time a message was sent this structure would need to be built (your examples are a simple tuple and a map, but I imagine people could go crazy) But fundamentally we're never going to be able to have anything more than a single record's worth of data being sent in a message, so we shouldn't overcomplicate things because we can just have a simple representation of the object. A map feels very much like the expected and idiomatic Elixir choice. Also, I haven't thought through all of the possible uses of def handle_info({:inserted, MyApp.Post.Comment, %{id: comment_id})
def handle_info({:inserted, MyApp.Post.Comment, %{post_id: post_id}) Tracked fieldsThat was a lot, but honestly the bigger thing that feels like it needs to be figured out is how to define which fields can be subscribed to. But maybe that's solved if we just automagically create topics for all association field columns for the fields which are sent? On that topic I want to look at one of your examples: EctoWatch.subscribe({:inserted, MyApp.Post.Comment, %{id: comment.id, post_id: comment.post_id}) Firstly, I don't think somebody would ever subscribe to But even more so, I really want to figure out if anybody would ever want to subscribe to two fields at once like to |
Beta Was this translation helpful? Give feedback.
-
Sounds great! I have no problems making my opinions known, and recognize that they're just opinions 😂 It's true that having less information can lead to more confusion. That being said, I've worked in clojure for years, where most of the function declarations and usage relies on order of arguments. Verbosity is not better or worse, it's just more verbose. To be hyperbolic, I'm very glad we don't use I think it would be reasonable to implement it with maps. I would probably fork it and add support for tuples. If you wanted to accept it, great, if not, I have a whole desktop full of forks I use in production 😂 PS: I don't think you need to have recursive support, I think a conditional for
Yeah that was a terrible example 😂 I can't think of any use cases on my own project where I would need two filters. Definitely agree that starting with one is a good decision. I'm sure I could think of contrived cases where multiple would be necessary, but I think it's a good thing to punt. Architecturally, it would be pretty niche to require that specific of pubsub channels over just setting the channel to the 'quieter' value, and filtering manually on the other. Although the atom/map/list/tuple debate still applies to the fourth parameter :D |
Beta Was this translation helpful? Give feedback.
-
You know, I might have a solution that could address this (without needing a fork 😉) but also fix something else I was thinking about... One of the things that I figured would be needed at some point for scaling was being able to collect messages and send them in batches up to a certain timeout. Both to reduce the number of messages but also to make sure that the update comes within a certain amount of time (if the product requirements fit that criteria). So, for example for a 500ms timeout, if you got only one message from the database for a watcher within that timeout then you'd get a message with a list of just one record. If you got three messages from the db, then you'd still just get one message with a list of three records. I didn't want to over-optimize, but it's been on my mind. One way that you could solve this, of course, is to create a separate GenServer which subscribes to messages from But I could imagine allowing a function for {[:inserted, :updated, :deleted], MyApp.Blog.Comment, extra_columns: [:post_id], handle_record: fn record, state ->
if ...condition... do
{:noreply, [record | state]}
else
{:reply, [record | state], []}
end
end}, That example isn't complete because you'd want to send the current state when a timeout was hit using a defmodule MyApp.Blog.CommentHandler do
use EctoWatch.Handler
def init(...) do
state = []
config = {[:inserted, :updated, :deleted], MyApp.Blog.Comment, extra_columns: [:post_id]}
{:ok, {config, state}}
end
def handle_record(record, state) do
# Implementation including timeout logic including `send_after(self(), :handle_timeout, 500)`
end
def handle_info(:handle_timeout, state) do
# ...
end
end That feels kind of nice and similar to how you can often in Elixir (like with Supervisors/Tasks) either do a default configuration-based thing, or expand it out into a module if it gets complex. All that said... I feel like the "right"/idiomatic BEAM-like solution would still be to make a separate GenServer that handles whatever messages. Pros for GenServer:
Cons for GenServer:
Pros for allowing custom EctoWatch behavior:
Cons for allowing custom EctoWatch behavior:
So, that was a lot (again)... Starting tomorrow I'm going to have a bit of time this week before going offline for a bit, so I want to try implementing something that will hopefully address most of the concerns and maybe still be changeable later if custom behavior is implemented (if having a separate translation GenServer isn't reasonable for some reason) I'm going to work on:
I also just realized that I forgot to respond over here. I'd also like to think on that (I reached out to my colleague to see if he could remember why we chose four-element tuples in the first place 😅 ) Not a final decision, I just want to move forward with something that I think will work 🤔 |
Beta Was this translation helpful? Give feedback.
-
I'm always for configuration, and I like the idea of being able to configure an interim GenServer. That being said, if that's the solution to supporting multiple input types, I would still either just use maps or fork it. If I just want to subscribe to all updates on a certain field, this complexity is not warranted (for me).
def handle_info({:inserted, MyApp.Blog.Post, id, opts} = w, socket) do
EctoWatch.Timing.throttle(w, 500, fn [latest | throttled] ->
...
end)
end
EDIT: yeah, did not take me long to thing of several reasons this would not work 😂 EDIT: PS enjoy your time offline!! |
Beta Was this translation helpful? Give feedback.
-
We would want a user to be able to subscribe values on columns other than the primary key. That might look something like:
In that example there would be a Phoenix PubSub topic setup for every column in
extra_columns
to be able to support this. Looking at the implementation of PubSub (specifically:pg
as I don't think this would apply for the redis adaptor), if youbroadcast
to a topic without any subscribers there is a minimal impact on performance because no PIDs exist in the:pg
group and so there is nothing to loop through.However, it might be good to require that people are more deliberate on which columns they want to allow people to subscribe to. So perhaps there should be something like a
subscription_columns
option like:I can imagine that in some cases developers might not actually care about getting the
profile_id
in the message as long as they can be sure that they'll only be getting messages for theprofile_id
(s) which they subscribed to. Of course if they want theprofile_id
then it could be put intoextra_columns
. But in order to broadcast to a topic that contains thesubscription_columns
columns the data would need to be sent frompg_notify
anyway, so perhapssubscription_columns
andextra_columns
should be linked.Also, it's worth addressing: should it be possible to subscribe to multiple columns like:
I imagine that would require thinking about what topics are created. If the following watcher was setup:
Then do we create a topic for
profile_id
, a topic forlocation_id
, and then a topic for the combination of the two? With more than one column the number of topics would start to grow and most of them would probably be unused.So perhaps there should be a watcher for everything you might want to subscribe to like:
Subscribing like:
And then each watcher would only send out messages on one PubSub topic dedicated to that set of columns.
Probably the set of columns should be alphabetized to make sure that there's a consistent match. This probably wouldn't be exactly what the topic strings would look like, but to give examples:
Also, if a column is something other than a foreign key (for example a large text field), then the topic would become quite long. Topics seems to just map down to Elixir
Registry
keys, so I don't know that there's a limit, but it will probably have some sort of cost. So perhaps either thesubscription_columns
should be limited to foreign keys or perhaps the value should be hashed in some way when making the topic (which probably introduces a very, very, very small chance of conflicts?)A situation that is not a foreign key where I could imagine you might want to be able to subscribe to specific values would be identifier fields like
email
orsocial_security_number
, but I wonder if it isn't best anyway to require the developer to subscribe to the primary key instead of theemail
(if the value of the email changes, for just one example, the subscription would be "dangling") All of this makes me lean toward limitingsubscription_columns
to just association columns.Another question: what about the global / primary key subscriptions? The following sets up a "default" watcher and a
subscription_columns
watcher:So perhaps if the
subscription_columns
option is specified then the global / primary key topics shouldn't be broadcast to.It could be said that maybe developers should subscribe to the global / primary keys explicitly. Basically requiring the
subscription_columns
option:This also might seem to make sense because then sending of global vs column-specific messages could be handled by different watcher processes in
EctoWatch
, which could potentially scale better. But a few arguments against this:@primary_key
module attribute, separately fromfield
definitions)pg_notify
notifications (or have two triggers which just do the same thing)subscription_columns
means. Benig able to just specify{MyApp.Accounts.User, :updated}
is a very, very nice user experience.I dunno 😅 That's a lot and it's hurting my head a bit to think of it all, which is part of why I wanted to write this....
After writing all of that, I'm wondering if it doesn't just make sense to have the columns in
extra_columns
which are also associations be broadcast automagically. Without asubscription_columns
option (because, like I mentioned, if we're allowing subscriptions on a column we need to get that value from the table throughpg_notify
anyway). Like this:In the above case you'd be able to do the following subscriptions:
I'd love to have thoughts to help me get out of my mental rut!
Beta Was this translation helpful? Give feedback.
All reactions