Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔍 Starting contacts into flow via ElasticSearch #144

Merged
merged 19 commits into from
Aug 8, 2019
Merged

Conversation

nicpottier
Copy link
Member

@nicpottier nicpottier commented Jul 2, 2019

Allows starting a set of contacts in a flow, our start job just needs a query parameter.

Will require ansible updates to update mailroom env to know about Elastic.

@codecov
Copy link

codecov bot commented Jul 3, 2019

Codecov Report

Merging #144 into master will increase coverage by 1.11%.
The diff coverage is 64.7%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #144      +/-   ##
==========================================
+ Coverage   42.98%   44.09%   +1.11%     
==========================================
  Files          68       70       +2     
  Lines        6449     6715     +266     
==========================================
+ Hits         2772     2961     +189     
- Misses       3181     3248      +67     
- Partials      496      506      +10
Impacted Files Coverage Δ
models/fields.go 75% <ø> (ø) ⬆️
models/starts.go 1.65% <0%> (-0.26%) ⬇️
runner/runner.go 47.81% <0%> (-0.13%) ⬇️
models/assets.go 51.24% <0%> (-0.52%) ⬇️
search/mock.go 0% <0%> (ø)
hooks/session_triggered.go 81.96% <100%> (ø) ⬆️
starts/worker.go 40.81% <54.54%> (+1.45%) ⬆️
models/contacts.go 49.62% <59.37%> (+0.62%) ⬆️
search/search.go 93.82% <93.82%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d66a527...1838d31. Read the comment docs.

@nicpottier
Copy link
Member Author

@dodobas would appreciate a close review of this!

@nicpottier nicpottier changed the title [wip] parsing into elastic queries from contactql parsing into elastic queries from contactql Jul 12, 2019
name += "_keyword"

if c.Comparator() == "=" {
query = q.NewTermQuery(name, value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we are missing fieldQuery for = and != operators, json test cases should also be updated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am including the field but more importantly I wasn't building a nested query, updating.

@nicpottier
Copy link
Member Author

@dodobas updated, can you take another pass?

@nicpottier nicpottier requested a review from dodobas July 16, 2019 13:49
@nicpottier nicpottier changed the title parsing into elastic queries from contactql Starting contacts into flow via ElasticSearch Jul 16, 2019
@nicpottier nicpottier changed the title Starting contacts into flow via ElasticSearch [WIP] Starting contacts into flow via ElasticSearch Jul 16, 2019
@nicpottier nicpottier changed the title [WIP] Starting contacts into flow via ElasticSearch 🚧 Starting contacts into flow via ElasticSearch Jul 25, 2019
@nicpottier nicpottier changed the title 🚧 Starting contacts into flow via ElasticSearch 🔍 Starting contacts into flow via ElasticSearch Jul 26, 2019
@nicpottier nicpottier requested review from rowanseymour and removed request for dodobas July 26, 2019 18:51
ivr/ivr_test.go Outdated
@@ -32,10 +32,10 @@ func TestIVR(t *testing.T) {
db.MustExec(`UPDATE channels_channel SET channel_type = 'ZZ', config = '{"max_concurrent_events": 1}' WHERE id = $1`, models.TwilioChannelID)

// create a flow start for cathy
start := models.NewFlowStart(models.Org1, models.IVRFlow, models.IVRFlowID, nil, []models.ContactID{models.CathyID}, nil, false, true, true, nil, nil)
start := models.NewFlowStart(models.Org1, models.IVRFlow, models.IVRFlowID, nil, []models.ContactID{models.CathyID}, nil, "", false, true, true, nil, nil)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor is getting a bit out of hand, going to tweak it to do With() for recipients.

WithContactIDs(contactIDs).
WithURNs(event.URNs).
WithCreateContact(event.CreateContact).
WithParentSummary(event.RunSummary)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

golang doesn't support using () around a block like this the way Python does. Having a trailing . is the only way to get this not reformatted that I could figure out. Slightly weird but I bet we get used to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

been doing this in goflow for a while with its builder classes, except with NewFlowStart(..) on the same line as models. and only the With lines indented... which seems marginally less weird, maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I started that way too but then that first list of required args is kinda long and seemed more readable this way to me. 🤷‍♂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO looks a bit weird to split a package and symbol. What if DoRestartParticipants was also WithRestartParticipants(bool) and then you also don't have to add new bool types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went back and updated those.

type IncludeActive bool

const DoIncludeActive = IncludeActive(true)
const DontIncludeActive = IncludeActive(false)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Death to bool parameters!

@nicpottier
Copy link
Member Author

So random thoughts on allowing starts via query..

One thing we have heard is that it would be nice to be able to start a flow for a group of contacts and exclude some contacts, specifically contacts which have a WhatsApp URN. (or maybe more precisely those that would be sent to using a WhatsApp URN, but lets ignore that specificity for this discussion)

query as I've added here is used as a union of the other fields, the same way that you can have both contact_ids and group_ids for a FlowStart and we will start the union of those contacts. (uniqued by contact_id)

I think that's right, but it does not allow for the scenario above.

Two ways we could accomplish that:

  1. Have an exclude_query field that basically subtracts contacts from the set of contacts that will be started in a flow. Seems a bit weird but would be simple enough to add.
  2. Change all our operations to turn into a query. IE, if you say you want to start Bob and all Reporters in a flow, then we turn that into id=134 OR group=134-1234-1234-1234-1234 and we just set the query field. That would change to basically say that all starts now go through Elastic and are treated the same.

Option 2 above is appealing from a number of code-paths perspective (though honestly the code path is not very complicated on the mailroom side either way). But honestly we would probably want to continue to include contact_id and group_id relations on FlowStart just so that we can logically represent things to the user. (or maybe not and we say we are educating them by showing only a query instead of the contact id and group they entered)

Option 2 also doubly makes us depend on our indexes being not too stale and actually correct. This PR is already making that more the case and honestly Elastic seems to have treated us pretty well in that respect, but adding it as a dependency for every single start feels kinda scary.

Thoughts? @rowanseymour @ericnewcomer @norkans7 ?

@ericnewcomer
Copy link
Member

I don't think I hate a query being the canonical form for flow starts. In a way it might actually be clearer to the user when we ultimately surface flow starts. ie, that the selection of contacts was a point-in-time that matched a query instead of it just being a group list. I mean, it is the same thing, but in a way I can see that feeling like a more honest visualization of what happened.

So, in the whatsapp case, is the thinking they would create a query that is in this group AND not in this other group? As an aside, kind of like the idea of surfacing in group queries, but having them be uuids isn't particularly exciting.

@nicpottier
Copy link
Member Author

Ya, we'd have to come up what our syntax is like for group queries and that could be using a group name:
age=10 AND group = "U-Reporters"
Or
age = 10 AND group != "U-Reporters"

Seems mostly clear..

But ya, the case above for whatsapp would then be something like:

group = "U-Reporters" AND whatsapp = ""

@ericnewcomer
Copy link
Member

Then of course they change their group name, doh. Maybe that's okay for that link to be tenuous in the flow start case. Assuming we wouldn't allow in-group querying for general contact search (or at least disable dynamic group creation from those queries).

@nicpottier
Copy link
Member Author

Ya, maybe we rewrite to UUID and store that version in our own queries?

@rowanseymour
Copy link
Contributor

👿 devils advocate: you don't need to query on groups.. every group should be dynamic anyway.. backed by a query on fields.. everything is fields

@nicpottier
Copy link
Member Author

I don't necessarily hate that, believe Vumi did that. It doesn't feel super friendly for users though, so feel it may require us hiding this from them in the simplest case. But ya, then suddenly you get to do boolean operations on all your groups which is powerful.

models/assets.go Outdated
}

// is this an implicit field?
if key == contactql.ImplicitKey {
Copy link
Contributor

@rowanseymour rowanseymour Jul 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually allow implicit conditions (Bob instead of name = Bob) in dynamic group queries - you can search like that but you can't save that raw query as a group. Thus goflow's evaluation of groups errors if there's an implicit condition. Reason for that is that the RP UI should rewrite any implicit conditions anyway - so searching for Bob gets rewritten as the explicit condition name ~ "Bob". Seems like it would be reasonable to do same with these kinds of queries and then you don't have to worry about implementing implicit queries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's true for dynamic queries but didn't see any reason why this part of the library shouldn't support it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't we rewrite these queries same as we do for contact search?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to be clear, it's not just implicit conditions we rewrite away, it's also implicit ANDs, e.g. Bob 078245353 becomes name ~ "Bob" AND tel ~ 078245353

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think we will totally pass in preprocessed ones. But this module only concerns itself with taking a contactql query and turning that into something elastic supports. Implicit queries are a thing in contactql, so seemed correct to support them.

Yes it is a superset of what we will absolutely need, but it is implemented and tested so don't see the plus in ripping that out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tho actually looking at the Python contactql parser, it always switches out an implicit condition with an explicit id, name or tel ones in the visitor, so if we want to be consistent, goflow's visitor should also do that, and then there is no contactql.ImplicitKey.. this code can't happen.

The only reason goflow's implementation doesn't already do that is we've been assuming it would never encounter an implicit condition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the question is what is ContactQL? Does ContactQL support implicit queries? If so, then I would posit everything that takes contactql should support implicit queries. If not, then that should be an error when parsing contactql and it is the job of RapidPro to turn implicit looking things into valid ContactQL.

I'm fine with either, though to me it seems useful to have implicit queries be part of ContactQL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicit conditions are certainly a part of contactql - but our Python implementation swaps them out with explicit ones at the visitor level so the evaluator has no concept of an implicit condition. No reason for goflow to not do likwise.

Also noticed that goflow's parser is wrong for implicit keys - it sets the operator to = instead of ~.. it's never been used 🤷‍♂

Have made nyaruka/goflow#742 which I think replicates the behavior of the Python parser

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, so then every place that takes ContactQL can take any ContactQL, which I think is the right thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok just made goflow v0.44.3 which should in theory match the python parser!

search/search.go Outdated

if field.Key == NameTel {
if number != 0 {
return elastic.NewNestedQuery("urns", elastic.NewMatchPhraseQuery("urns.path", number)), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above re implicit conditions, but doesn't this need to be restricted to tel urns?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes indeed, will update.

search/search.go Outdated
var query elastic.Query

if field.Category == Implicit {
number, _ := strconv.Atoi(c.Value())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this match the logic of https://github.com/nyaruka/rapidpro/blob/5814c169e031656c1f65e97348e382d90c496c54/temba/contacts/search/parser.py#L1053 ?

i.e. match a phonenumber regex, then do a bit of normalization?

// FieldRegistry provides an interface for looking up queryable fields
type FieldRegistry interface {
LookupSearchField(key string) Field
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this and the interface above are only needed because assets.Field doesn't have a UUID parameter. (and we don't have a FieldUUID in goflow either)

How do you feel about adding that? Would make this all fall away to just one field resolver func.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a moment to remember what field UUID is used for.. but yeah we could add it to assets.Field

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

models/fields.go Outdated
@@ -15,7 +15,7 @@ import (
type FieldID int

// FieldUUID is our type for the UUID of a field
type FieldUUID uuids.UUID
type FieldUUID = uuids.UUID
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a hack to get around the circular dependency between models and search. If we had a FieldUUID in assets then wouldn't need this.

(spent a few hours trying to refactor all ids and uuids in models to an ids package but it just got out of control, when do we get decent golang refactor tools again, fuck modules)

@nicpottier nicpottier requested a review from rowanseymour August 7, 2019 18:52
}

fieldQuery := elastic.NewTermQuery("fields.field", field.UUID())
fieldType := field.Type()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so no need to look up anything to do with fields if we add UUID right? FieldType is also on contactql.Condition

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, ya, then we'd have everything we need to build these queries in assets.Field

@nicpottier
Copy link
Member Author

@rowanseymour updated, oh god please no more changes. :)

key := c.PropertyKey()

if c.PropertyType() == contactql.PropertyTypeField {
field := resolver(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed? haven't we already verified that the fields exist during parsing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry I get it now - there's still no field UUID on the condition

Copy link
Contributor

@rowanseymour rowanseymour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't say I'm familiar with the ES API to know if that's all right, but rest looks good.

Any way to verify if this works the same as the ES stuff in RP ?

@nicpottier
Copy link
Member Author

I've run these queries against Elastic and it is pretty much a translation of the Rapid queries so I think they should be good, won't know for sure until we try!

@nicpottier nicpottier merged commit a72877a into master Aug 8, 2019
@nicpottier nicpottier deleted the elastic branch August 8, 2019 15:48
rasoro pushed a commit to Ilhasoft/mailroom that referenced this pull request Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants