feat(cohorts): Backwards compatibility of groups and properties #9462

neilkakkar · 2022-04-20T10:26:45Z

Two things in this PR:

Proof of concept of properties refactor. Instead of having a new property type for each kind of behavioural filter, I want to incorporate these into one. The reason is that earlier, everywhere in the app that wanted to discard a property type simply had to do something like: prop.type != 'person'. Now, this basically has to be prop.type not in (all-possible-behavioural-filters-list), which is a bit annoying and hard to maintain. Thus, changing this to prop.type != 'behavioural', which will always encompass all behavioural types we add.
Retire groups for cohorts, use property groups instead. This needs to interlink with the new query to be complete, but the current slice can go in independently. Basically, groups will keep working as before, but our backend should start using cohort.properties for all computations. We can test that everything is okay when removing cohort.groups doesn't break anything (however, we won't remove it actually, to support migration of frontend)

Changes

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

How did you test this code?

ee/clickhouse/queries/cohort_query.py

ee/clickhouse/queries/test/test_cohort_query.py

posthog/models/property.py

EDsCODE

A few questions to address. Otherwise lgtm

neilkakkar · 2022-04-26T16:09:52Z

Suspiciously flakey, this test: ee/clickhouse/models/test/test_cohort.py::TestCohort::test_cohortpeople_prop_changed -- https://github.com/PostHog/posthog/runs/6179098641?check_suite_focus=true

I thought it was an inter-dependency issue, but it failed again once on newer CH. Hmmm, want to get to the bottom of this before I merge this PR

rcmarron

Looking good! A few thoughts in the comments

rcmarron · 2022-04-26T19:21:50Z

posthog/models/cohort.py

+                                    time_value=group.get("days"),
+                                    operator=group.get("count_operator"),
+                                    operator_value=group.get("count"),
+                                    negation=group.get("count") == 0 and group.get("count_operator") in ["lte", "eq"],


Because we aren't fully backward compatible, we should think about how we should fail here. Right now, there's validation in the query class that raises on the following cases:

The negation field is only if it's in an AND group and not the first item

The "count" value is never allowed to be 0. This applies to all operators (=, >=, and <=)

Do we want to make a best effort at recreating the old groups with properties and let the query fail and front-end validation show the error? If so, I'd lean toward just setting the operators + values and not touching the negation, but if we want to touch the negation, then do we also need to reverse the operators (e.g. =0 -> !performed_event)

Hmm, sure thing, I won't touch negation here then, since there's no way, given new constraints, it will work for any existing old group, as far as I can tell.

Hmm, does raise this question for me, that why do we have these new restrictions on negations?

The main reason is that it allows users to do confusing things that they probably don't intend - and the result is a giant cohort (which isn't great).

For example, if you say "users who never performed an insight analyzed" on our account, you're going to get a result that includes every user who ever visited the marketing page. While this is technically correct, it probably isn't what the user wants.

The user can still get to the same result by saying "users did $pageview AND who never performed an insight analyzed", but they have to take the one further step showing they understand what they're asking for.

posthog/models/cohort.py

posthog/models/property.py

posthog/test/test_cohort_model.py

rcmarron · 2022-04-26T19:50:44Z

posthog/test/test_cohort_model.py

@@ -34,14 +34,16 @@ def test_insert_by_distinct_id_or_email(self):
    @pytest.mark.ee
    def test_calculating_cohort_clickhouse(self):
        person1 = Person.objects.create(
-            distinct_ids=["person1"], team_id=self.team.pk, properties={"$some_prop": "something"}
+            distinct_ids=["person1"], team_id=self.team.pk, properties={"$some_propX": "something"}


🤔 What's going on here? Does the test pass if the prop is $some_prop?

posthog/test/test_feature_flag.py

ee/clickhouse/queries/cohort_query.py

EDsCODE · 2022-04-26T20:31:32Z

Suspiciously flakey, this test: ee/clickhouse/models/test/test_cohort.py::TestCohort::test_cohortpeople_prop_changed -- https://github.com/PostHog/posthog/runs/6179098641?check_suite_focus=true

I thought it was an inter-dependency issue, but it failed again once on newer CH. Hmmm, want to get to the bottom of this before I merge this PR

I added a freezetime clause using relative time to try to address this because that was the main difference with the test on master. It was explicitly a day apart before.
I also noticed something that could be noteworthy. If you look at the generated query before these changes and after, we're missing a few joins that might have been helping ensure the result is correct. For example, previously, the spaghetti of JOINs included distinct_id2 joins and a "person_max" join.

I need to think more on this as I'm not quite sure how either point above can cause the flakiness yet

Addressed

* master: (137 commits) feat(cohorts): add cohort filter grammars (#9540) feat(cohorts): Backwards compatibility of groups and properties (#9462) perf(ingestion): unsubscribe from buffer topic while no events are produced to it (#9556) fix: Fix `Loading` positioning and `LemonButton` disabled state (#9554) test: Speed up backend tests (#9289) fix: LemonSpacer -> LemonDivider (#9549) feat(funnels): Highlight significant deviations in new funnel viz (#9536) docs(storybook): Lemon UI (#9426) feat: add support for list of teams to enable the conversion buffer for (#9542) chore(onboarding): cleanup framework grid experiment (#9527) fix(signup): domain provisioning on cloud (#9515) chore: split out async migrations ci (#9539) feat(ingestion): enable json ingestion for self-hosted by default (#9448) feat(cohort): add all cohort filter selectors to Storybook (#9492) feat(ingestion): conversion events buffer consumer (#9432) ci(run-backend-tests): remove CH version default (#9532) feat: Add person info to events (#9404) feat(ingestion): produce to buffer partitioned by team_id:distinct_id (#9518) fix: bring latest_migrations.manifest up to date (#9525) chore: removes unused feature flag (#9529) ...

EDsCODE and others added 30 commits April 14, 2022 13:28

add fields to property

a2cfcc6

add validatoin

551fad8

fix naming'

b13f25e

fix errors

c1a5884

example

75d70aa

example implementations

e1069b4

remove none

2f976f8

more typing

59892a6

change condition

593c14e

add terrible draft of lifecycle query

693e25a

move around date query

5e76c2c

one random test to satisfy

7383efa

add funnel persons subquery

6534f00

basic func

2a43740

use key as event

fe85668

use key as event

33a1d07

merge base branch

bf4b76b

change condition

2dc7eca

change to countif

db1c6ff

add base query conditions

cd44a29

add comments

8dc1e59

condition building

c98cd41

person props

f245c61

param cleanup

10c5d4d

basic test

27b492e

stub tests

37fe717

remove unnecessary funcs

f91ddba

merge new

2cf01b5

adjust typing

fe3b652

wip

fe685cd

EDsCODE reviewed Apr 26, 2022

View reviewed changes

ee/clickhouse/queries/cohort_query.py Outdated Show resolved Hide resolved

ee/clickhouse/queries/test/test_cohort_query.py Show resolved Hide resolved

address comments

7d2dee4

EDsCODE reviewed Apr 26, 2022

View reviewed changes

posthog/models/property.py Show resolved Hide resolved

remove properties select in cohort removal query

97ae9d5

EDsCODE reviewed Apr 26, 2022

View reviewed changes

address comment

4ea6ac0

EDsCODE added 2 commits April 26, 2022 15:03

update snapshots

0d835fa

add freeze time

a019949

rcmarron previously requested changes Apr 26, 2022

View reviewed changes

neilkakkar added 13 commits April 27, 2022 12:16

fix some tests

80c2e63

remove the Xs

00807bb

more test fixes and clean up

3337d9c

raise on cyclic dependencies instead

a6f6a9c

fixes

3775859

test waters with parallel execution

1b3052f

merge master resolve conflicts

b5310c1

fixes

18feee6

update tests

d25a85a

final test

68bacf2

clean up

452ba39

more test fixes

3937274

gahhhhh

0f23eb4

rcmarron self-requested a review April 27, 2022 19:59

rcmarron approved these changes Apr 27, 2022

View reviewed changes

EDsCODE merged commit e531d0d into master Apr 27, 2022

EDsCODE deleted the cohorts-new-model branch April 27, 2022 20:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cohorts): Backwards compatibility of groups and properties #9462

feat(cohorts): Backwards compatibility of groups and properties #9462

neilkakkar commented Apr 20, 2022

EDsCODE left a comment

neilkakkar commented Apr 26, 2022

rcmarron left a comment

rcmarron Apr 26, 2022

neilkakkar Apr 27, 2022

rcmarron Apr 27, 2022

rcmarron Apr 26, 2022

EDsCODE commented Apr 26, 2022

feat(cohorts): Backwards compatibility of groups and properties #9462

feat(cohorts): Backwards compatibility of groups and properties #9462

Conversation

neilkakkar commented Apr 20, 2022

Changes

How did you test this code?

EDsCODE left a comment

Choose a reason for hiding this comment

neilkakkar commented Apr 26, 2022

rcmarron left a comment

Choose a reason for hiding this comment

rcmarron Apr 26, 2022

Choose a reason for hiding this comment

neilkakkar Apr 27, 2022

Choose a reason for hiding this comment

rcmarron Apr 27, 2022

Choose a reason for hiding this comment

rcmarron Apr 26, 2022

Choose a reason for hiding this comment

EDsCODE commented Apr 26, 2022