Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Paths Query #5646

Merged
merged 19 commits into from
Aug 20, 2021
Merged

New Paths Query #5646

merged 19 commits into from
Aug 20, 2021

Conversation

EDsCODE
Copy link
Member

@EDsCODE EDsCODE commented Aug 18, 2021

Changes

Please describe.

  • use arrays to create paths
    If this affects the frontend, include screenshots.

Checklist

  • All querysets/queries filter by Organization, by Team, and by User
  • Django backend tests
  • Jest frontend tests
  • Cypress end-to-end tests
  • Migrations are safe to run at scale (e.g. PostHog Cloud) – present proof if not obvious
  • New/changed UI is decent on smartphones (viewport width around 360px)

@timgl timgl temporarily deployed to posthog-pr-5646 August 18, 2021 21:50 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-pr-5646 August 19, 2021 01:27 Inactive
def _determine_should_join_distinct_ids(self) -> None:
self._should_join_distinct_ids = True

def _determine_should_join_persons(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why override this? The implementation is identical (with entity props not accounted for but these would be blank) to main events query.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entities don't exist on the PathFilter class at all. This inheritance / deciding how filters should be created needs a bit more thought - feels too messy right now (and mypy screams a lot too lol)

@timgl timgl temporarily deployed to posthog-pr-5646 August 19, 2021 12:24 Inactive
@@ -14,15 +15,15 @@ class ClickhouseEventQuery(metaclass=ABCMeta):
EVENT_TABLE_ALIAS = "e"

_PERSON_PROPERTIES_ALIAS = "person_props"
_filter: Filter
_filter: Union[Filter, PathFilter]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty much the wrong way to go about this, judging by how lots of things outside (like ColumnOptimizer, EventQuery, breakdown props, etc.) depend on Filter existing.

Maybe some makes sense to have PathFilter inherit from Filter instead of BaseFilter. This has problems as well, since mixins specific to PathFilter can't do anything in ColumnOptimizer / EventQuery

A better way out can be to only have this gigantic Filter class that has Funnel and Path specific mixins. This way, everything else just checks if the property that matters to itself exists, and does what needs to be done.
Downside is that possible parameters for Filter object for Paths become a bit unclear. Too many, which one to use?

But, I guess that's okay for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think on this more. The typing is really weird right now. I actually might need to check again if there's some conflicts/unintended bugs

@neilkakkar
Copy link
Collaborator

Another open question: I don't yet see multiple Paths classes existing, since most functionality is just an extra operation here and there on the basic path query. So, worth just having one ClickhousePath class? (vs ClickhousePathBase only) - can add it later if we reach that stage?

@EDsCODE EDsCODE temporarily deployed to posthog-pr-5646 August 19, 2021 16:58 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-pr-5646 August 19, 2021 19:20 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-pr-5646 August 19, 2021 19:40 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-pr-5646 August 19, 2021 19:49 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-pr-5646 August 19, 2021 19:57 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-pr-5646 August 19, 2021 21:19 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-pr-5646 August 20, 2021 03:21 Inactive
@EDsCODE EDsCODE temporarily deployed to posthog-pr-5646 August 20, 2021 03:29 Inactive
@EDsCODE EDsCODE marked this pull request as ready for review August 20, 2021 04:39
@EDsCODE
Copy link
Member Author

EDsCODE commented Aug 20, 2021

Ready for another look. @neilkakkar feel free to change the paths class/file naming

@EDsCODE EDsCODE requested a review from neilkakkar August 20, 2021 04:40
@timgl timgl temporarily deployed to posthog-pr-5646 August 20, 2021 12:35 Inactive
self.assertNotIn("json", query.lower())

self.test_current_url_paths_and_logic()

def test_paths_start(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty suspect, trying to figure out why we have the off by one error in the new query. Judging from how the data is setup,

{"source": "1_/pricing", "target": "2_/about", "value": 1}

is incorrect, the value should be 2, like in the postgres test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paolodamico @kpthatsme @marcushyett-ph When we talk about setting a start point for paths, should the start point define the beginning of the session that a person is in or should it include paths after the start point even if it's in the middle of the session? For example, if one user has path A -> B -> C and another has B -> C and out start point is B. Would we count both of them or just the second person because the second person starts at B

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading your question right–

I think the path we show should start from whichever event the user is saying is their starting point.

For example, I may want to know everything that happens after a user ingests data:

So the path should only show the journey starting with the "user ingested data" event.

In your example case, I think we'd include both of these users - both starting from B onwards

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense. We wanted to make sure there's no a significant need/possible confusion for the other method where start points are defined as exact starts for a person session. I think the use case you described is much more useful though and it's how our current paths functionality works so we'll stick with that

@timgl timgl temporarily deployed to posthog-pr-5646 August 20, 2021 13:06 Inactive
@neilkakkar
Copy link
Collaborator

I updated how start point filtering works: ec3a8bb

It's... not the prettiest, and I didn't spend much time thinking about how to extend this, but I feel that can wait for a separate refactor. Right now, aiming to get this merged in, where we've exactly replaced old functionality. (all same old tests pass as expected)

Waiting for you to review this bit^, but the rest looks good to me 👍

self.assertTrue(response[2].items() >= {"source": "2_/", "target": "3_/about", "value": 1}.items())
self.assertTrue(response[3].items() >= {"source": "2_/about", "target": "3_/pricing", "value": 1}.items())
self.assertTrue(response[4].items() >= {"source": "3_/pricing", "target": "4_/help", "value": 1}.items())
self.assertTrue(response[0].items() == {"source": "1_/pricing", "target": "2_/", "value": 1}.items())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertEquals? :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that reminds me, I have no idea why we had the >= instead of ==. Will fix!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres version has extra elements, hence >= and assertTrue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertDictContainsSubset?

@timgl timgl temporarily deployed to posthog-pr-5646 August 20, 2021 13:50 Inactive
@timgl timgl temporarily deployed to posthog-pr-5646 August 20, 2021 14:02 Inactive
@EDsCODE EDsCODE changed the title WIP: new paths query New Paths Query Aug 20, 2021
@EDsCODE EDsCODE merged commit 06c5c6f into master Aug 20, 2021
@EDsCODE EDsCODE deleted the new-paths-query branch August 20, 2021 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants