Ideas to tackle ~Universal Taxonomy~ Automated Insights #8261

neilkakkar · 2022-01-25T14:23:10Z

Just jotting down some ideas as I come up with them / as I read things on the internet / clarify my own thinking around this. Feel free to add your own, we'd probably need to try & mix and match a few approaches to get to something usable!

Adapting from #8094 , there are 3 problems to solve:

Do businesses in the same verticals (however you define the verticals), use product analytics the same way? i.e. is a general classification possible?
If yes, can we map the custom & autocapture events they create to this classification accurately?
If yes, can we surface useful insights in PostHog that they haven't already thought of?

Some things to consider:

We don't really care about (1) and (2). The goal is simply (3). It's possible to reach (3) without doing (1) and (2), by say, using a more fluid approach than a hard taxonomical classification. (Don't know how this would work yet, just something to keep in mind)
I think to effectively do (3), not only would we want to map events to a model, but also event properties. For example, a subscribed event would likely have a price or amount property, and showing users they can "track daily revenue" vs just number of people who subscribed is where the magic happens. The latter is easy to figure out, the former, not so much!
??

Problem 1 solutions

My gut feel here is yes: most companies with the same business model look the same, do the same things, and earn money the same way. Thus, the events they track should be similar.

What's interesting to me here is that these companies can be in different industries: You can have a health subscription service, or a SaaS, both of which would have very similar events: subscription (started | cancelled) and amount props. By contrast, a health insurance company might have things like: bought product with product type: A as a property (spitballing here).

So, I propose we divide verticals by business models instead of industries. (Before going this route, actually check our data if we can confirm this hypothesis or not)

I may be oversimplifying, and there may be other variables that are also important, but I feel figuring these out would make things a lot clearer.

Choosing the right division here is important, because it can make the next problem impossibly hard to easy.

Problem 2 solutions

There's two parts to this problem: (2.1): What does our internal model for this vertical look like? and (2.2): How do we map user events to this internal model.

We need both to be distinct, since we use (2.1) as a generator for solving (3).

Generic Word Embeddings

We can represent every word by a 200-300 dimension vector. Lots of generic trained models exist. Any two events whose some measure of distance, like Euclidean in this vector space is less than epsilon map to the same thing.

So, given a representation for (2.1) (perhaps manually choosing words), we should be able to solve (2.2) using these word embeddings.

We shouldn't train our own embeddings, as (I think) that's a losing battle, hard to get right, and not worth it for the MVP.

It's easier to find generic word embeddings vs embeddings specific to a field, but I expect results to be better when we use specific embeddings for a specific field: they map domain words better.

We should try testing both kinds, to see what works.

Probability I think this will work: Moderately high

Automatic taxonomy creation

There's lots of interesting methods to generate taxonomies. Why not use these to generate a model (2.1), and use it to predict which custom property goes where? (2.2).

This definitely scales better than manually doing (2.1), but runs into a new problem: How do we map this model to smart insights? For example, the taxonomy created might focus instead on different disease classification, vs. events coming into PostHog.

Similar arguments can be made for ontology creation.

However, I think we can take inspiration from these techniques, and figure out something that works for us.

Probability I think this will work: Low

Text matching

There's no reason we have to solve all the hard parts via code. We could manually build a taxonomy of what events should look like for a vertical (assuming we've solved problem (1) well). And encourage companies to adhere to these guidelines: call your events like we tell you to.

This makes (2.2) very easy: we know apriori what's coming in!

(2.1) is hard though. Do we know enough about industries to do this manually?
Further, How do we tell oura to not go with the health-industry taxonomy, but with the SaaS-taxonomy?
And, mucho friction, as industries change / businesses grow / their business models change, and this feature goes to trash. Maybe.

But anyway, I think we should definitely attempt this once, just to understand the edge cases better: When/why would businesses not want to track events like so, etc. etc.

Probability I think this will work: Moderate

Text matching without training

It's like the above, but what if we assume, given we select the verticals properly, most users will call their events similarly?

This removes all the icky bits from the above method, and just keeps the easy bits.

Probability I think this will work: Moderate, if (1) is solved well. Low otherwise.

Problem 3 solutions

Given we have a model (2.1), we should be able to create all important insights manually (and thanks to ideas from companies in the same model vertical).

Not sure about the effort this will take, and whether we'll surface interesting things. But, I suspect this will atleast level the playing field: Here's the basic things every company in this vertical looks at, which can be valuable enough.

Some really out there solutions:

Random Insights

What if, instead of doing the hard work of creating a taxonomy, we randomly suggest insights based on events & properties data coming in? Of course, there needs to be some structure, AND, we can do some pruning based on prelim results, like a chess engine / A* search algorithm (need to define the problem better for search, but you get the idea)

So, you generate random insights, and discard any for which the result is 0. Then we have heuristics to prune certain combinations, like, say, "if conversion rate below 1%, probs not useful". We'll need to play around a lot to figure these out, but idk, might do better.

(I mean, if this does better than solving (1) and (2), we know our models are pretty shitty, a.k.a the problem is very hard 😂 )

Probability I think this will work: Moderate-low

Neural Net all the things!

This is a surprisingly well defined problem to attack via machine learning: You have a set of event with properties, persons with properties, and the output is a list of tuples: the insight type, and the events/actions in the list.

Actually, we could possibly use GPT-3 here! If it can generate code, it can generate filter objects! We just need to prompt several good examples of meaningful filter objects, given events & properties. (every filter object uniquely maps to an insight)

I think GPT-3 would definitely work better than training our own neural nets. (because training is hardddd, needs lots of data, etc. etc.)

Hmm, now that I think about it, this might be the most promising approach, barring concerns with using an external API.

Probability I think this will work: High

cc: @marcushyett-ph @EDsCODE

The text was updated successfully, but these errors were encountered:

marcushyett-ph · 2022-01-25T15:18:42Z

Thanks for this @neilkakkar really comprehensive!

+1 on (3) being the goal - if you find a better approach than 1 + 2 to achieve it, that'd be awesome - but keep in mind we might want to leverage this to solve the search problem eventually, so we shouldn't solve 3 at the expense of making progress in a direction that precludes search
Business models division makes sense to me, we want to reduce complexity of the problem space as much as possible, it might be tricky to identify the business model though
Generic Word Embeddings: It feels this could be an interesting way to skip 1 + 2 and go straight to 3? My main concern with this approach is that for this to be successful I feel precision is paramount, low-recall is not really an issue (and this approach will likely have lower precision than text mapping to a fixed taxonomy) (keen to be challenged on this)
Automatic Taxonomy Generation: I've worked with complex taxonomies a few times - never seen this work before - good taxonomy is a delicate mix of art and science
Regarding Problem 3: Both of these approaches seem quite extreme, for option the neural net, we'd need to do some kind of feature engineering (which is likely the time consuming bit), I would advocate for focussing on getting some decent features and using a simple linear regression model to rank insights, some example features (this way we'll understand how it works, more than with a NN):
- Number of data points (e.g. <1 per day is useless, lots will give more confidence to the result)
- Number of users affected (e.g. everything done by a single user is meaningless)
- Number of times a similar insight has been analyzed (e.g. by other users)
- Number of times a similar insight has been shared
- (Something about the insight itself having something interesting to interpret e.g. it's not just a boring flat line, or spikey noise - some kind of sustained trend is likely interesting)
- (Some kind of user feedback (e.g. this insight was junk)

Off the wall idea
Solve everything with "Insight embeddings", create an embedding which is composed from dimensions like the ones listed above and the names of the events / actions used. We can then use these embeddings to generate suggestions for insights based on ones you've and others have used previously. Or also create an embedding for the user or organizations based on their preferences for insights and find the closest insights to these preferences.

EDsCODE · 2022-01-25T17:41:45Z

Problem 1 + Business model segementation

We should stay open to cutting across the behavior on a different axis. Looking at business types seems to generalize what could be very different companies. A fintech for small business back office could be very different than a fintech for consumers. However, it seems a fintech for small business back office would have similarities to a healthtech EHR management software. We would then develop workflows for effectively tracking a category of feature patterns: onboarding flows or CRM-like interfaces. (This would also address the problem in "Text matching" where companies would bucket themselves into an industry but into feature patterns)

Problem 3

If we develop an internal taxonomy for events, collaboration (comments, names, descritptions) data could be good passive data to eventually train models off of that will produce useful insights for new users.

~~@marcushyett-ph Could you elaborate more on this?~~

General thoughts

I find that the core of the problem with automating right now, especially with any of the ML based techniques is that we don't have a measure of what "useful means". RE: @marcushyett-ph's points here could be a good direction in understanding what users find useful. We do have the history of insight analysis so there's some per team analysis behavior that could be parsed.
Our volume of data is very small. Looking beyond sheer event volume we have 380,000 insight analyses and 95,000 events.
Because of (1) and (2), a direction that would be guaranteed productive is figuring out if users would prefer to have instrumentation patterns and providing these templates. It would allow us to collect data in a structured way where we have some idea of the relationship of events
Another more subtle step of automation could be automating within the bounds of someones event/action insight definitions. So, if a trend graph was created for event/action X, we could background analyze the stickiness of that action and if the user has not considered it before + the stickiness is "significant", we could suggest it. Another example, if a user defined a funnel, we could preemptively analyze the retention/churn of those users who converted in the funnel and suggest this (along with property configurations). Basically, a superpowered funnel correlation.

marcushyett-ph · 2022-01-25T17:55:03Z

@marcushyett-ph Could you elaborate more on this?

Not quite sure I follow - what would you like me to elaborate more on?

EDsCODE · 2022-01-25T18:01:26Z

Sorry, ignore that! I erased the rest of that thought

EDsCODE · 2022-01-25T20:37:54Z

Here's a more extensive description of point 4 from above.

Problem layers

can we surface useful insights in PostHog that they haven't already thought of?

This got me thinking about how we use posthog right now and what would be an immediately useful insight to know. I found the problem space to have two layers.

Open field: Our current ideation of automated insights is at the formulation level where we want to suggest a completely novel insight to look at, automatically defining the events, properties, and orderings that might be relevant.
Context based suggestion: When a user has selected an event/events to define, there's a wide surface area of analytics that we haven't tapped into. We made some steps with Nailing Diagnosing causes and interconnectivity of insights however we've left most of this functionality as up to the user to discover.

Useful insights: Deep dive

While suggesting novel insights would be a way to discover new, possibly useful insights, suggesting related insights to the one that's being viewed could be a way to suggest surefire, useful insights. For example, if I'm analyzing event X, it almost always would benefit me to know if the retention/churn of this event usage is above or below average from other events i'm tracking. Performing more correlation analysis would be helpful too. When looking at event X, I'd like to know if there's some weirdly higher rate of stickiness for people performing the event on Day A vs Day B.

We could refine some of the flows to be more impactful so when I'm analyzing a three step funnel A->B->C, we could preemptively search for a different step B that's resulting in a really high conversion to C (or really low).

We could still apply all the dimensions mentioned above by @marcushyett-ph to decide on what to surface

Benefits

Problem definition is less ambiguous. We don't need to guess or train models for context on what an event is supposed to mean
The automation to surface useful related insights will be useful in building an overall automation as we collect more information
Will continue to promote interconnectedness theme

Drawbacks

computationally intense
doesn't quite create a taxonomy yet

marcushyett-ph · 2022-01-26T09:30:49Z

@EDsCODE this is very similar to what @clarkus mocked up in the post-search experience in this demo

I see a lot of potential options here, how are we going to prioritize against these for the rest of this sprint?

neilkakkar · 2022-01-26T13:28:18Z

Was expanding scope to cover whatever we can think of, but yeah, makes sense to rule things out now & prioritise.

Generic Word Embeddings: It feels this could be an interesting way to skip 1 + 2 and go straight to 3? My main concern with this approach is that for this to be successful I feel precision is paramount, low-recall is not really an issue (and this approach will likely have lower precision than text mapping to a fixed taxonomy) (keen to be challenged on this)

Actually, the way I was thinking of this was (which gives it moderately-high probability): There's a fixed taxonomy (2.1), and we use word embeddings to solve (2.2), i.e. the text mapping bit. I think this will have higher precision than naive string matching because words with similar meanings in this space map to the same fixed taxonomy word.

Example: User enrolled & User login can both map to our taxonomy: USER_LOGIN, while with text matching, we'd only get user login

If we use these embeddings to do everything, then agreed, precision goes down.

Automatic Taxonomy Generation: I've worked with complex taxonomies a few times - never seen this work before - good taxonomy is a delicate mix of art and science

I have no personal experience, but does sound hard to get right.

I would advocate for focussing on getting some decent features and using a simple linear regression model to rank insights, some example features (this way we'll understand how it works, more than with a NN)

Agree, definitely better for understanding.

I'd say the best way to solve problem 3, given you've solved problem 2 is manually generating the kind of insights based on the taxonomy, which then creates this set of possible insights.

What I suggested + your ideas are definitely possible, but I'd say doing things manually first would give us a better grasp on what features are important. For MVP, I propose solving (3) manually.

collaboration (comments, names, descritptions) data could be good passive data to eventually train models off of that will produce useful insights for new users.

Interesting, how do you imagine this will work? Or, what useful data do we get out of this? I imagine it being useful for us manually testing out ideas "oh, this team does this, which is cool, let's put it as a possible insight into our taxonomy", but don't see (yet) how this would work for training data?

I find that the core of the problem with automating right now, especially with any of the ML based techniques is that we don't have a measure of what "useful means".

Agreed! And also on the direction you're proposing.

neilkakkar · 2022-01-26T14:01:11Z

Also, it sounds like we have slightly different ideas in our heads about what 'taxonomy' is, so I suggest we taboo the word and replace it with "what we mean" when we chat next xD

marcushyett-ph · 2022-01-26T14:25:00Z

Can you share the new word / definition you come up with here please :)?

EDsCODE · 2022-01-26T16:38:53Z

A summary of options from above conversation:

Determine an event taxonomy: Develop a way to understand similarities between verticals/features/businesses so that instrumentation patterns and analysis can be repeatable and suggestible.
- Challenge is in determining taxonomy
- The insights being shown should be simple as they would be predetermined to be "useful" for that vertical/feature
- Ref: any of the methods for taxonomy above for P1 or P2
Automated Diagnosing Causes and Interconnectedness: Generate, analyze, surface important related insights in the when a user creates an insight
- No taxonomy concern
- Will have to create a "useful" metric engine (comparing against an average, making sure there are no 0s, large enough number of persons)
Guess useful insights: Similar to (2), but this could be a step after where we don't wait for a user to start an analysis, we just take events in their system and look for "significant" information
- No taxonomy

My stance here is (2) is the lowest risk as it's an extension of some of the diagnosing causes work we've been doing. (3) is an add-on. (1) is what we should continue to validate right now: is taxonomizing a vertical/feature possible? If yes, then we build this direction and table (2). If not, we proceed with (2).

EDsCODE · 2022-01-27T21:35:54Z

Concept:

neilkakkar · 2022-01-28T08:13:49Z

What I mean by taxonomy

There's the alphabets, and the words.

The alphabets = categories of events, like LOGGED_IN, AUTHORIZED, CLICKED, REQUEST_FAILURE, SIGNUP, PAYMENT, END_OF_SESSION.

The words = insights we can possibly generate using these events.

For example, LOGGED_IN -> PAYMENT funnel is a possible "word".
So is CLICKED -> SIGNUP funnel. So is REQUEST_FAILURE trend.

Together, they make our taxonomy. The events, and the meaningful insights we can generate from these events.

posthog-contributions-bot · 2022-01-28T08:13:50Z

This issue has 2063 words at 12 comments. Issues this long are hard to read or contribute to, and tend to take very long to reach a conclusion. Instead, why not:

Write some code and submit a pull request! Code wins arguments
Have a sync meeting to reach a conclusion
Create a Request for Comments and submit a PR with it to the meta repo or product internal repo

Is this issue intended to be sprawling? Consider adding label epic or sprint to indicate this.

neilkakkar · 2022-01-28T08:47:34Z

Problem Analysis

So, I chose the quickest (to me) path through problems (1), (2), and (3), to see if we can validate them quickly, and here are my findings (feel free to challenge method/conclusion).

Thank you, OpenAI. It made quickly using word embeddings and validating things very easy.

I started with clustering on all events to come up with natural categories we could use. Half of them were useable, and I cherry picked those (the taxonomy alphabet examples above came out of this). As you'll note, these classifications are pretty generic. To categorise is to throw away information & make things legible. It's a trade-off. But, as far as taxonomies go, I think the above is reasonably okay.

This answered problem (1). Yes, categorisation is possible, and for most things, you don't even need to go to a vertical, the above taxonomy is generic enough to work well with all businesses. Of course, a more specific one would be nicer for specific industries, but I wanted to quickly get something that works as a problem validation solution.

I implicitly solved problem (2.1) by cherry picking and building "words" out of them, as mentioned in the taxonomy example above.

For problem (2.2), we turn again to WordEmbeddings. Given my taxonomy definition, which events are closest to this definition (cosine similarity of word embeddings)? This worked extraordinarily well, gathering most useful events. So, a few more heuristics here to reduce the error rate, and we're golden.

The Big Problem

The big issue comes when going from Problem (1) and (2) to (3). There's no way I could figure out where the "words" in the taxonomy led to very meaningful insights that people hadn't thought of before.

The issue here is that:

a. The taxonomy is too generic.
b. The more generic this classification, the more information it throws away about this specific project.
c. All of the useful insights lie in the information thrown away. I hypothesize that most teams would be tracking these generic insights anyway. And if they aren't, a solution that solves for (b)^^ will automatically solve this as well.

Disregarding all the other problems with word embeddings (how to make them work with self hosted / using an external API / aggregating data across PostHog instances) - some of which are solvable I think - creating a taxonomy to solve (3) doesn't seem like the right approach.

Next up, I draw conclusions from this. Lmk if you disagree. @marcushyett-ph maybe there's a better selection of "words" you can come up with that helps solve this? Sounds impossible to me, given my hypothesis, but would love to be proven wrong :D

The direction we should head in

Fleshing out this problem a bit more, what seems key is that most insights useful to a project will be generated from the data specific to the project. Or, any suggestions coming out of a taxonomy (no matter how specific to the industry) would necessarily be worse than analysing a team's data using specific algorithms(tbd), which tells them interesting things about their data.

Coupled with the idea that we want to make this work well for self-hosted instances, we should limit our data universe to the project itself.

This implies that we want to solve (3) directly. AND the ML approach based not on the data, i.e. insight results, but on generating filters / generating new insights to test would be terrible. And any ML approach that takes into consideration the results of insights seems almost impossible to compute: Basically implies you need a LOT of generated insights, which generate results, so you can then extract these results into features, and run the model on these features. Might be possible, but this sounds insane: Slowest feature generation I've ever heard of, where drift is very pronounced - metrics can easily change week over week.

This constrains the problem space well, and discards most of our initial solutions. Something like what Eric mentioned, "Automated Diagnosing Causes and Interconnectedness" is a valid approach.

It also gives us interesting new levers to pull, which also tie pretty well into collaboration (cc: @paolodamico).

What if we define useful insights to be "insights other members in your team have found useful, but you haven't seen?" - we can recommend things powered off of analysis other users have been running. We can tell peeps that others have found this insight useful.

Then, the more powerful ones are the automated diagnosing causes insights (for which Eric made that nice concept above^)

I'm pretty sure we could also do more sigma-analysis like stuff for all insights: There's atleast some low-hanging fruit here, where users are looking at almost the right thing, but may gloss over it, and just nudging them in the right direction can make hell of a difference.

marcushyett-ph · 2022-01-28T11:53:14Z

@neilkakkar Can you share me the examples of the generic insights you're talking about (I couldn't really find them in the notebook)?

So I have three opinions on generic insights.

Generic insights can be extremely valuable (if you're new), James H gets great customer feedback from helping set up fairly generic AARRR metrics.
For a taxonomy to be really valuable is needs a lot of specific nodes, e.g. If I want a red, mens, long sleeved shirt, seeing a million products that are categorized as "Top" is not particularly helpful. I would expect we'd be in the order of thousands of nodes for a good taxonomy here. We can definitely do something with ML, but I think a human expert could solve this fairly well.
The "interestingness" of insights is not determined by how generic they sound. If your number of sign-ups in Germany has dropped by 90% over-night, thats really interesting (but its a pretty generic insight) - so I believe we need to think beyond the configuration of the insight to the content to determine its "interestingness"

The direction sounds reasonable to me (we should stay really close to the collaboration folks if we take it). One concern, does this approach preclude us from solving for the search problem (e.g. I can search for any insight in plain english)? Or do you think there's another way of approaching that given what you've learned so far from clustering etc?

neilkakkar · 2022-01-28T12:04:40Z

(Will respond to (1) and (2) separately, interesting tradeoffs here, after some quick tests on Monday . (1) is interesting, but sounds like a different problem which we can solve differently. Something like a bootstrap for success. )

About (3): Isn't this better solved by something like sigma-analysis / anomaly detection / correlation analysis? (a.k.a the direction proposed?). Would you rather have one of our generic suggested insights get lucky with showing you a remarkable change, vs, us pointedly searching for anomalies and surfacing those, like in Eric's concept above?

About the search problem: Word embeddings in-of-themselves are a more state-of-the-art in the world for solving text search / document retrieval. (and take a lot less resources, given a huge-ass pretrained model). I'm pretty sure we could use these well to solve "searching for insights in plain english", and arguably better than having a taxonomy, since that brings back problem (2.2), which is mapping the natural language to the taxonomy, AND mapping existing insights to the taxonomy as well.

Edit: We also haven't yet experimented with elasticsearch/Lucene/Solr/existing open source search solutions which can have lots of sophisticated ranking algos we can experiment with. Last time I worked on search (prev company), there were several plans of attack I had in mind & tested out to make search work nicely.

marcushyett-ph · 2022-01-28T12:11:20Z

Yes, it makes sense to bake-in a static onboarding experience around these generic insights rather than create an advanced system to generate the same insight for every user :D
Yep sigma analysis sounds potentially like a great solution for this.

Search: Sounds like a fair conclusion (but we can keep out of scope for now) would it be fair to say that we now have more confidence this is solvable given the work we've done so far?

Btw, I'm trying to get our session with ex-CTO of company in this space booked in (they're not free until the week after next) we should be able to validate some of our conclusions with them, then hopefully.

neilkakkar · 2022-01-28T12:24:47Z

would it be fair to say that we now have more confidence this is solvable given the work we've done so far?

100% yes.

For any of the above, I'm not saying we were wrong (at all) to consider all these approaches. I could only reach this step after experimenting a bit and solidifying my guesses via actual code.

neilkakkar · 2022-02-01T12:36:32Z

For a taxonomy to be really valuable is needs a lot of specific nodes, e.g. If I want a red, mens, long sleeved shirt, seeing a million products that are categorized as "Top" is not particularly helpful. I would expect we'd be in the order of thousands of nodes for a good taxonomy here. We can definitely do something with ML, but I think a human expert could solve this fairly well.

So, imagine we have such a taxonomy. Let's say, for sake of discussion, that every non-stale event in PostHog for PostHog is the taxonomy - it's a taxonomy for product analytics companies that also support FFs and session recordings and correlation. Now, this is a very specific taxonomy, and let's say there are 100 other companies which are in this space, have the same events, and use PostHog for their own product analytics because PostHog is obviously the best.

How do we generate valuable insights from this taxonomy? This is an easier problem than above, since there's a perfect taxonomy mapping.

EDsCODE · 2022-02-02T02:15:50Z

I think the discussion and findings have been very useful

Summary comment:

Creating a taxonomy is doable especially given openai capabilities but it's still unclear how to translate these to "useful" insights
Consequently, search isn't fully proven out but there's still some options we haven't visited (going the OSS search tools route rather than creating a taxonomy)
A good direction to head towards is first surfacing "useful" insights by defining "useful" as related insights such as an insight being viewed by teammates a lot that you haven't seen yet or a concrete "usefulness" criteria such as a meaningful volume or percentage difference in a related insight. Stay close to collaboration to build an optimal UX

marcushyett-ph · 2022-02-02T09:37:57Z

Great summary - whats our next step from here?

marcushyett-ph changed the title ~~Ideas to tackle Universal Taxonomy~~ Ideas to tackle ~Universal Taxonomy~ Automated Insights Jan 25, 2022

neilkakkar closed this as completed Mar 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas to tackle ~Universal Taxonomy~ Automated Insights #8261

Ideas to tackle ~Universal Taxonomy~ Automated Insights #8261

neilkakkar commented Jan 25, 2022 •

edited

Loading

marcushyett-ph commented Jan 25, 2022 •

edited

Loading

EDsCODE commented Jan 25, 2022 •

edited

Loading

marcushyett-ph commented Jan 25, 2022

EDsCODE commented Jan 25, 2022

EDsCODE commented Jan 25, 2022 •

edited

Loading

marcushyett-ph commented Jan 26, 2022

neilkakkar commented Jan 26, 2022

neilkakkar commented Jan 26, 2022

marcushyett-ph commented Jan 26, 2022

EDsCODE commented Jan 26, 2022

EDsCODE commented Jan 27, 2022 •

edited

Loading

neilkakkar commented Jan 28, 2022

posthog-contributions-bot bot commented Jan 28, 2022

neilkakkar commented Jan 28, 2022

marcushyett-ph commented Jan 28, 2022 •

edited

Loading

neilkakkar commented Jan 28, 2022 •

edited

Loading

marcushyett-ph commented Jan 28, 2022

neilkakkar commented Jan 28, 2022

neilkakkar commented Feb 1, 2022

EDsCODE commented Feb 2, 2022 •

edited

Loading

marcushyett-ph commented Feb 2, 2022

Ideas to tackle ~Universal Taxonomy~ Automated Insights #8261

Ideas to tackle ~Universal Taxonomy~ Automated Insights #8261

Comments

neilkakkar commented Jan 25, 2022 • edited Loading

Problem 1 solutions

Problem 2 solutions

Generic Word Embeddings

Automatic taxonomy creation

Text matching

Text matching without training

Problem 3 solutions

Random Insights

Neural Net all the things!

marcushyett-ph commented Jan 25, 2022 • edited Loading

EDsCODE commented Jan 25, 2022 • edited Loading

Problem 1 + Business model segementation

Problem 3

General thoughts

marcushyett-ph commented Jan 25, 2022

EDsCODE commented Jan 25, 2022

EDsCODE commented Jan 25, 2022 • edited Loading

Problem layers

Useful insights: Deep dive

Benefits

Drawbacks

marcushyett-ph commented Jan 26, 2022

neilkakkar commented Jan 26, 2022

neilkakkar commented Jan 26, 2022

marcushyett-ph commented Jan 26, 2022

EDsCODE commented Jan 26, 2022

EDsCODE commented Jan 27, 2022 • edited Loading

neilkakkar commented Jan 28, 2022

What I mean by taxonomy

posthog-contributions-bot bot commented Jan 28, 2022

neilkakkar commented Jan 28, 2022

Problem Analysis

The Big Problem

The direction we should head in

marcushyett-ph commented Jan 28, 2022 • edited Loading

neilkakkar commented Jan 28, 2022 • edited Loading

marcushyett-ph commented Jan 28, 2022

neilkakkar commented Jan 28, 2022

neilkakkar commented Feb 1, 2022

EDsCODE commented Feb 2, 2022 • edited Loading

marcushyett-ph commented Feb 2, 2022

neilkakkar commented Jan 25, 2022 •

edited

Loading

marcushyett-ph commented Jan 25, 2022 •

edited

Loading

EDsCODE commented Jan 25, 2022 •

edited

Loading

EDsCODE commented Jan 25, 2022 •

edited

Loading

EDsCODE commented Jan 27, 2022 •

edited

Loading

marcushyett-ph commented Jan 28, 2022 •

edited

Loading

neilkakkar commented Jan 28, 2022 •

edited

Loading

EDsCODE commented Feb 2, 2022 •

edited

Loading