Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Behavior Insights implementation for Apache Solr #2452

Draft
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

epugh
Copy link
Contributor

@epugh epugh commented May 9, 2024

Description

I am working with other folks, especially Stavros Macrakis (macrakis@gmail.com), to come up with a solution for understanding what users are doing in response to search results. We have great visibility and understanding of an incoming query, what we do with it, and then what docs are sent back. We do NOT have a way of tying that search to then what does the user do next, and if the following query is connected to the original one.

Many teams lean on GA or Snowplow or custom code for tracking click through, add to cart, etc as signals, but nothing that is drop dead simple to use and open source.

Solution

User Behavior Insights is a shared schema for tracking search related activities. There is a basic implementation for OpenSearch and this is a version for Apache Solr.

Tests

Bats test to demonstrate the end to end use of UBI.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@HoustonPutman
Copy link
Contributor

I'd love to review here, but I think I need some more starting information either in a ref guide page or a JIRA, I'm kind of lost right now...

@epugh
Copy link
Contributor Author

epugh commented May 10, 2024

I'd love to review here, but I think I need some more starting information either in a ref guide page or a JIRA, I'm kind of lost right now...

Yeah... I'll go ahead and write up some ref guide docs! And finish the demo .bats script ;-)

@chatman
Copy link
Contributor

chatman commented May 20, 2024

Usually, features like these are discussed in the dev@ list, or in JIRA or a SIP.
Most important question I have in mind is whether this needs to be in the core search engine? If not, can this not be a plugin/package, shipped outside of solr-core?

@epugh
Copy link
Contributor Author

epugh commented May 20, 2024

This is definitely draft mode code... I opened it as a PR just to be able to track the work, and once it gets a bit furthur, I plan on opening a proper discussion about it. Module? Solr Sandbox? A Component? A full blown package? So many fun options...

@epugh
Copy link
Contributor Author

epugh commented May 22, 2024

I figured out how to parse and run a streaming expression that is used to write the query analytics data to, well, anywhere we want ;-). The next area is to look at is actually integrating the streaming expression INTO the component as more than just a one off.. I gotta figure out how to take the data and pass it into the streaming expression... input() maybe??? Also think about how to not rebuild/destroy/rebuild the streaming expression for every query.

Then more Ref Guide docs, a BATS integration test maybe, and then a discussion about who wants to use it first! Plus of course the ever critical, "where does the code live" conversation.

@epugh
Copy link
Contributor Author

epugh commented May 22, 2024

Oh, and of course, we now have a machine readable schema via Json Schema available here https://github.com/o19s/ubi

@epugh
Copy link
Contributor Author

epugh commented Aug 16, 2024

I am having some second thoughts about the idea of logging ubi queries to disk... Why? Any real use case you want them to go somewhere. Plus log4j is a pain to touch... So... May just rip that part out. You want to log to disk? just write a streaming expression ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants