Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to sort by different seeds of hourly/daily randoms #8966

Merged
merged 2 commits into from
Mar 28, 2024

Conversation

cdrini
Copy link
Collaborator

@cdrini cdrini commented Mar 26, 2024

Closes #8965 .

Technical

Testing

Note: Solr has a caveat that the order might change more frequently if the documents are edited.

Screenshot

Stakeholders

@RayBB

@cdrini cdrini force-pushed the feature/random-hourly-seed branch 4 times, most recently from e4c1ddf to 92222bc Compare March 26, 2024 20:54
@cdrini cdrini force-pushed the feature/random-hourly-seed branch from 92222bc to 4d47451 Compare March 26, 2024 20:57
@cdrini cdrini marked this pull request as ready for review March 26, 2024 21:00
@cdrini cdrini added the On testing.openlibrary.org This PR has been deployed to testing.openlibrary.org for testing label Mar 26, 2024
Copy link
Collaborator

@RayBB RayBB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @cdrini thanks for finding a fix to this so fast. It would have been a difficult one for me to dig into.

The code works as I'd expect on testing and I generally feel I have a decent understanding of it. If we like this broad approach then I'd say lets merge it.

However, I have a couple of questions that could help us refine the feature further:

  1. Documentation Strategy: How should we document this feature to ensure it's accessible and understandable for librarians? (Please read the second question before providing an answer.)

  2. Default Random Seed Behavior: Should all carousels on a page share the same random seed by default? It isn't intuitive for librarians to manually adjust the random seed for each carousel. I'm not sure how many cases this issue matters to outside of the K-12. I can think of another example like a Collections for NYT bestsellers by year on list where books are on multiple years this would lead to a similar situation as K-12. It would be more intuitive if each carousel had a unique seed by default, changing every hour or as needed.

    One suggestion is to modify the process_user_sort function to automatically derive a seed from the query parameters if it is random and no custom seed is provided. This could be achieved by passing the param dictionary to process_user_sort and setting the random_seed to a sorted JSON string of the param dictionary if no custom seed is specified. if not random_seed then random_seed=json.dumps(param_dict, sort_keys=True)

    This approach would ensure that each carousel has its own unique seed, enhancing the feature's intuitiveness and reducing the need for manual adjustments.

    If we do that by default, then do we even need the option for users to specify custom seeds for random? I can't think of a case but I'm sure you or someone can :)

cc: @seabelis I'd love to hear your thoughts on this!

openlibrary/plugins/worksearch/schemes/__init__.py Outdated Show resolved Hide resolved
Co-authored-by: Raymond Berger <RayBB@users.noreply.github.com>
@cdrini
Copy link
Collaborator Author

cdrini commented Mar 27, 2024

  1. Whatever docs you used for collections might be a good spot? Or maybe here perhaps https://openlibrary.org/dev/docs/api/search ?
  2. Good idea! Haha I was debating the same thing :P I opted for this approach because it was simpler and required a bit less thinking ahead to try to predict potential use cases, while also giving the API user more control. E.g. what if someone wants to display a few randoms for a larger query set? They'd need this parameter anyways.

@cdrini
Copy link
Collaborator Author

cdrini commented Mar 27, 2024

It's also a bit of a niche use case, so I'm ok with the experience being a little manual/clunky in return for keeping our service/code simpler!

@RayBB
Copy link
Collaborator

RayBB commented Mar 27, 2024

  1. Docs probably here: https://openlibrary.org/help/faq/collections
  2. I think it's small things like this that make the patron experience better and lead to less frustration by librarians. It should just work as expected. So I feel it is worth it for a little (not too much) added code complexity.

Are (5383917) 10 lines are more or less what we'd need to do this? (other than a small refactor to update types)

I feel like a solid 7/10 that we should get the default behavior right. However, I know that this isn't a super high impact thing right now. If you agree that we should get the default behavior maybe the way is to merge your change now and then open an issue to change the default. The sorts of editions, subjects, works, and authors all have some overlap so it could be a good small refactor task for someone to pull out the common ones (mostly the random) and put them in a shared spot.

@cdrini
Copy link
Collaborator Author

cdrini commented Mar 27, 2024

That's close! There are a few caveats though: param can be a veeery large dict ; it will likely need to be eg md5-hashed and possibly truncated. Also, custom random seeds have a performance overhead; and this will make a new random seed for every request with random. I'm not sure of the implications of this; that would need some testing.

100% agree with you on the design principles here of making the default behaviour "just work"! But making things "just work" is actually one of the more difficult things to get right! And things that "just work" tend to be best for more novice users. As you move from novice users to expert users, you tend to find users will want to break your "just works" assumptions in exchange for more control. Changing this would change the default behaviour for librarians as well as for anyone using our APIs ; it would make them pay a potential performance hit for a default behaviour they might not care about. I'm not entirely sure if all these concerns are valid to be honest, but I just don't have the time to investigate them right now. And regardless of what we come up with, having an API to specify eg random.hourly seeds is still a useful feature! So the work in this PR is kind of going to always be a step in the right direction, whatever we might decide that right direction to be in the future.

Copy link
Collaborator

@RayBB RayBB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdrini I agree with you!

Lets merge this in and I can make an issue later for investigate refactoring and possibly changing the default 👍

Maybe this can be the default just for carousels and not the default for all search :)

@cdrini cdrini merged commit 4821b25 into internetarchive:master Mar 28, 2024
5 checks passed
@jimchamp jimchamp removed the On testing.openlibrary.org This PR has been deployed to testing.openlibrary.org for testing label Apr 2, 2024
@RayBB
Copy link
Collaborator

RayBB commented Apr 2, 2024

Just updated the page with this new feature and it is looking great! https://openlibrary.org/collections/k-12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support different seeds for random.hourly sort
3 participants