Add ability to sort by different seeds of hourly/daily randoms #8966

cdrini · 2024-03-26T20:37:28Z

Closes #8965 .

Technical

Testing

Viewing https://testing.openlibrary.org/search?q=subject%3A("Reading+Level-Grade+11"+OR+"Reading+Level-Grade+12")+first_publish_year%3A[2000+TO+*]&sort=random.hourly returns the same results when refreshed or until the hour changes
https://testing.openlibrary.org/search?q=subject%3A(%22Reading+Level-Grade+11%22+OR+%22Reading+Level-Grade+12%22)+first_publish_year%3A[2000+TO+*]&sort=random.hourly_1 has the same behaviour, but displays a different random than the above.

Note: Solr has a caveat that the order might change more frequently if the documents are edited.

Screenshot

Stakeholders

@RayBB

RayBB

Hey @cdrini thanks for finding a fix to this so fast. It would have been a difficult one for me to dig into.

The code works as I'd expect on testing and I generally feel I have a decent understanding of it. If we like this broad approach then I'd say lets merge it.

However, I have a couple of questions that could help us refine the feature further:

Documentation Strategy: How should we document this feature to ensure it's accessible and understandable for librarians? (Please read the second question before providing an answer.)
Default Random Seed Behavior: Should all carousels on a page share the same random seed by default? It isn't intuitive for librarians to manually adjust the random seed for each carousel. I'm not sure how many cases this issue matters to outside of the K-12. I can think of another example like a Collections for NYT bestsellers by year on list where books are on multiple years this would lead to a similar situation as K-12. It would be more intuitive if each carousel had a unique seed by default, changing every hour or as needed.

One suggestion is to modify the process_user_sort function to automatically derive a seed from the query parameters if it is random and no custom seed is provided. This could be achieved by passing the param dictionary to process_user_sort and setting the random_seed to a sorted JSON string of the param dictionary if no custom seed is specified. if not random_seed then random_seed=json.dumps(param_dict, sort_keys=True)

This approach would ensure that each carousel has its own unique seed, enhancing the feature's intuitiveness and reducing the need for manual adjustments.

If we do that by default, then do we even need the option for users to specify custom seeds for random? I can't think of a case but I'm sure you or someone can :)

cc: @seabelis I'd love to hear your thoughts on this!

openlibrary/plugins/worksearch/schemes/__init__.py

Co-authored-by: Raymond Berger <RayBB@users.noreply.github.com>

cdrini · 2024-03-27T15:22:27Z

Whatever docs you used for collections might be a good spot? Or maybe here perhaps https://openlibrary.org/dev/docs/api/search ?
Good idea! Haha I was debating the same thing :P I opted for this approach because it was simpler and required a bit less thinking ahead to try to predict potential use cases, while also giving the API user more control. E.g. what if someone wants to display a few randoms for a larger query set? They'd need this parameter anyways.

cdrini · 2024-03-27T15:26:04Z

It's also a bit of a niche use case, so I'm ok with the experience being a little manual/clunky in return for keeping our service/code simpler!

RayBB · 2024-03-27T16:34:15Z

Docs probably here: https://openlibrary.org/help/faq/collections
I think it's small things like this that make the patron experience better and lead to less frustration by librarians. It should just work as expected. So I feel it is worth it for a little (not too much) added code complexity.

Are (5383917) 10 lines are more or less what we'd need to do this? (other than a small refactor to update types)

I feel like a solid 7/10 that we should get the default behavior right. However, I know that this isn't a super high impact thing right now. If you agree that we should get the default behavior maybe the way is to merge your change now and then open an issue to change the default. The sorts of editions, subjects, works, and authors all have some overlap so it could be a good small refactor task for someone to pull out the common ones (mostly the random) and put them in a shared spot.

cdrini · 2024-03-27T17:55:22Z

That's close! There are a few caveats though: param can be a veeery large dict ; it will likely need to be eg md5-hashed and possibly truncated. Also, custom random seeds have a performance overhead; and this will make a new random seed for every request with random. I'm not sure of the implications of this; that would need some testing.

100% agree with you on the design principles here of making the default behaviour "just work"! But making things "just work" is actually one of the more difficult things to get right! And things that "just work" tend to be best for more novice users. As you move from novice users to expert users, you tend to find users will want to break your "just works" assumptions in exchange for more control. Changing this would change the default behaviour for librarians as well as for anyone using our APIs ; it would make them pay a potential performance hit for a default behaviour they might not care about. I'm not entirely sure if all these concerns are valid to be honest, but I just don't have the time to investigate them right now. And regardless of what we come up with, having an API to specify eg random.hourly seeds is still a useful feature! So the work in this PR is kind of going to always be a step in the right direction, whatever we might decide that right direction to be in the future.

RayBB

@cdrini I agree with you!

Lets merge this in and I can make an issue later for investigate refactoring and possibly changing the default 👍

Maybe this can be the default just for carousels and not the default for all search :)

RayBB · 2024-04-02T12:37:06Z

Just updated the page with this new feature and it is looking great! https://openlibrary.org/collections/k-12

…netarchive#8966)

cdrini force-pushed the feature/random-hourly-seed branch 4 times, most recently from e4c1ddf to 92222bc Compare March 26, 2024 20:54

Add ability to sort by different seeds of hourly/daily randoms

4d47451

cdrini force-pushed the feature/random-hourly-seed branch from 92222bc to 4d47451 Compare March 26, 2024 20:57

cdrini marked this pull request as ready for review March 26, 2024 21:00

cdrini assigned RayBB Mar 26, 2024

cdrini added the On testing.openlibrary.org This PR has been deployed to testing.openlibrary.org for testing label Mar 26, 2024

RayBB requested changes Mar 27, 2024

View reviewed changes

openlibrary/plugins/worksearch/schemes/__init__.py Outdated Show resolved Hide resolved

openlibrary/plugins/worksearch/schemes/__init__.py Show resolved Hide resolved

Update openlibrary/plugins/worksearch/schemes/__init__.py

36a527e

Co-authored-by: Raymond Berger <RayBB@users.noreply.github.com>

RayBB approved these changes Mar 27, 2024

View reviewed changes

cdrini merged commit 4821b25 into internetarchive:master Mar 28, 2024
5 checks passed

RayBB mentioned this pull request Mar 28, 2024

Enhance Default Carousel Randomness for Unique Display Across Multiple Carousels #8986

Open

2 tasks

jimchamp removed the On testing.openlibrary.org This PR has been deployed to testing.openlibrary.org for testing label Apr 2, 2024

Achorn pushed a commit to Achorn/openlibrary that referenced this pull request Apr 12, 2024

Add ability to sort by different seeds of hourly/daily randoms (inter…

49cdd72

…netarchive#8966)

Jiatonglii22 mentioned this pull request Apr 26, 2024

Hash Function Introduced to Produce Random Seeds to Enhance Default Carousel Randomness #9160

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to sort by different seeds of hourly/daily randoms #8966

Add ability to sort by different seeds of hourly/daily randoms #8966

cdrini commented Mar 26, 2024 •

edited

Loading

RayBB left a comment •

edited

Loading

cdrini commented Mar 27, 2024

cdrini commented Mar 27, 2024

RayBB commented Mar 27, 2024 •

edited

Loading

cdrini commented Mar 27, 2024

RayBB left a comment

RayBB commented Apr 2, 2024

Add ability to sort by different seeds of hourly/daily randoms #8966

Add ability to sort by different seeds of hourly/daily randoms #8966

Conversation

cdrini commented Mar 26, 2024 • edited Loading

Technical

Testing

Screenshot

Stakeholders

RayBB left a comment • edited Loading

Choose a reason for hiding this comment

cdrini commented Mar 27, 2024

cdrini commented Mar 27, 2024

RayBB commented Mar 27, 2024 • edited Loading

cdrini commented Mar 27, 2024

RayBB left a comment

Choose a reason for hiding this comment

RayBB commented Apr 2, 2024

cdrini commented Mar 26, 2024 •

edited

Loading

RayBB left a comment •

edited

Loading

RayBB commented Mar 27, 2024 •

edited

Loading