-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classifier: fix duplicate subjects (with ordered subject selection.) #2819
Conversation
Test subjects.queue after modifying the subject queue, and also after appending duplicate subjects that are already in the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Review
While the changes to specs make sense, I'm still able to replicate the duplicate subject bug as described in #2818 with the same behavior as described here: https://www.zooniverse.org/projects/msalmon/hms-nhs-the-nautical-health-service/talk/4936/2331577?comment=3847847
To test this PR, I ran HMS NHS locally in demo mode, made 10 classifications in the 1891 'Where From' subject set starting from subject 4/276. After clicking 'Done' on subject 13/276, I was shown subject 4/276 with an Already Seen banner. Refreshing the page did not serve a new subject and neither did navigating to the homepage and back to the same workflow.
In demo mode, Designator is always going to show you subject 4, because you haven't submitted a classification for it. I'm wondering if there's a way to test this without submitting real classifications on a live project. @snblickhan has a test project with ordered subjects. |
This test project does not have ordered subject selection banners like HMS NHS. It looks like I can make more than 10 real classifications to the test project without seeing a duplicate subject, but I have no way of knowing for sure without the SubjectSetProgressBanner. Is this test workflow same flow of data as an engaging crowds project? |
That workflow does have sequential subject selection ( I wonder if we need to enable both those flags in order to reproduce this bug? I've classified around 35 subjects on Sam's test project, on staging, without seeing a repeat. |
I think @snblickhan already pointed this out on the Talk thread, but refreshing the page will only give you a new subject when a workflow uses random selection. With prioritised selection, Designator will always send you the first ordered subject that you haven't classified. So seeing the same subject when you refresh the page is expected. You'll only get a new subject when your first unclassified subject retires, or when Panoptes registers the current subject as seen by you. |
Good to hear that duplicate subjects don't appear on Sam's test project, and I think I see the same. Because this bug fix is specific to a Zooniverse project with subject set selection, I think it's important to confirm we don't see duplicate subjects with that flag enabled. Is there a staging project with engaging crowd features used to test for bugs? If not, I can make real classifications on HMS NHS's easier workflows, but I think being able to quickly test a staging project that's a replica of HMS NHS workflows should be a requirement for reviewing its bug fixes going forward. |
@snblickhan can you help out Delilah with this? |
By the way, I'm currently working on something that will take the pain out of setting up grouped and prioritised workflows for testing etc. |
Sure thing. I'll create a new workflow with both of these admin flags enabled and we'll see whether that triggers the bug. |
OK @eatyourgreens & @goplayoutside3 -- have created new workflow #20917 which has 2 subject sets and includes prioritized subject delivery and set selection. I'll do some test classifications now to see if the bug shows up. |
Immediately triggered it -- using subject set 102314, got to 10 classifications in and was immediately sent back to #8/50 with an Already Seen flag. |
Thanks @snblickhan! I ran the project locally with this branch to try out workflow 20917. While signed out in an incognito browser window:
While signed in:
Testing a different user account
|
We can't track your classification history if you aren't logged in. @snblickhan am I right in saying that we don't expect these prioritised workflows to work for anonymous volunteers? |
That’s weird. Do you mind opening an issue for this? It could be a bug in the classifier, preserving the wrong subject across workflows, or a bug in the interaction with Designator. Changing workflow should reset the queue and request a new batch of ten subjects from Designator, starting from the last one that you classified but skipping any subjects that have been retired by other volunteers. I don’t expect the latter to be relevant here, because you’re the only person who has classified on the workflow. |
I’m hoping that the extra console logs in #2793 will help to debug changes in the active subject when you switch workflow. |
After reading @goplayoutside3's reply, I just tried the same thing (incognito window, not signed in) for workflow 20917 and it triggered the bug after 10 classifications on the DNP set (expected subject 11/50, it instead sent me all the way back to image 1/50 with an Already Seen banner). |
I think that is expected if you're testing at https://frontend.preview.zooniverse.org/projects/blicksam/fem-bug-squashing/classify/workflow/20917 because frontend.preview doesn't include this PR's bug fix. Do we expect prioritised workflows to work for anonymous volunteers who are not signed in? This PR does allow signed-in users to move past 10 classifications, and it could be approved if we're going to track the other unexpected non-sequential behavior I mentioned above in a different issue. |
If you aren't logged in, won't you get the first ten subjects on a loop, at least until they retire? Sequential selection relies on you being logged in, so that Panoptes can track which subjects you've seen. EDIT: @snblickhan how are projects like Davy Notebooks, which use sequential subjects, handling anonymous volunteers? They will all be getting the same ten subjects from Designator, which will be the first ten unretired pages. For Operation War Diary, we showed subjects in order to volunteers who were logged in, but served random pages to anonymous volunteers. I don't know if Panoptes/Designator would support that, but it's one possible strategy for spreading anonymous effort evenly across subjects. I think this conversation is straying away from the original bug and into discussing how the backend should work for sequential subject selection. |
There's a selection bias in the bug reports, in that they're coming from Talk and anonymous volunteers can't comment on Talk. |
Reading back, it seems there's a few different things going on here, particularly if you change workflows on a project like HMS NHS. This PR is specifically focussed on fixing #2818: when you're logged in, the last three subjects in every batch of ten, from the API, are repeated in your queue because the minimum queue size is three subjects. front-end-monorepo/packages/lib-classifier/src/store/SubjectStore/SubjectStore.js Line 15 in def60aa
I'd like to get that fix merged and deployed. It seems like the scope of the issue is starting to creep to include other bugs in subject selection. |
As part of the IIIF work (zooniverse/Panoptes-Front-End#6095), I've set up a grouped, prioritised workflow with two volumes from the British Library. I'm not seeing the duplicate subject bug on these subject sets, which is odd. I'd at least expect the last two subjects to repeat in each batch of ten from the API. |
I've just found out from @camallen, on Slack, that Designator has a five minute cache, during which time it will always send you new subject IDs. So, to reproduce this bug, you have to wait at least 5 minutes between requests for fresh subjects from Panoptes. That explains why we didn't see it on Sam's quick Yes/No workflow, but volunteers have seen it on HMS NHS. That cache also explains why we're seeing the first subject change if you leave a workflow, or set, then come back to it within 5 minutes. |
Super interesting. This is a good lesson for me to always create test workflows that match the projects where bugs are reported. I'd sell my soul for a Copy Workflow button that works across projects... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading back, it seems there's a few different things going on here, particularly if you change workflows on a project like HMS NHS.
This PR is specifically focussed on fixing #2818: when you're logged in.
In that scope, this PR does fix the bug. @eatyourgreens I'll let you open an issue for the Designator-related subject sequences. If you'd like me to document the sequence of subjects I'm shown on projects/blicksam/fem-bug-squashing, please let me know!
Let's get this out and see if it fixes the problem for HMS NHS volunteers. |
@goplayoutside3 I've opened an issue on Panoptes for the subject queue advancing when you leave a workflow and come back. |
In #2392 I added
subjects.queue
, an array of subject IDs that allows for appending or prepending subjects to the subject queue, and also allows for navigating both forward and backwards through the queue.I didn't update the tests to test the new array. As a result, it's possible to append duplicate subjects to the queue without any tests failing.
This PR updates the tests to test
subjects.queue
explicitly, adds a test that appends duplicate subjects and updates the subjects store to pass the new tests.Package:
lib-classifier
Closes #2818.
Review Checklist
General
Components
Apps
yarn panic && yarn bootstrap
ordocker-compose up --build
and app works as expected?Publishing
Post-merging