Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] NetworkJob::Gallery Tab Network jobs stall in pending or working state after relaunch/crash #1627

Open
bbappserver opened this issue Nov 18, 2024 · 4 comments
Labels

Comments

@bbappserver
Copy link
Contributor

Hydrus version

598

Qt major version

Qt 6

Operating system

Linux (specify distro and version in comments)

Install method

Running from source

Install and OS comments

Ubuntu 23 x86

Bug description and reproduction

Gallery importer network jobs become stuck in the pending or working state but never make any progress after hydrus is relaunched.

My best guess for why this happens is that after the tab is deserialized the jobs have become orphaned instead of being reinserted into the job queue.

Symptoms

  • The gallery url which initiated the job in the search log reads as successful, or there is a gallery url for the next page which has a blank status.
  • File/Post urls in the file log which have not yet been processed have a lank status but are never processed

Reproduction Steps

  1. Start several gallery downloaders.
  2. Reboot and kill(-9) hydrus at various stages of the network job. In my case a crash caused the issue.

Workaround

Copy the query from the faulted job and run it again. Delete the faulted job from the tab.

Log output

[There is no specific log or dump] The program is running without crash or error, however these specific jobs seem hung at the model level.
@hydrusnetwork
Copy link
Owner

Thank you for this report. This is pretty odd--obviously the program is supposed to resume if it boots with a non-complete downloader, and even if there is a crash it should normally be fine--since the downloader page is basically rewound in time a bit, it'll usually just blit through the first few results as it realises they are 'already in db' since they were imported after the crash (and so saved to the database, which did sync, but not the GUI session, which was pre-empted by the crash), and then it'll continue as always.

I agree that this is probably some odd scheduling thing. I did change some file log stuff recently, which could screw with some duplicate URLs or paths in file logs, but it doesn't sound like you have this unless you have some very exotic URL class rules.

Unfortunately I cannot reproduce this--if I force a crash and restart, things get back to normal as I would expect. There are some forced limits in the number of downloaders that can run at once--it is something like 5 gallery downloaders and 10 file downloaders--so sometimes the pending/working situation can stall when things are busy, for instance right after boot, but you will see things move forward unless the client is really suffering under hundreds of competing watchers or whatever.

Can we gather a bit more information?

  • First off, if your client closes cleanly, no crash or kill(-9), and then you boot up again, does a downloader reload the exit session state and get back to work ok in that case? Is it only crashes that throw it off, or is it any 'boot on incomplete downloader'?
  • If you get this infinite pending/waiting jobs state, hit network->data->review network jobs. Do you see the respective downloads in there? It should be the 'working' guys, but not the pending. Do they have any interesting or useful info? Is anything else working there, and if you hit 'refresh snapshot', is it moving along or all stalling? If you do your workaround and copy the query, do you see the network stuff working now, and if/how is it different?
  • Advanced, and only if you have the time to do it: Hit network->pause->always boot with paused network traffic, and then restart the program cleanly. Now open a new downloader and set up our crash test. Unpause the network traffic for a bit, and then initiate the crash. Now boot up, and all the network traffic should be paused again. Hit up help->debug->report modes->file import report mode and network report mode. Now unpause network traffic again and our guys should do their thing and you should get a whole bunch of popup spam, typically like forty popups per successful file import. Is there anything interesting in there? Does any of the network stuff get going, or is there actually basically doing going on? Does it always stop after x step?

@bbappserver
Copy link
Contributor Author

@hydrusnetwork

  • I have paused all but the stalled jobs, on the stalled jobs search and files are not paused
  • Subscriptions are paused

Paused all new network traffic, network report mode on, no messages from the network engine.
Reviewing network jobs shows empty table, no jobs waiting for bandwidth.

On closer inspection another symptom is that search is not ◼️ on the offending jobs, but is ◼️ on files. However the search never seems to be proceeding to populate more files.

@hydrusnetwork
Copy link
Owner

Thanks. I think you are correct, as you said on discord, that somehow these old (1 year+) import jobs have some busted variable and serialisation will help us debug. I did recently update some of the 'how do we identify ourselves' vs 'how do we do our job' variables inside file import jobs, and I think gallery search jobs, so perhaps this is what has happened here.

I will figure out some 'export this to json' job for the file and search logs and we'll examine what URLs etc.. they think they really have.

@hydrusnetwork
Copy link
Owner

Ok, slightly stupid location for the menu item, but both the file log and search log will in v600 support JSON export of the current selection to clipboard. Please DM me some examples here, and maybe the equivalent on a newly created downloader that we know work, and we'll see what the differences are.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants