-
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export speed issue with many attachments #1389
Comments
🤖 this is your friendly neighborhood build bot announcing test build 5.2.7.5586 ("limit workers, more caching") Install in Zotero by downloading test build 5.2.7.5586, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...". |
I first need to unpack a few things -- with large libraries things come to the surface about BBT that don't affect most people. Sorry for the wall of text that follows. It's not essential that you read all that follows, it's merely to provide backing context for the (very relevant) questions you asked. The TL;DR is there's still something wrong with BBT for you; a new debug log from an auto-export with 5586 will tell me more. Still here? Alright then. First -- 13k entries is a lot, but I have more in my test suite, largest being 24k items, and with a cold cache (which is the worst-case scenario), that takes 80-100 seconds on my MBPro. So unless your machine differs significantly, performance-wise, from mine, something is absolutely wrong. In pre-5.2, only foreground exports were possible. Zotero imposes a limitation that exports, once started, must run to completion, and while doing so, block the UI completely. The first speedup I have in place is that for individual entries, when exported for a particular combination of preferences and export options, the output is remembered; when that same item is exported again, for the same combination of preferences and export options, it is not computed again, I just take the cached output. If you have a lot of uncached entries (and changing any preferences drops the entire cache) for an export (again, for a particular combination of preferences), and I've detected earlier that an auto-export took more than 10 seconds, on the next auto-export, I split what you're about to export into batches of 10, export those and do nothing with the results -- but the bare fact I've exported them will put them in the cache, and after each small batch, the UI gets a breather. I then start the actual export again, and that will run to completion, but getting everything from the cache. Best-case scenario is that you change one entry, only that entry needs to be generated again, and everything but that comes from the cache. In 5.2.3, that was changed. All BBT exports (except drag-and-dropped) now happen on a separate thread, so the export does not interfere with the UI, database, Zotero writing files, anything. It runs entirely isolated from the rest of Zotero, and I figured, it might not need the cache, so I didn't put it in, because it was a little complicated to access the cache from the now-isolated export. The plumbing was still there because for drag and drop it was still useful. Turns out I had forgotten just how heavyweight BBT exports really are, so even if they happen in a separate process, it's still desirable to have them cached (for 24k items it is the difference between 80-100 seconds and 10-15 seconds for me). So 5.2.7 adds cache access for the workers. So that brings me to your situation. The automatic exports in pre-5.2 and post-5.2.5 are incremental in the sense that if you have exported an entry before (and have not changed any preference between) it will take the cached output from memory (not from the file on disk) instead of regenerating it. In 5.2.3 - 5.2.5, the exports would behave as if the cache was always empty, and would not contribute back to the cache during export. 5.2.7 does read from the cache, and does contribute to the cache, so we have the pre-5.2 caching behavior, but with exports that should not block the UI even on an (largely) empty cache. So you are correct -- subsequent exports should get faster and faster until you're effectively always just generating the one entry you just changed and nothing else, plus some overhead for reading the cache and writing files to disk. You are right that the problem with |
Those are the - only - kind of walls I want folks to build. So yes. Let's build THOSE walls. |
Latest MBP, 2.7 GHz Quad-Core Intel Core i7, 16 GB 2133 MHz LPDDR3. It should run fast.
Haven't really changed anything. Recently I have been using Zotero Betas (with plenty of updates). Would that make a difference?
Nope. Unfortunately not. We are talking answering emails or coffee break kind of lengths here. 10+ minutes Updated to Test build 5.2.7.5586. Then added three new entries. Took two verses, two choruses, a bridge and half a sax solo from a Billy Ocean song (about 160 sec). Definitely incremental. Then added two more entries. That took 10 sec. So the question is: how do I not change the cache? No BBT updates? (that seems to trigger them). No Zotero updates? |
Thanks for all the help! |
You have the post-Jony-Ive MBPro. I should have waited for apple to come to its senses 😢 . Screw that stupid butterfly keyboard and the touchbar escape key. So the absolutely strange thing here is that even in the worst-case scenario (cold cache), this should take about one minute, maybe even a little less because your system is faster than mine. And I can pretty consistently do 24k items in on average 90 seconds. With a hot cache, 10-15 seconds.
Yes! yes it would!
That is at least one of the issues. There used to be a setting where you could choose to keep the cache, if you accept the risk that fixes in BBT don't show up for you because the cache is not cleared. That setting is no longer there, but it's easy to reintroduce, and in fact I could prompt for it if a export took long. That still doesn't explain why on your system exports take an order of magnitude longer than mine. Hopefully the log will give me some clues. |
I'll need a new debug log from 5586 to see if you're using the cache. |
So I went back to the stable Zotero (v. 5.0.81) and installed the test build. This sort of behavior happened before however in just exact the same manner. It's been a long time since I was able to do a cold cache export in under a minute, that's why I stopped using the "automatic export" feature. I have become used to the fact that exports just take up around 10+min from scratch. After starting up the newly installed Zotero and hitting "export now" nothing happened for about 10 sec or so, then I got the spinning wheel. Then, after 6 min or so Zotero unfroze and the dialog box worked again, telling me, there is a background process in the works and gave the percentages. Unfortunately, I had not set up the logging mode. I then started a new export set-up again. That took 7:30 min. Then I added some new entries and used the export now button as per usual. Report-ID: TNCR35UK-euc |
Never mind, got it, hold on for the next build |
🤖 this is your friendly neighborhood build bot announcing test build 5.2.7.5591 ("add option to retain cache") Install in Zotero by downloading test build 5.2.7.5591, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...". |
Right. No need to fiddle in the hidden prefs -- this build will reset the key to the proper value, and also introduces an option to retain the cache in the BBT advanced pane. You can turn it on. I'll need a log from 5591. |
It should be OK to test on Zotero beta BTW. My test suite runs both the latest Zotero release and the latest beta. |
🤖 this is your friendly neighborhood build bot announcing test build 5.2.7.5592 ("always full test suite for #1389") Install in Zotero by downloading test build 5.2.7.5592, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...". |
Installed BBT 5.2.7.5592. Then ran export with no changes. Then added two entries. Then ran again. Behaved as expected. You are running Better BibTeX version 5.2.7.5592 |
Then exported again w no changes. Again, as expected. Only a few seconds. |
So what you're saying is that the hot-cache behavior is as it's supposed to be? It's now just the cold-cache behavior that's the problem. That would at least be an improvement. |
Yep. At the moment, that seems to be the case. I experienced similar improvements - though not this good - in the past with some versions of BBT. Then came a new version and it was slow again (even the hot caches). But it's never been this good. Again, thanks so much for diving into this! If this problem with the cold cache could be addresses, than the citr function would really become super useful. Even after I have (re)connected the Zotero database with citr (which takes a very long time, as reported above), then I have to wait several seconds, when using it. Every time. This is not the case, when I only connect to the local .bib file that citr creates with my added references (which is awesome for producing reproducible documents). |
🤖 this is your friendly neighborhood build bot announcing test build 5.2.11.5766 ("Merge branch 'gh-1389' into gh-1389-itoe") Install in Zotero by downloading test build 5.2.11.5766, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...". |
I think I've found the problem. There may be a way to fix this if there is a structurally faster way to check whether a file exists, which there may well be -- there are two libraries for file handling in Firefox, and the newer one seems to be preferred by Mozilla over the slightly older one Zotero uses for most things, but whether that gets us a performance benefit is as of yet unknown. If there isn't such a faster method, it would require a breaking change to a function used by some translators. Not many, just 6 AFAICT, but Zotero isn't big on breaking changes for translators (for fairly good reasons). 5766 has the private native serializer. Give that a shot. From my measurements it seems so fast that using caching might actually cost performance. |
Works def faster. Will test it with Report ID: REYC9USB-euc |
I'd say. On my own tests with your profile:
REYC9USB-euc:
Which still does not explain the difference between our systems but jeez louise this is a keeper. 15s is still too slow for citr I'd say, but combined with the endpoint I proposed so citr can set up an auto-export I'd say we've struck gold. I'm going to bake another version with serialization caching to see if that improves matters further, |
I cannot say it often enough: Danke, grazie, thank you, merci! |
You're welcome 👍 Hoooly crap ..... I've tested with the serialization cache reenabled, and it's now:
that seems a tradeoff worth making! |
To think 3 weeks ago we were struggling with 17-minute exports... thanks for sticking this one out, could not have done it without your persistence. |
Downside of this is that the serialization cache now only works for worker exports, not "regular" exports (with "background exporters" set to 0). To which I'd say "don't do those then". |
🤖 this is your friendly neighborhood build bot announcing test build 5.2.11.5778 ("Merge branch 'master' into gh-1389") Install in Zotero by downloading test build 5.2.11.5778, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...". |
5778 will let you play along. There's a few open reports which I wish the posters would just respond to so I can fix what's there to fix, but I'm rolling this into a new release between now and a few days. Thanks so much for reporting this! |
🤖 this is your friendly neighborhood build bot announcing test build 5.2.11.5787 ("saveFile only if attachment exists like Zotero") Install in Zotero by downloading test build 5.2.11.5787, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...". |
5.2.12 is building for release. I needed to get a separate fix out, but #1389 is included in the release. |
You’re awesome. |
🙇 |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I understand that exporting (unsing Better BibLaTeX) a very large Zotero database simply takes time and creates rather large
bibtex
files (in my case 13845 entries create abibtex
file that is 7.3 MB. So no squabbles here. My understanding with updates using the BBT "automatic update" feature is that those should happen incrementally? (Not sure, if that's the right description). So, after, say 10 or so more entries, the export shouldn't take as long as if you'd export from scratch. This does not seem to be the case in my set-up.Report ID:
RRPZJI6J-euc
Exporter used:
BBT 5.2.7
Expected behavior:
Incremental update
Actual behavior:
Full update (very slow)
Also, the interaction with citr is veeeery slow. Not sure if those two things are related? The instructions on
citr
say that BBT is needed for this to work:As suggested over at citr, I omit the following fields in the BBT "export" settings:
abstract, note, file, tag, attachment
So I thought I just might ask if there's a connection before starting a new issue.
The text was updated successfully, but these errors were encountered: