-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove obsolete/empty databases from mongo db #1431
Comments
I can think of no reason for one of us not to remove these 5 right now. If they are removed from Warwick what happens with replication? I don't know how to connect or the cloud replicas. |
I can take care of the cloud. If you delete it on Warwick I believe On 2016-05-21 10:14, John Cremona wrote:
|
@JohnCremona, @edgarcosta The five empty databases listed above have been removed from the cloud mongo db (running on ms.lmfdb.xyz) and no longer appear on www.lmfdb.org/api/ |
You beat me to it, I have removed the first one....and now the others. |
@JohnCremona, @davidfarmer Do we need the databases knowledge_5, knowledge_6, knowledge_7, knowledge_8, knowledge_9, and knowledge_tmp? It looks to me like the knowledge database contains everything that is in these. |
@sehlen Do we still need the databases modularforms and modularforms_raw? It looks like the only place where modularforms is referenced is in test_root.py (which should presumably be changed to modularforms2). |
@AndrewVSutherland The knowledge_* stuff is my fault. Cleaning it now. edit: Done |
@JohnCremona I presume it makes sense to remove the "limbo" database from the cloud server (and possibly Warwick as well?) |
Yes. It was the original Artin representation database, but has been superseded by artin. So yes, it can be deleted. |
OK then I'll do the honours at the Warwick end...done (limbo dropped). |
What about "MaassWaveForm" (without the "s")? There are 2 files that mention it:
|
limbo also dropped at ms.lmfdb.xyz |
mongo makes it much too easy to create a new database by mistake after a typo. At least, that was true before we added authentication. |
The collection Fredrik
|
Catching! Done at warwick
and ms:
|
I also took a snapshot before going on this dropDatabase spree... |
@edgarcosta Good idea. I was thinking that we also have the weekly dumps, but they are not kept for that long (40 days according to the backup script). Though I do have some other copies. |
The db just got much slimmer:
|
@JohnCremona we should discuss how frequently, and for how long we should keep backups in the cloud |
@edgarcosta Your previous comment reminded me of a question you asked me (no idea which thread or issue or what) which I did not understand. If it's about filesystems used on the Warwick server please include Bober and Schilly (perhaps you did). |
Does anyone know anything about ap_statistics? It is not referenced by any code in LMFDB/lmfdb. |
It contains data about a(p)’s for newforms of weight 2, trivial character and where the degree of the coefficient field is 2. I just mongodumped the database so you can safely delete it if you like. At some point in the future we would probably like to display statistics for modular forms but we have to think more about exactly what should be (pre-)computed and what should be stored where. The reason why there are so many collections in modularforms2 is that the mongo server in warwick (previously Washington I think) was basically the Basically every collection on modularforms2 which is not referenced to somewhere in the lmfdb can probably be safely deleted (@sehlen?). .
|
@fredstro, @sehlen, So when I updated the modularforms2 data on www.lmfdb.org I copied over the following 11 collections: dimension_table webmodformspace webnewforms webeigenvalues.chunks I see the code also references "webmodformspace_dimension" in emf_utils.py Are there any other collections that are needed? |
@fredstro OK, it sounds like I can definitely remove ap_statistics from the cloud, I'll let @JohnCremona decide what to do on the Warwick machines. |
@davidfarmer is it safe to remove the database Lfunction from the cloud (and possibly Warwick also)? It looks like the code uses the Lfunctions database (both for new and old formats) |
As luck would have it one of the tests in test_root checks for MaassWaveForm, so now fails. Rather than fix this one, we should adapt the test_db function there to check for a complete list of the databases which we actually need. |
I saw that, and I was going to edit that, but then I noticed that in the master branch it is an empty file: |
Wrong file: lmfdb/test_root.py is the one. That empty file can probably be deleted! |
@jwj61 the test of the link to http://hobbes.la.asu.edu/lmfdb-14/ now fails. Is that just temporary? There's a test for it in test_acknowledgements.py |
I added a new /stats option to the api interface that generates a table of mongo db collections ordered by size and/or number of objects; you can check it out at: http://www.lmfdb.org/api/stats One thing that jumps out is that on beta the single largest collection (about 300 GB) is ap.chunks in modularforms2 and the third largest is vector_on_basis.chunks (at 135 GB). Neither of these collections is present on www.lmfdb.org. Is there any reason to keep them on beta? |
As I said around May when this question came up in another thread or email discussion about modular forms (they are all so full of information now that I can't find which one it was ) |
Fredrik, this is about Warwick now. I don't have time to look at it right On Sat, Jul 16, 2016, 10:55 Fredrik Strömberg notifications@github.com
|
@fredstro Thanks for the confirmation! |
Given that we have not heard anything further from @sehlen or @fredstro I would propose that we go ahead and delete all the collections listed in #1431 (comment) except for modularforms2/ModularSymbols.* and plan to proceed with #1294 (perhaps this weekend @edgarcosta, @JohnCremona ?). |
Sounds good to me. Perhaps do it to finish before the weekly backup starts at midnight Saturday GMT (which is 2300 BST so I guess 1800 Saturday on the East coast). |
FYI, I will be flying back to the US over this weekend. |
Ok, when I wrote my answer I didn't see (I was reading on my phone) that this was concerning the mongodb on lmfdb in Warwick since I didn't see any obvious reason for deleting any of the mentioned collections from that server. After Stephan's latest rewriting I wasn't sure exactly which collections contains what and which ones we would like to keep. I had hoped we had time for Stephan to finish his move to Germany and get settled before the storage issues became critical. There are too many messages from github at the moment so my gmail just dumps everything into spam even though I try to mark them as non-spam. It would be good if there was a setting so that some of these messages only got sent to the release managers so I could actually follow the relevant discussions (as I don't want to sync my massive spam folder to my phone). |
@fredstro I will delete the last 3, but are you sure you really need all the others? For example, are the collections webnewforms.chunks and webnewforms2.chunks and WebNewForms.chunks all necessary? Perhaps just 2 (rather than 3) variants of each of these would suffice? BTW, you can browse the full list of collections on the Warwick mongo db modularforms2 database at http://beta.lmfdb.org/api/ (scroll down to see modularforms2). |
@fredstro One thing you can do to reduce the number of github e-mails you get is to go to https://github.com/settings/notifications and turn off e-mails for repositories you are watching (you will still get e-mails for conversations you are participating in (like this one), and for comments that @mention you explicitly). You can also unclick the notifications for Pull Request Reviews and Pull Request Pushes (I think you will still get notifications regarding pull requests you initiate). |
It is not that we are running out of space on the Warwick server, but we still want to delete things which are not needed, since not only do they use up space but so do their copies on the weekly backups. We do have more disks which can be brought online but I am delaying that until a lot of other things have finished since last time I did that it was quite disruptive. |
Ok. I can only tell you which collections were used before Stephan's May refactoring (I am not 100% certain he did not change the names or introduced anything new). Namely the following:
I don't recognise a lot of the collections in /api/ so I wouldn't remove any of them without checking that Stephan is done with the testing/checking/recomputations which began in May. Oh, and thanks for the hint about the notifications! |
@fredstro It would be great to put some of the information above into https://github.com/LMFDB/lmfdb-inventory/blob/master/db-modularforms2.md -- even a short description without the detail (which may not even make sense for thse files/chunks things). |
@fredstro Thank your for the clarification this is very helpful! Among the larger collections (those > 1GB, there are also many smaller ones that I expect can be deleted but there is no rush to deal with those), the only remaining candidates for deletion are: http://beta.lmfdb.org/api/modularforms2/WebNewforms.chunks/ which look to me like old versions of webnewforms.* (and there is also webnewforms2.*). @sehlen Can you confirm that it is OK to delete these two collections (or alternatively, confirm that they are needed)? Once that is done I think we can close this issue. |
Ok, I will try to fill in some of the documentation there. I thought it was just for the ones which are used on the website (which I can write what I think it is but it might not be correct since Stephan might have changed something). |
Thanks -- obviously this is most important to have the details for collections in use on the website, but for anything which is not strictly temporary (say, you expect it to still be useful in a few months' time) then having a small note about it would stop others from asking you if it can be deleted! |
I don’t think they are used. In fact, I did not add new collections (I think).
I think so. But, as I said, I cannot really start looking at these things now as there are many things to finish before my family and I move back to Germany on Saturday.
|
Indeed https://github.com/sehlen/lmfdb-inventory says that it is 5 commits ahead of the master, and all those commits are called "Update db-modularforms2.md". Pulling them into the master would be a good place to start. |
ah, damn it, I started with the same collections as you did but with the format similar to the L-functions. |
Sorry, but I mentioned this at least three times now.
|
That's alright, I just thought that anything you added would have been pushed to master. |
|
After checking I dropped the following collections:
|
@fredstro Thank you! I am now happy to close this issue. |
The following databases are either empty or have only empty collections in them:
mwf_dbname
modularforms_2010
quadratic_twists
modforms
WebNewForms
Several others are never referenced by any of the code in LMFDB/lmfdb and have no inventory information in LMFDB/inventory (I will post a list later).
These should definitely be removed from the cloud server (so that they do not appear on http://www.lmfdb.org/api/, for example). Presumably the empty ones (and possibly others) can/should also be removed from the mongo db on Atkin.
The text was updated successfully, but these errors were encountered: