Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove obsolete/empty databases from mongo db #1431

Closed
AndrewVSutherland opened this issue May 21, 2016 · 115 comments
Closed

Remove obsolete/empty databases from mongo db #1431

AndrewVSutherland opened this issue May 21, 2016 · 115 comments
Labels
backend Backend server/database pending Use to flag pending issues during a workshop

Comments

@AndrewVSutherland
Copy link
Member

The following databases are either empty or have only empty collections in them:

mwf_dbname
modularforms_2010
quadratic_twists
modforms
WebNewForms

Several others are never referenced by any of the code in LMFDB/lmfdb and have no inventory information in LMFDB/inventory (I will post a list later).

These should definitely be removed from the cloud server (so that they do not appear on http://www.lmfdb.org/api/, for example). Presumably the empty ones (and possibly others) can/should also be removed from the mongo db on Atkin.

@AndrewVSutherland AndrewVSutherland added the backend Backend server/database label May 21, 2016
@JohnCremona
Copy link
Member

I can think of no reason for one of us not to remove these 5 right now. If they are removed from Warwick what happens with replication? I don't know how to connect or the cloud replicas.

@AndrewVSutherland
Copy link
Member Author

I can take care of the cloud. If you delete it on Warwick I believe
this should get propagated automatically to the replicas (but
@edgarcosta can confirm).

On 2016-05-21 10:14, John Cremona wrote:

I can think of no reason for one of us not to remove these 5 right
now. If they are removed from Warwick what happens with replication?
I don't know how to connect or the cloud replicas.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#1431 (comment)

@AndrewVSutherland
Copy link
Member Author

@JohnCremona, @edgarcosta The five empty databases listed above have been removed from the cloud mongo db (running on ms.lmfdb.xyz) and no longer appear on www.lmfdb.org/api/

@JohnCremona
Copy link
Member

JohnCremona commented May 21, 2016

You beat me to it, I have removed the first one....and now the others.
There is no such thing as "the mongo server at atkin". The warwick mongoserver is on lmfdb.warwick.ac.uk and looking at beta.lmfdb.org/api you can verify that they are gone from there.

@AndrewVSutherland
Copy link
Member Author

@JohnCremona, @davidfarmer Do we need the databases knowledge_5, knowledge_6, knowledge_7, knowledge_8, knowledge_9, and knowledge_tmp? It looks to me like the knowledge database contains everything that is in these.

@AndrewVSutherland
Copy link
Member Author

@sehlen Do we still need the databases modularforms and modularforms_raw? It looks like the only place where modularforms is referenced is in test_root.py (which should presumably be changed to modularforms2).

@edgarcosta
Copy link
Member

edgarcosta commented May 21, 2016

@AndrewVSutherland The knowledge_* stuff is my fault. Cleaning it now.

edit: Done

@AndrewVSutherland
Copy link
Member Author

@JohnCremona I presume it makes sense to remove the "limbo" database from the cloud server (and possibly Warwick as well?)

@jwj61
Copy link
Member

jwj61 commented May 21, 2016

Yes. It was the original Artin representation database, but has been superseded by artin. So yes, it can be deleted.

@JohnCremona
Copy link
Member

JohnCremona commented May 21, 2016

OK then I'll do the honours at the Warwick end...done (limbo dropped).

@edgarcosta
Copy link
Member

What about "MaassWaveForm" (without the "s")?

There are 2 files that mention it:

@edgarcosta
Copy link
Member

limbo also dropped at ms.lmfdb.xyz

@JohnCremona
Copy link
Member

mongo makes it much too easy to create a new database by mistake after a typo. At least, that was true before we added authentication.

@fredstro
Copy link
Contributor

The collection
‘MaassWaveForm’
can be dropped, similarly the collections
‘modularforms_raw’ (this is not really used and was something I was experimenting with)
and
‘modularforms’ (this was the predecessor to ‘modularforms2’ and hasn’t been used since 2012)
can both be dropped.

Fredrik

On 21 May 2016, at 16:58, edgarcosta notifications@github.com wrote:

What about "MaassWaveForm" (without the "s")?

There are 2 files that mention it:

never ends up using it : https://github.com/LMFDB/lmfdb/blob/master/lmfdb/modular_forms/maass_forms/maass_waveforms/backend/mwf_utils.py https://github.com/LMFDB/lmfdb/blob/master/lmfdb/modular_forms/maass_forms/maass_waveforms/backend/mwf_utils.py
Wants to check for "MaassWaveForms" instead?: https://github.com/LMFDB/lmfdb/blob/master/lmfdb/test_root.py https://github.com/LMFDB/lmfdb/blob/master/lmfdb/test_root.py

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub #1431 (comment)

@edgarcosta
Copy link
Member

Catching!

Done at warwick

lmfdb0:PRIMARY> use MaassWaveForm
switched to db MaassWaveForm
lmfdb0:PRIMARY> db.dropDatabase()
{ "dropped" : "MaassWaveForm", "ok" : 1 }
lmfdb0:PRIMARY> use modularforms_raw
switched to db modularforms_raw
lmfdb0:PRIMARY> db.dropDatabase()
{ "dropped" : "modularforms_raw", "ok" : 1 }
lmfdb0:PRIMARY> use modularforms
switched to db modularforms
lmfdb0:PRIMARY> db.dropDatabase()
{ "dropped" : "modularforms", "ok" : 1 }

and ms:

> use limbo
switched to db limbo
> db.dropDatabase()
{ "dropped" : "limbo", "ok" : 1 }
> use MaassWaveForm
switched to db MaassWaveForm
> db.dropDatabase()
{ "dropped" : "MaassWaveForm", "ok" : 1 }
> use modularforms_raw
switched to db modularforms_raw
> db.dropDatabase()
{ "dropped" : "modularforms_raw", "ok" : 1 }
> use modularforms
switched to db modularforms
> db.dropDatabase()
{ "dropped" : "modularforms", "ok" : 1 }

@edgarcosta
Copy link
Member

I also took a snapshot before going on this dropDatabase spree...

@JohnCremona
Copy link
Member

@edgarcosta Good idea. I was thinking that we also have the weekly dumps, but they are not kept for that long (40 days according to the backup script). Though I do have some other copies.

@edgarcosta
Copy link
Member

The db just got much slimmer:

  • 1TB using MMAPv1 (e.g. warwick and ms)
  • 620GB using using wiredTiger+zlib

@edgarcosta
Copy link
Member

edgarcosta commented May 21, 2016

@JohnCremona we should discuss how frequently, and for how long we should keep backups in the cloud

@JohnCremona
Copy link
Member

@edgarcosta Your previous comment reminded me of a question you asked me (no idea which thread or issue or what) which I did not understand. If it's about filesystems used on the Warwick server please include Bober and Schilly (perhaps you did).

@AndrewVSutherland
Copy link
Member Author

Does anyone know anything about ap_statistics? It is not referenced by any code in LMFDB/lmfdb.

@fredstro
Copy link
Contributor

It contains data about a(p)’s for newforms of weight 2, trivial character and where the degree of the coefficient field is 2.
It is data which I and David F. was/are planning to use to analyse a variant of Maeda’s conjecture…
(The collection ‘madea’ in ‘modularforms2’ contains a subset of collection ap_data on the database ‘ap_statistics)

I just mongodumped the database so you can safely delete it if you like.

At some point in the future we would probably like to display statistics for modular forms but we have to think more about exactly what should be (pre-)computed and what should be stored where.

The reason why there are so many collections in modularforms2 is that the mongo server in warwick (previously Washington I think) was basically the
only database we all had had access to so it was used for a lot of testing, experimenting development and debugging…

Basically every collection on modularforms2 which is not referenced to somewhere in the lmfdb can probably be safely deleted (@sehlen?).

.

On 21 May 2016, at 19:35, Andrew Sutherland notifications@github.com wrote:

Does anyone know anything about ap_statistics? It is not referenced by any code in LMFDB/lmfdb.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub #1431 (comment)

@AndrewVSutherland
Copy link
Member Author

@fredstro, @sehlen, So when I updated the modularforms2 data on www.lmfdb.org I copied over the following 11 collections:

dimension_table
dimension_table.chunks
dimension_table.files

webmodformspace
webmodformspace.chunks
webmodformspace.files

webnewforms
webnewforms.chunks
webnewforms.files

webeigenvalues.chunks
webeigenvalues.files

I see the code also references "webmodformspace_dimension" in emf_utils.py

Are there any other collections that are needed?

@AndrewVSutherland
Copy link
Member Author

@fredstro OK, it sounds like I can definitely remove ap_statistics from the cloud, I'll let @JohnCremona decide what to do on the Warwick machines.

@AndrewVSutherland
Copy link
Member Author

@davidfarmer is it safe to remove the database Lfunction from the cloud (and possibly Warwick also)? It looks like the code uses the Lfunctions database (both for new and old formats)

@JohnCremona
Copy link
Member

As luck would have it one of the tests in test_root checks for MaassWaveForm, so now fails. Rather than fix this one, we should adapt the test_db function there to check for a complete list of the databases which we actually need.

@edgarcosta
Copy link
Member

I saw that, and I was going to edit that, but then I noticed that in the master branch it is an empty file:
https://github.com/LMFDB/lmfdb/blob/master/test_root.py

@JohnCremona
Copy link
Member

Wrong file: lmfdb/test_root.py is the one. That empty file can probably be deleted!

@JohnCremona
Copy link
Member

@jwj61 the test of the link to http://hobbes.la.asu.edu/lmfdb-14/ now fails. Is that just temporary? There's a test for it in test_acknowledgements.py

@AndrewVSutherland
Copy link
Member Author

I added a new /stats option to the api interface that generates a table of mongo db collections ordered by size and/or number of objects; you can check it out at:

http://www.lmfdb.org/api/stats
http://beta.lmfdb.org/api/stats

One thing that jumps out is that on beta the single largest collection (about 300 GB) is ap.chunks in modularforms2 and the third largest is vector_on_basis.chunks (at 135 GB). Neither of these collections is present on www.lmfdb.org. Is there any reason to keep them on beta?

@AndrewVSutherland AndrewVSutherland added the pending Use to flag pending issues during a workshop label Jul 11, 2016
@AndrewVSutherland
Copy link
Member Author

Following the discussion on #1294 and a review of the outstanding issues on LMFD/invenetory, I propose that we delete the following collections from the mongo db on lmfdb.warwick.ac.uk and then close this issue:

http://beta.lmfdb.org/api/modularforms2/ap.chunks/
http://beta.lmfdb.org/api/modularforms2/ap.files/
http://beta.lmfdb.org/api/modularforms2/Atkin_Lehner.chunks/
http://beta.lmfdb.org/api/modularforms2/Atkin_Lehner.files/
http://beta.lmfdb.org/api/modularforms2/vector_on_basis.chunks/
http://beta.lmfdb.org/api/modularforms2/vector_on_basis.files/
http://beta.lmfdb.org/api/modularforms2/Newform_factors.chunks/
http://beta.lmfdb.org/api/modularforms2/Newform_factors.files/
http://beta.lmfdb.org/api/modularforms2/Modular_symbols.chunks/
http://beta.lmfdb.org/api/modularforms2/Modular_symbols.files/
http://beta.lmfdb.org/api/modularforms2/WebNewforms.chunks/
http://beta.lmfdb.org/api/modularforms2/WebNewforms.files/
http://beta.lmfdb.org/api/modularforms2/webmodformspace_dimension/
http://beta.lmfdb.org/api/modularforms2/maeda/
http://beta.lmfdb.org/api/modularforms2/converted_E/
http://beta.lmfdb.org/api/ap_statistics/ap_data/

None of these collections are present on www.lmfdb.org, and none of them are referenced anywhere in the LMFDB code base; @sehlen, @fredstro does this list look OK to you? If there are any collections listed above that you think should be kept, please say so now (they will still be available on backups).

We also plan to eventually remove the database Lfunction (as opposed to Lfunctions), but this is covered under a separate issue #1456 and will be done only after references to it in the code are removed.

Once this is done we can proceed with #1294 and close that issue as well.

@fredstro
Copy link
Contributor

As I said around May when this question came up in another thread or email discussion about modular forms (they are all so full of information now that I can't find which one it was )
As far as I know you xan delete all of them. I actually thought they were deleted after I said this the last time but didn't bother to check.

@sehlen
Copy link
Contributor

sehlen commented Jul 16, 2016

Fredrik, this is about Warwick now. I don't have time to look at it right
now but we don't want the modular symbols collections etc to be deleted
there without moving/copying them to another mongodb first, I guess...

On Sat, Jul 16, 2016, 10:55 Fredrik Strömberg notifications@github.com
wrote:

As I said around May when this question came up in another thread or email
discussion about modular forms (they are all so full of information now
that I can't find which one it was )
As far as I know you xan delete all of them. I actually thought they were
deleted after I said this the last time but didn't bother to check.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1431 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACW20XMQKCYgpTlrLVOvey0CI2JrNSIQks5qWPDtgaJpZM4IjwEU
.

@AndrewVSutherland
Copy link
Member Author

@fredstro Thanks for the confirmation!

@AndrewVSutherland
Copy link
Member Author

@sehlen, @fredstro If you still need Modular_symbols.chunks and Modular_symbols.files, that's fine, I can remove them from the list of collections to be removed (they aren't all that big), I just need to know.

@AndrewVSutherland
Copy link
Member Author

Given that we have not heard anything further from @sehlen or @fredstro I would propose that we go ahead and delete all the collections listed in #1431 (comment) except for modularforms2/ModularSymbols.* and plan to proceed with #1294 (perhaps this weekend @edgarcosta, @JohnCremona ?).

@JohnCremona
Copy link
Member

Sounds good to me. Perhaps do it to finish before the weekly backup starts at midnight Saturday GMT (which is 2300 BST so I guess 1800 Saturday on the East coast).

@edgarcosta
Copy link
Member

FYI, I will be flying back to the US over this weekend.

@AndrewVSutherland AndrewVSutherland modified the milestone: v1.1 Jul 20, 2016
@fredstro
Copy link
Contributor

Ok, when I wrote my answer I didn't see (I was reading on my phone) that this was concerning the mongodb on lmfdb in Warwick since I didn't see any obvious reason for deleting any of the mentioned collections from that server.
After the previous discussions in May (around issue #1376) I believed that we had sufficient space to store these collections there and that the conclusion was that we decided on which collections are needed for the website and these were copied to the cloud while the lmfdb.warwick server could continue to contain collections with "primitive" research data not necessarily presented on the website.
All of the mentioned collections except the last 3 (maeda, ap_data and converted_E) are used to construct the WebNewForms but they are not accessed from the website so if the new url e is that lmfdb.warwick.ac.uk should only contain data for the website (exactly as the cloud servers) then they can safely be removed.
Since it seems that you are running out of space we (Stephan and I) can try to find somewhere else to store the modular forms data. Since my last personal backup was in May I am now making another dump so we don't lose anything computed by Stephan after that point.

After Stephan's latest rewriting I wasn't sure exactly which collections contains what and which ones we would like to keep. I had hoped we had time for Stephan to finish his move to Germany and get settled before the storage issues became critical.

There are too many messages from github at the moment so my gmail just dumps everything into spam even though I try to mark them as non-spam. It would be good if there was a setting so that some of these messages only got sent to the release managers so I could actually follow the relevant discussions (as I don't want to sync my massive spam folder to my phone).

@AndrewVSutherland
Copy link
Member Author

@fredstro I will delete the last 3, but are you sure you really need all the others?

For example, are the collections

webnewforms.chunks and webnewforms2.chunks and WebNewForms.chunks
webnewforms.files and webnewforms2.files and WebNewForms.files
webmodformspace.chunks and webmodformspace2.chunks and WebModformspace.chunks
webmodformspace.files and webmodformspace2.files and WebModformspace.files

all necessary? Perhaps just 2 (rather than 3) variants of each of these would suffice?

BTW, you can browse the full list of collections on the Warwick mongo db modularforms2 database at http://beta.lmfdb.org/api/ (scroll down to see modularforms2).

@AndrewVSutherland
Copy link
Member Author

@fredstro One thing you can do to reduce the number of github e-mails you get is to go to https://github.com/settings/notifications and turn off e-mails for repositories you are watching (you will still get e-mails for conversations you are participating in (like this one), and for comments that @mention you explicitly). You can also unclick the notifications for Pull Request Reviews and Pull Request Pushes (I think you will still get notifications regarding pull requests you initiate).

@JohnCremona
Copy link
Member

It is not that we are running out of space on the Warwick server, but we still want to delete things which are not needed, since not only do they use up space but so do their copies on the weekly backups. We do have more disks which can be brought online but I am delaying that until a lot of other things have finished since last time I did that it was quite disruptive.

@fredstro
Copy link
Contributor

fredstro commented Jul 20, 2016

Ok.
I was responding to the collections you (Drew) listed above which did not include the lower case versions.
@JohnCremona I see, I can understand that although the largest collection is that of aps which is quite important (since there might be a lot of them which are not converted to the web format yet) .
Most of the 'random' collections are just used for testing and could probably be removed but I wouldn't do it before making sure that Stephan doesn't need them.

I can only tell you which collections were used before Stephan's May refactoring (I am not 100% certain he did not change the names or introduced anything new). Namely the following:

I don't recognise a lot of the collections in /api/ so I wouldn't remove any of them without checking that Stephan is done with the testing/checking/recomputations which began in May.
@sehlen : from my point of view, I don't need to keep any of the collections other than the ones mentioned above together with the ones used for the webobjects (whatever names they may have)

Oh, and thanks for the hint about the notifications!

@JohnCremona
Copy link
Member

@fredstro It would be great to put some of the information above into https://github.com/LMFDB/lmfdb-inventory/blob/master/db-modularforms2.md -- even a short description without the detail (which may not even make sense for thse files/chunks things).

@AndrewVSutherland
Copy link
Member Author

@fredstro Thank your for the clarification this is very helpful!

Among the larger collections (those > 1GB, there are also many smaller ones that I expect can be deleted but there is no rush to deal with those), the only remaining candidates for deletion are:

http://beta.lmfdb.org/api/modularforms2/WebNewforms.chunks/
http://beta.lmfdb.org/api/modularforms2/WebNewforms.files/

which look to me like old versions of webnewforms.* (and there is also webnewforms2.*).

@sehlen Can you confirm that it is OK to delete these two collections (or alternatively, confirm that they are needed)? Once that is done I think we can close this issue.

@fredstro
Copy link
Contributor

Ok, I will try to fill in some of the documentation there. I thought it was just for the ones which are used on the website (which I can write what I think it is but it might not be correct since Stephan might have changed something).
I also thought it only applied for collections which could be obtained in json format I didn't think about all the gridfs collections.
I am not sure what we really intend with the api but if we really want users to interface with it we should probably give an openapi/swagger spec of all endpoints and only allow access to fully described endpoints.

@JohnCremona
Copy link
Member

Thanks -- obviously this is most important to have the details for collections in use on the website, but for anything which is not strictly temporary (say, you expect it to still be useful in a few months' time) then having a small note about it would stop others from asking you if it can be deleted!

@sehlen
Copy link
Contributor

sehlen commented Jul 20, 2016

On Jul 20, 2016, at 09:50, Andrew Sutherland notifications@github.com wrote:

@fredstro https://github.com/fredstro Thank your for the clarification this is very helpful!

Among the larger collections (those > 1GB, there are also many smaller ones that I expect can be deleted but there is no rush to deal with those), the only remaining candidates for deletion are:

http://beta.lmfdb.org/api/modularforms2/WebNewforms.chunks/ http://beta.lmfdb.org/api/modularforms2/WebNewforms.chunks/
http://beta.lmfdb.org/api/modularforms2/WebNewforms.files/ http://beta.lmfdb.org/api/modularforms2/WebNewforms.files/

I don’t think they are used. In fact, I did not add new collections (I think).
Maybe @fredstro could take a quick look at the data. If you have a backup, Fredrik, then that’s also fine, I guess.

which look to me like old versions of webnewforms.* (and there is also webnewforms2.*).

I think @fredstro used webnewforms2 when he started cleaning up data last year before the release.

@sehlen https://github.com/sehlen Can you confirm that it is OK to delete these two collections (or alternatively, confirm that they are needed)? Once that is done I think we can close this issue.

I think so. But, as I said, I cannot really start looking at these things now as there are many things to finish before my family and I move back to Germany on Saturday.
Also, @fredstro, if you start documenting, please have a look at my fork of the inventory repo, where I started the documentation to the best of my knowledge.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #1431 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/ACW20ZoxSmXQqUATn-b2YxfCKJ5d7-0pks5qXieygaJpZM4IjwEU.

@JohnCremona
Copy link
Member

Indeed https://github.com/sehlen/lmfdb-inventory says that it is 5 commits ahead of the master, and all those commits are called "Update db-modularforms2.md". Pulling them into the master would be a good place to start.

@fredstro
Copy link
Contributor

ah, damn it, I started with the same collections as you did but with the format similar to the L-functions.

@sehlen
Copy link
Contributor

sehlen commented Jul 20, 2016

Sorry, but I mentioned this at least three times now.

On Jul 20, 2016, at 11:08, Fredrik Strömberg notifications@github.com wrote:

ah, damn it, I started with the same collections as you did but with the format similar to the L-functions.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #1431 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/ACW20RxAMI6buGylVeqkgiOWa7O3AON3ks5qXjn5gaJpZM4IjwEU.

@fredstro
Copy link
Contributor

That's alright, I just thought that anything you added would have been pushed to master.

@sehlen
Copy link
Contributor

sehlen commented Jul 20, 2016

On Jul 20, 2016, at 12:14, Fredrik Strömberg notifications@github.com wrote:

That's alright, I just thought that anything you added would have been pushed to master.

No, I didn’t issue a pull request because I was not nearly done yet and the information is partly “wrong” because of that.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #1431 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/ACW20cen2JwnELbW6m87oSM7hyoHGS2sks5qXklVgaJpZM4IjwEU.

@fredstro
Copy link
Contributor

After checking I dropped the following collections:

  • test
  • dimension_table_old
  • webnewforms2 + .files + .chunks
  • webmodformspace2 + .files + .chunks
  • WebModformspace.files + .chunks
  • WebNewforms.files + .chunks
  • file_checked
  • ambient_mongo_checked'
  • checked_stuff
  • webmodformspace_errors
  • character_orbits.files + .chunks
  • aps_mongo_checked
  • dimensions (this was used by the now obsolete (I think) class DimensionTable in emf_classes.py
    the remaining collections are used (for the moment) and should be kept and they are all documented (if I missed some let me know)

@AndrewVSutherland
Copy link
Member Author

@fredstro Thank you! I am now happy to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Backend server/database pending Use to flag pending issues during a workshop
Projects
None yet
Development

No branches or pull requests