-
Notifications
You must be signed in to change notification settings - Fork 24
scholarpedia.org #32
Comments
SGTM! We can do this once #20 is resolved. |
There is a newer version now available at https://archive.org/details/wiki-scholarpediaorg-20151102 |
The articles are here. @DataWraith Feel like converting these to HTML? :) It's quite a bit smaller than wikipedia, so should hopefully be less problematic. |
Heh. Eventually I'd like to write a program that converts a MediaWiki dump to HTML (probably by running it through pandoc), but right now I'm fairly busy, sorry. I could only do the Wikipedia dump because a third party provided a dump in the OpenZIM format, and an easy-to-use library was available for reading and converting that. With a raw XML dump, I'd have to roll my own solution, which would take more time than I currently have. |
(@vitzli thanks for updating the archive in archive.org) |
@DataWraith No worries. I might have a go at getting it to render with https://github.com/davidar/markup.rocks @vitzli didn't realise you where the one who pushed the updated copy - thanks :) |
I took another look at this, and wanted to share what I found, in case it is useful to the next person. Extracting the article markup from the XML dump is pretty easy, actually. But just having the article markup doesn't really gain you much. Simple articles can be rendered through I think our best bet is for someone to actually setup a MediaWiki instance and then use MWDumper to load the dump, and then export to HTML with mwoffliner. From what I can tell, this is the workflow that was used to create the HTML content for the ZIM files I used to dump Wikipedia. The entire process is pretty convoluted though (Database, MediaWiki, Redis, Node...), so I'm currently not willing to tackle it. If I were to do it, I'd probably try to setup everything in Docker containers with Docker Compose though, so that it is repeatable and applicable to other Wiki dumps. Edit: Okay, so I couldn't resist fiddling around with this, despite my earlier words. Took much less time than I estimated too, because I could draw on pre-made docker images. The hard part (MWDumper) is yet to come, but I'm confident I'll have this figured out soonish, maybe even this weekend. |
sigh This is much harder than it looked in the beginning. I realize I'm flip-flopping on this a lot -- should've kept my mouth shut from the beginning. Anyway. This post is as much for venting as for information's sake, so feel free to ignore it. I wanted the process of creating HTML dumps from XML dumps to be repeatable, so I set up everything in Docker containers. Turns out that the pre-made docker containers for the necessary software I could find are mostly outdated, so I had to make them from scratch after running into problems with version incompatibilities. I managed to setup a local MediaWiki instance with a MySQL database and import the Scholarpedia dump using MWDumper in an automated and repeatable fashion, but getting MediaWiki to render mathematical equations took the better part of the weekend (TeX didn't work at all, no matter what I tried, so I had to switch to Mathoid, which meant getting yet another web service up and running...), and it's still not working to my satisfaction (occasionally returns HTTP 400 -- Bad Request). It doesn't help that the documentation on any of this is extremely sparse. The entire process looks like this:
Remaining work
|
(sounds more doable, as in, less headache than latex->html) |
Parsoid is intended to be able to convert from MediaWiki markup to HTML and back in a lossless fashion (they do 'round trip testing'). I haven't noticed any mistakes with the conversion, but from what I gather from the limited documentation available, the conversion process isn't 100% perfect yet. The fact that they need to be able to make round trips also bloats the generated HTML somewhat. The files use absolute links too, so the additional step of using mwoffliner is necessary to produce an IPFS-suitable folder of files. I'll try to get that working next weekend (so that I have something to show even if the equations don't work quite right yet), but given my over-optimism so far, I don't want to promise anything. |
Hrm, it's unfortunate that MediaWiki is such a beast. I've also converted it to a GitHub Wiki (example). It's somewhat passable, but definitely not perfect. |
LICENSE: CC BY-NC-SA 3.0 [1]
Like SEP but for science, e.g. http://www.scholarpedia.org/article/Faddeev-Popov_ghosts by Faddeev himself.
There is an outdated archive in https://archive.org/details/wiki-scholarpediaorg_w.
[1] http://www.scholarpedia.org/article/Scholarpedia:Terms_of_Use#Scholarpedia.27s_Licenses_to_You.2C_and_Your_license_to_parties_other_than_Scholarpedia
The text was updated successfully, but these errors were encountered: