Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsoid? #1

Closed
legoktm opened this issue Dec 15, 2014 · 2 comments
Closed

Parsoid? #1

legoktm opened this issue Dec 15, 2014 · 2 comments

Comments

@legoktm
Copy link

legoktm commented Dec 15, 2014

Have you looked into the output that Parsoid produces? It converts wikitext into HTML/RDFa (and back!) and is expected to eventually replace the current MediaWiki parser.

An example is: http://parsoid-lb.eqiad.wikimedia.org/enwiki/Main_Page?oldid=598252063 (API docs at https://www.mediawiki.org/wiki/Parsoid/API.

@spencermountain
Copy link
Owner

hi Kunal, yes thanks. Parsoid is certainly profoundly cool, and does prompt me to have less angst. ;)
My hope is to have structured query-able data from a wp dump, and it's totally easier to pick-out data in html than in markup, and especially getting plaintext, but it does require some kind of headless dom thing, on 4m articles, and honestly there's more formatting stuff to remove in the html than in the markup itself.
short answer, I thought about it, then drank too much coffee.

Do you know if there's some kind of intermediate step before parsoid makes its html?
Like, some kind of representation that is more hackable?
I would be very excited if so
thanks for the heads-up, too.

@spencermountain
Copy link
Owner

added a section to the readme, thanks

spencermountain pushed a commit that referenced this issue Jun 5, 2015
removed unwanted "{{Gallery" and "{{Taxobox" from
spencermountain pushed a commit that referenced this issue Apr 21, 2017
Hit callback with null for searches that don't lead to anything useful
spencermountain pushed a commit that referenced this issue Jan 9, 2018
Interlanguage links interfere with wikibook/cookbook links
spencermountain pushed a commit that referenced this issue May 3, 2018
spencermountain pushed a commit that referenced this issue Jun 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants