Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically extract information from http://minecraft.gamepedia.com/ #8

Closed
rom1504 opened this issue Mar 27, 2015 · 19 comments
Closed

Comments

@rom1504
Copy link
Member

rom1504 commented Mar 27, 2015

There are many ways to extract the data that should go into minecraft-data, as discussed there PrismarineJS/mineflayer#229 .
In this issue I'll focus on progress on extracting information from the wiki http://minecraft.gamepedia.com/

@rom1504
Copy link
Member Author

rom1504 commented Mar 27, 2015

The current recipes script in the bin/ folder don't produce data to the new recipe format.
But to do that we need the correspondence name -> [id,metadata] ( #7 ) so these scripts can be updated only once we can extract these items/blocks data.

@rom1504
Copy link
Member Author

rom1504 commented Mar 27, 2015

So, in order to do that extraction, I don't want to use html anymore.
wikitext is much easier to parse.

In order to do get that wikitext there are a few ways :

  • getting the wikitext from the edit page
  • using the API : check if it's actually possible to use it
  • asking the team in charge of the wiki if they'd be ok with giving dumps (also see an example here https://dumps.wikimedia.org/) of the database

The problem with the dump is even if they agree to export them, I don't know how regularly they will do that (since the wiki content change regularly).

@rom1504
Copy link
Member Author

rom1504 commented Mar 27, 2015

Using the api is indeed possible (example)

Instead of using it manually, let's use this https://github.com/macbre/nodemw

@rom1504
Copy link
Member Author

rom1504 commented Mar 27, 2015

There's also a wikitext parser written in node.js (https://github.com/spencermountain/wtf_wikipedia).
Parsoid seems like a more advanced parser, but its purpose is generating html so it seems that won't do (spencermountain/wtf_wikipedia#1)

https://github.com/spencermountain/wtf_wikipedia doesn't work on minecraft wiki (tested on Blocks : it can't find the table and on Gravel : it can't read the infobox)

rom1504 added a commit that referenced this issue Mar 28, 2015
@rom1504
Copy link
Member Author

rom1504 commented Apr 19, 2015

This http://minecraft.gamepedia.com/Data_values is important.

current name in blocks.json and items.json correspond to nothing, wouldn't it be better to replace them by the "nameid" , for example swordDiamond -> diamond_sword (or even minecraft:diamond_sword) ?

@rom1504
Copy link
Member Author

rom1504 commented Apr 19, 2015

http://minecraft.gamepedia.com/Data_values/Block_IDs and http://minecraft.gamepedia.com/Data_values/Item_IDs should be used for the list of blocks and items (that even says if these blocks and items can have metadata) : parsing similar to https://github.com/PrismarineJS/minecraft-data/blob/master/bin/wiki_extractor/entities_extractor.js. Then more data can be found in the page of each block/item.

@rom1504
Copy link
Member Author

rom1504 commented Apr 23, 2015

@rom1504
Copy link
Member Author

rom1504 commented Apr 25, 2015

Items extraction is done

Now trying blocks extraction :

  • id
  • name
  • displayName
  • hardness
  • stackSize
  • diggable
  • boundingBox
  • material
  • harvestTools

material goes along with materials.json. Problem is it seems to have been written manually and doesn't correspond to anything specific in the wiki. Most related thing is this http://minecraft.gamepedia.com/Breaking#Best_tools but I don't really know if it's possible to write materials.json using this.

@rom1504
Copy link
Member Author

rom1504 commented Apr 25, 2015

material : done. materials.json will probably stay manual.

just harvestTools missing.

@rom1504
Copy link
Member Author

rom1504 commented Apr 27, 2015

blocks.json done !

@rom1504
Copy link
Member Author

rom1504 commented Apr 27, 2015

Total progress :

  • entities
  • items
  • blocks
  • materials : manual file (very simple + some edge cases only present in the text on the wiki)
  • biomes
  • instruments : manual from http://wiki.vg/Block_Actions
  • recipes

@rom1504
Copy link
Member Author

rom1504 commented Apr 27, 2015

@rom1504
Copy link
Member Author

rom1504 commented May 1, 2015

Shapeless means it has multiple shapes. So recipes with only one item, or with 9 times the same item are shaped recipes (see http://minecraft.gamepedia.com/Module_talk:Crafting#Shapeless_recipes_marked_as_shaped_recipes and http://minecraft.gamepedia.com/Template_talk:Crafting#remove_shapeless_indicator_when_unambiguous)

@rom1504
Copy link
Member Author

rom1504 commented May 2, 2015

recipes done.

@rom1504
Copy link
Member Author

rom1504 commented May 2, 2015

only biomes missing.

@rom1504
Copy link
Member Author

rom1504 commented May 2, 2015

For biomes : see PrismarineJS/mineflayer#197

@rom1504
Copy link
Member Author

rom1504 commented May 2, 2015

So, biomes current values cannot really be automatically extracted : I added a line in the wiki about how to semi-automatically extract it.

@rom1504
Copy link
Member Author

rom1504 commented May 2, 2015

All the .json now have a extraction procedure ! closing.
The next step is about metadata extraction but that will be done in other issues.

@rom1504 rom1504 closed this as completed May 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant