-
Notifications
You must be signed in to change notification settings - Fork 16
Contributing
If you are assured that Infoboxer takes some page wrong, please create an issue with link to page (or raw wikitext) and description of a problem.
Look at en.wikipedia.org template definitions. It can be extended. Also, similar definitions can/should be created for other language wikipedias and other popular wikis.
You can do pull requests with your own definitions, or create an issue describing which template definitions should be added to Infoboxer.
If you want to fix some bug or implement some feature, please just follow the standard process for github opensource: fork, fix, push, make pull request.
Some (scanty) information below.
- Infoboxer is splitted in several modules (which are clearly visible in API docs and folders structure).
- Most of "easy features" can be added to Navigation module and its submodules: enchancing of navigational experience and implement clever shortcuts (like "converting table to dataframe/list of hashes", for ex.).
- Most of potential bugs can seat in
Parser class
and its modules; MediaWiki markup IS tricky and tightly coupled and
ambigous; there's also some non-implemented features, like
<source>
tag parsing and template definition pages (which, possibly, is not target of Infoboxer anyways). - Most of underfeatured area is in MediaWiki -- seems reasonable for information extraction purposes to have more features from MediaWiki API, like "page list generators", search, "what links here" and similar functionality.
- Most of clarification and documentation is required for Templates module, which is still underloved heart of Infoboxer.
Whether you'd want to put your hands on Parser: please remember, that
it's hand-crafted and thoroughly optimized. The first thought you may
have that it needs more OO decompozition, a class for each case; or more
ideomatic Ruby, or ... Trust me, I've tried it all. But when you are
dealing with hundreds of thousands of parsing operations and tens of
thousands of resulting nodes, it turns out even simplest things like
Object#tap
have performance penalty on large number of calls.