Statistical Network Analysis of the Field of Statistics on Wikipedia In English
The Python script, which extracts edges of our graphs, is inspired by the great brianckeegan's Wikipedia-Network-Analysis python notebook available on github under MIT license.
A statistical analysis in French is available in the report
folder.
We build a directed graph using links between Wikipedia articles related to the specific field of Statistics (but you can quite easily change it if you want).
We have two solutions to get pages related to Statistics :
- Using Category:Statistics
- Using lists of articles about statistics (List_of_statistics_articles and Outline_of_statistics) featured in the Portal:Statistics
See Extract_links_from_API.py
for more details. We strongly recommend using the second solution.
- Data, available in
edges1.csv
andvertex1.csv
files, was extracted the 30/12/2014 using the first solution. - Data, available in
edges2.csv
andvertex2.csv
files, was extracted the 27/12/2014 using the second solution.
If you want to update this data, please donate to Wikimedia, because this operation is quite resource consuming for the MediaWiki API.
Please consider the following problem pointed out by brianckeegan :
Wikipedia article also contain templates (https://en.wikipedia.org/wiki/Help:Template) which creates lots of "redundant" links between articles that share templates even those these links don't appear in the body of the article itself. You'll need to do much more advanced text parsing of wiki-markup to actually get links in the body of an article
- Python 2.7
Anaconda
wikitools
which you can install with pip :pip install wikitools
R
and its packageigraph
The Python and R scripts are under MIT License.