GitHub - colinmorris/wiki-pageview-floor: Finding and analysing the least viewed articles on English Wikipedia

Trying to find and analyse the least viewed articles on English Wikipedia. See my blog for a writeup of this investigation, In search of the least viewed article on Wikipedia.

Data pipeline

In the course of this investigation, I looked at a few different sets of articles. In each case, the steps for processing them was basically the same.

The first step is to use Quarry to run a SQL query which generates a csv file with page metadata. The main datasets and corresponding queries were:

A sample of around 32k articles having contiguous page_random values ranging from 0.5 to 0.505
- NB: This was before I figured out the trick of calculating random gaps directly as part of the SQL query, so this dataset required calculating the gaps as a postprocessing step, using gaps.py
Pages in the "Phaegopterina stubs" category
The 600k articles having the smallest random gaps

The next step is to run get_views.py, passing in the filename of the csv downloaded from quarry. This will create a csv having a column with article name, plus 12 columns having monthly page views in 2021 for that article, with a final convenience column having the total for the year.

merge.py merges the csv's from steps 1 and 2.

The subsequent analysis and visualization of the merged data is done in the included ipython notebooks.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
100k.ipynb		100k.ipynb
32k.ipynb		32k.ipynb
LICENSE		LICENSE
README.md		README.md
dartboard.ipynb		dartboard.ipynb
gaps.py		gaps.py
get_randos.py		get_randos.py
get_views.py		get_views.py
merge.py		merge.py
moths.ipynb		moths.ipynb
notes.txt		notes.txt
requirements.txt		requirements.txt
sand.py		sand.py
view_sandbox.py		view_sandbox.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data pipeline

About

Releases

Packages

Languages

License

colinmorris/wiki-pageview-floor

Folders and files

Latest commit

History

Repository files navigation

Data pipeline

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages