consume-rBeer

A spider/scraper for ratebeer.com

Goals:

spider for all known breweries, beers, styles, and users (public profiles and histories)
parse each of these four page types and output the parsed data into various formats, probably an sqlite db to start

Simple Use:

To grab all the ratings for a beer (Stone Ruination IPA) and the make a histogram (using matplotlib's pylab):

> import Beer
> import pylab
> ruin = Beer.Beer(14709)
> ruin.scrape_user_rating_list()
> pylab.hist([rating for (name, userid, rating) in ruin.ratings], bins=20)

And you should see a nice 20 bin histogram of these ratings, which as of 22 Jan 2012, peaked at just above 4.0, with a gaussian shape.

You can make a similar histogram but this time for every beer for a brewery (Redemption):

> import Brewery
> import pylab
> redemption = Brewery.Brewery(rb_id=11318)
> redemption.parse()
> for beer in redemption.beers:
>     beer.scrape_user_rating_list()
> hist([rating for (name, uid, rating) in beer.ratings for beer in redemption.beers], bins=20)

This one produces a few clusters of bin magnitudes, with the peak at around 3.5

Neat!

some further sample usage of data grabbed with the scrapper can be seen in notebooks/

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
data		data
elasticsearch		elasticsearch
notebooks		notebooks
recs		recs
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

consume-rBeer

Goals:

Simple Use:

About

Releases

Packages

Contributors 2

Languages

gearmonkey/consume-rBeer

Folders and files

Latest commit

History

Repository files navigation

consume-rBeer

Goals:

Simple Use:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages