Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate 3rd down conversion rate #36

Closed
ochawkeye opened this issue Sep 9, 2013 · 6 comments
Closed

Calculate 3rd down conversion rate #36

ochawkeye opened this issue Sep 9, 2013 · 6 comments
Labels

Comments

@ochawkeye
Copy link
Contributor

@poppers112 Asked about how one would go about calculating 3rd down conversion rates for the two different teams involved in a game. I was interested in that as well so attempted to do the same. This isn't an issue, and probably belongs in a cookbook that may one day exist, but I couldn't think of a better place to put this.

My attempt as follows:

import nflgame

year, week, season_type = 2013, 1, 'REG'

def third_down2(teamname):
    games = nflgame.games_gen(
        year, week, teamname, teamname, kind=season_type)
    plays = nflgame.combine_plays(games)
    attempts = 0
    conversions = 0
    for p in plays.filter(team=teamname, third_down_att=True):
        attempts += 1
        conversions += p.third_down_conv
    percentage = float(conversions)/float(attempts)*100
    return '%s: %s of %s (%.2f%%)' % (
        teamname, conversions, attempts, percentage)


games = nflgame.games(year, week, kind=season_type)
for game in games:
    print third_down2(game.home)
    print third_down2(game.away)

This gets the job done.

DEN: 8 of 15 (53.33%)
BAL: 8 of 22 (36.36%)
BUF: 4 of 13 (30.77%)
NE: 11 of 20 (55.00%)
CAR: 5 of 11 (45.45%)
SEA: 6 of 13 (46.15%)
CHI: 6 of 14 (42.86%)
CIN: 7 of 11 (63.64%)
CLE: 1 of 14 (7.14%)
MIA: 8 of 16 (50.00%)
DET: 5 of 13 (38.46%)
MIN: 2 of 10 (20.00%)
IND: 6 of 10 (60.00%)
OAK: 7 of 13 (53.85%)
JAC: 5 of 19 (26.32%)
KC: 5 of 15 (33.33%)
NYJ: 7 of 18 (38.89%)
TB: 6 of 16 (37.50%)
NO: 6 of 13 (46.15%)
ATL: 3 of 11 (27.27%)
PIT: 4 of 13 (30.77%)
TEN: 6 of 15 (40.00%)
STL: 4 of 11 (36.36%)
ARI: 7 of 14 (50.00%)
SF: 9 of 18 (50.00%)
GB: 4 of 10 (40.00%)
DAL: 5 of 15 (33.33%)
NYG: 6 of 11 (54.55%)

How can I improve this method?

@BurntSushi
Copy link
Owner

I like the approach here. The only thing I would change is to make it a bit faster and have your third_down2 function rely less on global state. The key here is to run functions like combine_plays and games as few times as possible. combine_plays aggregates all play data, while games (and games_gen) reads and parses the JSON data from disk (or worse, fetches it from NFL.com if it's an active game). Therefore, we should avoid those calls as much as possible.

Incidentally, this results in a nice clean contract for your third_down2 function: given a play generator and a team, compute the third down conversion rate. This results in a 30% faster program on my machine. Here's the modified code:

import nflgame

year, week, season_type = 2013, 1, 'REG'

def third_down2(teamname, play_gen):
    attempts = 0
    conversions = 0
    for p in play_gen.filter(team=teamname, third_down_att=1):
        attempts += 1
        conversions += p.third_down_conv
    percentage = float(conversions)/float(attempts)*100
    return '%s: %s of %s (%.2f%%)' % (
        teamname, conversions, attempts, percentage)


games = nflgame.games(year, week, kind=season_type)
for game in games:
    print third_down2(game.home, game.drives.plays())
    print third_down2(game.away, game.drives.plays())

This approach also allows for computing third down conversion rates over multiple games by passing the result of combine_plays.

Note that I also changed third_down_att=True to third_down_att=1. It doesn't affect the correctness of your program, but there are no boolean statistical categories, so it's misleading to pretend that there are. :-)

@ochawkeye
Copy link
Contributor Author

I like it. If there is one thing I am very guilty of, it is reliance upon global variables. I also do a poor job of scaling from simple tests to larger tests.

Up until I posted it here, the body of the code had been simply print third_down2('MIN') or print third_down2('DET'). Thought process goes "ok, that worked, now how do I go through games without explicitly specifying the team?...How about generating a list of games for the week and plucking out the home and away teams?...Hey! That worked. I better post it on github". The other side of my brain doesn't know enough yet to objectively look at the code and say "Yo dummy, you are duplicating some effort there".

@BurntSushi Thanks as always for your patient explanations.

@3ny
Copy link

3ny commented Sep 9, 2013

Thank you ochawkeye and BurntSushi. My problem is that I didn't even know that third_down_conv existed. I looked through the nflgame API and couldn't find anything. I found third_down_conv in the nfldb API in the Play class. Should I have known to look in the nfldb API? Sorry if this is a dumb question.

@BurntSushi
Copy link
Owner

@poppers112 That's not a dumb question at all! In fact, that's exactly why issues #11 and #12 exist. There's just no easy to consume documentation yet.

nfldb's API is not a bad place to look. There is a ton of overlap, although there may be a couple small renamings. A more definitive but less convenient place to look is in nflgame/statmap.py. It includes descriptions of each.

@BurntSushi
Copy link
Owner

There is also an ER diagram for nfldb that might be a good way to get a bird's eyeview of the data available.

@BurntSushi
Copy link
Owner

@ochawkeye Just keep doing what your doing. Your code is definitely improving. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants