Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pybaseball.pitching_stats() returns batting stats, not pitching stats #267

Closed
armstjc opened this issue May 14, 2022 · 2 comments
Closed

Comments

@armstjc
Copy link

armstjc commented May 14, 2022

When running the following code:

import pybaseball
import pandas as pd

def getFanGraphsBattingStats(start=2020, end=2021):
    for i in range(start,end+1):
        data = pybaseball.batting_stats(i)
        data.to_csv(f'Data/FanGraphs/Batting/{i}_fangraphs_batting.csv',index=False)
        print(data)

def getFanGraphsPitchingStats(start=2020, end=2021):
    for i in range(start,end+1):
        data = pybaseball.pitching_stats(i)
        data.to_csv(f'Data/FanGraphs/Pitching/{i}_fangraphs_pitching.csv',index=False)
        print(data)


if __name__ == "__main__":
    pybaseball.cache.enable()
    getFanGraphsBattingStats()
    getFanGraphsPitchingStats()

It does not get the pitching stats as intended, as shown below:


      IDfg  Season                Name   Team  ...   CSW%    xBA   xSLG  xwOBA
18   19709    2020  Fernando Tatis Jr.    SDP  ...  0.269  0.297  0.614  0.419
1     5361    2020     Freddie Freeman    ATL  ...  0.191  0.341  0.660  0.464
4    13510    2020        Jose Ramirez    CLE  ...  0.229  0.263  0.505  0.371
9    15676    2020          Jose Abreu    CHW  ...  0.293  0.299  0.587  0.398
20   13611    2020        Mookie Betts    LAD  ...  0.272  0.281  0.481  0.359
..     ...     ...                 ...    ...  ...    ...    ...    ...    ...
130  13145    2020           Josh Bell    PIT  ...  0.284  0.228  0.381  0.297
139   6153    2020     Eduardo Escobar    ARI  ...  0.260  0.261  0.394  0.305
117   3892    2020        Josh Reddick    HOU  ...  0.260  0.245  0.358  0.300
128   6184    2020       J.D. Martinez    BOS  ...  0.263  0.229  0.444  0.316
136  10071    2020     Jonathan Villar  - - -  ...  0.266  0.211  0.281  0.256

[142 rows x 319 columns]
      IDfg  Season                Name   Team  ...   CSW%    xBA   xSLG  xwOBA
3    19709    2021  Fernando Tatis Jr.    SDP  ...  0.270  0.279  0.618  0.406
1    20123    2021           Juan Soto    WSN  ...  0.263  0.304  0.544  0.430
8    16252    2021         Trea Turner  - - -  ...  0.262  0.303  0.484  0.362
0    11579    2021        Bryce Harper    PHI  ...  0.263  0.301  0.610  0.430
20   13510    2021        Jose Ramirez    CLE  ...  0.233  0.281  0.505  0.374
..     ...     ...                 ...    ...  ...    ...    ...    ...    ...
123  10243    2021      Randal Grichuk    TOR  ...  0.249  0.233  0.402  0.294
95   14221    2021         Jorge Soler  - - -  ...  0.269  0.249  0.493  0.354
125   2396    2021      Carlos Santana    KCR  ...  0.242  0.244  0.421  0.334
118   1744    2021      Miguel Cabrera    DET  ...  0.274  0.231  0.415  0.313
126  15117    2021       Hunter Dozier    KCR  ...  0.302  0.224  0.388  0.299

[132 rows x 319 columns]
      IDfg  Season                Name   Team  ...   CSW%    xBA   xSLG  xwOBA
18   19709    2020  Fernando Tatis Jr.    SDP  ...  0.269  0.297  0.614  0.419
1     5361    2020     Freddie Freeman    ATL  ...  0.191  0.341  0.660  0.464
4    13510    2020        Jose Ramirez    CLE  ...  0.229  0.263  0.505  0.371
9    15676    2020          Jose Abreu    CHW  ...  0.293  0.299  0.587  0.398
20   13611    2020        Mookie Betts    LAD  ...  0.272  0.281  0.481  0.359
..     ...     ...                 ...    ...  ...    ...    ...    ...    ...
130  13145    2020           Josh Bell    PIT  ...  0.284  0.228  0.381  0.297
139   6153    2020     Eduardo Escobar    ARI  ...  0.260  0.261  0.394  0.305
117   3892    2020        Josh Reddick    HOU  ...  0.260  0.245  0.358  0.300
128   6184    2020       J.D. Martinez    BOS  ...  0.263  0.229  0.444  0.316
136  10071    2020     Jonathan Villar  - - -  ...  0.266  0.211  0.281  0.256

[142 rows x 319 columns]
      IDfg  Season                Name   Team  ...   CSW%    xBA   xSLG  xwOBA
3    19709    2021  Fernando Tatis Jr.    SDP  ...  0.270  0.279  0.618  0.406
1    20123    2021           Juan Soto    WSN  ...  0.263  0.304  0.544  0.430
8    16252    2021         Trea Turner  - - -  ...  0.262  0.303  0.484  0.362
0    11579    2021        Bryce Harper    PHI  ...  0.263  0.301  0.610  0.430
20   13510    2021        Jose Ramirez    CLE  ...  0.233  0.281  0.505  0.374
..     ...     ...                 ...    ...  ...    ...    ...    ...    ...
123  10243    2021      Randal Grichuk    TOR  ...  0.249  0.233  0.402  0.294
95   14221    2021         Jorge Soler  - - -  ...  0.269  0.249  0.493  0.354
125   2396    2021      Carlos Santana    KCR  ...  0.242  0.244  0.421  0.334
118   1744    2021      Miguel Cabrera    DET  ...  0.274  0.231  0.415  0.313
126  15117    2021       Hunter Dozier    KCR  ...  0.302  0.224  0.388  0.299

[132 rows x 319 columns]
The thread 'MainThread' (0x1) has exited with code 0 (0x0).
The program 'python.exe' has exited with code 4294967295 (0xffffffff).


@tjburch
Copy link
Collaborator

tjburch commented May 20, 2022

This is effectively a duplicate of #221. I was unable to reproduce, but seems like others are running into it as well. @schorrm - any thoughts here?

@bdilday
Copy link
Contributor

bdilday commented May 20, 2022

I took a look at it and I was able to reproduce it (using pybaseball.cache.enable() as shown in the example above).

It seems to me that the caching is unable to distinguish between the two calls. I speculate that it has to do with they're both calling the fetch method
https://github.com/jldbc/pybaseball/blob/master/pybaseball/datasources/fangraphs.py#L224-L226

of classes that are derived from the FangraphsDataTable (abstract) class
https://github.com/jldbc/pybaseball/blob/master/pybaseball/datasources/fangraphs.py#L76-L81

I have 2 guesses of what might fix it:

  • add a fetch method to each of the derived classes so the caching can recognize it's a different call
  • update the way the caching hashes a function call

but honestly I don't have a strong understanding of the caching so I don;t really know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants