-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PSA - Nfl data feed is down #130
Comments
just so no one is panicking... There is a light at the end of the tunnel and there is a way out of this for us. There are some potentially huge breaking changes... There is a very real possibility of removing all http requests to nfl.com and relying only on data already downloaded. |
Unlocking to invite conversation on this subject. I've also released version 3.0.0 which will work with historic data. |
Hey, First off: really sorry to hear that the NFL api is essentially dead, and I think I speak for everyone who's used the library to say thanks for your work on fixes to get the library to work with historic data as a stopgap. As for moving forward, I'd be happy to help with measures to scrape the NFL website. I've got a decent amount of time to work on them given the COVID situation, and am especially invested in the live play-by-play data as I use that for a Twitter bot I run (although am happy to help on roster data or anything else too). |
I am also looking to work on this. I rely heavily on the live stats for my fantasy football league app and would really like to get this back up before the start of the season. I am mostly JS/PHP code wise but have experience web scraping and should have no problem adapting and collaborating. I will probably start digging around sometime soon. Is there somewhere anyone is willing to work together or should we just keep it all here? |
The real value of the NFL feeds was the statids included with each play. It would be difficult to derive that info from the play by play text. |
I haven't used the new API yet. But if I understand correctly, it looks like the playStatType enum takes the place of |
I ended up here in search of info when I realized the old NFL API was removed. Since I haven't used nflgame (yet) I'm not sure if this will be useful to you but today I did figure out how to use their new API to retrieve schedules, scores, and I think at least some stats. It looks like they're using a revised version of the API they documented in ~2015 at https://api.nfl.com/docs Since I'm not sure that I'll have the time to familiarize myself with nflgame and create a proper pull request, I thought I'd share what I found here. This gist will connect to the new NFL API, retrieve an access token, and then uses the access token to pull down the 2020 week 1 schedule. I was also able to successfully retrieve the equivalent of 2019 week 1 score strip. |
This new API does look promising...I’m not that familiar with the nflgame source code either and wouldn’t know where to begin on implementing, but it looks like this could replace the existing functionality in every way? (correct me if I’m wrong) I’d probably defer to Derek or somebody else to be in charge of this effort, but still happy to help in whatever way I can. |
@justin-haight Was wondering what you used as a reference. |
I watched the network traffic when I loaded their scores page and then tried it out with postman. Wish I could say it was something cooler but at least I did have to work for it with all the ads and analytics traffic they have. |
When we extract the token from nfl.com and force it int headers we are explicitly against the NFL.com ToS, leaving us open to facing legal action. I would like to come up with a long-term solution for this project that is 100% legal and within their Terms and Services. Personally, I do not use this library for much any longer, so its unlikely that I will be contributing code to something that will leave me open to a lawsuit. Fail Safe Option |
There is no clear way to legally access this api. This api documentation page has been public for at least 3 years and I've yet to find a way to get an actual token. I believe it is reserved for actual NFL partners. |
Also I got a gig, so I'll be indisposed this week. Really sorry but gotta pay them bills. 100% open to ideas on how to legally bring this back up and running. Again, the fail safe here is to just convert the nflscrapr data into nflgame data after a game completes. I'll be linking the actual data repository, kinda busy right now and need to get back to it. |
I've started writing a Scala/Spark program to concatenate all the files form nflscrapR and push them to a Postgres database. It's currently only creating parquet files, but here's the link if anyone's interested. |
The link for your gist gives a 404 error. Is it still available? |
@BrianT71 Sorry I removed the public gist after Derek pointed out I was using some IDs in my request which are required to make it work. They probably didn't intend to make those public and I wasn't sure what else it might expose. |
@justin-haight were you doing something other than calling the re****e endpoint with specific headers and grant type in the body? (redacted endpoint name for same reason you removed the gist) |
I've started deconstructing the new API and found all of the game data (game information, play by play and player stats) but I have not been able to find anything on pulling rosters (or even all players) to get their team info and any IDs. I previously scrapped this from the roster page (www.nfl.com/teams/arizona-cardinals/roster) but the update version no longer has ID numbers in the player links. It just uses the players name as the link. If anyone has found any way to get player info and new IDs I would appreciate some guidance. |
@toddrob99 nope that's all I was doing. |
@BrianT71 Assuming you are referring to api.nfl.com when you say "new API," you can get person ids from the teams endpoint by including roster{id} in your field selector. The below URL will give you the roster for the 2019 Cardinals. The teams endpoint does not seem to be working when 2020 team ids are specified, instead throwing a 500 error stating |
@toddrob99 Sorry, I was sloppy in my wording. I was referring to the v3 API which is what the current website (nfl.com) is using. Since there is no documentation for v3 that I can find, I am stuck using educated guesses for the endpoint names and fields. I was hoping someone else may have already figured this part out. I'll be banging away at it over the weekend. |
@BrianT71, sorry I didn't make the connection between v3 and new; I've just been referring to /v3/shield as shield queries. Decoded query param:
Note: as I look deeper into this, it appears that query is only including players with status=ACT. That doesn't help my use case of listing inactive players, but maybe it will help you. I've gone back and forth a few times about mentioning this at all, but I've created a python nflapi wrapper that has some of the queries I'm using (will most likely add this one since my teamById method is no longer working for 2020 team ids). The reason why I've been hesitant to mention it is that it does not facilitate retrieval of a token and I do not want to deal with a ton of people asking about that. Also because I haven't created any documentation for it. |
This thread looks like it died down a couple months ago. Any updates on nflgame, or thoughts about if there is going to be something working by the time the season starts? Appreciate everyone is busy. |
FWIW, looks like the nfl xml is back up using a |
I just tried a random game from 2019 and the PBP is there! Not sure if this will continue into 2020 or if the format is the same. I guess we'll know later this week. http://static.nfl.com/liveupdate/game-center/2019110700/2019110700_gtd.json |
@mjsz does that mean we'll be able to pull NFL game data in the existing state of the library? |
@JimHewitt Great find! Have you looked into whether scraping data from this feed abides by the NFL ToS? EDIT: I've read through the ToS and I've not yet noticed anything that would be a cause for alarm for scraping of this game-center URL. Would not mind a second set of eyes however. |
This repo will need to be udpated at least here: nflgame/nflgame/update_sched.py Line 40 in bafd5fb
and here: Line 36 in bafd5fb
to reflect the new data urls from nfl.com. there may be other places but that should be a start. |
@JimHewitt Reading comprehension is difficult for me at near midnight, apologies :) |
Well, doesn't look like the gamecenter live-update page is displaying play by play data. That's a bummer. |
Doesn't seem to be working for yesterday's Chief's game. Getting File not found." |
Yup, Looks like we are SOL. Only hope is that they may not update the site until later, but I'm not holding my breath. |
nflscrapr has been replaced by nflfastR (https://mrcaseb.github.io/nflfastR/) and they were able to get the data for the HOU-KC Thursday night game (https://twitter.com/benbbaldwin/status/1304475824566013953) so there must be a way to do it |
Yeah that's interesting. Trying to reverse-engineer how they did it. So far, looking through their code, it seems like this would be what they're trying to hit: http://nflcdns.nfl.com/liveupdate/game-center/2020_01_HOU_KC/2020_01_HOU_KC_gtd.json But that doesn't seem to work, either. Looks like I may need to get R installed so I can actually try and run it and verify. |
The json is here. Format is different, but has PBP info. |
Actually their code as I see it, seems to be trying to parse:
Where gameId is for example: Which would give the URL of: However, this doesn't work either. Curious that they got the JSON somehow, but it doesn't appear to be using this code to get it. The code DOES work for previous years games, just not current year. |
Yeah I tried it both ways. I wonder if the data is only active while the
game is being played.
…On Sat, Sep 12, 2020 at 8:38 PM Scott Kaforey ***@***.***> wrote:
Yeah that's interesting. Trying to reverse-engineer how they did it. So
far, looking through their code, it seems like this would be what they're
trying to hit:
http://nflcdns.nfl.com/liveupdate/game-center/2020_01_HOU_KC/2020_01_HOU_KC_gtd.json
But that doesn't seem to work, either. Looks like I may need to get R
installed so I can actually try and run it and verify.
Actually their code as I see it, seems to be trying to parse:
url <- glue::glue("
http://nflcdns.nfl.com/liveupdate/game-center/{gameId}/{gameId}_gtd.json")
Where gameId is for example:
# gameId = '2018090905'
Which would give the URL of:
http://nflcdns.nfl.com/liveupdate/game-center/2020091000/2020091000_gtd.json
However, this doesn't work either. Curious that they got the JSON somehow,
but it doesn't appear to be using this code to get it.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#130 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABXHM2GCE25ZUIFMTEVNBRDSFQO77ANCNFSM4NAHM2XQ>
.
|
Couple things I’ve noticed, if you go here https://nflcdns.nfl.com/liveupdate/gamecenter/58167/KC_Gamebook.pdf the play by play is there, (you’d have to parse the pdf, and this isn’t live obviously), but the id 58167 isn’t the game id for the super bowl. So possibly a different id format needed for the gtd json data? Secondly, it looks like he’s pulling the play-by-play from his own .rds files. Where ever in the code he’s saving/pushing those, might be able to see where he’s pulling it. I can’t do too much digging into at the moment as I’m looking at all of this right now on my phone, but those are 2 things could be of possible interest. |
@chitown88 The id 58167 is called "gameKey" in the api under gameDetail (maybe other places too). It appears to be a sequential number as the KC-HOU gameKey from the Thur night game is 58168. |
If you're interested in the schedule for 2020, here's a quick script you can use to convert the nflfastR schedule into nflgame format:
|
The file 2020_01_HOU_KC.json.gz that nflFastR is using is the output from the NFL v3 Shield API. You can download this directly from the website when you view a page like this one (just filter the network calls by api.nfl.com)": https://www.nfl.com/games/texans-at-chiefs-2020-reg-1 My guess is someone is using an NFL API key or just manually downloading the json from the website and uploading it to github. It doesn't help us with live data, but for anyone interested in having historical data it wouldn't be hard to write an interpreter to convert these files and then add them in the old gamecenter format to this repo as well. |
Digging into the data format a bit more, it looks like (almost) everything we need is there to convert the new json format into the old json format, but there are two annoying problems:
|
I think this may map between the two id types: nflverse/nflverse-pbp#13 (comment) |
Watch out for ids that change with each season... |
Also, looks lime the old id is contained in the new one as ASCII. 32013030-2d30-3032-3334-35395dc60da5 |
Good catch! With that in mind, here's a (very) rough start. I'm not proud of how this code looks :) I don't have much time this week so if anyone can help out it'd be very appreciated. It needs a lot of cleaning up on the drives and some logic to aggregate the statistics. It takes as input the json files found here: https://github.com/guga31bb/nflfastR-raw/tree/master/raw/2020
|
This works well enough now that the files are importable into nflgame. You can get the full play-by-play for each game but not the game stats or the player list.
|
The "downloading the json from the website and uploading it to github" part is accurate. Basically we have a headless browser that loads the NFL . com scores page and captures the json since you can view it there. We do not use the API. |
This renders nflgame useless at the moment. I will be looking into how we can make historic data usable in the worst case scenarios like this.
The text was updated successfully, but these errors were encountered: