-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting index out of range issues on get_schedule #103
Comments
I've been unable to reproduce this. Can you post the full code chunk where you've hit this? My initial guess is that it's a Cloudflare challenge due to many requests in a short time frame. |
Here is minimal code to reproduce, It doesn't happen every time, but happen on the first time I ran this code. Should I sleep after each loop? |
I'd try a short sleep and see if that resolves it, yeah. We had to do that
for some of the tests at some point.
…On Fri, Nov 15, 2024, 8:16 AM jgpayne ***@***.***> wrote:
`teams = ['Kansas', 'Connecticut', 'Kentucky', 'LSU', 'Oklahoma', 'Texas',
'SMU', 'Baylor', 'UCLA', 'Wake Forrest']
for i in teams:
df = kpt.get_schedule(browser=browser, team=i)`
Here is minimal code to reproduce, It doesn't happen every time, but
happen on the first time I ran this code. Should I sleep after each loop?
—
Reply to this email directly, view it on GitHub
<#103 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACOAQNBJAFRIOI2BJR3YG4D2AX62ZAVCNFSM6AAAAABRX5MT5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZYHE4TAMBZGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I've still been unable to reproduce this, even with much longer lists. Has the sleep suggestion resolved your issues? |
Nope still happens with a sleep. It doesn't happen every time and it eventually works with enough retries so it's not a huge deal if I'm the only one encountering it. |
I think it has something to do with the logging in. Using the actual source code and looking at the schedule variable I get: The page you are requesting is available to subscribers only. Purchase a 12-month subscription for $24.95You'll get unrestricted ad-free access to the most insightful college basketball data on the web, including... » All of the data that many of the nation's most successful coaches use. » Detailed statistical breakdowns of every team and player in Division I. » Predictions and box scores for every Division I game this season, along with a forecast of a team's final conference and overall record. » Still undecided? Check out the guided tour for a preview of additional content available. Note: » If you are trying to renew (even if the subscription has lapsed) you must log-in above and use the renew link that will appear. So somehow the login isn't persisting through into the next page. I have noticed this within the actual webpage as well, I will log in, click on a team and then subsequently be logged out. |
Hm, I have not run into this and thus it's difficult to troubleshoot. We'll
see if it's a common issue. I can think of some kind of stupid solutions,
but I'm hesitant to implement them due to the overhead.
…On Sun, Nov 17, 2024, 9:48 PM jgpayne ***@***.***> wrote:
I think it has something to do with the logging in. Using the actual
source code and looking at the schedule variable I get:
The page you are requesting is available to subscribers only.
Purchase a 12-month subscription for $24.95
You'll get unrestricted *ad-free* access to the most insightful college
basketball data on the web, including...
» All of the data that many of the nation's most successful coaches use.
<http://www.nytimes.com/2011/03/24/sports/ncaabasketball/24ncaa.html>
» Detailed statistical breakdowns of every team
<http:///team.php?team=Louisiana+Tech> and player
<http:///player.php?p=21138> in Division I.
» Predictions <http://fanmatch.php?d=2012-12-31> and box scores
<http://box.php?g=458&y=2013> for every Division I game this season,
along with a forecast of a team's final conference and overall record.
» *Still undecided? Check out the guided tour <http://tour.php> for a
preview of additional content available.*
*Note:*
» To purchase a subscription as a gift for someone go here
<http://gift-kenpom.php>.
» If you are trying to renew (even if the subscription has lapsed) you
must log-in above and use the renew link that will appear.
So somehow the login isn't persisting through into the next page. I have
noticed this within the actual webpage as well, I will log in, click on a
team and then subsequently be logged out.
—
Reply to this email directly, view it on GitHub
<#103 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACOAQND2H5UJZ5ILACVLZ6T2BFPSRAVCNFSM6AAAAABRX5MT5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBRHA4DQMJYGM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Just wanted to add that I've been experiencing this same issue. Login works ~80% of the time but this code will throw this error when it fails. Fwiw I have also noticed similar issues to @jgpayne when navigating kenpom.com code: error: |
Thanks for the report. I'm not sure this is something we'd want to address if it's just occurring during normal navigation of the site. My hacky solution would be to allow the user to store their credentials as environment variables, catch the error and log in again, and then re-run the query without the user having to do anything. Which would probably work, but it's not very elegant and shouldn't really be necessary. Again, will see if this becomes a larger issue. |
Running
df = kpt.get_schedule(browser=browser, team='Kansas') or any team
I am getting the error:
url = url + "&y=" + str(season)
schedule = BeautifulSoup(get_html(browser, url), "html.parser")
table = schedule.find_all('table')[1]
schedule_df = pd.read_html(StringIO(str(table)))
IndexError: list index out of range
I am only getting this error on occasion, but when running this for multiple teams, it could happen at any moment.
The text was updated successfully, but these errors were encountered: