Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting index out of range issues on get_schedule #103

Open
jgpayne opened this issue Nov 14, 2024 · 9 comments
Open

Getting index out of range issues on get_schedule #103

jgpayne opened this issue Nov 14, 2024 · 9 comments

Comments

@jgpayne
Copy link

jgpayne commented Nov 14, 2024

Running

df = kpt.get_schedule(browser=browser, team='Kansas') or any team

I am getting the error:
url = url + "&y=" + str(season)
schedule = BeautifulSoup(get_html(browser, url), "html.parser")
table = schedule.find_all('table')[1]
schedule_df = pd.read_html(StringIO(str(table)))
IndexError: list index out of range

I am only getting this error on occasion, but when running this for multiple teams, it could happen at any moment.

@j-andrews7
Copy link
Owner

I've been unable to reproduce this. Can you post the full code chunk where you've hit this?

My initial guess is that it's a Cloudflare challenge due to many requests in a short time frame.

@jgpayne
Copy link
Author

jgpayne commented Nov 15, 2024

teams = ['Kansas', 'Connecticut', 'Kentucky', 'LSU', 'Oklahoma', 'Texas', 'SMU', 'Baylor', 'UCLA', 'Wake Forest']
for i in teams:
df = kpt.get_schedule(browser=browser, team=i)

Here is minimal code to reproduce, It doesn't happen every time, but happen on the first time I ran this code. Should I sleep after each loop?

@j-andrews7
Copy link
Owner

j-andrews7 commented Nov 15, 2024 via email

@j-andrews7
Copy link
Owner

I've still been unable to reproduce this, even with much longer lists. Has the sleep suggestion resolved your issues?

@jgpayne
Copy link
Author

jgpayne commented Nov 18, 2024

Nope still happens with a sleep. It doesn't happen every time and it eventually works with enough retries so it's not a huge deal if I'm the only one encountering it.

@jgpayne
Copy link
Author

jgpayne commented Nov 18, 2024

I think it has something to do with the logging in. Using the actual source code and looking at the schedule variable I get:

The page you are requesting is available to subscribers only.

Purchase a 12-month subscription for $24.95

You'll get unrestricted ad-free access to the most insightful college basketball data on the web, including...

» All of the data that many of the nation's most successful coaches use.

» Detailed statistical breakdowns of every team and player in Division I.

» Predictions and box scores for every Division I game this season, along with a forecast of a team's final conference and overall record.

» Still undecided? Check out the guided tour for a preview of additional content available.

Note:
» To purchase a subscription as a gift for someone go here.

» If you are trying to renew (even if the subscription has lapsed) you must log-in above and use the renew link that will appear.

So somehow the login isn't persisting through into the next page. I have noticed this within the actual webpage as well, I will log in, click on a team and then subsequently be logged out.

@j-andrews7
Copy link
Owner

j-andrews7 commented Nov 18, 2024 via email

@ethanario
Copy link

Just wanted to add that I've been experiencing this same issue. Login works ~80% of the time but this code will throw this error when it fails. Fwiw I have also noticed similar issues to @jgpayne when navigating kenpom.com

code:
url = f"https://kenpom.com/player-expanded.php?team={modified_team_name}&y={str(year)}"
team_page = BeautifulSoup(browser.get(url).content, "html.parser")
table = team_page.find_all('table')[2]

error:
IndexError: list index out of range

@j-andrews7
Copy link
Owner

Thanks for the report. I'm not sure this is something we'd want to address if it's just occurring during normal navigation of the site.

My hacky solution would be to allow the user to store their credentials as environment variables, catch the error and log in again, and then re-run the query without the user having to do anything. Which would probably work, but it's not very elegant and shouldn't really be necessary.

Again, will see if this becomes a larger issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants