-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
batting_stats_range breaks when parsing through the date '2021-06-25' #218
Comments
The issue arises for the same date in 2019. |
it looks like in the cases that the data gets truncated, beautiful soup can't use utf-8 so falls back on a different encoding, e.g.,
this seems to happen when the page header or footer includes a link to https://fbref.com/es or https://fbref.de, because they include the characters ú and ß (in Fútbol and Fußball). so long story short this looks like inconsistent encoding between the header / footer and the main part of the page because it doesnt depend on the data, the date where it happens isn't reproducible either Closed by #223 |
When using the batting_stats_range function, there is an issue when parsing through 6/25/2021.
When it breaks, you receive corrupt data. One such piece being the player José Abreu appearing as "José Abreu". As well as only receiving a couple rows of data ( As opposed to several hundred for a typical day of data).
Below are some code blocks that work and do not work.
Works:
data = batting_stats_range("2021-06-25", "2021-06-27")
data = batting_stats_range("2021-06-25", "2021-06-25")
Does NOT Work:
data = batting_stats_range("2021-06-24", "2021-06-27")
data = batting_stats_range("2021-06-24", "2021-06-25")
For some reason, you can start on 6/25 with no issues. But you cannot parse over, nor end on 6/25 without receiving corrupt data.
The text was updated successfully, but these errors were encountered: