-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_csv includes the BOM of an utf8 file into the first column label #13497
Comments
Attached is an utf8 or ios-8859-1 encoded csv file. It is part of IMF financial data from data.imf.org. The following code shows that the first column label starts with the file's BOM and includes the '"' even though all other labels rightly do not contain '"' as it is the separator. import pandas as PD df = PD.read_csv('sample.csv', The BOM is in fact prepended to the first column label even if utf8 is specified. |
I should add that this bug occurs on Win7x64, Python 3.5.1, 32bit, pandas 0.18.1. |
When I use utf-8-sig, the BOM is gone, but the first column label is So I'll open a new issue. Or is there a workaround? Am 23.06.2016 um 01:49 schrieb Tom Augspurger:
|
this is the same exact issue as #4793 |
Right. Tom has spotted this as well. I had only searched the open Using utf-8-sig uncovers another issue: Separators are not removed Leo On 23/06/2016, Jeff Reback notifications@github.com wrote:
|
Consider the following script:
import pandas as PD
df = PD.read_csv('sample.csv',
encoding='utf8')
print(df.columns[:5])
The text was updated successfully, but these errors were encountered: