-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_excel losing first few rows of .xls files. #611
Comments
I will have to see what happens with these files when I use libxls on them w/o R or readxl. But the chances are high that this is a libxls matter. If/when I can prove that, I'll report it upstream. |
I've read these files now and am even more suspicious that it is a libxls issue. Related to encoding. Content that is not being read out of cell A1 of failure.xls: ELECTORAL DIVISION OF SOMERSET EAST ; FIELD-CÓRNËTCY No. 5, ZWAGERSHÓÊN. |
This value appears to be stored in an |
Record type 0xD6 was used in BIFF5 to represent rich-text strings. The rich-text information is stored after the text itself, so we can treat it like a LABEL record. See tidyverse/readxl#611
Changes: - Allow truncated XLS files #55 #60 #76 tidyverse/readxl#619 - Fix long-standing "extra column" bug #73 - Support for RSTRING records (rich-text cells in older BIFF5 files) tidyverse/readxl#611 Minimum version updated in bl3 due to header changes.
Hi @ddekadt I've just incorporated the current version of libxls into the dev version of readxl. I have high hopes that resolves this issue. Your sample files are no longer on DropBox. If you're still listening, can you share again so I can verify this is fixed? |
Hi @jennybc -- here you go: https://www.dropbox.com/s/x81wwr35po4r1l6/sample_data.zip?dl=0 Thank you! |
Yes, as I had hoped, the updated libxls has fixed this. This is only available in the dev version of readxl at this point, but it's good to know this is fixed. library(readxl)
dat <- read_xls("investigations/sample_data/success.xls")
#> New names:
#> * `` -> ...2
#> * `` -> ...3
#> * `` -> ...4
#> * `` -> ...5
#> * `` -> ...6
#> * ...
dat
#> # A tibble: 20 x 8
#> `ELECTORAL DIVISION … ...2 ...3 ...4 ...5 ...6 ...7 ...8
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 2 Vo. 1 I Su… Christi… Profess… Qual… Situation… <NA> Pace…
#> 3 50 Abson William Proprie… Hous… Hartman-r… Tenant E
#> 4 51 Achard… Gamiet Boatman Sala… Rudolph-s… Fishe… M
#> 5 52 Achard… Mahomet Clerk 55 55 <NA> M
#> 6 Ó3 Acker Henry <NA> Hous… Sherlock-… Tenant E
#> 7 54 Ackerd… Isaac Mason 55 Armstrong… <NA> M
#> 8 55 Ackerl… F ranci… Clerk Sala… Britannia… AV. R… E
#> 9 56 Ackman j J ames Sub-gan… 55 North End Cape … E
#> 10 57 Acurrow Francis Guard 55 Palmersto… Tramw… E
#> 11 58 Adam John Coffee … 55 Strangers… Foste… 0
#> 12 59 Adams 'Louis Labourer 55 Kirkwood-… Harbo… M
#> 13 60 (Adami Mickie Merchant Hous… Evatt-str… Tenant E
#> 14 61 Adami Nanbry 55 55 55 55 E
#> 15 62 Adami 'Darwick 55 55 <NA> <NA> E
#> 16 63 Adams lAchmat Boatman Sala… W aimer-r… Fishe… M
#> 17 64 (Adams Aden <NA> <NA> <NA> <NA> M
#> 18 65 Adams (Charles <NA> <NA> Coode-str… W. Jo… 0
#> 19 66 (Adams Charles… Clerk 55 Donkin-st… Cape … E
#> 20 67 ?Adams Ernest Boatman 55 W aimer-r… Harbo… 0
dat <- read_xls("investigations/sample_data/failure.xls")
#> New names:
#> * `` -> ...7
#> * `` -> ...8
dat
#> # A tibble: 13 x 8
#> No. `Surname in full… `Christian Names … `Profession, Trade… Qualification.
#> <dbl> <chr> <chr> <chr> <chr>
#> 1 1788 McNally Henry Cape Police Salary
#> 2 1789 Nel Jacobus Petrus Farmer Householder
#> 3 1790 Nel Louis Johannes <NA> <NA>
#> 4 1791 Paddock Charles <NA> <NA>
#> 5 1792 Paddock Ernest Harding <NA> J,
#> 6 1793 Parkman Charles Stewart <NA> M
#> 7 1794 Peover John <NA> M
#> 8 1795 Read George Cape Police Salary
#> 9 1796 Schoeman, jun, Christoff el Andr… Farmer Householder
#> 10 1797 Schoeman Johannes Cornells <NA> <NA>
#> 11 1798 Strydom Jan Abram <NA> 0.5
#> 12 1799 Strydom Johannes Hendrick <NA> <NA>
#> 13 1800 Snyman Jacobus <NA> Occupier
#> # … with 3 more variables:
#> # Situation of Property in respect of which qualification exists, or pi esent address if qualification be not in respect of Property. <chr>,
#> # ...7 <chr>, ...8 <chr> Created on 2021-07-29 by the reprex package (v2.0.0.9000) |
Hmmm, maybe I spoke too soon. It reads without error, but it still seems to be missing data. |
Reading these files with xls2csv appears to work, which suggests the problem now lies in readxl itself.
|
Hello,
I've been using readxl::read_excel for a project. Great package and function -- thank you so much!
I have one issue that I would love help with -- I don't think it has been address before (I have done my due diligince, I believe).
I include two sample .xls spreadsheets here. For one (success.xls) the function successfully reads in all the data, for the other (failure.xls) it loses the first two rows. This lost information is critical for me.
Code that reproduces this problem for these two files is as follows:
I use this (roughly) code on 1500 .xls files. For some large % of the .xls files, read_excel fails to retrieve the first two rows of data. I imagine there is some issue related to the fact that these are .xls not .xlsx files -- fwiw, these are automatically produced by Abbyy Finereader.
Any assistance would be appreciated,
Daniel
The text was updated successfully, but these errors were encountered: