-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PNAD domicílios import dictionary is wrong #88
Comments
@lucasmation what dictionary are you using to compare? Here it seems all right, except for V0101 that is really out and should be included. |
get_import_dictionary('PNAD',2002,'domicilios') I think the 3rd one is wrong |
V0102 or v0103 |
run old_dic <- get_import_dictionary('PNAD',2002,'domicilios') |
Then import from SAS dic. again and compare |
Just download the data and check the Excel dictionary, you will see. |
Please also check for similar problems with the pessoas data |
@lucasmation this is the dictionary stored in
Here is IBGE Excel file downloaded today: It seems to be correct, aside for the missing variables. Anyway, I am downloading all data and will run our script again to get dictionaries for all years and see if there is anything strange or if I am missing something. |
This should be a reproducible example:
So the dictionary seems to be correct (except for not having the ANO variable) However once I import the variable, column v0102 seems to start at the wrong column:
The fist observation: But look at how the data is after being imported |
@lucasmation the problem seems to be within
|
You are correct about the error being in read_fwf. (see comment in the end |
Let see if readr developers fix that. For now lets just delete UF from the PNAD import dictionaries (the data.frames for each dataset in microdadosBrasil) |
@nicolassoarespinto , better than degrading the dictionaries, plase just change the wrapper function read_PNAD() to something like:
(please adapt to the variable names there) |
@nicolassoarespinto , also please check if the same problem (ovelapping columns) does not happen in any other dataset. Something like (pseudo-code... )
If there are overlapping columns in any other dataset-filetype we will need to make similar adjustments to the wrapper functions |
Are you in the master or RA branch? Em qui, 20 de out de 2016 12:04 PM, nicolassoarespinto <
|
@lucasmation I am in master branch. We cannot adjust this in the wrapper functions because since #65 we import dictioanries inside |
@lucasmation working right now on temporary fix by changes in dictionaries |
an option would me to do something like:
I mean, don't use read_data, use read_fwf directly |
if you have changed the dictionaries already I am fine with that |
Found the same error in other dictionaries with the test you proposed:
Complete code for replicability:
|
Censo EscolarWe haver overlapping in the Here is the
This dont seems to be a problem for the other variables, as it shown in the example below:
|
download_sourceData("PNAD", 2002, unzip = T)
dom <- read_PNAD("domicilios", 2002) %>% data.table
running the above the positions of variables beiond UF are wrong, when compared to the import dictionary and visual inspection of the txt file.
This is confirmed when I inspect the import dictionary for PNAD-2002-dom:
get_import_dictionary('PNAD',2002,'domicilios')
All of this may have been caused by uso updating the source files from a newer version from IBGE, but forgetting to update the import dicionaries. @nicolassoarespinto can you import again all import dictionaries, from the SAS import dictionary, for PNAD into R?
The text was updated successfully, but these errors were encountered: