Extract fails for EGIR01DT #81

aftollefsen · 2019-03-07T09:55:11Z

Great package. I have some issues when trying to extract a bunch of IR files with the option add_geo=TRUE. The goal of the script is to download all IR surveys for Africa and merge these.

Here is my code:

# Extract list of DHS countries
countries <- dhs_countries()

# Identify countries where regions is Africa
uniqueregions <- unique(countries$SubregionName)
africaregions <- uniqueregions[agrep("africa",uniqueregions)]

# Subset countries data frame with only africa
subcountries <- countries[which(countries$SubregionName %in% africaregions),]

# Identify surveys that match Individual Recode (IR) and the countries you want (example shows first five countries))
hrdatasets <- dhs_datasets(fileType = "IR",countryIds = subcountries$DHS_CountryCode,fileFormat = "STATA")

# Download datasets matching the above query
hrdownloads <- get_datasets(dataset_filenames = hrdatasets$FileName)

# Search within downloaded surveys if variables exist
vars <- search_variables(names(hrdownloads), variables = c("v190","v012","v106","v151"))

# Extract DHS surveys with only variables selected in the previous step
extract <- extract_dhs(questions = vars, add_geo = TRUE)

This starts, but then fails:
Starting Survey 1 out of 182 surveys:AOIR51dt
Starting Survey 2 out of 182 surveys:AOIR62DT
.
.
.
Starting Survey 31 out of 182 surveys:CDIR61DT
Starting Survey 32 out of 182 surveys:CIIR35DT
Starting Survey 33 out of 182 surveys:CIIR3ADT
Starting Survey 34 out of 182 surveys:CIIR50DT
Starting Survey 35 out of 182 surveys:CIIR62DT
Starting Survey 36 out of 182 surveys:EGIR01DT
Error in vapply(r, attr, character(1), "label", exact = TRUE) :
values must be length 1,
but FUN(X[[1]]) result is length 0

# Convert list of data.frames into one single data.frame using dplyr
library(dplyr)
dhsdf <- bind_rows(extract, .id = "survey")

The text was updated successfully, but these errors were encountered:

aftollefsen · 2019-03-07T10:00:25Z

Same for
Starting Survey 44 out of 170 surveys:GHIR02DT
Error in vapply(r, attr, character(1), "label", exact = TRUE) :
values must be length 1,
but FUN(X[[70]]) result is length 0

OJWatson · 2019-03-14T20:50:04Z

Hi there,

Thanks for flagging this up and sorry for the slow reply. This issue is mainly due to the earliest recodes having slightly odd stata files, so that when we read the data in we don't correctly grab the labels attached to the data. @jeffeaton and I have been chatting about this and will look into trying to get it working for the stata files.

However, if you download those datasets using the FLAT file format then those 2 survey datasets work. We generally would recommend always downloading the flat files as they are more reliable and I think there are only 6 surveys that don't work as flat downloads (see #5) and we are working on a work around for those.

So just adapt the code above you provided with the following and hopefully you should be all okay (:crossed_fingers:) :

# Identify surveys that match Individual Recode (IR) and the countries you want (example shows first five countries))
hrdatasets <- dhs_datasets(fileType = "IR",countryIds = subcountries$DHS_CountryCode,fileFormat = "flat")

OJWatson added the parser exceptions odd files, and parser errors label Aug 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract fails for EGIR01DT #81

Extract fails for EGIR01DT #81

aftollefsen commented Mar 7, 2019

aftollefsen commented Mar 7, 2019

OJWatson commented Mar 14, 2019

Extract fails for EGIR01DT #81

Extract fails for EGIR01DT #81

Comments

aftollefsen commented Mar 7, 2019

aftollefsen commented Mar 7, 2019

OJWatson commented Mar 14, 2019