Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
randrescastaneda authored Jun 24, 2019
1 parent a3d3f20 commit 701b7dd
Showing 1 changed file with 13 additions and 2 deletions.
15 changes: 13 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ The first column contains the name of the variable to be checked. It may be the

The second column, “Warning,” allows the user to specify the level of urgency. The purpose of this column is merely cosmetic. It allows the user to organize or filter the results easier either in the Tableau dashboard or in their own analyses.

The third and fourth columns are the checking code, but each column has a particular function. The fourth column (iff) contains the checks properly speaking. That is, this column contains the logical statement that checks the consistency of the variable. For instance, if you wanted to test that the variable that corresponds to the age of the person does not have negative values, positive values above 100, or missing values, you may type something like this: age < 0 | age > 100. As you see, the logical test flags those observations that meet the criterion as inconsistent.[^age]
The third and fourth columns are the checking code, but each column has a particular function. The fourth column (iff) contains the checks properly speaking. That is, this column contains the logical statement that checks the consistency of the variable. For instance, if you wanted to test that the variable that corresponds to the age of the person does not have negative values, positive values above 100, or missing values, you may type something like this: age < 0 | age > 100. As you see, the logical test flags those observations that meet the criterion as inconsistent.

The third column (temporalvars), is for code lines that need to be executed before the logical statement in column “iff.” Sometimes, it is needed to create a temporal variable with certain characteristics in order to check some inconsistencies. For instance, in the GMD collection you may need to test that the combination of household id and person id is unique along the dataset. In order to do so, you can do the following:

Expand All @@ -69,6 +69,17 @@ The first four lines of the code above create a temporal macro that counts the n
* In the example above, the logical statement that goes in the corresponding cell of column “iif” is r(N) != `n', rather than count if r(N) != `n'. Given that by design all the consistency checks count the number of observations with problems, it is inefficient to ask the user to type “count if” for each cell. Instead, it is only necessary to type the logical statement of the code line.

See the summary table below:
<img src="./images/qcheck_summary.png">

### 3. Modify Excel file as needed (Spreadsheet "Variables")
The dynamic assessment of qcheck performs different analyses depending on the type of variable: welfare, categorical, and basic. Variables classified as ‘welfare’ are assumed to be continuous and estimations of poverty and inequality are only performed with these variables. Categorical variables are numeric in nature but their values refer to a classification or characteristic of the observation rather than to an ordinal correlation between its members. For instance, variable ‘lstatus’ in the GMD collection is the Labor Force Participation. It contains three numeric values: 1, 2, and 3. However, 1 means ‘employed’, 2 means ‘unemployed’, and 3 means ‘out of labor force.’ Finally, the basic classification of variables refers to variables that are either non-categorical or welfare aggregate.

[^age]:Notice that the test identifies those observations with problems, and not those that are fine. That is, the test should not be inrange(age, 0, 100).
By default, the first two columns, “raw_varname” and “test_varname”, contain the same information; say the name of the variables. The distinction between the two, however, relies on whether the user wants to apply the checks of one collection to a different collection with different variables names but similar concepts. Thus, the user does not have to re-type all the tests over again when assessing two different collections. For instance, assume you need want to apply to checks of the GMD collection to the SEDLAC collection. Concepts like household id, sampling weights, welfare aggregate, among others, are denoted in both datasets with different variable names. The user only needs to type on the “raw_varname” column the name of the SEDLAC variables in front of the corresponding GMD variable name in the “test_varname” column. In this way, qcheck will ‘rename’ all SEDLAC variable with their equivalent in GMD and apply the checks already denoted in GMD terms.

See the example below to understand how to fill correctly the Excel.
<img src="./images/qcheck_variables.png">

### 4. Load check into Qcheck
Before using qcheck in Stata you need to ‘load’ the checks into the system. To do so, you have to specify the function ‘create’ in the qcheck command in Stata. Depending on where you saved the Excel file “qcheck_NNN.xlsx”, you need to specify the directory path as indicated in the image below. You need to do this procedure for each “qcheck_NNN.xlsx” input file you have.

<img src="./images/qcheck_load.png">

0 comments on commit 701b7dd

Please sign in to comment.