Skip to content

Commit

Permalink
Merge pull request #8 from worldbank/develop
Browse files Browse the repository at this point in the history
GLAD Update: June, 2022.
  • Loading branch information
jpazvd authored Feb 17, 2023
2 parents 6b92996 + 19bd5e8 commit 590d7a5
Show file tree
Hide file tree
Showing 99 changed files with 18,642 additions and 5,096 deletions.
116 changes: 66 additions & 50 deletions 00_documentation/Harmonized_Variables_in_GLAD.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,66 @@
# Harmonized variable names
<sup>back to the [README](https://github.com/worldbank/GLAD/blob/master/README.md) :leftwards_arrow_with_hook:</sup>

This page list the harmonized variable names that will be used in all GLADs regardless of which assessment, year or country the data comes from. The first table has the variables that are included in all data sets. The second table includes variable names we have harmonized but exist only in some data sets.

varname | varclass | varlabel | vartype | note
-- | -- | -- | -- | -- |
surveyid | key | SurveyID (Region_Year_Assessment) | String | |
countrycode | key | WB country code (3 letters) | String | (a) |
national_level | key | Idcntry_raw is a national level | Indicator (1=National) | (a) |
idcntry_raw | id | Country ID, as coded in rawdata | Numerical or String | (b) |  
idschool | id | School ID | Numerical | | 
idgrade | id | Grade ID | Numerical | | 
idclass | id | Class ID | Numerical | (c) |
idlearner | id | Learner ID | Numerical | | 
score_[*assessment*]\_[*subject*]\_[*pv*] | value | [Plausible value *pv*:] *assessment* score for *subject* | Numerical ||  
level_[*assessment*]\_[*subject*]\_[*pv*] | value | [Plausible value *pv*:] *assessment* level for *subject* | Categorical | | 
age | trait | Learner age at time of assessment | Numerical |  |
urban | trait | School is located in urban/rural area | Indicator (1=Urban) | |  
male | trait | Learner gender is male/female | Indicator (1=Male) |  |
escs | trait | Learner socio-economic status _(Purposefully not labeled yet)_ | Numerical | |
learner_weight | sample | Total learner weight | Numerical |  |

**Notes:**

For all assessment-years, the id variables (*idcntry_raw, idschool, idgrade, idclass, idlearner*) compose a unique id.

(a) The full correspondence of *countrycode*, *national_level* and *idcntry_raw* is found in the [master countrycode list](https://github.com/worldbank/GLAD/blob/master/01_harmonization/011_rawdata/master_countrycode_list.csv). Some examples:
* in LLECE 1997 the *countrycode* MEX is linked to both the sample from the country Mexico (idcntry_raw = 21) and for the sample from the subnational unit of Nueva Leon (idntry_raw = 11). However, the first is considered *national_level* of 1, while the later is *national_level* of 0. That means that both samples are found in the GLAD module ALL, but the module CLO for Mexico is calculated using only the first sample, discarding the later.
* in PIRLS 2001 the *countrycode* GBR is linked to both the samples from England (idcntry_raw = 926) and Scotland (idcntry_raw = 927) and both are considered *national_level* of 1. That means that both samples are found in the GLAD module ALL and the module CLO for United Kingdom is calculated pooling both samples without distinction.

(b) The variable *idcntry_raw* is preserved as found in the raw data. Most assesment-years have it as a numerical variable. The only exception so far is PASEC 1996, for which this variable is a string.

(c) Some assessment-years may not have the variable _idclass_.

---

## Variables specific to a single assessment or year

Though the variable _learner_weight_ exist in all assessments, other sample-related variables vary across assessments.

varname | value | varlabel| vartype | note
-- | -- | -- | -- | --
year | key | Year of assessment | Date | PASEC, EGRA only (when multi-year bundles)
urban_o* | trait | Original variable of urban | Categorical | PIRLS, TIMSS, SACMEQ only (whenever available)
learner_weight_subject* | sample | Total learner weight for specific subject | Numerical | LLECE only
strata* | sample | Strata | Numerical | LLECE, PASEC only
jkzone | sample | Jackknife zone | Numerical | PIRLS, TIMSS, PASEC 2014 only
jkrep | sample | Jackknife replicate code | Numerical | PIRLS, TIMSS, PASEC 2014 only
weight_replicate* | sample | Replicate weight # | Numerical | PASEC 2014 only
# Harmonized variable names
<sup>back to the [README](https://github.com/worldbank/GLAD/blob/master/README.md) :leftwards_arrow_with_hook:</sup>

This page list the harmonized variable names that will be used in all GLADs regardless of which assessment, year or country the data comes from. The first table has the variables that are included in all data sets. The second table includes variable names we have harmonized but exist only in some data sets.

varname | varclass | varlabel | vartype | note
-- | -- | -- | -- | -- |
surveyid | key | SurveyID (Region_Year_Assessment) | String | |
countrycode | key | WB country code (3 letters) | String | (a) |
national_level | key | Idcntry_raw is a national level | Indicator (1=National) | (a) |
idcntry_raw | id | Country ID, as coded in rawdata | Numerical or String | (b) |  
idschool | id | School ID | Numerical | | 
idgrade | id | Grade ID | Numerical | | 
idclass | id | Class ID | Numerical | (c) |
idlearner | id | Learner ID | Numerical | | 
score_[*assessment*]\_[*subject*]\_[*pv*] | value | [Plausible value *pv*:] *assessment* score for *subject* | Numerical ||  
level_[*assessment*]\_[*subject*]\_[*pv*] | value | [Plausible value *pv*:] *assessment* level for *subject* | Categorical | | 
age | trait | Learner age at time of assessment | Numerical |  |
urban | trait | School is located in urban/rural area | Indicator (1=Urban) | |  
male | trait | Learner gender is male/female | Indicator (1=Male) |  |
escs | trait | Learner socio-economic status _(Purposefully not labeled yet)_ | Numerical | |
learner_weight | sample | Total learner weight | Numerical |  |

**Notes:**

For all assessment-years, the id variables (*idcntry_raw, idschool, idgrade, idclass, idlearner*) compose a unique id.

(a) The full correspondence of *countrycode*, *national_level* and *idcntry_raw* is found in the [master countrycode list](https://github.com/worldbank/GLAD/blob/master/01_harmonization/011_rawdata/master_countrycode_list.csv). Some examples:
* in LLECE 1997 the *countrycode* MEX is linked to both the sample from the country Mexico (idcntry_raw = 21) and for the sample from the subnational unit of Nueva Leon (idntry_raw = 11). However, the first is considered *national_level* of 1, while the later is *national_level* of 0. That means that both samples are found in the GLAD module ALL, but the module CLO for Mexico is calculated using only the first sample, discarding the later.
* in PIRLS 2001 the *countrycode* GBR is linked to both the samples from England (idcntry_raw = 926) and Scotland (idcntry_raw = 927) and both are considered *national_level* of 1. That means that both samples are found in the GLAD module ALL and the module CLO for United Kingdom is calculated pooling both samples without distinction.

(b) The variable *idcntry_raw* is preserved as found in the raw data. Most assesment-years have it as a numerical variable. The only exception so far is PASEC 1996, for which this variable is a string.

(c) Some assessment-years may not have the variable _idclass_.

---

## Variables specific to a single assessment or year

Though the variable _learner_weight_ exist in all assessments, other sample-related variables vary across assessments.

varname | value | varlabel| vartype | note
-- | -- | -- | -- | --
year | key | Year of assessment | Date | PASEC, EGRA only (when multi-year bundles)
urban_o* | trait | Original variable of urban | Categorical | PIRLS, TIMSS, SACMEQ only (whenever available)
learner_weight_subject* | sample | Total learner weight for specific subject | Numerical | LLECE only
strata* | sample | Strata | Numerical | LLECE, PASEC only
jkzone | sample | Jackknife zone | Numerical | PIRLS, TIMSS, PASEC 2014 only
jkrep | sample | Jackknife replicate code | Numerical | PIRLS, TIMSS, PASEC 2014 only
weight_replicate* | sample | Replicate weight # | Numerical | PASEC 2014 only

---

## Variables specific to a single assessment or year

Though the variable _learner_weight_ exist in all assessments, other sample-related variables vary across assessments.

varname | value | varlabel| vartype | note
-- | -- | -- | -- | --
year | key | Year of assessment | Date | PASEC, EGRA only (when multi-year bundles)
urban_o* | trait | Original variable of urban | Categorical | PIRLS, TIMSS, SACMEQ only (whenever available)
learner_weight_subject* | sample | Total learner weight for specific subject | Numerical | LLECE only
strata* | sample | Strata | Numerical | LLECE, PASEC only
jkzone | sample | Jackknife zone | Numerical | PIRLS, TIMSS, PASEC 2014 only
jkrep | sample | Jackknife replicate code | Numerical | PIRLS, TIMSS, PASEC 2014 only
weight_replicate* | sample | Replicate weight # | Numerical | PASEC 2014 only
13 changes: 11 additions & 2 deletions 00_documentation/dyntext_GLAD.txt
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,13 @@ tab countrycode idgrade if national_level==0
<<dd_do: nocommands>>
foreach valuevar of local onthefly_valuevars {
label var `valuevar' ""
tab `valuevar', m
qui unique `valuevar'
if (`r(unique)'<=15) {
tab `valuevar', m
}
else {
sum `valuevar'
}
}
<</dd_do>>

Expand All @@ -155,7 +161,10 @@ if "`onthefly_traitvars'" != "" {
tostring threshold_val, replace

* Concatenate the condition to be human readable
gen str harmonization_condition = threshold_res + " = (" + threshold_var + " >= " + threshold_val + ")"
cap: gen str harmonization_condition = threshold_res + " = (" + threshold_var + " >= " + string(threshold_val) + ")"
if _rc != 0 {
gen str harmonization_condition = threshold_res + " = (" + threshold_var + " >= " + threshold_val + ")"
}
replace harmonization_condition = "no harmonization" if harmonization_condition==" = ( >= .)"

* Drop original variables
Expand Down
Loading

0 comments on commit 590d7a5

Please sign in to comment.