Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

selecting age focused sample #5

Open
andkov opened this issue Feb 26, 2018 · 1 comment
Open

selecting age focused sample #5

andkov opened this issue Feb 26, 2018 · 1 comment

Comments

@andkov
Copy link
Member

andkov commented Feb 26, 2018

Here's the solution Ardo prepared for selecting those individuals who have a valid observation of gait between ages 68 and 72 (inclusive)

dta <- ds_valid %>% dplyr::select(-firstobs)
subjects <- unique(ds_valid$id) 

library(dplyr)
library(elect)
# dta <- ELECTData
# subjects <- unique(dta$id)
for(i in 1:length(subjects)){
  dta.i <- dta[dta$id==subjects[i],]
  if(nrow(dta.i)<=2){print(subjects[i])}
}

dta %>% glimpse()
# Subselect:
bound <- c(68,72)
count <- 0
for(i in 1:length(subjects)){
  select <- 0
  dta.i <- dta[dta$id==subjects[i],]
  if(dta.i$age[1]< bound[2]){
    ddta.i <- dta.i[dta.i$age>bound[1],]
    if(nrow(ddta.i)>1){
      select <- 1; print(i)
      select <- 1 ; print (i)
      firstobs <- rep(0, nrow(ddta.i))
      firstobs[1] <- 1
      ddta.i <- cbind(ddta.i, firstobs = firstobs)
    }
  }
  if(select==1 & count==0){
    ddta <- ddta.i
    count <- count +1
  }
  if(select==1 & count>0){
    ddta <- rbind(ddta,ddta.i)
    count <- count+1
  }
}

hist(ddta$age[ddta$firstobs == 1])
@andkov
Copy link
Member Author

andkov commented Feb 26, 2018

Here's reformulation of the same logic using dplyr phraseology

> ds <- ds_valid %>% 
+   dplyr::select(id, age, wave, firstobs, gait) %>% 
+   dplyr::filter( id %in% c("617643","709354","228190"))
> 
> ds %>% glimpse()
Observations: 22
Variables: 5
$ id       <int> 228190, 228190, 617643, 617643, 617643, 617643, 617643, 617643...
$ age      <dbl> 74, 74, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 78, 69, 70...
$ wave     <int> 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 0, 1, 2, 3, 4, 5, ...
$ firstobs <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
$ gait     <dbl> 0.57, NA, 0.87, 0.76, 0.94, 0.83, 0.80, 0.70, 0.92, 0.75, NA, ...
> ids_gait_70 <- ds %>% 
+   dplyr::filter( age > 67 & age <  73) %>%
+   dplyr::group_by(id) %>% 
+   dplyr::arrange(age) %>% 
+   dplyr::mutate(gait70 = dplyr::first(gait)) %>% 
+   dplyr::distinct(id, gait70)
> ids_gait_70 %>% glimpse()
Observations: 2
Variables: 2
$ id     <int> 617643, 709354
$ gait70 <dbl> 0.94, 1.08
> 
> d2 <- ds %>% 
+   dplyr::left_join(ids_gait_70, by = "id") %>% 
+   # keep only people who have a non-missing gait70
+   dplyr::filter(!is.na(gait70)) %>% 
+   dplyr::filter(age > 68 )
> 
> d2 %>% head(40)
       id age wave firstobs gait gait70
1  617643  68    3        0 0.83   0.94
2  617643  69    4        0 0.80   0.94
3  617643  70    5        0 0.70   0.94
4  617643  71    6        0 0.92   0.94
5  617643  72    7        0 0.75   0.94
6  617643  73    8        0   NA   0.94
7  617643  74    9        0 0.81   0.94
8  617643  75   10        0 0.70   0.94
9  617643  78   13        0   NA   0.94
10 709354  69    0        1 1.08   1.08
11 709354  70    1        0 0.96   1.08
12 709354  71    2        0 0.61   1.08
13 709354  72    3        0 0.44   1.08
14 709354  73    4        0 0.54   1.08
15 709354  75    5        0 0.69   1.08
16 709354  75    6        0 0.59   1.08
17 709354  76    7        0   NA   1.08
> 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant