-
Notifications
You must be signed in to change notification settings - Fork 2
/
STAT660-01_f18-team-2_project1_data_analysis_by_WH.sas
197 lines (146 loc) · 4.69 KB
/
STAT660-01_f18-team-2_project1_data_analysis_by_WH.sas
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
*******************************************************************************;
**************** 80-character banner for column width reference ***************;
* (set window width to banner width to calibrate line length to 80 characters *;
*******************************************************************************;
*
This file uses the following analytic dataset to address several research
questions regarding the absenteeism at work from July 2007 to July 2010 at a
counrier company in Brazil.
Dataset Name: Absenteeism_at_work created in external file
STAT660_f18-team-2_project1_data_preparation.sas, which is assumed to be
in the same directory as this file.
See included file for dataset properties
;
* environmental setup;
* set relative file import path to current directory (using standard SAS trick);
X "cd ""%substr(%sysget(SAS_EXECFILEPATH),1,%eval(%length(%sysget(SAS_EXECFILEPATH))-%length(%sysget(SAS_EXECFILENAME))))""";
* load external file that generates analytic dataset FRPM1516_analytic_file;
%include '.\STAT660-01_f18-team-2_project1_data_preparation.sas'
;
title1
'Research Question: What was the most common reason for the absenteeism?'
;
title2
'Rationale: This would help to find out whether most of the workers were having the same reason for absenteeism. '
;
footnote1
'Based on the summary table, the most common reason was medical consultation(23).'
;
footnote2
'After grouping by ID, we could also find the most common reason for absenteeism for each employee.For example, employee 28 used reason 23 for 32 times.'
;
*
Methodology: Used proc freq to find the frequency of each reason,
using the result to find out the most common one.
Limitations: Since each employee had multiple reasons,it would
be better give same weight to each employee for more accurate results.
Follow-up Steps: Add weight to the frequency count.
;
proc freq
data =Absenteeism_analytic_file
;
table
Reason_for_absence
/nocum
;
run;
proc freq
data =Absenteeism_analytic_file
;
table
Reason_for_absence*ID
/nopercent
;
run;
title;
footnote;
title1
'Research Question: Which reason was related to the longest absenteeism hours?'
;
title2
'Rationale: This should help to have a general understanding of how many absenteeism hours related with each reason.'
;
footnote1
'Based on the summary table, diseases of the circulatory system had the longest average absenteeism hours which was 42 hours. '
;
footnote2
'The the average of absenteeism time in hours was only 6.99 hours.'
;
footnote3
'This could tell us that there would be a high chance if a employee who takes absence for the reason of circulatory system would have a longer absenttesim time.'
;
*
Methodology: Use PROC MEANS to find out the average absenteeism
hours for each reason.
Limitations: There may be duplicates in the dataset.
Follow-up Steps: Add a week_number as a new column to the dataset
if possible.Then use id, absenteeism_time_in_hours, Day_of_the_week,
week_number,month_of_absence,reason_for_absence as a composite key.
;
proc means
data =Absenteeism_analytic_file
mean median maxdec=2
;
class
reason_for_absence
;
var
Absenteeism_time_in_hours
;
run;
proc means
data =Absenteeism_analytic_file
mean median maxdec=2
;
var
Absenteeism_time_in_hours
;
run;
title;
footnote;
title1
'Research Question: Was there a special day in a week that had higher frequence that employees was absent? '
;
title2
'Rationale: In reality, it would help with the weekly work schedule.'
;
footnote1
'Based on the summary table, the frequency of each work day in a week was from 19%-22.34% which are very close to 20%.'
;
footnote2
'Although day 2 of the week had a bit higher frequency of 22.34%,hypothesis testing will be needed for significance difference test if we are interested in it.'
;
footnote3
'Genreally speaking,employees take absence with no preferrence on days. But there could be a special patterns if analyzed by months or seasons.'
;
*
Methodology: Use proc freq to study the weekly pattern for each season. Count the
Absenteeism_time_in hours where value is not 0.Crosstable with ID
is also used to find out the week pattern of each employee.
Limitations: In special month or season, it may have special daily distribution.
Follow-up Steps:Check this pattern in a special month or season when needed.
;
proc freq
data =Absenteeism_analytic_file
;
table
Day_of_the_week
/nocum
;
where
Absenteeism_time_in_hours NE 0
;
run;
proc freq
data =Absenteeism_analytic_file
;
table
Day_of_the_week*ID
/nocum
;
where
Absenteeism_time_in_hours NE 0
;
run;
title;
footnote;