-
Notifications
You must be signed in to change notification settings - Fork 13
/
10-Malmquist_Index.Rmd
550 lines (409 loc) · 26.2 KB
/
10-Malmquist_Index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
```{r, include=FALSE, eval=FALSE}
library(bookdown); library(rmarkdown); rmarkdown::render("10-Malmquist_Index.Rmd", "pdf_book")
```
# Changing Performance over Time
## Introduction[^10-malmquist_index-1]
[^10-malmquist_index-1]: This chapter benefited from significant contributions by Aurobindh Kalathil Puthanpura, Nina Chaichi, Dong-Joon Lim, and Kevin van Blommestein.
```{r Ch_10_setup, include=FALSE}
library (kableExtra)
library (knitr)
knitr::opts_chunk$set(echo = TRUE)
library (DiagrammeR)
library (DiagrammeRsvg, quietly=TRUE)
library (rsvg, quietly=TRUE)
library (htmltools, quietly=TRUE)
library ("Lahman", quietly=TRUE)
library ("dplyr", quietly=TRUE)
library(TRA) # Not on CRAN, install from github if needed
# devtools::install_github("prof-anderson/TRA")
library (Benchmarking)
library (DJL)
library (deaR)
```
Often it is important to assess the changing performance over time rather than just at a single point in time. We did some explorations of the changing distribution of scores in the chapter 8 with baseball but we didn't really explore how these scores change on a year by year basis. A unit's changing DEA efficiency might be due to a number of reasons:
- General operating conditions have changed
- Changing size of operations
- Incorporated best practices by single unit
In the case of a store being benchmarked against similar stores in the same chain, changes due to general operating conditions might be due to inflation raising wages (or materials, end products, etc.), changing laws or regulations, etc. In this case, a change in the general operating conditions is like "...a rising tide raising all boats." It is good to identify and quantify these impacts, but it isn't something that an individual store manager would be either blamed for performance loss due to this or praised for changes helping. Stock traders highlight the importance of this issue - making a 20% return when the market is also up 20% is not particularly impressive and getting a 15% return might be considered poor.
Performance may be improved by simply getting bigger and achieving economies of scale. Perhaps a store expansion allowed for better space utilization with a 10% increase in space and staff generating a 20% increase in sales. In this case, there might be gains from just simply getting bigger. Other returns to scale effects such as getting too big and decreasing returns to scale or impacts of shrinking in size may also occur. Again, finding ways to separate and assess these impacts can be very useful.
Another way of changing performance might be by adopting new best practices. A store manager might glean a new innovation such as a better way to staff employees, run checkout operations, or manage inventory. This innovation would be a change that affects only this particular store and we would like to identify and reward such innovation. Identifying a new best practice early could help us spread this best practice more widely and quickly.
Simply looking at efficiency scores on their own in a single year does not always give us sufficient richness for separating out these situations. The \index{Malmquist productivity index|(} \index{MPI|see{Malmquist productivity index}} Malmquist Productivity Index was developed for decomposing this changing performance over time.
## A Two-Dimensional Example of MPI
Let's use a straightforward single input, single output example of the Malmquist Productivity Index to illustrate conceptually what we are doing with the Malmquist Productivity Index. This is then followed with a numerical version of that example. This example is inspired by Fare, Lindren, and Roos [@fare_productivity_1995].
We will then follow with calculations using various packages.
```{r displaymalm1, out.width="45%", fig.show='hold', fig.align='center', fig.cap="Malmquist Graphical Example"}
knitr::include_graphics("images/ch10_malm1.PNG")
```
The lower (orange) line corresponds to the \index{Efficiency frontier} efficiency frontier at a particular time period *t*. The higher (red) line is the efficiency frontier at the next time period, *t+1*. The blue circle with a *F* is a particular unit, *k* of interest at time period *t*, perhaps denoted as $k^t$. Similarly, in time period *t+1*, its performance has changed to the higher blue circle (*B*) or $k^{t+1}$.
Clearly the unit's performance has changed significantly. Our goal with the Malmquist Productivity Index is to determine how much of the performance change can be attributed to improved operational performance (perhaps better managerial practices?) and how much can be attributed to an improved operating environment (the rising tide raising all boats.)
We will use an output-oriented perspective on this analysis but an input-oriented approach can be followed in the same manner.
The technical efficiency change can be given as the ratio of the efficiency in the newest period to the previous period. Recall that for a particular unit, say, *k*, the output-oriented efficiency score could be denoted as $\phi_k$ but we need to incorporate time. We could extend this to be $\phi_{k,t_1}^{t_2}$ to reflect unit *k*'s efficiency using data from time period $t_1$ against a frontier from time $t_2$. When $t_1=t_2$ we would have the regular efficiency at time $t_1$.
The Malmquist literature uses the term distance function as a more generalized term rather than efficiency score. Let's modify this notation to be $D_k^t(x^{t+1},y^{t+1})$ to reflect the distance of DMU *k* from the efficiency frontier of time *t* using *k*'s input and output data from time *t+1*.
We can now calculate the changing distance from the frontier from the respective distance functions.
$$
TE=\frac{D_k^{t+1}(x^{t+1},y^{t+1})}{D_k^t(x^{t},y^{t})}
$$ Using the figure earlier, we can substitute locations on the vertical axis for these respective distances.
$$
TE=\frac{D_k^{t+1}(x^{t+1},y^{t+1})}{D_k^t(x^{t},y^{t})}
=\frac{OB/OA}{OF/OE}
=\frac{18/20}{6/10}=\frac{180}{120}=1.50
$$
This can be interpreted as the unit, *k*, making a 50% improvement in technical (or operational) efficiency separate from the general overall improving operating conditions.
Next, we want to calculate the technology progress that has occurred. This is done by comparing the relative closeness of unit at a particular time to those in both time periods. More formally, we can use the following.
$$
\begin{split}
\begin{aligned}
P=\sqrt {\frac{D_k^{t}(x^{t+1},y^{t+1})}{D_k^{t+1}(x^{t+1},y^{t+1})}
\cdot\frac{D_k^{t}(x^{t},y^{t})}{D_k^{t+1}(x^{t},y^{t})}}
\end{aligned}
\end{split}
$$
The first ratio is the relative closeness of unit *k* in time period *t+1* to frontiers in time periods *t* and *t+1* respectively. The second ratio is the relative distance of unit *k* in time period *t* to the frontiers. Again, let's use the graphical example to illustrate this.
$$
\begin{split}
\begin{aligned}
P & = \sqrt {\frac{D_k^{t}(x^{t+1},y^{t+1})}{D_k^{t+1}(x^{t+1},y^{t+1})}
\cdot\frac{D_k^{t}(x^{t},y^{t})}{D_k^{t+1}(x^{t},y^{t})}} \\
& =\sqrt {\frac{OB/OD}{OB/OA}
\cdot \frac {OF/OE}{OF/OC}}
=\sqrt {\frac{OA}{OD}
\cdot\frac{OC}{OE}} \\
& =\sqrt {\frac{20}{14}
\cdot \frac {15}{10} }
=\sqrt {\frac {15}{7}} = {1.46}
\end{aligned}
\end{split}
$$
This indicates that about 46% of the unit's improved performance is due to the general progress of technology or improving operating conditions (rising tide effect).
The Malmquist Productivity Index would be the geometric means of the ratios for the distance from each frontier to the respective points.
$$
\begin{split}
\begin{aligned}
MPI &= \sqrt {\frac{D_k^{t}(x^{t+1},y^{t+1})}{D_k^{t}(x^{t},y^{t})}
\cdot\frac{D_k^{t+1}(x^{t+1},y^{t+1})}{D_k^{t+1}(x^{t},y^{t})}} \\
&=\sqrt {\frac{OB/OD}{OF/OE}
\cdot\frac{OB/OA}{OF/OC}}
=\sqrt {\frac{18/14}{6/10}
\cdot\frac{18/20}{6/15}}
\approx {2.196}
\end{aligned}
\end{split}
$$
Through some algebraic manipulation, *MPI* can also then be seen to be the product of the two terms, *TC* and *P*.
$$
\begin{split}
\begin{aligned}
MPI & =TC\cdot P \\
& = 1.5\cdot 1.46\approx{2.19}
\end{aligned}
\end{split}
$$
```{r displaymalm4, out.width="45%", fig.show='hold', fig.align='center', fig.cap="Detailed Graphical Example of Malmquist"}
knitr::include_graphics("images/ch10_malm4.PNG")
```
Let's now take a look at conducting the analysis using the \index{deaR package} `deaR` package.
First, we'll create a dataset.
```{r creating_graphical_data}
malm_gr_data<-tibble(Year=c(1,1,1,2,2,2),
DMU=c(1,2,3,1,2,3),
X=c(10, 28,15.2, 7, 29, 24.2),
Y=c(7.7, 16.1,6, 11.1,22.4,18 ))
```
```{r, display-gr-malm-data, echo=FALSE}
kbl (malm_gr_data, caption="Data for graphical example of the Malmquist Productivity Index with unit being studied as DMU 3.",
booktabs = T, digits = 4, align = 'c') |>
kable_styling(latex_options = c("HOLD_position"))
```
This table is a pretty simple and compact format but MPI requires more organization. Specifically, even a simple dataset requires us to specify inputs and outputs just like in other DEA applications but it also needs time periods and a way to match each DMU from period to period. The \index{deaR package} `deaR` package has a function, `make_malmquist` to convert a dataset into the appropriate format.
```{r}
#Process data into the format required by deaR
malm_gr_data_pro <-
deaR::make_malmquist(malm_gr_data,
percol = 1, # Use Col1 for period (year)
dmus = 2, # Col2 has DMU identifier
inputs = 3, # Col3 has the input
outputs = 4, # Col4 has the output
arrangement = "vertical")
```
Let's look at the data structure that the \index{deaR package} `deaR` package is using. It creates two sets of DEA datasets ready for processing in the context of the `deaR` package marked by time periods 1 and 2 (`` malm_gr_data_pro$`1` `` and `` malm_gr_data_pro$`2` ``). They look similar except for the period and the data.
```{r}
malm_gr_data_pro$`1`
```
The input (`` malm_gr_data_pro$`1`$input ``) and output data (`` malm_gr_data_pro$`1`$output ``) are transposed. The `deaR` package implements models that allow for inputs and outputs outside of the traditional DEA model which assumes that each DMU has control over and is accountable for their use of inputs to achieve higher levels of outputs. The `nc_`prefix refers to non-controllable inputs and outputs. Therefore, `` malm_gr_data_pro$`1`$nc_inputs=1 ``) would then indicate the second input is a non-controllable input. The `nd_` prefix refers to non-discretionary inputs and outputs. The `ud_` prefix refers to undesirable inputs or outputs.
```{r, eval=FALSE, echo=FALSE}
# Leaving this code chunk out
library (pander) # Use pander since kable does not support lists
pander(malm_gr_data_pro,
caption="Data structure used by deaR package for Malmquist")
#kbl (malm_gr_data_pro, booktabs=T,
# caption="Data structure used by deaR package for Malmquist")
```
```{r}
result_gr <- deaR::malmquist_index(malm_gr_data_pro,
orientation = "oo",
rts = "vrs",
type1 = "cont",
type2 = "fgnz")
mi_gr <- result_gr$mi
res_mi_gr <- t(rbind(result_gr$mi, result_gr$pech,
result_gr$tc, result_gr$sech ))
colnames(res_mi_gr) <- c("MPI", "PEch", "TC", "SEch")
kbl (res_mi_gr, booktabs = T, digits = 4, align = 'c',
caption="Malmquist Productivity Analysis results using deaR") |>
kable_styling(latex_options = c("HOLD_position"))
```
The results don't match what I expect for this example. I need to do a little more digging. I expect that it is due to one or more of the following factors:
- characterizing efficiency as 0 to 1 vs, 1 and higher
- Returns to scale (VRS vs. CRS)
- Particular variation of Malmquist and decomposition
- Labeling of terms (TC for technological change vs. pure technical change)
## MPI for Griffell and Lovell 1999
Let's revisit an example from the literature using the \index{deaR package} `deaR` package.
```{r MPI-Griffell-Lovell-1999}
data("Grifell_Lovell_1999")
kbl (Grifell_Lovell_1999, booktabs=T)
data_example <- make_malmquist(Grifell_Lovell_1999,
percol = 1,
dmus = 2,
inputs = 3,
outputs = 4,
arrangement = "vertical")
result_fgnz <- malmquist_index(data_example,
orientation = "oo",
rts = "vrs",
type1 = "cont",
type2 = "fgnz")
mi_fgnz <- result_fgnz$mi
res_mi_fgnz <- t(rbind(result_fgnz$mi, result_fgnz$pech,
result_fgnz$sech,
result_fgnz$tc ))
colnames(res_mi_fgnz) <- c("MPI", "PEch", "SEch", "TC")
kbl (res_mi_fgnz, booktabs = T, digits = 4, align = 'c') |>
kable_styling(latex_options = c("HOLD_position"))
# Using Benchmarking package
# MQ <- Benchmarking::malmquist(X=mx, Y=my, ID=mID, TIME=mt)
# Says "Error '==' only defined for equally-sized data frames
```
This example is drawn from Grifell and Lovell, 1999.
To Be Added
## Using MPI to Assess Industry Maturation
Let's revisit our baseball example from earlier.
```{r ch10_draw_bb_io_diagram}
XFigNames <- "PA (Plate Appearances)"
YFigNames <- c("BB (Base on Balls or Walks)",
"1B (Singles)",
"2B (Doubles)",
"3B (Triples)",
"HR (Home Runs)")
Figure<-DrawIOdiagram(XFigNames,YFigNames, '"\n\nBatter\n\n "')
tmp<-capture.output(rsvg_png(charToRaw(export_svg(Figure)),
'BaseballBatting.png'))
```
```{r}
batting<-Lahman::Batting
MinPA <- 400 # Minimum number of plate appearances
# Might cause issues for shortened seasons
# batting <- dplyr::filter(batting,PA>MinPA)
batting <- batting |>
select(playerID,yearID,teamID,lgID,AB,H,X2B,X3B,HR,BB) |>
mutate(PA=as.numeric(AB+BB)) |>
mutate(X1B=as.numeric(H-HR-X3B-X2B)) |>
transform(BB=as.numeric(BB)) |>
transform(X2B=as.numeric(X2B)) |>
transform(X3B=as.numeric(X3B)) |>
transform(HR=as.numeric(HR)) |>
filter(yearID>1900, PA>MinPA) |>
select(playerID,yearID,teamID,lgID,PA,BB,X1B,X2B,X3B,HR)
kbl (head(batting), booktabs = T, digits = 4,
caption="Results of Preparing Data using dplyr") |>
kable_styling(latex_options = c("HOLD_position"))
```
```{r preparing-BB-IO-1919-data}
inputs <- c("PA")
outputs <- c("BB", "X1B", "X2B", "X3B", "HR")
batting1919AL <- batting |> filter(yearID==1919, lgID=="AL")
x19 <- batting1919AL |> select(PA)
row.names(x19)<-batting1919AL[,1]
y19 <- batting1919AL |> select(BB, X1B, X2B, X3B, HR)
row.names(y19)<-batting1919AL[,1]
t19 <- batting1919AL |> select(yearID)
row.names(t19)<-batting1919AL[,1]
kbl (head (cbind(x19,y19)),
caption="Data for the 1919AL",
booktab= T, digits = 4, align='c') |>
kable_styling(latex_options = c("HOLD_position"))
#Data for DJL's Malmquist function
# batting1919_20AL <- batting |> filter((yearID>1918)
# & (yearID<1921), lgID=="AL")
```
```{r preparing-BB-IO-1920-data}
batting1920AL <- batting %>% filter(yearID==1920, lgID=="AL")
x20 <- batting1920AL %>% select( PA)
row.names(x20)<-batting1920AL[,1]
y20 <- batting1920AL %>% select(BB, X1B, X2B, X3B, HR)
row.names(y20)<-batting1920AL[,1]
t20 <- batting1920AL %>% select(yearID)
row.names(t20)<-batting1920AL[,1]
kbl (head (cbind(x20,y20)),
caption="Data for the 1920AL",
booktab= T, digits = 4, align='c') |>
kable_styling(latex_options = c("HOLD_position"))
#Data for DJL's Malmquist function
# batting1919_20AL <- batting %>% filter((yearID>1918)
# & (yearID<1921), lgID=="AL")
```
```{r}
# From Auro's help file for Malmquist function (MPI)
da_f <- data.frame(x= c(11, 29, 31, 61, 13, 27, 17, 61), # Inputs
y= c( 6, 8, 11, 16, 7, 9, 10, 16), # Outputs
d= c( 1, 2, 3, 4, 1, 2, 3, 4), # DMU
p= c( 1, 1, 1, 1, 2, 2, 2, 2)) # Period
mpi_r <- MultiplierDEA::MPI(Dataset = da_f, DMU_colName = "d",
IP_colNames = "x", OP_ColNames = "y",
Period_ColName = "p", Periods = c(1,2),
rts = "vrs", orientation = "input", scale = TRUE)
# Examine the MPI for DMUs
mpi_r$m.vrs
```
```{r}
# Now let's run it on our baseball data for 1919 and 1920.
da_bb19 <- cbind(x19, y19, rownames(x19), rep(1919,nrow(x19)))
colnames(da_bb19) <- c("PA", "BB", "X1B", "X2B", "X3B", "HR","Player", "Year")
da_bb20 <- cbind(x20, y20, rownames(x20), rep(1920,nrow(x20)))
colnames(da_bb20) <- c("PA", "BB", "X1B", "X2B", "X3B", "HR","Player", "Year")
da_bb19_20 <- rbind (da_bb19,da_bb20) # stack together the two years
kbl(head( da_bb19_20), booktabs=T, caption="Example of Batting Data") |>
kable_styling(latex_options = c("HOLD_position"))
mpi_bb <- MultiplierDEA::MPI(Dataset = da_bb19_20, DMU_colName = "Player",
IP_colNames = "PA",
OP_ColNames = c( "BB", "X1B", "X2B", "X3B", "HR"),
Period_ColName = "Year", Periods = c(1919, 1920),
rts = "crs", orientation = "output", scale = TRUE)
# Note that running MPI on this requires a little time.
# Essentially does DEA 4 times for each player
```
```{r}
res_bb <- cbind(as.numeric(mpi_bb$et1t1.crs),
as.numeric(mpi_bb$et1t2.crs),
as.numeric(mpi_bb$et2t2.crs),
as.numeric(mpi_bb$et2t1.crs),
as.numeric(mpi_bb$tec),
as.numeric(mpi_bb$tc),
as.numeric(mpi_bb$m.crs))
colnames(res_bb) <- c("E19-T19", "E19-T20",
"E20-T20", "E20-T19", "TEC", "TC", "MPI")
rownames(res_bb)<-mpi_bb$DMU
kbl(head(res_bb, 20), digits = 4, booktabs=T,
caption="Malmquist Productivity Index Results from MultiplierDEA Package")|>
kable_styling(latex_options = c("HOLD_position"))
```
\*\* I want to dig into the numbers in more detail but this is my first pass at interpretation!\*\*
Some observations, the `NA` indicates that a player does not have data in one of the two years. For example, `ainsmed01` is efficient in the year 1919 using his batting statistics from 1919 since the E19-T19 column is 1.0. His 1919 batting statistics are below the efficiency frontier in 1920 though since E20-T19 is less than 1.0 (0.9817). In other words, the efficiency frontier has expanded in 1920 such that his 1919 performance would be slightly inefficient. He does not have data from 1920 though and therefore the two columns that would use 1920 data are filled with `NA`.
Interesting that the `NA` seems to only cover players that played in 1919 and not 1920. I'll need to doublecheck in the full dataset as there should be a roughly equal number of `NA` values in both sides.
Let's shift our attention to talk about the best players by sorting by efficiency in 1919 and using the head function to display the top 20 batters.
```{r}
# Now let's grab just a few rows of specific players to discuss
#kbl(res_bb[c("cobbty01", "collied01", "ruthba01"),], digits=4, booktabs=T,
# caption="MPI results for Selected Batters")
kbl(head(res_bb
[order(res_bb[,3], decreasing=TRUE),],20),
digits=5, booktabs=T,
caption="Malmquist Productivity Index Results for Baseball Batting, 1919-1920") |>
kable_styling(latex_options = c("HOLD_position", "scale_down")) |>
footnote (general="Top 20 Efficient 1920 American League Batters")
```
Ty Cobb's values are interesting. His value for *E19-T19* is *1* and therefore he was efficient in 1919 using his 1919 batting statistics. He is no longer efficient in 1920 as indicated by *E20-T20* being *0.9165.* While his 1919 performance was efficient relative to 1919, it would not have been efficient against the frontier in 1920 (E19-T20 \< 1.0). The *TC* value indicates that his efficiency declined relative to their respective efficiency frontiers. The TEC value though is larger than 1 indicating that the efficiency frontier expanded relative to the regions of the frontier that Ty Cobb was producing. If we had been looking at managerial performance of DMUs instead of batting performance of hitters, we would say that Ty Cobb was not as efficient in using his input to produce outputs relative to the changing efficiency frontier. The efficiency frontier expanded and Ty Cobb fell off the efficiency frontier.
In contrast, Babe Ruth was efficient in both 1919 and 1920. His efficiency score in 1919 relative to the 1920 frontier was much larger than *1* indicating that he far exceeded the frontier in his region of the frontier of 1920. His 1920 performance also exceeded the 1919 frontier. His values indicate that the efficiency frontier is rising by *26%* in his area of the frontiers (*TC=1.26*).
### MPI with Ordinal Weight Restrictions
Now let's try examining this with weight restrictions. Recall that in chapter 8 we indicated that Home Runs should be at least as valuable as triples, triples are at least as valuable as doubles, etc. The `MultiplierDEA` package's MPI function does not include a parameter to pass weight restrictions. Fortunately, we can accommodate ordinal weight restrictions by making a simple data transformation on the outputs. Each output is converted into the sum of that output and "better" outputs. Home Runs are unchanged, but Triples are redefined as the number of Triples + Home Runs. Doubles are converted to be Doubles+Triples+Home Runs. Specifically, we have the following:
- Home Runs or longer: (Home Runs+Triples+Doubles+Singles)
- Triples or longer: (Triples)
- Doubles or longer: (Doubles+Triples+Home Runs)
- Singles or longer: (Singles+Doubles+Triples+Home Runs)
- Walks or longer: (Walks+Doubles+Triples+Home Runs)
Note that we don't specify a relationship between singles and walks (or base on balls) as they are similar but different. A single has more potential for advancing other runners while a walk typically has a higher pitch count which may help wear out the opposing pitcher.
We'll use the `m` prefix to represent the modified data and results. Let's use a new object, `my19` to reflect these modified (accumulated) outputs.
```{r}
my19 <- y19
my19 <- my19 |> dplyr::mutate(X3B=X3B+HR) |>
dplyr::mutate(X2B=X2B+X3B) |>
dplyr::mutate(X1B=X1B+X2B) |>
dplyr::mutate(BB=BB+X2B)
```
```{r}
kbl(head(cbind(y19, my19)), booktabs=T,
caption="Modified Outputs for Ordinal Weight Restrictions") |>
add_header_above( c(" " = 1, "Original Outputs" = 5,
"Modified Outputs" = 5)) |>
kable_styling(latex_options = c("HOLD_position")) |>
footnote (general="First six batters from 1919 American League")
```
We repeat the same process to create `my20`.
```{r, echo=FALSE}
my20 <- y20
my20 <- my20 |> dplyr::mutate(X3B=X3B+HR) |>
dplyr::mutate(X2B=X2B+X3B) |>
dplyr::mutate(X1B=X1B+X2B) |>
dplyr::mutate(BB=BB+X2B)
```
```{r}
mda_bb19 <- cbind(x19, my19, rownames(x19), rep(1919,nrow(x19)))
colnames(mda_bb19) <- c("PA", "BB", "X1B", "X2B", "X3B", "HR",
"Player", "Year")
mda_bb20 <- cbind(x20, my20, rownames(x20), rep(1920,nrow(x20)))
colnames(mda_bb20) <- c("PA", "BB", "X1B", "X2B", "X3B", "HR",
"Player", "Year")
mda_bb19_20 <- rbind (mda_bb19,mda_bb20) # stack together the two years
mmpi_bb <- MultiplierDEA::MPI(Dataset = mda_bb19_20, DMU_colName = "Player",
IP_colNames = "PA",
OP_ColNames = c( "BB", "X1B", "X2B", "X3B", "HR"),
Period_ColName = "Year", Periods = c(1919, 1920),
rts = "crs", orientation = "output", scale = TRUE)
```
```{r}
mres_bb <- cbind(as.numeric(mmpi_bb$et1t1.crs),
as.numeric(mmpi_bb$et1t2.crs),
as.numeric(mmpi_bb$et2t2.crs),
as.numeric(mmpi_bb$et2t1.crs),
as.numeric(mmpi_bb$tec),
as.numeric(mmpi_bb$tc),
as.numeric(mmpi_bb$m.crs))
colnames(mres_bb) <- c("E19-T19", "E19-T20",
"E20-T20", "E20-T19", "TEC", "TC", "MPI")
rownames(mres_bb)<-mmpi_bb$DMU
kbl(head(mres_bb [order(mres_bb[,1], decreasing=TRUE),],20),
digits=5, booktabs=T,
caption="Malmquist Productivity Index Results with Ordinal Weight Restrictions") |>
kable_styling(latex_options = c("HOLD_position", "scale_down")) |>
footnote (general="Top 20 Efficient 1919 American League Batters")
```
Note that in this case there are only four efficient batters in 1919 (Ty Cobb, Babe Ruth, George Sisler, and Bobby Veach). In 1920, only two batters are efficient (Babe Ruth and George Sisler)
**Results do not appear to match the weight restriction model results from chapter 8!**
- Take a look at Tris Speaker. His 1920 efficiency score with ordinal weight restrictions is 0.95140 but in chapter 8 with weight restrictions it is 0.9853.
- On the other hand, Joe Dugan gets the same score of 0.81610 in both analyses.
- The weight restriction results appear to be consistent for both Speaker and Dugan.
- The accumulation of outputs appears to be correct here so as to incorporate ordinal weight restrictions.
### Testing Weights from Accumulated (Ordinal) Approach
```{r}
res20test<-MultiplierDEA::DeaMultiplierModel(x20, my20, "crs", "output")
res20tab <- cbind (res20test$Efficiency,
res20test$vx, res20test$uy)
```
```{r}
kbl(t(format( res20tab["speaktr01",], digits=5)),
booktabs=T,
caption="Results from Multiplier DEA Using Ordinal (Accumulated) Weight Restrictions")|>
kable_styling(latex_options = c("HOLD_position"))
```
The results match those found along the way in the Malmquist analysis but do not agree with those from earlier. Let's unpack the weights given the structure of
```{r}
mres20uyBB <- res20tab[,3]
mres20uy1B <- res20tab[,4]
mres20uy2B <- res20tab[,5] +res20tab[,4] + res20tab[,3]
mres20uy3B <- res20tab[,6] + res20tab[,5] + res20tab[,4] + res20tab[,3]
mres20uyHR <- res20tab[,7] + res20tab[,6] + res20tab[,5] + res20tab[,4] + res20tab[,3]
mres20uy <- cbind(mres20uyBB, mres20uy1B, mres20uy2B, mres20uy3B, mres20uyHR)
mres20uy ["speaktr01",]
```
Perhaps there is an issue in the aggregation approach for single and walks.
\index{Malmquist productivity index|)}