forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathFunction-operators.rmd
796 lines (594 loc) · 34 KB
/
Function-operators.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
---
title: Function operators
layout: default
---
# Function operators
In this chapter, you'll learn about function operators: functions that take one (or more) functions as input and return a function as output. Function operators are an FP technique related to functionals. But where functionals are used as substitutes for common loop structures, function operators are used as substitutes for the common uses of anonymous functions. Like functionals, they don't add functionality. However, they can make your code more readable and expressive, and they can help you write your code faster.
An example of a function operator (FO) is a simple function `chatty()`, which shows its input and output (in a naive way). It's useful because it provides a window into functionals: in the case, we use it to see the differences between how `lapply()` and `mclapply()` execute code. (We'll explore this theme in more detail below with the fully-featured `tee()` function.)
```{r, eval = FALSE}
library(parallel)
chatty <- function(f) {
function(x) {
res <- f(x)
cat(format(x), " -> ", format(res, digits = 3), "\n", sep = "")
res
}
}
s <- c(0.4, 0.3, 0.2, 0.1)
x2 <- lapply(s, chatty(Sys.sleep))
#> 0.4 -> NULL
#> 0.3 -> NULL
#> 0.2 -> NULL
#> 0.1 -> NULL
x2 <- mclapply(s, chatty(Sys.sleep))
#> 0.3 -> NULL
#> 0.4 -> NULL
#> 0.1 -> NULL
#> 0.2 -> NULL
```
In the last chapter, we saw that most built-in functionals, like `Reduce`, `Filter` and `Map`, have very few arguments. So we used anonymous functions to modify how they worked. In this chapter, we'll build substitutes for standard anonymous functions with specialised equivalents that allow us to communicate our intent more clearly. For example, in the last chapter we used an anonymous function with `Map` to supply fixed arguments:
```{r, eval = FALSE}
Map(function(x, y) f(x, y, zs), xs, ys)
```
Later in this chapter, we'll learn about partial application using the `partial()` function. Partial application encapsulates the use of an anonymous function to supply default arguments, and allows us to write succinct code:
```{r, eval = FALSE}
Map(partial(f, zs = zs), xs, yz)
```
This is an important use of FOs: by transforming the input function, you eliminate parameters from a functional. In fact, as long as the inputs and outputs of the function remain the same, this approach allows your functionals to be more extensible, often in ways you haven't thought of.
In this chapter, we'll explore four types of function operators (FOs):
* __Behavioural FOs__. While leaving the function otherwise unchanged, this type can do things like automatically log when the function is run, ensure that a function is run only once, and delay the operation of a function.
* __Output FOs__. This type can return different values depending on whether a function throws an error, or negates the result of a logical predicate.
* __Input FOs__. This type can modify inputs like partially evaluating a function, convert a function that takes multiple arguments to one that takes a list, or automatically vectorise a function.
* __Combining FOs__. This type can combine the results of predicate functions with boolean operators, or compose multiple function calls.
For each type, we'll show you some useful FOs, and how you can use them as another way of describing a function's tasks: as combinations of multiple functions instead of combinations of arguments. The goal is not to provide an exhaustive list of every possible FO, but to show a selection that demonstrate how well they work with functionals and other FOs. For your own work, you'll need to think about and experiment with how function operators can help you solve recurring problems.
The examples in this chapter come from five years of creating function operators in different R packages (particularly plyr), and from reading about useful operators in other languages.
### In other languages
Function operators are used extensively in FP languages like Haskell, and commonly in Lisp, Scheme and Clojure. They are also an important part of modern JavaScript programming, like in the [underscore.js](http://underscorejs.org/) library. They are particularly common in CoffeeScript because its syntax for anonymous functions is so concise. In stack-based languages like Forth and Factor, function operators are almost exclusively used because it's rare to refer to variables by name. Python's decorators are just function operators by a [different name](http://stackoverflow.com/questions/739654/). In Java, they are very rare because it's difficult to manipulate functions (although possible if you wrap them up in strategy-type objects). They are also rare in C++ because, while it's possible to create objects that work like functions ("functors") by overloading the `()` operator, modifying these objects with other functions is not a common programming technique. That said, C++ 11 includes partial application (`std::bind`) as part of the standard library.
## Behavioural FOs
Behavioural FOs leave the inputs and outputs of a function unchanged, but adds some extra behaviour. In this section, we'll look at functions which implement four possible behaviours:
* log to disk every time a function is run
* add a delay to avoid swamping a server with requests
* print to console every n invocations to check on a long running process
* cache previous computations to improve performance
To motivate these use cases, imagine we want to download a long vector of urls with `download.file()`. That's pretty simple with `lapply()`:
```{r, eval = FALSE}
lapply(urls, download.file, quiet = TRUE)
```
(This example ignores the fact that `download.file` also needs a file name, so pretend, for the purposes of exposition, it has a useful default.)
For a vareity of reasons, we might want to add some behaviours to this function. Because the list is long, we might want to print a `.` every ten urls so we know the function's still working. To avoid hammering the server, we might want to add a small delay between each request. Implementing these behaviors in a for loop is rather complicated (For example, we can no longer use `lapply()` because we need an external counter):
```{r, eval = FALSE}
i <- 1
for(url in urls) {
i <- i + 1
if (i %% 10 == 0) cat(".")
Sys.delay(1)
download.file(url, quiet = TRUE)
}
```
Reading this code is quite hard because we are using low-level functions and because it's not obvious (without some thought) what the overall objective is. In the remainder of this chapter we'll create FOs that encapsulate each of the behaviours described above. Ultimately, this will allow us to write code like:
```{r, eval = FALSE}
lapply(urls, dot_every(10, delay_by(1, download.file)), quiet = TRUE)
```
### Useful behavioural FOs
Implementing `delay_by` is straightforward, and follows the same basic template that we'll see for the majority of FOs in this chapter:
```{r}
delay_by <- function(delay, f) {
function(...) {
Sys.sleep(delay)
f(...)
}
}
system.time(runif(100))
system.time(delay_by(1, runif)(100))
```
`dot_every` is a little bit more complicated because it needs to modify state in the parent environment using `<<-`. If it's not clear how this works, you might want to re-read the mutable state section in [Functional programming](#functional-programming).
```{r}
dot_every <- function(n, f) {
i <- 1
function(...) {
if (i %% n == 0) cat(".")
i <<- i + 1
f(...)
}
}
x <- lapply(1:100, runif)
x <- lapply(1:100, dot_every(10, runif))
```
Notice that I've made the function the last argument in each FO. This will make it easier to read when we compose multiple function operators. If the function were the first argument, then instead of:
```{r, eval = FALSE}
download <- dot_every(10, delay_by(1, download.file))
```
we'd have
```{r, eval = FALSE}
download <- dot_every(delay_by(download.file, 1), 10)
```
This is a little harder to follow because the argument of `dot_every()` is far away from its call. This is sometimes called the [Dagwood sandwich](http://en.wikipedia.org/wiki/Dagwood_sandwich) problem: you have too much filling (too many long arguments) between your slices of bread (parentheses). I've also tried to give my FOs descriptive names: delay by 1 (second), (print a) dot every 10 (invocations). The more clearly your code expresses your intent in the names of a function, the easier it will be for others (and future you) to read and understand the code.
Two other tasks that you can solve with a behaviour FO are:
* Logging a time stamp and message to a file every time a function is run:
```{r}
log_to <- function(path, message, f) {
stopifnot(file.exists(path))
function(...) {
cat(Sys.time(), ": ", message, sep = "", file = path,
append = TRUE)
f(...)
}
}
```
* Ensuring that if the first input is `NULL` then the output is `NULL` (The name is inspired by Haskell's maybe monad which plays a similar role: it makes it possible for any function to work with a `NULL` argument.).
```{r}
maybe <- function(f) {
function(x, ...) {
if (is.null(x)) return(NULL)
f(x, ...)
}
}
```
### Memoisation
Another thing you might worry about when downloading multiple files is accidentally downloading the same file multiple times. You could avoid this by calling `unique` on the list of input URLs, or manually managing a data structure that mapped the URL to the result. An alternative approach is to use memoisation: a way of modifying a function to automatically cache its results.
```{r}
library(memoise)
```
```{r, cache = TRUE}
slow_function <- function(x) {
Sys.sleep(1)
10
}
system.time(slow_function())
system.time(slow_function())
fast_function <- memoise(slow_function)
system.time(fast_function())
system.time(fast_function())
```
Memoisation is an example of the classic tradeoff memory and speed in computer science. While a memoised function uses more memory because it stores all of the previous inputs and outputs, it runs much faster.
One realistic use case is computing the Fibonacci series. The Fibonacci series is defined recursively: the first two values are 1 and 1, then f(n) = f(n - 1) + f(n - 2). A naive version implemented in R would be very slow because, for example, `fib(10)` computes `fib(9)` and `fib(8)`, and `fib(9)` computes `fib(8)` and `fib(7)`, and so on. As a result, the value for each value in the series gets computed many, many times. Memoising `fib()` makes the implementation much faster because each value is computed only once, and then saved.
```{r, cache = TRUE}
fib <- function(n) {
if (n < 2) return(1)
fib(n - 2) + fib(n - 1)
}
system.time(fib(23))
system.time(fib(24))
fib2 <- memoise(function(n) {
if (n < 2) return(1)
fib2(n - 2) + fib2(n - 1)
})
system.time(fib2(23))
system.time(fib2(24))
```
It doesn't make sense to memoise all functions. The example below shows that a memoised random number generator is no longer random:
```{r}
runifm <- memoise(runif)
runifm(5)
runifm(5)
```
Once we understand `memoise()`, it's straightforward to apply to our problem:
```{r, eval = FALSE}
download <- dot_every(10, memoise(delay_by(1, download.file)))
```
This gives a function that we can easily use with `lapply()`. However, if something goes wrong with the loop inside `lapply()`, it can be difficult to tell what's going on. The next section will show how we can use FOs to pull back the curtain and look inside.
### Capturing function invocations
One challenge with functionals is that it can be hard to see what's going on inside. It's not easy to pry open their internals like it is with a for loop. However, we can use FOs to help us. The `tee` function, defined below, has three arguments, all functions: `f`, the original function; `on_input`, a function that's called with the inputs to `f`, and `on_output` a function that's called with the output from `f`.
```{r}
ignore <- function(...) NULL
tee <- function(f, on_input = ignore, on_output = ignore) {
function(...) {
input <- if (nargs() == 1) c(...) else list(...)
on_input(input)
output <- f(...)
on_output(output)
output
}
}
```
(The function is inspired by the unix `tee` shell command which is used to split up streams of file operations so that you can both display what's happening and save intermediate results to a file. The function is named after the `t connector` in plumbing)
We can use `tee` to look into how `uniroot` finds where `x` and `cos(x)` intersect:
```{r, echo = FALSE}
old <- options(digits = 6, scipen = 9)
```
```{r}
g <- function(x) cos(x) - x
zero <- uniroot(g, c(-5, 5))
# The location where the function is evaluated
zero <- uniroot(tee(g, on_input = print), c(-5, 5))
# The value of the function
zero <- uniroot(tee(g, on_output = print), c(-5, 5))
```
```{r, echo = FALSE}
options(old)
```
While using `print()` allows us to see what's happening as the function runs, it doesn't give us a way to work with the values. To do that, we could capture the sequence of calls by creating a function, `remember()`, that records every argument called and retrieves them when coerced into a list. (The small amount of S3 magic that makes this simple is explained in [S3](#s3)).
```{r}
remember <- function() {
memory <- list()
f <- function(...) {
# This is inefficient!
memory <<- append(memory, list(...))
invisible()
}
structure(f, class = "remember")
}
as.list.remember <- function(x, ...) {
environment(x)$memory
}
print.remember <- function(x, ...) {
cat("Remembering...\n")
str(as.list(x))
}
```
Now we can see exactly how uniroot zeros in on the final answer:
```{r, uniroot-explore}
locs <- remember()
vals <- remember()
zero <- uniroot(tee(g, locs, vals), c(-5, 5))
# FIXME: should need as.list.remember, but knitr environment
# seems to prevent S3 from finding the right method
x <- sapply(as.list.remember(locs), "[[", 1)
error <- sapply(as.list.remember(vals), "[[", 1)
plot(x, type = "b"); abline(h = 0.739, col = "grey50")
plot(error, type = "b"); abline(h = 0, col = "grey50")
```
### Exercises
* What does the following function do? What would be a good name for it?
```{r}
f <- function(g) {
result <- NULL
function(...) {
if (is.null(result)) {
result <<- g(...)
}
result
}
}
runif2 <- f(runif)
runif2(5)
runif2(5)
```
* Modify `delay_by()` so that instead of delaying by a fixed amount of time, it ensures that a certain amount of time has elapsed since the function was last called. That is, if you called `g <- delay_by(1, f); g(); Sys.sleep(2); g()` there shouldn't be an extra delay.
* Write `wait_until()` which delays execution until a specific time. Or write `run_after()` which only runs a function after a specified time, returning `NULL` otherwise.
* There are three places we could have added a memoise call: why did we choose the one we did?
```{r, eval = FALSE}
download <- memoise(dot_every(10, delay_by(1, download.file)))
download <- dot_every(10, memoise(delay_by(1, download.file)))
download <- dot_every(10, delay_by(1, memoise(download.file)))
```
* Why is the `remember()` function inefficient? How could you implement it in more efficient way?
## Output FOs
The next step up in complexity is to modify the output of a function. This could be quite simple, or it could fundamentally change the operation of the function by returning something completely different to its usual output. In this section you'll learn about two simple modifications, `Negate()` and `failwith()`, and two fundamental modifications, `capture_it()` and `time_it()`.
### Minor modifications
`base::Negate` and `plyr::failwith` offer two minor, but useful, modifications of a function that are particularly handy in conjunction with functionals.
`Negate()` takes a function that returns a logical vector (a predicate function), and returns the negation of that function. This can be a useful shortcut when the function you have returns the opposite of what you need. The essence of `Negate()` is very simple:
```{r}
Negate <- function(f) {
function(...) !f(...)
}
(Negate(is.null))(NULL)
```
I often use this idea to make a function, `compact()`, that removes all null elements from a list:
```{r}
compact <- function(x) Filter(Negate(is.null), x)
```
`plyr::failwith()` turns a function that throws an error into a function that returns a default value when there's an error. Again, the essence of `failwith()` is simple, it's just a wrapper around `try()`, the function that captures errors and allows execution to continue. (If you haven't seen `try()` before, it's discussed in more detail in [exceptions and debugging](#ignore-errors-with-try)):
```{r, error = TRUE}
failwith <- function(default = NULL, f, quiet = FALSE) {
function(...) {
out <- default
try(out <- f(...), silent = quiet)
out
}
}
log("a")
failwith(NA, log)("a")
failwith(NA, log, quiet = TRUE)("a")
```
`failwith()` is very useful in conjunction with functionals: instead of the failure propagating and terminating the higher-level loop, you can complete the iteration and then find out what went wrong. For example, imagine you're fitting a set of generalised linear models (GLMs) to a list of data frames. While GLMs can sometimes fail because of optimisation problems, you'd still want to be able to try to fit all the models, and later look back at those that failed:
```{r, eval = FALSE}
# If any model fails, all models fail to fit:
models <- lapply(datasets, glm, formula = y ~ x1 + x2 * x3)
# If a model fails, it will get a NULL value
models <- lapply(datasets, failwith(NULL, glm),
formula = y ~ x1 + x2 * x3)
# remove failed models (NULLs) with compact
ok_models <- compact(models)
# use where to extract the datasets corresponding to failed models
failed_data <- datasets[vapply(models, is.null, logical(1))]
```
I think this is a great example of the power of combining functionals and function operators: it makes it easy to succinctly express what you need to do to solve a common data analysis problem.
### Changing what a function does
Other output function operators can have a more profound affect on the operation of the function. Instead of returning the original return value, we can return some other effect of the function evaluation. Here are two examples:
* Return text that the function `print()`ed:
```{r}
capture_it <- function(f) {
function(...) {
capture.output(f(...))
}
}
str_out <- capture_it(str)
str(1:10)
str_out(1:10)
```
* Return how long a function took to run:
```{r}
time_it <- function(f) {
function(...) {
system.time(f(...))
}
}
```
`time_it()` allows us to rewrite some of the code from the functionals chapter:
```{r}
compute_mean <- list(
base = function(x) mean(x),
sum = function(x) sum(x) / length(x)
)
x <- runif(1e6)
# Instead of using an anonymous function to time execution
lapply(compute_mean, function(f) system.time(f(x)))
# We can compose function operators
call_fun <- function(f, ...) f(...)
lapply(compute_mean, time_it(call_fun), x)
```
In this example, there's not a huge benefit to using function operators, because the composition is simple and we're applying the same operator to each function. Generally, using function operators is most effective when you are using multiple operators or if the gap between creating them and using them is large.
### Exercises
* Create a `negative` function that flips the sign of the output of the function to which it's applied.
* The `evaluate` package makes it easy to capture all the outputs (results, text, messages, warnings, errors and plots) from an expression. Create a function like `capture_it()` that also captures the warnings and errors generated by a function.
* Create a FO that tracks files created or deleted in the working directory (Hint: use `setDiff()` and `dir()`). What other global effects of functions might you want to track?
* Modify the final example to use `fapply()` from [looping pattern](#looping-patterns) chapter instead of `lapply()`.
## Input FOs
The next step up in complexity is to modify the inputs of a function. Again, you can modify how a function works in a minor way (e.g., setting default argument values), or in a major way (e.g. converting inputs from scalars to vectors, or vectors to matrices).
### Prefilling function arguments: partial function application
A common use of anonymous functions is to make a variant of a function that has certain arguments "filled in" already. This is called "partial function application", and is implemented by `pryr::partial`. (Once you have read the computing on the language chapter, I encourage you to read the source code for `partial` and figure out how it works - it's only 5 lines of code!)
`partial()` allows us to replace code like
```{r, eval = FALSE}
f <- function(a) g(a, b = 1)
compact <- function(x) Filter(Negate(is.null), x)
Map(function(x, y) f(x, y, zs), xs, ys)
```
with
```{r, eval = FALSE}
f <- partial(g, b = 1)
compact <- partial(Filter, Negate(is.null))
Map(partial(f, zs = zs), xs, ys)
```
We can use this idea to simplify the code used when working with lists of functions. Instead of:
```{r}
funs2 <- list(
sum = function(x, ...) sum(x, ..., na.rm = TRUE),
mean = function(x, ...) mean(x, ..., na.rm = TRUE),
median = function(x, ...) median(x, ..., na.rm = TRUE)
)
```
We can write:
```{r}
library(pryr)
funs2 <- list(
sum = partial(sum, na.rm = TRUE),
mean = partial(mean, na.rm = TRUE),
median = partial(median, na.rm = TRUE)
)
```
But if you look closely, you'll notice we're just applying a function to every element in a list. Since that's the job of `lapply`, we can further reduce the code:
```{r}
funs <- c(sum = sum, mean = mean, median = median)
funs2 <- lapply(funs, partial, na.rm = TRUE)
```
Next, let's think about a similar, but subtly different case. Say we have a numeric vector and we want to generate a list of means that are variously trimmed. The following code won't work because we want the first argument of `partial` to be the `mean` function. Since fixed matching overrides positional matching, we could instead try specifying the argument by name. But that won't work because the `trims` end up being the first argument of `mean`.
```{r, error = TRUE}
(trims <- seq(0, 0.9, length = 5))
funs3 <- lapply(trims, partial, `_f` = mean)
sapply(funs3, call_fun, c(1:100, (1:50) * 100))
```
Instead we could use an anonymous function:
```{r}
funs4 <- lapply(trims, function(t) partial(mean, trim = t))
funs4[[1]]
sapply(funs4, call_fun, c(1:100, (1:50) * 100))
```
But that too won't work because each function gets a promise to evaluate `t`, and that promise isn't evaluated until all of the functions are run, when `t = 0.9`. To make it work, you need to manually force the evaluation of `t`:
```{r}
funs5 <- lapply(trims, function(t) {
force(t)
partial(mean, trim = t)
})
funs5[[1]]
sapply(funs5, call_fun, c(1:100, (1:50) * 100))
```
When writing functionals, you can expect your users to know of `partial()` and not use `...` For example, instead of implementing `lapply()` like:
```{r}
lapply2 <- function(x, f, ...) {
out <- vector("list", length(x))
for (i in seq_along(x)) {
out[[i]] <- f(x[[i]], ...)
}
out
}
unlist(lapply2(1:5, log, base = 10))
```
we could implement it as:
```{r}
lapply3 <- function(x, f) {
out <- vector("list", length(x))
for (i in seq_along(x)) {
out[[i]] <- f(x[[i]])
}
out
}
unlist(lapply3(1:5, partial(log, base = 10)))
```
Using partial function application is a straightforward task in many functional programming languages. But it's not entirely clear how it should interact with R's lazy evaluation rules. The approach `plyr::partial` takes is to create a function that is as similar as possible to the analogous anonymous function you'd create by hand. Peter Meilstrup takes a different approach in his [ptools package](https://github.com/crowding/ptools/). If you're interested in the topic., you might want to read about the binary operators he created:`%()%`, `%>>%` and `%<<%`.
### Changing input types
Instead of a minor change to the function's inputs, it's also possible to make a major change like making a function work with fundamentally different types of data. There are a few existing functions that work along these lines:
* `base::Vectorize` converts a scalar function to a vector function. `Vectorize` takes a non-vectorised function and vectorises it with respect to the arguments specified in the `vectorize.args` argument. This doesn't give you any magical performance improvements, but it's useful if you want a quick and dirty way of making a vectorised function.
A mildly useful extension of `sample` would be to vectorize it with respect to size. Doing so would allow you to generate multiple samples in one call.
```{r}
sample2 <- Vectorize(sample, "size", SIMPLIFY = FALSE)
sample2(1:5, c(1, 1, 3))
sample2(1:5, 5:3)
```
In this example we have used `SIMPLIFY = FALSE` to ensure that our newly vectorised function always returns a list. This is usually what you want.
* `splat` converts a function that takes multiple arguments to one that takes a single list of arguments.
```{r}
splat <- function (f) {
function(args) {
do.call(f, args)
}
}
```
This is useful if you want to invoke a function with varying arguments:
```{r}
x <- c(NA, runif(100), 1000)
args <- list(
list(x),
list(x, na.rm = TRUE),
list(x, na.rm = TRUE, trim = 0.1)
)
lapply(args, splat(mean))
```
* `plyr::colwise()` converts a vector function to one that works with data frames:
```{r, error = TRUE}
median(mtcars)
median(mtcars$mpg)
plyr::colwise(median)(mtcars)
```
### Exercises
* Our previous `download()` function only downloads a single file. How can you use `partial()` and `lapply()` to create a function that downloads multiple files at once? What are the pros and cons of using `partial()` vs. writing a function by hand?
* Read the source code for `plyr::colwise()`. How does the code work? What are `colwise`'s three main tasks? How could you make `colwise` simpler by implementing each task as a function operator? (Hint: think about `partial`)
* Write FOs that convert a function to return a matrix instead of a data frame, or a data frame instead of a matrix. (If you already know [S3](#s3), make these methods of `as.data.frame` and `as.matrix`)
* You've seen five functions that modify a function to change its output from one form to another. What are they? Draw a table of the various combinations of types of outputs: what should go in the rows and what should go in the columns? What function operators might you want to write to fill in the missing cells? Come up with example use cases.
* Look at all the examples of using an anonymous function to partially apply a function in this and the previous chapter. Replace the anonymous function with `partial`. What do you think of the result? Is it easier or harder to read?
## Combining FOs
Besides just operating on single functions, function operators can take multiple functions as input. One simple example of this is `plyr::each()`. It takes a list of vectorised functions and combines them into a single function. Then, as a way of applying those individual functions in a single pass, it applies the single function to the input:
```{r}
summaries <- plyr::each(mean, sd, median)
summaries(1:10)
```
Two more complicated examples are combining functions through composition, or through boolean algebra. These capabilities are the glue that allow us to join multiple functions together.
### Function composition
An important way of combining functions is through composition: `f(g(x))`. Composition takes a list of functions and applies them sequentially to the input. It's a replacement for the common pattern of anonymous function that chains multiple functions together to get the result you want:
```{r}
sapply(mtcars, function(x) length(unique(x)))
```
A simple version of compose looks like this:
```{r}
compose <- function(f, g) {
function(...) f(g(...))
}
```
(`pryr::compose()` provides a more full-featured alternative that can accept multiple functions).
This allows us to write:
```{r}
sapply(mtcars, compose(length, unique))
```
Mathematically, function composition is often denoted with the infix operator, o, `(f o g)(x)`. Haskell, a popular functional programming language, uses `.` to the same end. In R, we can create our own infix composition function:
```{r}
"%.%" <- compose
sapply(mtcars, length %.% unique)
sqrt(1 + 8)
compose(sqrt, `+`)(1, 8)
(sqrt %.% `+`)(1, 8)
```
Compose also allows for a very succinct implementation of `Negate`, which is just a partially evaluated version of `compose()`.
```{r}
Negate <- partial(compose, `!`)
```
We could also implement the standard deviation by breaking it down into separate sets of function composition:
```{r}
square <- function(x) x^2
deviation <- function(x) x - mean(x)
sd <- sqrt %.% mean %.% square %.% deviation
sd(1:10)
```
This type of programming is called tacit or point-free programming. (The term point-free comes from the use of "point" to refer to values in topology; this style is also derogatorily known as pointless). In this style of programming, you don't explicitly refer to variables. Instead, you focus on the high-level composition of functions rather than the low-level flow of data. The focus is on what's being done, not on objects it's being done to. Also, since we're using only functions and not parameters, we use verbs and not nouns. This style is common in Haskell, and is the typical style in stack based programming languages like Forth and Factor. It's not a terribly natural or elegant style in R, but it is a useful tool to have in your toolbox.
`compose()` is particularly useful in conjunction with `partial()`, because `partial()` allows you to supply additional arguments to the functions being composed. One nice side effect of this style of programming is that it keeps the function's arguments near the function's name. This is important because, as the size of the chunk of code you have to hold in your head grows, code becomes harder to understand.
Below I take the example from the first section of the chapter and modify it to use the two styles of function composition described above. Both results are longer than the original code, but they may be easier to understand because the function and its arguments are closer together. Note that we still have to read them from right to left (bottom to top): the first function called is the last one written. We could define `compose()` to work in the opposite direction, but in the long run, this is likely to lead to confusion since we'd create a small part of the langugage that reads differently from every other part.
```{r, eval = FALSE}
download <- dot_every(10, memoise(delay_by(1, download.file)))
download <- pryr::compose(
partial(dot_every, 10),
memoise,
partial(delay_by, 1),
download.file
)
download <- partial(dot_every, 10) %.%
memoise %.%
partial(delay_by, 1) %.%
download.file
```
### Logical predicates and boolean algebra
When I use `Filter()` and other functionals that work with logical predicates, I often find myself using anonymous functions to combine multiple conditions:
```{r, eval = FALSE}
Filter(function(x) is.character(x) || is.factor(x), iris)
```
As an alternative, we could define function operators that combine logical predicates:
```{r}
and <- function(f1, f2) {
function(...) {
f1(...) && f2(...)
}
}
or <- function(f1, f2) {
function(...) {
f1(...) || f2(...)
}
}
not <- function(f1) {
function(...) {
!f1(...)
}
}
```
This would allow us to write:
```{r, eval = FALSE}
Filter(or(is.character, is.factor), iris)
```
This would allow us to succinctly write functions with arbitrarily complicated boolean expressions.
### Exercises
* Implement your own version of `compose` using `Reduce` and `%.%`. For bonus points, do it without calling `function`.
* Extend `and()` and `or()` to deal with any number of input functions. Can you do it with `Reduce()`? Can you keep them lazy (e.g. for `and()`, the function returns once it sees the first `FALSE`)?
* Implement the `xor()` binary operator. Implement it using the existing `xor()` function. Implement it as a combination of `and()` and `or()`. What are the advantages and disadvantages of each approach? Also think about what you'll call the resulting function, and how you might need to change the names of `and()`, `not()` and `or()` in order to keep them consistent.
* Above, we implemented boolean algebra for functions that return a logical function. Implement elementary algebra (`plus()`, `minus()`, `multiply()`, `divide()`, `exponentiate()`, `log()`) for functions that return numeric vectors.
## The common pattern and a subtle bug
Most function operators we've seen follow a similar pattern:
```{r}
funop <- function(f, otherargs) {
function(...) {
# maybe do something
res <- f(...)
# maybe do something else
res
}
}
```
However, there's a subtle problem with this implementation. It doesn't work well with `lapply()` because `f` is lazily evaluated. This means that if you give `lapply()` a list of functions and a FO to apply those functions, it will look like it repeatedly applied the last function:
```{r}
wrap <- function(f) {
function(...) f(...)
}
fs <- list(sum = sum, mean = mean, min = min)
gs <- lapply(fs, wrap)
gs$sum(1:10)
environment(gs$sum)$f
```
Another problem is that as designed, we have to pass a function object, rather than the name of a function, which is often more convenient. We can solve both problems by using `match.fun()`: it forces evaluation of `f`, and will find the function object if given its name:
```{r}
wrap2 <- function(f) {
f <- match.fun(f)
function(...) f(...)
}
fs <- c(sum = "sum", mean = "mean", min = "min")
hs <- lapply(fs, wrap2)
hs$sum(1:10)
environment(hs$sum)$f
```
### Exercises
* Why does the following code (from [stackoverflow](http://stackoverflow.com/questions/8440675)) not do what you expect?
```{r}
a <- list(0, 1)
b <- list(0, 1)
# return a linear function with slope a and intercept b.
f <- function(a, b) function(x) a * x + b
# create a list of functions with different parameters.
fs <- Map(f, a, b)
fs[[1]](3)
```
How can you modify `f` so that it works correctly?