forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
S3.Rmd
912 lines (640 loc) · 40 KB
/
S3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
# S3 {#s3}
S3 is R's first and simplest OO system. S3 is informal and ad hoc, but it has a certain elegance in its minimalism: you can't take away any part of it and still have a useful OO system. Because of these reasons, S3 should be your default choice for OO programming: you should use it unless you have a compelling reason otherwise. S3 is the only OO system used in the base and stats packages, and it's the most commonly used system in CRAN packages. \index{S3} \index{objects!S3|see{S3}}
S3 is a very flexible system: it allows you to do a lot of things that are quite ill-advised. If you're coming from a strict environment like Java, this will seem pretty frightening (and it is!) but it does give R programmers a tremendous amount of freedom. While it's very difficult to prevent someone from doing something you don't want them to do, your users will never be held back because there is something you haven't implemented yet. Since S3 has few built-in constraints, the key to its successful use is applying the constraints yourself. This chapter will teach you the conventions you should (almost) always adhere to in order to use S3 safely.
We'll use the sloop package to fill in some missing pieces when it comes to S3.
```{r setup, messages = FALSE}
# install_github("hadley/sloop")
library(sloop)
```
## Basics {#s3-basics}
An S3 object is built on top of a base type with the "class" attribute set. The base type is typically a vector, although we will see later that it's possible to use other types of classes. For example, take the factor. It is built on top of an integer vector, and the value of the class attribute is "factor". It stores information about the "levels" in another attribute.
```{r}
f <- factor("a")
typeof(f)
attributes(f)
```
An S3 object behaves differently from its underlying base type because of __generic functions__, or generics for short. A generic executes different code depending on the class of one of its arguments, almost always the first. You can see this difference with the most important generic function: `print()`.
```{r}
print(f)
print(unclass(f))
```
`unclass()` strips the class attribute from its input, so it is a useful tool for seeing what special behaviour an S3 class adds.
`str()` shows the internal structure of S3 objects. Be careful when using `str()`: some S3 classes provide a custom `str()` method which can hide the underlying details. For example, take the `POSIXlt` class, which is one of the two classes used to represent date-time data:
<!-- Jenny: When I really want to see what's going on, I use dput(). Is that a strategy worth mentioning? -->
```{r}
time <- strptime("2017-01-01", "%Y-%m-%d")
str(time)
str(unclass(time), list.len = 5)
```
A __generic__ and its __methods__ are functions that operate on classes. The role of a generic is to find the right method for the arguments that it is provided, the process of __method dispatch__. A method is a function that implements the generic behaviour for a specific class. In other words the job of the generic is to find the right method; the job of the method is to do the work.
S3 methods are functions with a special naming scheme, `generic.class()`. For example, the Date method for the `mean()` generic is called `mean.Date()`, and the factor method for `print()` is called `print.factor()`. This is the reason that most modern style guides discourage the use of `.` in function names: it makes them look like S3 methods. For example, is `t.test()` the `t` method for `test` objects?
You can find some S3 methods (those in the base package and those that you've created) by typing their names. However, this will not work with most packages because S3 methods are not exported: they live only inside the package, and are not available from the global environment. Instead, you can use `getS3method()`, which will work regardless of where the method lives:
```{r}
# Only works because the method is in the base package
mean.Date
# Always works
getS3method("mean", "Date")
```
<!-- Jenny: What is the specific goal above? To find out whether an S3 method exists? Or to see its definition? -->
### Exercises
1. The most important S3 objects in base R are factors, data frames,
and date/times (Dates, POSIXct, POSIXlt). You've already seen the
attributes and base type that factors are built on. What base types and
attributes are the others built on?
1. Describe the difference in behaviour in these two calls.
```{r}
set.seed(1014)
some_days <- as.Date("2017-01-31") + sample(10, 5)
mean(some_days)
mean(unclass(some_days))
```
1. Draw a Venn diagram illustrating the relationships between
functions, generics, and methods.
1. What does the `as.data.frame.data.frame()` method do? Why is
it confusing? How should you avoid this confusion in your own
code?
1. What does the following code return? What base type is it built on?
What attributes does it use?
```{r}
x <- ecdf(rpois(100, 10))
x
```
## Classes
S3 is a simple and ad hoc system, and has no formal definition of a class. To make an object an instance of a class, you simply take an existing object and set the __class attribute__. You can do that during creation with `structure()`, or after the fact with `class<-()`: \index{S3!classes} \index{classes!S3}
```{r}
# Create and assign class in one step
foo <- structure(list(), class = "foo")
# Create, then set class
foo <- list()
class(foo) <- "foo"
```
You can determine the class of any object using `class(x)`, and see if an object inherits from a specific class using `inherits(x, "classname")`. \index{attributes!class}
```{r}
class(foo)
inherits(foo, "foo")
```
The class name can be any character vector, but I recommend using only letters and `_`. Avoid `.`. Opinion is mixed whether to use underscores (`my_class`) or CamelCase (`MyClass`) for multi-word class names. Pick one convention and stick with it.
It's possible to provide a vector of class names, which allows S3 to implement a basic style of inheritance. This allows you to reduce your workload by allowing classes to share code where possible. We'll come back to this idea in [inheritance].
S3 has no checks for correctness. This means you can change the class of existing objects:
```{r, error = TRUE}
# Create a linear model
mod <- lm(log(mpg) ~ log(disp), data = mtcars)
class(mod)
print(mod)
# Turn it into a data frame (?!)
class(mod) <- "data.frame"
# Unsurprisingly this doesn't work very well
print(mod)
```
If you've used other OO languages, this might make you feel queasy. But surprisingly, this flexibility causes few problems: while you _can_ change the type of an object, you never _should_. R doesn't protect you from yourself: you can easily shoot yourself in the foot. As long as you don't aim the gun at your foot and pull the trigger, you won't have a problem.
To avoid foot-bullet intersections when creating your own class, you should always provide:
* A __constructor__, `new_x()`, that efficiently creates new objects with the
correct structure.
For more complicated classes, you may also want to provide:
* A __validator__, `validate_x()`, that performs more expensive checks that the
object has correct values.
* A __helper__, `x()`, that provides a convenient and neatly parameterised way
for others to construct and validate (create) objects of this class.
### Constructors
S3 doesn't provide a formal definition of a class, so it has no built-in way to ensure that all objects of a given class have the same structure (i.e. same attributes with the same types). Instead, you should enforce a consistent structure yourself by using a __constructor__. A constructor is a function whose job it is to create objects of a given class, ensuring that they always have the same structure.
There are three rules that a constructor should follow. It should:
1. Be called `new_class_name()`.
1. Have one argument for the base object, and one for each attribute.
1. Check the types of the base object and each attribute.
Base R generally does not provide constructors (three exceptions are the internal `.difftime()`, `.POSIXct()`, and `.POSIXlt()`) so we'll demonstrate constructors by filling in some missing pieces in base. (If you want to use these constructors in your own code, you can use the versions exported by the S3 package, which complete a few details that we skip here in order to focus on the core issues.)
We'll start with one of the simplest S3 classes in base R: Date, which is just a double with a class attribute. The constructor rules lead to the slightly awkward name `new_Date()`, because the existing base class uses a capital letter. I recommend using lower case class names to avoid this problem.
```{r}
new_Date <- function(x) {
stopifnot(is.double(x))
structure(x, class = "Date")
}
new_Date(c(-1, 0, 1))
```
You can use the `new_s3_*()` helpers provided by the S3 package to make this even simpler. They are wrappers around structure that require a class argument, and check the base type of `x`.
```{r}
new_Date <- function(x) {
S3::new_s3_dbl(x, class = "Date")
}
```
The purpose of the constructor is to help the developer (you). That means you can keep them simple, and you don't need to optimise the error messages for user friendliness. If you expect others to create your objects, you should also create a friendly helper function, called `class_name()`, that we'll describe shortly.
A slightly more complicated example is `POSIXct`, which is used to represent date-times. It is again built on a double, but has an attribute that specifies the time zone, a length 1 character vector. R defaults to using the local time zone, which is represented by the empty string. To create the constructor, we need to make sure each attribute of the class gets an argument to the constructor. This gives us:
```{r}
new_POSIXct <- function(x, tzone = "") {
stopifnot(is.double(x))
stopifnot(is.character(tzone), length(tzone) == 1)
structure(x,
class = c("POSIXct", "POSIXt"),
tzone = tzone
)
}
new_POSIXct(1)
new_POSIXct(1, tzone = "UTC")
```
The constructor checks that `x` is a double, and that `tzone` is a length 1 character vector. We use `stopifnot()` here since the constructor is a developer focussed function so error messages don't need to be that friendly. Note that POSIXct uses a class _vector_; we'll come back to what that means in [inheritance].
Generally, the constructor should not check that the values are valid because such checks are often expensive. For example, our `new_POSIXct()` constructor does not check that `tzone` is a valid value, and we get a warning when the object is printed.
```{r}
x <- new_POSIXct(1, "Auckland NZ")
x
```
### Validators
More complicated classes will require more complicated checks for validity. Take factors, for example. The constructor function only checks that the structure is correct:
```{r}
new_factor <- function(x, levels) {
stopifnot(is.integer(x))
stopifnot(is.character(levels))
structure(
x,
levels = levels,
class = "factor"
)
}
```
So it's possible to use this to create invalid factors:
```{r, error = TRUE}
new_factor(1:5, "a")
new_factor(0:1, "a")
```
Rather than encumbering the constructor with complicated checks, it's better to put them in a separate function. This is a good idea because it allows you to cheaply create new objects when you know that the values are correct, and to re-use the checks in other places.
```{r, error = TRUE}
validate_factor <- function(x) {
values <- unclass(x)
levels <- attr(x, "levels")
if (!all(!is.na(values) & values > 0)) {
stop(
"All `x` values must be non-missing and greater than zero",
call. = FALSE
)
}
if (length(levels) < max(values)) {
stop(
"There must at least as many `levels` as possible values in `x`",
call. = FALSE
)
}
x
}
validate_factor(new_factor(1:5, "a"))
validate_factor(new_factor(0:1, "a"))
```
This function is called primarily for its side-effects (throwing an error if the object is invalid) so you'd expect it to invisibly return its primary input. However, unlike most functions called for their side effects, its useful for validation methods to return visibly, as we'll see next.
### Helpers
If you want others to construct objects from your class, you should also provide a helper method that makes their life as easy as possible. This should have the same name as the class, and should be parameterised in a convenient way. `factor()` is a good example of this as well: you want to automatically derive the internal representation from a vector. The simplest possible implementation looks something like this:
```{r}
factor <- function(x, levels = unique(x)) {
ind <- match(x, levels)
validate_factor(new_factor(ind, levels))
}
factor(c("a", "a", "b"))
```
The validator prevents the construction of invalid objects, but for a real helper you'd spend more time creating user friendly error messages.
```{r, error = TRUE}
factor(c("a", "a", "b"), levels = "a")
```
In base R, neither `Date` nor `POSIXct` has a helper function. Instead there are two ways to construct them:
* By coercing from another type with `as.Date()` and `as.POSIXct()`. These
functions should be S3 generics, so we'll come back to them in [coercion].
* With a helper function that either parses a string (`strptime()`) or
creates a date from individual components (`ISODatetime()`).
These missing helpers mean that there's no obvious default way to create a date or date-time in R. We can fill in those missing pieces with a couple of helpers:
```{r}
Date <- function(year, month, day) {
as.Date(ISOdate(year, month, day, tz = ""))
}
POSIXct <- function(year, month, day, hour, minute, sec, tzone = "") {
ISOdatetime(year, month, day, hour, minute, sec, tz = tzone)
}
```
These helpers fill a useful role, but are not computationally efficient: behind the scenes `ISODatetime()` works by pasting the components into a string and then using `strptime()`. More efficient equivalents are `lubridate::make_datetime()` and `lubridate::make_date()`.
### Object styles
S3 gives you the freedom to build a new class on top of any existing base type. So far, we've focussed on vector-style where you take an existing vector type and add some attributes. Importantly, a single vector-style object represents multiple values. There are two other important styles: scalar-style and data-frame-style.
Each __scalar__-style object represents a single "value", and are built on top of named lists. This is the style that you are most likely to use in practice. The constructor for the scalar type is slightly different because the arguments become named elements of the list, rather than attributes.
```{r}
new_scalar_class <- function(x, y, z) {
structure(
list(
x = x,
y = y,
z = z
),
class = "scalar_class"
)
}
```
(For a real constructor, you'd also check that the `x`, `y`, and `z` fields are the types that you expect.)
In base R, the most important example of this style is `lm`, the class returned when you fit a linear model:
```{r}
mod <- lm(mpg ~ wt, data = mtcars)
typeof(mod)
names(mod)
```
The __data-frame-style__ builds on top of a data frame (a named list where each element is a vector of the same length), and adds additional attributes to store important metadata. A data-frame-style constructor looks like:
```{r}
new_df_class <- function(df, attr1, attr2) {
stopifnot(is.data.frame(df))
structure(
df,
attr1 = attr1,
attr2 = attr2,
class = c("df_class", "data.frame")
)
}
```
The most common data-frame-style class is the tibble, a modern reimagining of the data frame provided by the tibble package, and used extensively within the tidyverse.
Collectively, we'll call the attributes of a vector-style or data-frame-style class and the names of a list-style class the __fields__ of an object.
When creating your own classes, you should pick the vector style if your class closely resembles an existing vector type. Otherwise, use a scalar (list) style. The scalar type is generally easier to work with because implementing a full range of convenient vectorised methods is usually a lot of work. It's typically obvious when you need to use a data-frame-style.
### Exercises
1. Categorise the objects returned by `lm()`, `factor()`, `table()`,
`as.Date()`, `ecdf()`, `ordered()`, `I()` into "vector", "scalar", and
"other".
1. Write a constructor for `difftime` objects. What base type are they
built on? What attributes do they use? You'll need to consult the
documentation, read some code, and perform some experiments.
1. Write a constructor for `data.frame` objects. What base type is a data
frame built on? What attributes does it use? What are the restrictions
placed on the individual elements? What about the names?
1. Enhance our `factor()` helper to have better behaviour when one or
more `values` is not found in `levels`. What does `base::factor()` do
in this situation?
1. Carefully read the source code of `factor()`. What does it do that
our constructor does not?
1. What would a constructor function for `lm` objects, `new_lm()`, look like?
Why is a constructor function less useful for linear models?
## Generics and methods
The job of an S3 generic is to perform method dispatch, i.e. find the function designed to work specifically for the given class. S3 generics have a simple structure: they call `UseMethod()`, which then calls the right method. `UseMethod()` takes two arguments: the name of the generic function (required), and the argument to use for method dispatch (optional). If you omit the second argument it will dispatch based on the first argument, which is what I generally advise. \indexc{UseMethod()} \index{S3!new generic}
```{r}
# Dispatches on x
generic <- function(x, y, ...) {
UseMethod("generic")
}
# Dispatches on y
generic2 <- function(x, y, ...) {
UseMethod("generic2", y)
}
```
Note that you don't pass any of the arguments of the generic to `UseMethod()`; it uses black magic to pass them on automatically. Generally, you should avoid doing any computation in a generic, because the semantics are complicated and few people know the details. In general, any modifications to the arguments of the generic will be undone, leading to much confusion.
A generic isn't useful without some methods, which are just functions that follow a naming scheme (`generic.class`). Because a method is just a function with a special name, you _can_ call methods directly, but you generally _shouldn't_. The main reason to call the method directly is that it sometimes leads to considerable performance improvements. See [performance](#be-lazy) for an example.
```{r}
generic.foo <- function(x, y, ...) {
message("foo method")
}
generic(new_s3_scalar(class = "foo"))
```
You can see all the methods defined for a generic with `s3_methods_generic()`:
```{r}
s3_methods_generic("generic")
```
Note the false positive: `generic.skeleton()` is not a method for our generic but an existing function in the methods package. It's picked up because method definition relies only on a naming convention. This is another reason that you should avoid using `.` in non-method function names.
Remember that apart from methods that you've created, and those defined in the base package, most S3 methods will not be directly accessible. You'll need to use `getS3method("generic", "class")` to see their source code.
### Coercion
Many S3 objects can be naturally created from an existing object through __coercion__. If this is the case for your class, you should provide a coercion function, an S3 generic called `as_class_name`. Base R generally does not follow this convention, which can cause problems as illustrated by `as.factor()`:
* The name is confusing, since `as.factor()` is not the `factor` method of the
`as()` generic.
* `as.factor()` is not a generic, which means that if you create a new class
that could be usefully converted to a factor, you can not extend
`as.factor()`.
We can fix these issues by creating a new generic coercion function and providing it with some methods:
```{r}
as_factor <- function(x, ...) {
UseMethod("as_factor")
}
```
Every `as_y()` generic should have a `y` method that returns its input unchanged:
```{r}
as_factor.factor <- function(x, ...) x
```
This ensures that `as_factor()` works if the input is already a factor.
Two useful methods would be for character and integer vectors.
```{r}
as_factor.character <- function(x, ...) {
factor(x, levels = unique(x))
}
as_factor.integer <- function(x, ...) {
factor(x, levels = as.character(unique(x)))
}
```
Typically the coercion methods will either call the constructor or the helper; pick the function that makes the code simpler. Here the helper is simplest. If you use the constructor, remember to also call the validator function.
If you think your coercion function will be frequently used, it's worth providing a default method that gives a better error message. Default methods are called when no other method is appropriate, and are discussed in more detail in [inheritance].
```{r, error = TRUE}
as_factor(1)
as_factor.default <- function(x, ...) {
stop(
"Don't know how to coerce object of class ",
paste(class(x), collapse = "/"), " into a factor",
call. = FALSE
)
}
as_factor(1)
```
### Arguments
Methods should always have the same arguments as their generics. This is not usually enforced, but it is good practice because it will avoid confusing behaviour. If you do eventually turn your code into a package, R CMD check will enforce it, so it's good to get into the habit now.
There is one exception to this rule: if the generic has `...`, the method must still have all the same arguments (including `...`), but can also have its own additional arguments. This allows methods to take additional arguments, which is important because you don't know what additional arguments that a method for someone else's class might need. The downside of using `...`, however, is that any mispelled arguments will be silently swallowed.
### Exercises
1. Read the source code for `t()` and `t.test()` and confirm that
`t.test()` is an S3 generic and not an S3 method. What happens if
you create an object with class `test` and call `t()` with it? Why?
```{r}
x <- structure(1:10, class = "test")
t(x)
```
1. Carefully read the documentation for `UseMethod()` and explain why the
following code returns the results that it does. What two usual rules
of function evaluation does `UseMethod()` violate?
```{r}
g <- function(x) {
x <- 10
y <- 10
UseMethod("g")
}
g.default <- function(x) c(x = x, y = y)
x <- 1
y <- 1
g(x)
```
## Method dispatch
At a high-level, S3 method dispatch is simple, and revolves around two functions, `UseMethod()` and `NextMethod()`. You'll learn about these two functions below, and then we'll come back to some of the additional wrinkles in [dispatch details].
### `UseMethod()`
The purpose of `UseMethod()` is to find the appropriate method to call given a generic and a class. It does this by creating a vector of function names, `paste0("generic", ".", c(class(x), "default"))`, and looking for each method in turn. As soon as it finds a matching method, it calls it. If no matching method is found, it throws an error. To explore dispatch, we'll use `S3::s3_dispatch()`. You give it a call to an S3 generic, and it lists all the possible methods, noting which ones exist. For example, what happens when you try and print a `POSIXct` object?
```{r}
x <- Sys.time()
s3_dispatch(print(x))
```
`print()` will look for three possible methods, of which two exist, and one, `print.POSIXct()`, will be called. The last method is always the "default" method. This doesn't correspond to a specific class, so is a useful catch all.
### `NextMethod()`
Method dispatch usually terminates as soon as a matching method is found. However, methods can explicitly choose to call the next available method using `NextMethod()`. This is useful because it allows you to rely on code that others have already written, which we'll come back to in [inheritance]. Let's make `NextMethod()` concrete with an example. Here, I define a new generic ("showoff") with three methods. Each method signals that it's been called, and then calls the "next" method:
```{r}
showoff <- function(x) {
UseMethod("showoff")
}
showoff.default <- function(x) {
message("showoff.default")
TRUE
}
showoff.a <- function(x) {
message("showoff.a")
NextMethod()
}
showoff.b <- function(x) {
message("showoff.b")
NextMethod()
}
```
Let's create a dummy object with classes "b" and "a". `s3_dispatch()` shows that all three potential methods are available:
```{r}
x <- new_s3_scalar(class = c("b", "a"))
s3_dispatch(showoff(x))
```
When you call `NextMethod()` it finds and calls the next available method in the dispatch list. When we call `showoff()`, the method for `b` forwards to the method for `a`, which forwards to the default method.
```{r}
showoff(x)
```
Like `UseMethod()`, the precise semantics of `NextMethod()` are complex. It doesn't actually work with the class attribute of the object, but instead uses a special global variable (`.Class`) to keep track of which method to call next. This means that modifying the argument that is dispatched upon has no impact, and you should avoid modifying the object that is being dispatched on.
Generally, you call `NextMethod()` without any arguments. However, if you do give arguments, they are passed on to the next method, as if they'd been supplied to the generic.
### Exercises
1. Which base generic has the greatest number of defined methods?
1. Explain what is happening in the following code.
```{r}
generic2 <- function(x) UseMethod("generic2")
generic2.a1 <- function(x) "a1"
generic2.a2 <- function(x) "a2"
generic2.b <- function(x) {
class(x) <- "a1"
NextMethod()
}
generic2(new_s3_scalar(class = c("b", "a2")))
```
## Inheritance
The class attribute is not limited to a single string, but can be a character vector. This, along with S3 method dispatch and `NextMethod()`, gives a surprising amount of flexibility that can be used creatively to reduce code duplication. However, this flexibility can also lead to code that is hard to understand or reason about, so you are best constraining yourself to simple styles of inheritance. Here we will focus on defining subclasses that inherit their fields, and some behaviour, from a parent class.
Subclasses use a character __vector__ for the class attribute. There are two examples of subclasses that you might have come across in base R:
* Generalised linear models are a generalisation of linear models that allow
the error term to belong to a richer set of distributions, not just the normal
distribution like the linear model. This is a natural case for the use of
inheritance and indeed, in R, `glm()` returns objects of class
`c("glm", "lm")`.
* Ordered factors are used when the levels of a factor have some intrinsic
ordering, like `c("Good", "Better", "Best")`. Ordered factors are produced
by `ordered()` which returns an object with class `c("ordered", "factor")`.
You can think of the glm class "inheriting" behaviour from the lm class, and the ordered class inheriting behaviour from the factor class because of the way method dispatch works. If there is a method available for the subclass, R will use it, otherwise it will fall back to the "parent" class. For example, if you "plot" a glm object, it falls back to the lm method, but if you compute the ANOVA, it uses a glm-specific method.
```{r}
mod1 <- glm(mpg ~ wt, data = mtcars)
s3_dispatch(plot(mod1))
s3_dispatch(anova(mod1))
```
### Constructors
There are three principles to adhere to when creating a subclass:
* A subclass should be built on the same base type as a parent.
* The `class()` of the subclass should be of the form
`c(subclass, parent_class)`
* The fields of the subclass should include the fields of the parent.
And these properties should be enforced by the constructor.
When you create a class, you need to decide if you want to allow subclasses, because it requires changes to the constructor and careful thought in your methods. To allow subclasses, the parent constructor needs to have `...` and `subclass` arguments:
```{r}
new_my_class <- function(x, y, ..., subclass = NULL) {
stopifnot(is.numeric(x))
stopifnot(is.logical(y))
structure(
x,
y = y,
...,
class = c(subclass, "my_class")
)
}
```
Then the implementation of the subclass constructor is simple: it checks the types of the new fields, then calls the parent constructor.
```{r}
new_subclass <- function(x, y, z) {
stopifnot(is.character(z))
new_my_class(x, y, z, subclass = "subclass")
}
```
If you wanted to allow this subclass to be futher subclassed, you'd need to include `...` and `subclass` arguments:
```{r}
new_subclass <- function(x, y, z, ..., subclass = NULL) {
stopifnot(is.character(z))
new_my_class(x, y, z, ..., subclass = c(subclass, "subclass"))
}
```
If your subclass is more complicated, you'd also provide validator and helper functions, as described previously.
### Coercion
You also need to make sure that there's some way to convert the subclass back to the parent class. The best way to do that is to add a method to the coercion generic. Generally, this method should call the parent constructor:
```{r}
as_my_class.sub_class <- function(x) {
new_my_class(attr(x, "x"), attr(x, "y"))
}
```
### Methods
The goal of creating a subclass is to reuse as much code as possible from the parent class. This means that you should not have to define every method that the parent class provides (if you do, reconsider if you actually need a subclass!). Generally, defining new methods is straightforward: you simply create a new method (`generic.subclass`) whenever the parent method doesn't do quite the right thing. In many cases, the new method will be able to call `NextMethod()` in order to take advantage of the computation done in the parent.
One wrinkle arises when you have methods that return the same type of object as the primary input. For example, dplyr has many functions (`arrange()`, `summarise()`, `mutate()`, ...) that input a data frame (or data frame-like object) and output a modified version of that data frame. Imagine you want to store the provenance of each data frame, i.e. who created it and when. To do so, you might create a data frame subclass called `provenance`:
```{r}
new_provenance <- function(data, author, date = Sys.Date()) {
stopifnot(is.data.frame(data))
stopifnot(is.character(author), length(author) == 1)
stopifnot(is.Date(date), length(date) == 1)
structure(
data,
author = author,
date = date,
class = c("provenance", "data.frame")
)
}
```
And now you want to make this class work with dplyr. The class doesn't change any of the computation related to the data frame, it just needs to preserve the attributes, which dplyr doesn't know anything about. That means you need to provide a method for each dplyr generic. The computation is unchanged, so you can use `NextMethod()` to do all the hard work, but you need to manually reconstruct the provenance object.
```{r}
arrange.provenance <- function(.data, ...) {
new_provenance(
NextMethod(),
author = attr(.data, "author"),
date = attr(.data, "date")
)
}
mutate.provenance <- function(.data, ...) {
new_provenance(
NextMethod(),
author = attr(.data, "author"),
date = attr(.data, "date")
)
}
```
To do this for all the dplyr generics would require a lot of copying and pasting. Let's reduce some of that duplication by taking advantage of `S3::reconstruct()`. `reconstruct()` is a generic function designed to reconstruct a subclass from an instance of the parent class, typically created by `NextMethod()`, and the original subclass. In other words, the job of a reconstructor is to take an object from a parent class, and copy over attributes from the subclass. (Note that `reconstruct()` is unusual in that it dispatches on the second argument. This allows a more natural specification.)
```{r}
reconstruct.provenance <- function(new, old) {
new_provenance(
new,
author = attr(old, "author"),
date = attr(old, "date")
)
}
```
Now we can rewrite the methods to minimise the amount of duplicated code:
```{r}
arrange.provenance <- function(.data, ...) {
reconstruct(NextMethod(), .data)
}
mutate.provenance <- function(.data, ...) {
reconstruct(NextMethod(), .data)
}
```
This duplicated code could be avoided completely if `arrange.data.frame()`, provided by dplyr, called `reconstruct()` for you. And indeed, a future version of that function will.
When designing a class that can be subclassed, you need to carefully think through these issues. Generally, whenever you implement a method that returns the same type of object as the primary input, you should call `reconstruct()` to ensure that it also works for subclasses. That way implementors of a subclass will only need to provide methods when the computation is actually different.
### Exercises
1. The `ordered` class is a subclass of `factor`, but it's implemented in
a very ad hoc way in base R. Implement it in a principled way by
building a constructor and an `as_ordered` generic.
```{r}
f1 <- factor("a", c("a", "b"))
as.factor(f1)
as.ordered(f1) # loses levels
```
1. What classes have a method for the `Math` group generic in base R? Read
the source code. How do the methods work?
1. R has two classes for representing date time data, `POSIXct` and
`POSIXlt`, which both inherit from `POSIXt`. Which generics have
different behaviours for the two classes? Which generics share the same
behaviour?
## Dispatch details
This chapter concludes with a few additional details about method dispatch that is not well documented elsewhere. It is safe to skip these details if you're new to S3.
### Environments and namespaces
The precise rules for where a generic looks for the methods are a little complicated because there are two paths for discovery:
1. In the calling environment of the function that called the generic.
1. In the special `.__S3MethodsTable__.` object in the function environment of
the generic. Every package has an `.__S3MethodsTable__` which lists all
the S3 methods exported by the package.
These details are not usually important, but are necessary in order for S3 generics to find the correct method when the generic and method are in different packages.
### Base types
What happens when you call an S3 generic with a non-S3 object, i.e. an object that doesn't have the class attribute set? You might think it would dispatch on what `class()` returns:
```{r}
class(matrix(1:5))
```
But unfortunately dispatch actually occurs on the __implicit class__, which has three components:
* "array" or "matrix" (if the object has dimensions).
* `typeof()` (with a few minor tweaks).
* If it's "integer" or "double", "numeric".
There is no base function that will compute the implicit class, but you can use a helper from the sloop package: \index{implicit class} \index{base types!implicit class}
```{r}
s3_class(matrix(1:5))
```
`s3_dispatch()` knows about the implicit class, so use it if you're ever in doubt about method dispatch:
```{r}
s3_dispatch(print(matrix(1:5)))
```
Note that this can lead to different dispatch for objects that look similar:
```{r}
x1 <- 1:5
class(x1)
s3_dispatch(mean(x1))
x2 <- structure(x1, class = "integer")
class(x2)
s3_dispatch(mean(x2))
```
### Internal generics
Some S3 generics, like `[`, `sum()`, and `cbind()`, don't call `UseMethod()` because they are implemented in C. Instead, they call the C functions `DispatchGroup()` or `DispatchOrEval()`. These functions are called __internal generics__, because they do dispatch internally, in C code. Internal generics only exist in base R, so you can not create an internal generic in a package.
`s3_dispatch()` shows internal generics by including the name of the generic at the bottom of the method class. If this method is called, all the work happens in C code, typically using [switchpatch].
```{r}
s3_dispatch(Sys.time()[1])
```
For performance reasons, internal generics do not dispatch to methods unless the class attribute has been set (`is.object()` is true). This means that internal generics do not use the implicit class. Again, if you're confused, rely on `s3_dispatch()` to show you the difference.
```{r}
x <- sample(10)
class(x)
s3_dispatch(x[1])
class(y)
s3_dispatch(mtcars[1])
```
### Group generics
Group generics are the most complicated part of S3 method dispatch because they involve both `NextMethod()` and internal generics. Group generics are worth learning about, however, because they allow you to implement a whole swath of methods with one function. Like internal generics, they only exist in base R, and you can not define your own group generic.
Base R has four group generics, which are made up of the following generics: \index{group generics} \index{S3!group generics}
* __Math__: `abs`, `sign`, `sqrt`, `floor`, `cos`, `sin`, `log`, `exp`, ...
* __Ops__: `+`, `-`, `*`, `/`, `^`, `%%`, `%/%`, `&`, `|`, `!`, `==`, `!=`, `<`,
`<=`, `>=`, `>`
* __Summary__: `all`, `any`, `sum`, `prod`, `min`, `max`, `range`
* __Complex__: `Arg`, `Conj`, `Im`, `Mod`, `Re`
Defining a single group generic for your class overrides the default behaviour for all of the members of the group. Methods for group generics are looked for only if the methods for the specific generic do not exist:
```{r}
s3_dispatch(sum(Sys.time()))
```
Most group generics involve a call to `NextMethod()`. For example, take `difftime()` objects. If you look at the method dispatch for `abs()`, you'll see there's a `Math` group generic defined.
```{r}
y <- as.difftime(10, units = "mins")
s3_dispatch(abs(y))
```
`Math.difftime` basically looks like this:
```{r}
Math.difftime <- function(x, ...) {
new_difftime(NextMethod(), units = attr(x, "units"))
}
```
It dispatches to the next method, here the internal default, to perform the actual computation, then copies back over the the class and attributes.
Note that inside a group generic function a special variable `.Generic` provides the actual generic function called. This can be useful when producing error messages, and can sometimes be useful if you need to manually re-call the generic with different arguments.
### Double dispatch
Generics in the "Ops" group, which includes the two-argument mathematical and logical operators like `-` and `&`, implement a special type of method dispatch. They dispatch on the type of _both_ of the arguments, so called __double dispatch__. This is necessary to preserve the commutative property of many operators, i.e. `a + b` should equal `b + a`. Take the following simple example:
```{r}
date <- as.Date("2017-01-01")
integer <- 1L
date + integer
integer + date
```
If `+` dispatched only on the first argument, it would return different values for the two cases. To overcome this problem, generics in the Ops group use a slightly different strategy from usual. Rather than doing a single method dispatch, they do two, one for each input. There are three possible outcomes of this lookup:
* The methods are the same, so it doesn't matter which method is used.
* The methods are different, and R calls the first method with a warning.
* One method is internal, in which case R calls the other method.
For the example above, we can look at the possible methods for each argument, taking advantage of the fact the we can call `+` with a single argument. In this case, the second argument would dispatch to the internal `+` function, so R will call `+.Date`.
```{r}
s3_dispatch(+date)
s3_dispatch(+integer)
```
Let's take a look at another case. What happens if you try and add a date to a factor? There is no method in common, so R calls the internal `+` method (which preserves the attributes of the LHS), with a warning.
```{r, error = TRUE}
factor <- factor("a")
s3_dispatch(+factor)
date + factor
factor + date
```
Finally, what happens if we try to substract a POSIXct from a POSIXlt? A commmon `-.POSIXt` method is found and called.
```{r}
dt1 <- as.POSIXct(date)
dt2 <- as.POSIXlt(date)
s3_dispatch(-dt1)
s3_dispatch(-dt2)
dt1 - dt2
```
### Exercises
1. `Math.difftime()` is more complicated than I described. Why?