diff --git a/vignettes/translation-function.Rmd b/vignettes/translation-function.Rmd index 5d403d795..bbb31e314 100644 --- a/vignettes/translation-function.Rmd +++ b/vignettes/translation-function.Rmd @@ -30,7 +30,7 @@ con <- simulate_dbi() translate_sql((x + y) / 2, con = con) ``` -`translate_sql()` takes an optional `con` parameter. If not supplied, this causes dplyr to generate (approximately) SQL-92 compliant SQL. If supplied, dplyr uses `sql_translation()` to look up a custom environment which makes it possible for different databases to generate slightly different SQL: see `vignette("new-backend")` for more details. You can use the various simulate helpers to see the translations used by different backends: +`translate_sql()` takes an optional `con` parameter. If not supplied, this causes dbplyr to generate (approximately) SQL-92 compliant SQL. If supplied, dbplyr uses `sql_translation()` to look up a custom environment which makes it possible for different databases to generate slightly different SQL: see `vignette("new-backend")` for more details. You can use the various simulate helpers to see the translations used by different backends: ```{r} translate_sql(x ^ 2L, con = con) @@ -38,7 +38,7 @@ translate_sql(x ^ 2L, con = simulate_sqlite()) translate_sql(x ^ 2L, con = simulate_access()) ``` -Perfect translation is not possible because databases don't have all the functions that R does. The goal of dplyr is to provide a semantic rather than a literal translation: what you mean, rather than precisely what is done. In fact, even for functions that exist both in databases and R, you shouldn't expect results to be identical; database programmers have different priorities than R core programmers. For example, in R in order to get a higher level of numerical accuracy, `mean()` loops through the data twice. R's `mean()` also provides a `trim` option for computing trimmed means; this is something that databases do not provide. +Perfect translation is not possible because databases don't have all the functions that R does. The goal of dbplyr is to provide a semantic rather than a literal translation: what you mean, rather than precisely what is done. In fact, even for functions that exist both in databases and in R, you shouldn't expect results to be identical; database programmers have different priorities than R core programmers. For example, in R in order to get a higher level of numerical accuracy, `mean()` loops through the data twice. R's `mean()` also provides a `trim` option for computing trimmed means; this is something that databases do not provide. If you're interested in how `translate_sql()` is implemented, the basic techniques that underlie the implementation of `translate_sql()` are described in ["Advanced R"](https://adv-r.hadley.nz/translation.html). @@ -63,7 +63,7 @@ The following examples work through some of the basic differences between R and ``` * R and SQL have different defaults for integers and reals. - In R, 1 is a real, and 1L is an integer. In SQL, 1 is an integer, and 1.0 is a real + In R, 1 is a real, and 1L is an integer. In SQL, 1 is an integer, and 1.0 is a real. ```{r} translate_sql(1, con = con) @@ -104,7 +104,7 @@ dbplyr no longer translates `%/%` because there's no robust cross-database trans ### Aggregation -All database provide translation for the basic aggregations: `mean()`, `sum()`, `min()`, `max()`, `sd()`, `var()`. Databases automatically drop NULLs (their equivalent of missing values) whereas in R you have to ask nicely. The aggregation functions warn you about this important difference: +All databases provide translation for the basic aggregations: `mean()`, `sum()`, `min()`, `max()`, `sd()`, `var()`. Databases automatically drop NULLs (their equivalent of missing values) whereas in R you have to ask nicely. The aggregation functions warn you about this important difference: ```{r} translate_sql(mean(x), con = con) @@ -119,7 +119,7 @@ translate_sql(mean(x, na.rm = TRUE), window = FALSE, con = con) ### Conditional evaluation -`if` and `switch()` are translate to `CASE WHEN`: +`if` and `switch()` are translated to `CASE WHEN`: ```{r} translate_sql(if (x > 5) "big" else "small", con = con) @@ -135,7 +135,7 @@ translate_sql(switch(x, a = 1L, b = 2L, 3L), con = con) ## Unknown functions -Any function that dplyr doesn't know how to convert is left as is. This means that database functions that are not covered by dplyr can often be used directly via `translate_sql()`. +Any function that dbplyr doesn't know how to convert is left as is. This means that database functions that are not covered by dbplyr can often be used directly via `translate_sql()`. ### Prefix functions @@ -145,7 +145,7 @@ Any function that dbplyr doesn't know about will be left as is: translate_sql(foofify(x, y), con = con) ``` -Because SQL functions are general case insensitive, I recommend using upper case when you're using SQL functions in R code. That makes it easier to spot that you're doing something unusual: +Because SQL functions are generally case insensitive, I recommend using upper case when you're using SQL functions in R code. That makes it easier to spot that you're doing something unusual: ```{r} translate_sql(FOOFIFY(x, y), con = con) @@ -153,7 +153,7 @@ translate_sql(FOOFIFY(x, y), con = con) ### Infix functions -As well as prefix functions (where the name of the function comes before the arguments), dbplyr also translates infix functions. That allows you to use expressions like `LIKE` which does a limited form of pattern matching: +As well as prefix functions (where the name of the function comes before the arguments), dbplyr also translates infix functions. That allows you to use expressions like `LIKE`, which does a limited form of pattern matching: ```{r} translate_sql(x %LIKE% "%foo%", con = con) @@ -190,7 +190,7 @@ mf %>% ### Error for unknown translations -If needed, you can also force dbplyr to error if it doesn't know how to translate a function with the `dplyr.strict_sql` option: +If needed, you can also use the `dplyr.strict_sql` option to force dbplyr to error if it doesn't know how to translate a function: ```{r} #| error = TRUE @@ -245,16 +245,16 @@ Things get a little trickier with window functions, because SQL's window functio knitr::include_graphics("windows.png", dpi = 300) ``` - Of the many possible specifications, there are only three that commonly + Of the many possible specifications, only three are commonly used. They select between aggregation variants: - * Recycled: `BETWEEN UNBOUND PRECEEDING AND UNBOUND FOLLOWING` + * Recycled: `BETWEEN UNBOUND PRECEDING AND UNBOUND FOLLOWING` - * Cumulative: `BETWEEN UNBOUND PRECEEDING AND CURRENT ROW` + * Cumulative: `BETWEEN UNBOUND PRECEDING AND CURRENT ROW` - * Rolling: `BETWEEN 2 PRECEEDING AND 2 FOLLOWING` + * Rolling: `BETWEEN 2 PRECEDING AND 2 FOLLOWING` - dplyr generates the frame clause based on whether your using a recycled + dbplyr generates the frame clause based on whether you're using a recycled aggregate or a cumulative aggregate. To see how individual window functions are translated to SQL, we can again use `translate_sql()`: @@ -266,14 +266,14 @@ translate_sql(ntile(G, 2), con = con) translate_sql(lag(G), con = con) ``` -If the tbl has been grouped or arranged previously in the pipeline, then dplyr will use that information to set the "partition by" and "order by" clauses. For interactive exploration, you can achieve the same effect by setting the `vars_group` and `vars_order` arguments to `translate_sql()` +If the tbl has been grouped or arranged previously in the pipeline, then dplyr will use that information to set the "partition by" and "order by" clauses. For interactive exploration, you can achieve the same effect by setting the `vars_group` and `vars_order` arguments to `translate_sql()`: ```{r} translate_sql(cummean(G), vars_order = "year", con = con) translate_sql(rank(), vars_group = "ID", con = con) ``` -There are some challenges when translating window functions between R and SQL, because dplyr tries to keep the window functions as similar as possible to both the existing R analogues and to the SQL functions. This means that there are three ways to control the order clause depending on which window function you're using: +There are some challenges when translating window functions between R and SQL, because dbplyr tries to keep the window functions as similar as possible to both the existing R analogues and to the SQL functions. This means that there are three ways to control the order clause depending on which window function you're using: * For ranking functions, the ordering variable is the first argument: `rank(x)`, `ntile(y, 2)`. If omitted or `NULL`, will use the default ordering associated