README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# readfoundry

readfoundry provides a simple wrapper to Foundry to aid in the development
of RAPs within DHSC.

readfoundry can be used to either run raw SparkSQL on the Foundry SQL 
server or can be used with `dbplyr` to enable generation of SQL.

## Installation

You can install the readfoundry package like so:

``` r
if (!requireNamespace("librarian")) install.packages("librarian")
librarian::shelf(DataS-DHSC/readfoundry)
```

## Usage
To use the package you will need the path to the dataset you want to query 
([Identify a dataset's RID or filepath • Palantir](https://www.palantir.com/docs/foundry/analytics-connectivity/identify-dataset-rid/)) as well as a BEARER token from your Foundry session that has permission to export
this dataset ([User-generated tokens • Palantir](https://www.palantir.com/docs/foundry/platform-security-third-party/user-generated-tokens/)). This token should be 
saved in your projects `.Renviron` file under the key "BEARER" (please do not 
add your `.REnviron` file to version control).

To then run raw SQL call:
``` r
# create connection
fapi <- readfoundry::FoundryReader$new(Sys.getenv("BEARER"))

# SparkSQL query to run
sql_query <- 
  paste(
    "SELECT",
    "*",
    "FROM \"/path/to/dataset\"",
    "WHERE metric_name IN ('example', 'metric', 'names')"
  )

# get query results
df <- fapi$run_query(sql_query)
```

To use the `dbplyr` wrapper start your pipe chain with a call to `fapi$get_dataset("path/to/dataset")` and when you have finished transforming
your data collect the results by calling `fapi$collect()`, e.g.:
``` r
# create connection
fapi <- readfoundry::FoundryReader$new(Sys.getenv("BEARER"))

# get details of metrics available
df_stats <- fapi$get_metrics("path/to/dataset")

# get transformed data
df <- fapi$get_dataset("path/to/dataset") %>%
  filter(metric_name == "example_name") %>%
  fapi$collect()
```

### Notes

SparkSQL uses single quotes for values with the exception of 
double quotes around dataset path. Back ticks are also not supported for
variable names.

For more info on the SQL functions available see [Palantir Spark SQL Reference](https://www.palantir.com/docs/foundry/transforms-sql/spark-reference/)

To change the log level of the wrapper use:
``` r
futile.logger::flog.threshold(
  futile.logger::INFO,
  name = fapi$get_log_name()
)
```

## Licence

Unless stated otherwise, the codebase is released under the MIT License. This 
covers both the codebase and any sample code in the documentation. The 
documentation is © Crown copyright and available under the terms of the [Open 
Government 3.0 licence](https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/).