Skip to content

Commit

Permalink
📖 Paper update.
Browse files Browse the repository at this point in the history
  • Loading branch information
JonasMoss committed Dec 3, 2019
1 parent 68fbda9 commit 17f341f
Show file tree
Hide file tree
Showing 3 changed files with 56 additions and 114 deletions.
93 changes: 31 additions & 62 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,20 +12,24 @@ authors:
affiliations:
- name: University of Oslo
index: 1
date: 25 November 2019
output:
github_document:
html_preview: true
bibliography: paper.bib
date: 25 November 2019
---

# Summary

`univariateML` is an R [@r] package for univariate maximum likelihood estimation [@lecam1990ml].
It supports more than 20 densities, the most popular generic functions such as `plot`, `AIC`, and `confint`, and a simple parametric bootstrap [@efron1994introduction] interface.
`univariateML` is an R [@r] package for user-friendly univariate maximum
likelihood estimation [@lecam1990ml]. It supports more than 20 densities,
the most popular generic functions such as `plot`, `AIC`, and `confint`,
and a simple parametric bootstrap [@efron1994introduction] interface.

When looking at univariate data it is natural to ask if there is a known
parametric density that fits the data well. The following example uses the
`egypt` [@pearson1902egypt] data set included in the package and a plot of the Weibull and Gamma
densities [@johnson1970continuous, Chapter 17 & 21].

`egypt` [@pearson1902egypt] data set included in the package and a plot of
the Weibull and Gamma densities [@johnson1970continuous, Chapter 17 & 21].

``` r
# install.packages("univariateML")
Expand All @@ -38,7 +42,8 @@ lines(mlgamma(egypt$age), col = "red") # Plots a Gamma fit.
![](paper_files/figure-gfm/figure-1.png)<!-- -->

A natural question to ask is which among several models fits the data best.
This can be done using tools of model selection such as the `AIC` [@akaike1998information].
This can be done using tools of model selection such as the `AIC`
[@akaike1998information].

``` r
AIC(mlweibull(egypt$age),
Expand All @@ -49,60 +54,24 @@ AIC(mlweibull(egypt$age),
## mlweibull(egypt$age) 2 1230.229
## mlgamma(egypt$age) 2 1234.772


Problems involving estimation of univariate densities are common in statistics.
Estimation of univariate densities is used in for instance exploratory data analysis,
in the estimation of copulas [@ko2019focused],
as parametric starts in density estimation [@hjort_glad_1995; @moss2019kdensity],
and is of interest in and of itself.

`univariateML` exists to simplify the whole process of doing inference with univariate densities.
For most densities implemented in `R` the maximum likelihood estimates can easily
be computed using numerical optimization functions such as `stats::nlm` and
`stats::optim` on the negative log-likelihood, but there are three problems
with this solution strategy:

1. It takes much time to program, especially if we want to try out many densities, to
make density plots, and calculate the AIC for all of them.
2. It is bug prone.
3. The estimation itself can be slow when the sample size is large. The time lost quickly adds up
when doing the parametric bootstrap or another procedure requiring repeated calls to
the estimating function.

In short, it is inconvenient to program these solutions by hand.

`univariateML` has custom made optimizers for almost every supported density.
This is in contrast to the `mle` function in the built-in `R` package `stats4`,
which supports far more general maximum likelihood estimation through numerical
optimization on a supplied negative log-likelihood function.

Analytic formulas for the maximum likelihood estimates are used whenever
they exist. Most estimators without analytic solutions have a custom made
Newton-Raphson solver.

`Rfast` [@Rfast] is an `R` package with many univariate density estimators
implemented with custom Newton-Raphson. `univariateML` and `Rfast` differst
mainly in focus: While `univariateML` aims to be well-tested and safe, and is
focused on univariate density estimation only, `Rfast` aims to have the fastest
possible implementations of many kinds of functions.

The speedup involed in using either can be substantial, as seen in the following
Gamma distribution example:

``` r
set.seed(313)
x <- rgamma(500, 2, 7)

microbenchmark::microbenchmark(
univariateML = univariateML::mlgamma(x),
naive = nlm(function(p) -sum(dgamma(x, p[1], p[2], log = TRUE)),
p = c(1, 1)))
```

## Unit: microseconds
## expr min lq mean median uq max neval
## univariateML 606.2 792.75 1093.802 991.60 1073.65 7346.8 100
## naive 26276.8 28647.45 30616.039 29209.85 30378.25 69477.2 100

Estimation of univariate densities is used in for instance exploratory data
analysis, in the estimation of copulas [@ko2019focused], as parametric starts
in density estimation [@hjort_glad_1995; @moss2019kdensity], and is of interest
in and of itself.

Analytic formulas for the maximum likelihood estimates are used whenever they
exist. Most estimators without analytic solutions have a custom made
Newton-Raphson solver. This is in contrast to the `mle` function
in the built-in `R` package `stats4`, which supports more general maximum
likelihood estimation through numerical optimization on a supplied negative
log-likelihood function.

`Rfast` [@Rfast] is an `R` package with fast Newton-Raphson implementations of
many univariate density estimators. `univariateML` differs from `Rfast`
mainly in focus: While `univariateML`is focused on user-friendly univariate
density estimation, `Rfast` aims to have the fastest possible implementations
of many kinds of functions.

# References

# References
77 changes: 25 additions & 52 deletions paper/paper.rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,15 @@ date: 25 November 2019

# Summary

`univariateML` is an R [@r] package for univariate maximum likelihood estimation [@lecam1990ml].
It supports more than 20 densities, the most popular generic functions such as `plot`, `AIC`, and `confint`, and a simple parametric bootstrap [@efron1994introduction] interface.
`univariateML` is an R [@r] package for user-friendly univariate maximum
likelihood estimation [@lecam1990ml]. It supports more than 20 densities,
the most popular generic functions such as `plot`, `AIC`, and `confint`,
and a simple parametric bootstrap [@efron1994introduction] interface.

When looking at univariate data it is natural to ask if there is a known
parametric density that fits the data well. The following example uses the
`egypt` [@pearson1902egypt] data set included in the package and a plot of the Weibull and Gamma
densities [@johnson1970continuous, Chapter 17 & 21].
`egypt` [@pearson1902egypt] data set included in the package and a plot of
the Weibull and Gamma densities [@johnson1970continuous, Chapter 17 & 21].

```{r figure, height = 9, width = 9}
# install.packages("univariateML")
Expand All @@ -38,60 +40,31 @@ lines(mlgamma(egypt$age), col = "red") # Plots a Gamma fit.
```

A natural question to ask is which among several models fits the data best.
This can be done using tools of model selection such as the `AIC` [@akaike1998information].
This can be done using tools of model selection such as the `AIC`
[@akaike1998information].

```{r, AIC}
AIC(mlweibull(egypt$age),
mlgamma(egypt$age))
```

Problems involving estimation of univariate densities are common in statistics.
Estimation of univariate densities is used in for instance exploratory data analysis,
in the estimation of copulas [@ko2019focused],
as parametric starts in density estimation [@hjort_glad_1995; @moss2019kdensity],
and is of interest in and of itself.

`univariateML` exists to simplify the whole process of doing inference with univariate densities.
For most densities implemented in `R` the maximum likelihood estimates can easily
be computed using numerical optimization functions such as `stats::nlm` and
`stats::optim` on the negative log-likelihood, but there are three problems
with this solution strategy:

1. It takes much time to program, especially if we want to try out many densities, to
make density plots, and calculate the AIC for all of them.
2. It is bug prone.
3. The estimation itself can be slow when the sample size is large. The time lost quickly adds up
when doing the parametric bootstrap or another procedure requiring repeated calls to
the estimating function.

In short, it is inconvenient to program these solutions by hand.

`univariateML` has custom made optimizers for each supported density.
This is in contrast to the `mle` function in the built-in `R` package `stats4`,
which supports far more general maximum likelihood estimation through numerical
optimization on a supplied negative log-likelihood function.

Analytic formulas for the maximum likelihood estimates are used whenever
they exist. Most estimators without analytic solutions have a custom made
Newton-Raphson solver.

`Rfast` [@Rfast] is an `R` package with many univariate density estimators
implemented with custom Newton-Raphson and has faster implementations than
`univariateML`. The two packages differ mainly in focus: While `univariateML`
is focused on user friendly univariate density estimation, `Rfast` aims to have the
fastest possible implementations of many kinds of functions.

The speedup can be substantial, as seen in the following Gamma distribution example.

```{R, Gamma, warning = FALSE, cache = TRUE}
set.seed(313)
x <- rgamma(500, 2, 7)
microbenchmark::microbenchmark(
univariateML = univariateML::mlgamma(x),
Rfast = Rfast::gammamle(x),
naive = nlm(function(p) -sum(dgamma(x, p[1], p[2], log = TRUE)), p = c(1, 1)))
```
Estimation of univariate densities is used in for instance exploratory data
analysis, in the estimation of copulas [@ko2019focused], as parametric starts
in density estimation [@hjort_glad_1995; @moss2019kdensity], and is of interest
in and of itself.

Analytic formulas for the maximum likelihood estimates are used whenever they
exist. Most estimators without analytic solutions have a custom made
Newton-Raphson solver. This is in contrast to the `mle` function
in the built-in `R` package `stats4`, which supports more general maximum
likelihood estimation through numerical optimization on a supplied negative
log-likelihood function.

`Rfast` [@Rfast] is an `R` package with fast Newton-Raphson implementations of
many univariate density estimators. `univariateML` differs from `Rfast`
mainly in focus: While `univariateML`is focused on user-friendly univariate
density estimation, `Rfast` aims to have the fastest possible implementations
of many kinds of functions.

# References
Binary file modified paper/paper_files/figure-gfm/figure-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 17f341f

Please sign in to comment.