Skip to content

Commit

Permalink
Release 0.2.0 (#25)
Browse files Browse the repository at this point in the history
* consistent behavior for ordered variables (#19)

* consistent behavior for ordered variables

* add unit tests

* Fixes (#20)

fix minimum bw and bug in kde fit

* improve stability of density estimates near boundary (#21)

* check whether data lies within boundaries

* stabilize trafo by shifting boundaries in dkde1d

* add unit tests

* make probit trafo adjustemnt less aggressive

* remove left overs

* allow x outside of bounds in d/pkde1d

* fix density computation for discrete variables (#23)

* Prepare 0.2.0 (#24)

* bump version

* update NEWS

* update documentation

* improve docs

* improve docs ++
  • Loading branch information
tnagler authored May 7, 2018
1 parent 0e3d439 commit f900e6d
Show file tree
Hide file tree
Showing 27 changed files with 210 additions and 148 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
.Rhistory
.RData
.Ruserdata
revdep
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: kde1d
Type: Package
Title: Univariate Kernel Density Estimation
Version: 0.1.2
Version: 0.2.0
Authors@R: c(
person("Thomas", "Nagler",, "mail@tnagler.com", role = c("aut", "cre")),
person("Thibault", "Vatter",, "thibault.vatter@gmail.com", role = c("aut"))
Expand Down
17 changes: 17 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,20 @@
# kde1d 0.2.0

NEW FEATURES

* improved stability of density estimates near a boundary (#21).

BUG FIXES

* consistent behavior when `dkde1d()` and `pkde1d()` are called with
non-`ordered` input although data are discrete (#19).

* fixed bug in computation of kernel density estimates (#20).

* adapt minimum `bw` allowed for discrete data to truncated Gaussian kernel
(#20).


# kde1d 0.1.2

NEW FEATURES
Expand Down
60 changes: 6 additions & 54 deletions R/kde1d.R
Original file line number Diff line number Diff line change
Expand Up @@ -50,33 +50,31 @@
#' Statistical Society, Series B, 53, 683–690.
#'
#' @examples
#' ## For reproducibility
#' set.seed(0)
#'
#' ## unbounded data
#' x <- rnorm(100) # simulate data
#' x <- rnorm(500) # simulate data
#' fit <- kde1d(x) # estimate density
#' dkde1d(1000, fit) # evaluate density estimate
#' dkde1d(0, fit) # evaluate density estimate
#' summary(fit) # information about the estimate
#' plot(fit) # plot the density estimate
#' curve(dnorm(x), add = TRUE, # add true density
#' col = "red")
#'
#' ## bounded data, log-linear
#' x <- rgamma(100, shape = 1) # simulate data
#' x <- rgamma(500, shape = 1) # simulate data
#' fit <- kde1d(x, xmin = 0, deg = 1) # estimate density
#' dkde1d(1000, fit) # evaluate density estimate
#' dkde1d(seq(0, 5, by = 1), fit) # evaluate density estimate
#' summary(fit) # information about the estimate
#' plot(fit) # plot the density estimate
#' curve(dgamma(x, shape = 1), # add true density
#' add = TRUE, col = "red",
#' from = 1e-3)
#'
#' ## discrete data
#' x <- rbinom(100, size = 5, prob = 0.5) # simulate data
#' x <- rbinom(500, size = 5, prob = 0.5) # simulate data
#' x <- ordered(x, levels = 0:5) # declare as ordered
#' fit <- kde1d(x) # estimate density
#' dkde1d(2, fit) # evaluate density estimate
#' dkde1d(sort(unique(x)), fit) # evaluate density estimate
#' summary(fit) # information about the estimate
#' plot(fit) # plot the density estimate
#' points(ordered(0:5, 0:5), # add true density
Expand Down Expand Up @@ -109,49 +107,3 @@ kde1d <- function(x, xmin = NaN, xmax = NaN, mult = 1, bw = NA, deg = 2) {
class(fit) <- "kde1d"
fit
}

#' check and pre-process arguments passed to kde1d()
#' @noRd
check_arguments <- function(x, mult, xmin, xmax, bw, deg) {
stopifnot(NCOL(x) == 1)

if (!is.ordered(x) & is.factor(x))
stop("Factors not allowed; use kdevine::kdevine() or cctools::cckde().")

stopifnot(mult > 0)

if (is.ordered(x) & (!is.nan(xmin) | !is.nan(xmax)))
stop("xmin and xmax are not meaningful for x of type ordered.")

if (!is.nan(xmax) & !is.nan(xmin)) {
if (xmin > xmax)
stop("xmin is larger than xmax.")
if (any(x < xmin) || any(x > xmax))
stop("Not all data are contained in the interval [xmin, xmax].")
} else if (!is.nan(xmin)) {
if (any(x < xmin))
stop("Not all data are larger than xmin.")
} else if (!is.nan(xmax)) {
if (any(x > xmax))
stop("Not all data are samller than xmax.")
}

if (!(deg %in% 0:2))
stop("deg must be either 0, 1, or 2.")
}

#' adjusts observations and evaluation points for boundary effects
#' @importFrom stats qnorm
#' @noRd
boundary_transform <- function(x, xmin, xmax) {
if (!is.nan(xmin) & !is.nan(xmax)) { # two boundaries
x <- qnorm((x - xmin) / (xmax - xmin + 1e-1))
} else if (!is.nan(xmin)) { # left boundary
x <- log(x - xmin + 1e-3)
} else if (!is.nan(xmax)) { # right boundary
x <- log(xmax - x + 1e-3)
}

x
}

51 changes: 31 additions & 20 deletions R/kde1d_methods.R
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,21 @@ dkde1d <- function(x, obj) {
if (!is.ordered(x))
stopifnot(!is.factor(x))

x <- expand_as_numeric(x)
fhat <- dkde1d_cpp(x, obj)
# adjust grid to stabilize estimates
rng <- diff(range(obj$grid_points))
if (!is.nan(obj$xmin))
obj$grid_points[1] <- obj$xmin - 0.1 * rng
if (!is.nan(obj$xmax))
obj$grid_points[length(obj$grid_points)] <- obj$xmax + 0.1 * rng

if (length(obj$jitter_info$i_disc) == 1 & !is.ordered(x))
x <- ordered(x, obj$jitter_info$levels$x)

fhat <- dkde1d_cpp(expand_as_numeric(x), obj)

if (length(obj$jitter_info$i_disc) == 1) {
# for discrete variables we can normalize
x_all_num <- expand_as_numeric(as.ordered(obj$jitter_info$levels$x))
f_all <- dkde1d_cpp(x_all_num, obj)
f_all <- dkde1d_cpp(seq_along(obj$jitter_info$levels$x) - 1, obj)
fhat <- fhat / sum(f_all)
}

Expand All @@ -60,15 +68,14 @@ pkde1d <- function(q, obj) {
if (!is.ordered(q))
stopifnot(!is.factor(q))

q <- expand_as_numeric(q)

if (length(obj$jitter_info$i_disc) != 1) {
p <- pkde1d_cpp(q, obj)
} else {
# for discrete variables we have to add the missing probability mass
x_all_num <- expand_as_numeric(as.ordered(obj$jitter_info$levels$x))
f_all <- dkde1d(x_all_num, obj)
p <- sapply(q, function(y) sum(f_all[x_all_num <= y]))
if (!is.ordered(q))
q <- ordered(q, obj$jitter_info$levels$x)
x_all <- as.ordered(obj$jitter_info$levels$x)
p_all <- dkde1d(x_all, obj)
p <- sapply(q, function(y) sum(p_all[x_all <= y]))
p <- pmin(pmax(p, 0), 1)
}

Expand All @@ -81,19 +88,17 @@ pkde1d <- function(q, obj) {
qkde1d <- function(p, obj) {
stopifnot(all(na.omit(p) > 0.0) & all(na.omit(p) < 1.0))
if (length(obj$jitter_info$i_disc) != 1) {
## qkde1d_cpp for continuous variables
q <- qkde1d_cpp(p, obj)
} else {
## for discrete variables compute quantile from the density
x_all_num <- expand_as_numeric(as.ordered(obj$jitter_info$levels$x))
x_all <- as.ordered(obj$jitter_info$levels$x)

# pdf at all possible values of x
dd <- dkde1d(x_all_num, obj)
pp <- c(cumsum(dd)) / sum(dd)
pp <- pkde1d(x_all, obj)

# generalized inverse
q <- x_all_num[vapply(p, function(y) which(y <= pp)[1], integer(1))]
q <- ordered(obj$jitter_info$levels$x[q + 1],
levels = obj$jitter_info$levels$x)
q <- x_all[vapply(p, function(y) which(y <= pp)[1], integer(1))]
}

q
Expand Down Expand Up @@ -156,13 +161,19 @@ plot.kde1d <- function(x, ...) {
ev <- ordered(x$jitter_info$levels$x,
levels = x$jitter_info$levels$x)
plot_type <- "h" # for discrete variables, use a histrogram
x$values <- dkde1d(ev, x)
x$grid_points <- ev
} else {
# adjust grid if necessary
ev <- x$grid_points
if (!is.nan(x$xmin))
ev[1] <- x$xmin
if (!is.nan(x$xmax))
ev[length(ev)] <- x$xmax
}
vals <- dkde1d(ev, x)

pars <- list(
x = x$grid_points,
y = x$values,
x = ev,
y = vals,
type = plot_type,
xlab = "x",
ylab = "density",
Expand Down
53 changes: 53 additions & 0 deletions R/tools.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#' check if data violate boundaries
#' @noRd
check_boundary_violations <- function(x, xmin, xmax) {
if (!is.nan(xmax) & !is.nan(xmin)) {
if (any(x < xmin) || any(x > xmax))
stop("Not all data are contained in the interval [xmin, xmax].")
} else if (!is.nan(xmin)) {
if (any(x < xmin))
stop("Not all data are larger than xmin.")
} else if (!is.nan(xmax)) {
if (any(x > xmax))
stop("Not all data are samller than xmax.")
}
}

#' check and pre-process arguments passed to kde1d()
#' @noRd
check_arguments <- function(x, mult, xmin, xmax, bw, deg) {
stopifnot(NCOL(x) == 1)

if (!is.ordered(x) & is.factor(x))
stop("Factors not allowed; use kdevine::kdevine() or cctools::cckde().")

stopifnot(mult > 0)

if (is.ordered(x) & (!is.nan(xmin) | !is.nan(xmax)))
stop("xmin and xmax are not meaningful for x of type ordered.")

if (!is.nan(xmax) & !is.nan(xmin)) {
if (xmin > xmax)
stop("xmin is larger than xmax.")
}
check_boundary_violations(x, xmin, xmax)

if (!(deg %in% 0:2))
stop("deg must be either 0, 1, or 2.")
}

#' adjusts observations and evaluation points for boundary effects
#' @importFrom stats qnorm
#' @noRd
boundary_transform <- function(x, xmin, xmax) {
if (!is.nan(xmin) & !is.nan(xmax)) { # two boundaries
x <- qnorm((x - xmin) / (xmax - xmin + 1e-1))
} else if (!is.nan(xmin)) { # left boundary
x <- log(x - xmin + 1e-3)
} else if (!is.nan(xmax)) { # right boundary
x <- log(xmax - x + 1e-3)
}

x
}

6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ devtools::install_github("tnagler/kde1d")
``` r
x <- rnorm(100) # simulate data
fit <- kde1d(x) # estimate density
dkde1d(1000, fit) # evaluate density estimate
dkde1d(0, fit) # evaluate density estimate
summary(fit) # information about the estimate
plot(fit) # plot the density estimate
curve(dnorm(x), add = TRUE, # add true density
Expand All @@ -48,7 +48,7 @@ curve(dnorm(x), add = TRUE, # add true density
``` r
x <- rgamma(100, shape = 1) # simulate data
fit <- kde1d(x, xmin = 0, deg = 1) # estimate density
dkde1d(1000, fit) # evaluate density estimate
dkde1d(seq(0, 5, by = 1), fit) # evaluate density estimate
summary(fit) # information about the estimate
plot(fit) # plot the density estimate
curve(dgamma(x, shape = 1), # add true density
Expand All @@ -61,7 +61,7 @@ curve(dgamma(x, shape = 1), # add true density
x <- rbinom(100, size = 5, prob = 0.5) # simulate data
x <- ordered(x, levels = 0:5) # declare as ordered
fit <- kde1d(x) # estimate density
dkde1d(2, fit) # evaluate density estimate
dkde1d(sort(unique(x)), fit) # evaluate density estimate
summary(fit) # information about the estimate
plot(fit) # plot the density estimate
points(ordered(0:5, 0:5), # add true density
Expand Down
25 changes: 2 additions & 23 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,13 @@
## New submission after package has been archived

Problem 1: ASAN failures
Solution: Fixed heap buffer overflows on clang-ASAN related to max-comparison
of a size_t and a negative number.

Problem 2: Failed without long doubles
Solution: Fixed cdf calculations for discrete data which made /tests fail.

Problem 3: Threw a C++ exception under valgrind (fatal to the R process)
Solution: This is a known issue of valgrind + boost; fixed by adding
'#define BOOST_MATH_PROMOTE_DOUBLE_POLICY false' in /inst/include/stats.hpp
(as suggested in http://svn.boost.org/trac10/ticket/10005).

We are sorry for the delay in fixing these issues, reproducing the bugs reported
by clang-ASAN was way harder than expected.

## Test environments
* Debian Linux with clang-4.0-ASAN enabled on rocker/r-devel-ubsan-clang (devel)
* Ubuntu 16.04 with clang-7.0.0-ASAN enabled (devel)
* Ubuntu 16.04 without long double support (devel)
* Ubuntu 16.04 with valgrind (devel)
* local OS X install (release)
* ubuntu 12.04 on travis-ci (oldrel, release, devel)
* ubuntu 12.04 on travis-ci (release, devel)
* win-builder (devel)
* Windows Server 2012 R2 x64 + x86 (release)

## R CMD check results

0 errors | 0 warnings | 0 notes

## Reverse dependencies

This is a new release, so there are no reverse dependencies.
There are no reverse dependencies.
2 changes: 1 addition & 1 deletion docs/LICENSE-text.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/authors.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions docs/index.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit f900e6d

Please sign in to comment.