Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update based on some recent developments #379

Merged
merged 46 commits into from
Feb 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
f93b77b
bookkeeping
osorensen Jan 19, 2024
f9f55db
removed unnecessary call
osorensen Jan 19, 2024
84a2230
marking as const
osorensen Jan 19, 2024
7959b6c
adding more consts
osorensen Jan 19, 2024
2a639d6
additional safety measures
osorensen Jan 19, 2024
7391954
parallelization working (#355)
osorensen Jan 19, 2024
a8bdc0d
correcting an error in docs
osorensen Jan 19, 2024
2d69ada
Merge branch 'develop' of https://github.com/ocbe-uio/BayesMallows in…
osorensen Jan 19, 2024
83248f6
update cran-comments
osorensen Jan 24, 2024
506d5c4
updating comments
osorensen Jan 24, 2024
a5d84e4
Merge branch 'master' into develop
osorensen Jan 26, 2024
d25a0f2
updating news
osorensen Jan 26, 2024
243b8a3
deleting submission file
osorensen Jan 26, 2024
16a93ba
Adding support for sampling from prior (#360)
osorensen Jan 26, 2024
eddbd22
Merge branch 'develop' of https://github.com/ocbe-uio/BayesMallows in…
osorensen Jan 26, 2024
9b7d4dc
fixed error in update_mallows.SMCMallows
osorensen Jan 26, 2024
236a21d
updated failing test and added namespace qualifier
osorensen Jan 26, 2024
144a451
added long-running SMC test from prior
osorensen Jan 26, 2024
a62545e
Take care of item names properly (#363)
osorensen Jan 26, 2024
ec9500c
Can now deal with a single vector of input data (#364)
osorensen Jan 26, 2024
db8a31b
updating news
osorensen Jan 26, 2024
78e4f29
updated set_priors function
osorensen Feb 7, 2024
86723fc
closes #370
osorensen Feb 7, 2024
2a61f42
Added a gamma prior (#371)
osorensen Feb 7, 2024
45e1eeb
Merge branch 'priors-issue-370' of https://github.com/ocbe-uio/BayesM…
osorensen Feb 7, 2024
9f4cc7f
Had forgot to implement the change... (#372)
osorensen Feb 7, 2024
defa9db
Merge branch 'develop' of https://github.com/ocbe-uio/BayesMallows in…
osorensen Feb 8, 2024
41cb4ab
Added lag option (#373)
osorensen Feb 8, 2024
9fddacd
ready for the change
osorensen Feb 8, 2024
ee08f70
Merge branch 'develop' of https://github.com/ocbe-uio/BayesMallows in…
osorensen Feb 8, 2024
701bf6b
Resampling issue 365 (#376)
osorensen Feb 9, 2024
bd74eb5
resolving conflict and bumping dev version
osorensen Feb 9, 2024
0747676
removed git conflict marker
osorensen Feb 9, 2024
ce0128a
added a line shift
osorensen Feb 15, 2024
d3bbc55
removing const-ref from built-in types
osorensen Feb 15, 2024
0af8731
refactoring limits functions for pairwise augmentation
osorensen Feb 15, 2024
7c55d63
increasing test strictness
osorensen Feb 15, 2024
ea86e61
added some more work
osorensen Feb 16, 2024
30d7774
moving distance code into implementation file
osorensen Feb 16, 2024
2d71e7c
moved partition function code into cpp files
osorensen Feb 16, 2024
4395859
added code for reproducing Liu et al 2019 review
osorensen Feb 16, 2024
4f2608f
changed updating frequency for pkgdown. closes #380
osorensen Feb 16, 2024
a80b0d6
adding ignore to codecov
osorensen Feb 16, 2024
2fe90c2
fixing #381 (#382)
osorensen Feb 16, 2024
9924131
Heatplot issue 381 (#383)
osorensen Feb 16, 2024
84e5671
updated news
osorensen Feb 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions .github/workflows/pkgdown.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,6 @@
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [master]
pull_request:
branches: [master]
release:
types: [published]
workflow_dispatch:
Expand Down
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: BayesMallows
Type: Package
Title: Bayesian Preference Learning with the Mallows Rank Model
Version: 2.0.1
Version: 2.0.1.9003
Authors@R: c(person("Oystein", "Sorensen",
email = "oystein.sorensen.1985@gmail.com",
role = c("aut", "cre"),
Expand Down Expand Up @@ -46,7 +46,7 @@ BugReports: https://github.com/ocbe-uio/BayesMallows/issues
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.2.3
RoxygenNote: 7.3.1
Depends: R (>= 3.5.0)
Imports: Rcpp (>= 1.0.0),
ggplot2 (>= 3.1.0),
Expand Down
4 changes: 4 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,15 @@ S3method(compute_consensus,BayesMallows)
S3method(compute_consensus,SMCMallows)
S3method(compute_posterior_intervals,BayesMallows)
S3method(compute_posterior_intervals,SMCMallows)
S3method(generate_initial_ranking,BayesMallowsIntransitive)
S3method(generate_initial_ranking,BayesMallowsTransitiveClosure)
S3method(plot,BayesMallows)
S3method(plot,SMCMallows)
S3method(print,BayesMallows)
S3method(print,BayesMallowsMixtures)
S3method(print,SMCMallows)
S3method(update_mallows,BayesMallows)
S3method(update_mallows,BayesMallowsPriorSamples)
S3method(update_mallows,SMCMallows)
export(assess_convergence)
export(assign_cluster)
Expand All @@ -33,6 +36,7 @@ export(plot_elbow)
export(plot_top_k)
export(predict_top_k)
export(sample_mallows)
export(sample_prior)
export(set_compute_options)
export(set_initial_values)
export(set_model_options)
Expand Down
21 changes: 21 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,26 @@
# BayesMallows (development versions)

* Fixed a bug in heat_plot() when the model has been estimated with
rho_thinning > 1, causing the probabilities to be unnormalized. Issue #381.
Thanks to Marta Crispino for discovering the bug.
* Added stratified, systematic, and residual resampling to the sequential
Monte Carlo algorithm. These distributions should in general be preferred to
multinomial resampling, which was the only available option until now.
* The move step of the SMC algorithm now allows a user-defined lag for the
sampling of latent ranks, specified in the "latent_sampling_lag" argument
to set_smc_options().
* Prior for precision parameter alpha is now a gamma distribution. Until now
an exponential distribution has been assumed. Since the exponential is a special
case of the gamma with shape parameter equal to 1 (the default), this is not
a breaking change. However, it adds flexibility when it comes to specifying the prior.
* setup_rank_data() now accepts a single vector of rankings, silently converting a to matrix with a single row.
* Sequential Monte Carlo algorithm can now start from a sample from the prior
distribution, see the sample_prior() function for an example.
* Added support for parallelism under-the-hood with oneTBB.
wleoncio marked this conversation as resolved.
Show resolved Hide resolved

# BayesMallows 2.0.1

* Edits to C++ code fixing memory leaks.
* Edits to unit tests which caused issues on CRAN.

# BayesMallows 2.0.0
Expand Down
15 changes: 2 additions & 13 deletions R/compute_mallows.R
Original file line number Diff line number Diff line change
Expand Up @@ -79,21 +79,10 @@ compute_mallows <- function(
validate_class(initial_values, "BayesMallowsInitialValues")
validate_rankings(data)
validate_preferences(data, model_options)
validate_rankings(data)
validate_initial_values(initial_values, data)

pfun_values <- tryCatch(
prepare_partition_function(model_options$metric, data$n_items),
error = function(e) {
if (is.null(pfun_estimate)) {
stop(
"Exact partition function not known. Please provide an ",
"estimate in argument pfun_estimate."
)
} else {
return(NULL)
}
}
)
pfun_values <- extract_pfun_values(model_options, data, pfun_estimate)

if (is.null(cl)) {
lapplyfun <- lapply
Expand Down
16 changes: 16 additions & 0 deletions R/estimate_partition_function.R
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,22 @@ estimate_partition_function <- function(
matrix(c(power, stats::lm(form, data = estimate)$coefficients), ncol = 2)
}

extract_pfun_values <- function(model_options, data, pfun_estimate) {
tryCatch(
prepare_partition_function(model_options$metric, data$n_items),
error = function(e) {
if (is.null(pfun_estimate)) {
stop(
"Exact partition function not known. Please provide an ",
"estimate in argument pfun_estimate."
)
} else {
return(NULL)
}
}
)
}

prepare_partition_function <- function(metric, n_items) {
if (metric %in% c("cayley", "hamming", "kendall")) {
return(NULL)
Expand Down
8 changes: 2 additions & 6 deletions R/generate_initial_ranking.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ generate_initial_ranking <- function(
UseMethod("generate_initial_ranking")
}

#' @export
generate_initial_ranking.BayesMallowsTransitiveClosure <- function(
preferences, cl = NULL, shuffle_unranked = FALSE, random = FALSE,
random_limit = 8L) {
Expand Down Expand Up @@ -34,12 +35,7 @@ generate_initial_ranking.BayesMallowsTransitiveClosure <- function(
}
}


.S3method(
"generate_initial_ranking", "BayesMallowsTransitiveClosure",
generate_initial_ranking.BayesMallowsTransitiveClosure
)

#' @export
generate_initial_ranking.BayesMallowsIntransitive <- function(
preferences, cl = NULL, shuffle_unranked = FALSE,
random = FALSE, random_limit = 8L) {
Expand Down
4 changes: 2 additions & 2 deletions R/heat_plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,14 @@ heat_plot <- function(model_fit, burnin = model_fit$burnin, ...) {
drop = FALSE
]
posterior_ranks$probability <- 1
posterior_ranks$iteration <- NULL

heatplot_data <- aggregate(posterior_ranks[, "probability", drop = FALSE],
by = list(
cluster = posterior_ranks$cluster,
item = posterior_ranks$item,
value = posterior_ranks$value
),
FUN = function(x) sum(x) / (model_fit$nmc - burnin)
FUN = function(x) sum(x) / length(unique(posterior_ranks$iteration))
)

heatplot_data$item <- factor(heatplot_data$item, levels = item_order)
Expand Down
30 changes: 30 additions & 0 deletions R/sample_prior.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#' Sample from prior distribution
#'
#' Function to obtain samples from the prior distributions of the Bayesian
#' Mallows model. Intended to be given to [update_mallows()].
#'
#' @param n An integer specifying the number of samples to take.
#' @param n_items An integer specifying the number of items to be ranked.
#' @param priors An object of class "BayesMallowsPriors" returned from
#' [set_priors()].
#'
#' @return An object of class "BayesMallowsPriorSample", containing `n`
#' independent samples of \eqn{\alpha} and \eqn{\rho}.
#'
#' @export
#'
#' @family modeling
#' @example /inst/examples/sample_prior_example.R
sample_prior <- function(n, n_items, priors = set_priors()) {
validate_positive(n)
validate_positive(n_items)
ret <- list(
alpha = stats::rgamma(n, shape = priors$gamma, rate = priors$lambda),
rho = replicate(n, sample(n_items, n_items)),
priors = priors,
n_items = n_items,
items = seq_len(n_items)
)
class(ret) <- "BayesMallowsPriorSamples"
ret
}
12 changes: 10 additions & 2 deletions R/set_priors.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,13 @@
#' @description Set values related to the prior distributions for the Bayesian
#' Mallows model.
#'
#' @param gamma Strictly positive numeric value specifying the shape parameter
#' of the gamma prior distribution of \eqn{\alpha}. Defaults to `1`, thus
#' recovering the exponential prior distribution used by
#' \insertCite{vitelli2018}{BayesMallows}.
#'
#' @param lambda Strictly positive numeric value specifying the rate parameter
#' of the truncated exponential prior distribution of \eqn{\alpha}. Defaults
#' of the gamma prior distribution of \eqn{\alpha}. Defaults
#' to `0.001`. When `n_cluster > 1`, each mixture component \eqn{\alpha_{c}}
#' has the same prior distribution.
#'
Expand All @@ -26,10 +31,13 @@
#' [update_mallows()].
#' @export
#'
#' @references \insertAllCited{}
#'
#' @family preprocessing
#'
set_priors <- function(lambda = 0.001, psi = 10, kappa = c(1, 3)) {
set_priors <- function(gamma = 1, lambda = 0.001, psi = 10, kappa = c(1, 3)) {
stopifnot(length(kappa) == 2)
validate_positive(gamma)
validate_positive(lambda)
validate_integer(psi)
validate_positive(psi)
Expand Down
57 changes: 55 additions & 2 deletions R/set_smc_options.R
Original file line number Diff line number Diff line change
@@ -1,20 +1,73 @@
#' @title Set SMC compute options
#'
#' @description Sets the SMC compute options to be used in [update_mallows.BayesMallows()].
#' @description Sets the SMC compute options to be used in
#' [update_mallows.BayesMallows()].
#'
#' @param n_particles Integer specifying the number of particles.
#' @param mcmc_steps Number of MCMC steps to be applied in the resample-move
#' step.
#' @param resampler Character string defining the resampling method to use. One
#' of "stratified", "systematic", "residual", and "multinomial". Defaults to
#' "stratified". While multinomial resampling was used in
#' \insertCite{steinSequentialInferenceMallows2023;textual}{BayesMallows},
#' stratified, systematic, or residual resampling typically give lower Monte
#' Carlo error \insertCite{Douc2005,Hol2006,Naesseth2019}{BayesMallows}.
#' @param latent_sampling_lag Parameter specifying the number of timesteps to go
#' back when resampling the latent ranks in the move step. See Section 6.2.3
#' of \insertCite{Kantas2015}{BayesMallows} for details. The \eqn{L} in their
#' notation corresponds to `latent_sampling_lag`. See more under Details.
#' Defaults to `NA`, which means that all latent ranks from previous timesteps
#' are resampled. If set to `0`, no move step is applied to the latent ranks.
#'
#' @return An object of class "SMCOptions".
#'
#'
#' @section Lag parameter in move step:
#'
#' The parameter `latent_sampling_lag` corresponds to \eqn{L} in
#' \insertCite{Kantas2015}{BayesMallows}. Its use in this package is can be
#' explained in terms of Algorithm 12 in
#' \insertCite{steinSequentialInferenceMallows2023}{BayesMallows}. The
#' relevant line of the algorithm is:
#'
#' **for** \eqn{j = 1 : M_{t}} **do** \cr
#' **M-H step:** update \eqn{\tilde{\mathbf{R}}_{j}^{(i)}} with proposal
#' \eqn{\tilde{\mathbf{R}}_{j}' \sim q(\tilde{\mathbf{R}}_{j}^{(i)} |
#' \mathbf{R}_{j}, \boldsymbol{\rho}_{t}^{(i)}, \alpha_{t}^{(i)})}.\cr
#' **end**
#'
#' Let \eqn{L} denote the value of `latent_sampling_lag`. With this parameter,
#' we modify for algorithm so it becomes
#'
#' **for** \eqn{j = M_{t-L+1} : M_{t}} **do** \cr
#' **M-H step:** update \eqn{\tilde{\mathbf{R}}_{j}^{(i)}} with proposal
#' \eqn{\tilde{\mathbf{R}}_{j}' \sim q(\tilde{\mathbf{R}}_{j}^{(i)} |
#' \mathbf{R}_{j}, \boldsymbol{\rho}_{t}^{(i)}, \alpha_{t}^{(i)})}.\cr
#' **end**
#'
#' This means that with \eqn{L=0} no move step is performed on any latent
#' ranks, whereas \eqn{L=1} means that the move step is only applied to the
#' parameters entering at the given timestep. The default,
#' `latent_sampling_lag = NA` means that \eqn{L=t} at each timestep, and hence
#' all latent ranks are part of the move step at each timestep.
#'
#'
#' @export
#' @references \insertAllCited{}
#'
#' @family preprocessing
set_smc_options <- function(
n_particles = 1000,
mcmc_steps = 5) {
mcmc_steps = 5,
resampler = c("stratified", "systematic", "residual", "multinomial"),
latent_sampling_lag = NA_integer_) {
validate_integer(n_particles)
validate_integer(mcmc_steps)
if (!is.na(latent_sampling_lag)) validate_integer(latent_sampling_lag)
resampler <- match.arg(
resampler,
c("stratified", "systematic", "residual", "multinomial")
)

ret <- as.list(environment())
class(ret) <- "SMCOptions"
Expand Down
42 changes: 38 additions & 4 deletions R/setup_rank_data.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@
#' optional initial value of the rankings. If `rankings` has column names,
#' these are assumed to be the names of the items. `NA` values in rankings are
#' treated as missing data and automatically augmented; to change this
#' behavior, see the `na_action` argument to [set_model_options()].
#' behavior, see the `na_action` argument to [set_model_options()]. A vector
#' length `n_items` is silently converted to a matrix of length `1 x n_items`,
#' and names (if any), are used as column names.
#'
#' @param preferences A data frame with one row per pairwise comparison, and
#' columns `assessor`, `top_item`, and `bottom_item`. Each column contains the
Expand Down Expand Up @@ -80,6 +82,12 @@
#' when all possible orderings are computed, i.e., when `random=TRUE`.
#' Defaults to `8L`.
#'
#' @param timepoint Integer vector specifying the timepoint. Defaults to `NULL`,
#' which means that a vector of ones, one for each observation, is generated.
#' Used by [update_mallows()] to identify data with a given iteration of the
#' sequential Monte Carlo algorithm. If not `NULL`, must contain one integer
#' for each row in `rankings`.
#'
#' @note Setting `random=TRUE` means that all possible orderings of each
#' assessor's preferences are generated, and one of them is picked at random.
#' This can be useful when experiencing convergence issues, e.g., if the MCMC
Expand Down Expand Up @@ -126,7 +134,8 @@ setup_rank_data <- function(
cl = NULL,
shuffle_unranked = FALSE,
random = FALSE,
random_limit = 8L) {
random_limit = 8L,
timepoint = NULL) {
na_action <- match.arg(na_action, c("augment", "fail", "omit"))

if (is.null(rankings) && is.null(preferences)) {
Expand All @@ -138,10 +147,19 @@ setup_rank_data <- function(
if (na_action == "fail" && any(is.na(rankings))) {
stop("rankings matrix contains NA values")
}
if (!is.matrix(rankings)) {
rankings <- matrix(rankings,
nrow = 1,
dimnames = list(NULL, names(rankings))
)
}

if (na_action == "omit" && any(is.na(rankings))) {
keeps <- apply(rankings, 1, function(x) !any(is.na(x)))
print(paste("Omitting", sum(!keeps), "row(s) from rankings due to NA values"))
print(paste(
"Omitting", sum(!keeps),
"row(s) from rankings due to NA values"
))
rankings <- rankings[keeps, , drop = FALSE]
}
} else {
Expand All @@ -153,7 +171,10 @@ setup_rank_data <- function(
if (!is.null(observation_frequency)) {
validate_positive_vector(observation_frequency)
if (nrow(rankings) != length(observation_frequency)) {
stop("observation_frequency must be of same length as the number of rows in rankings")
stop(
"observation_frequency must be of same ",
"length as the number of rows in rankings"
)
}
} else {
observation_frequency <- rep(1, nrow(rankings))
Expand All @@ -164,6 +185,19 @@ setup_rank_data <- function(
stop("invalid permutations provided in rankings matrix")
}
n_items <- ncol(rankings)

if (!is.null(colnames(rankings))) {
items <- colnames(rankings)
} else {
items <- paste("Item", seq(from = 1, to = n_items, by = 1))
}

if (is.null(timepoint)) timepoint <- rep(1, nrow(rankings))
if (length(timepoint) != nrow(rankings)) {
stop("must have one timepoint per row")
}


constraints <- generate_constraints(preferences, n_items, cl)
consistent <- matrix(integer(0))

Expand Down
Loading
Loading