Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probability is missing? #407

Closed
osorensen opened this issue Mar 22, 2024 · 2 comments
Closed

Probability is missing? #407

osorensen opened this issue Mar 22, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@osorensen
Copy link
Collaborator

Leap-and-shift for latent rankings based on pairwise preference data is symmetric, but here it is treated as uniform on the constraint set. Is that correct? This does not matter for MCMC, but it does matter for the particle weights in SMC.

RankProposal PairwiseLeapAndShift::propose(
const vec& current_rank, const doubly_nested& items_above,
const doubly_nested& items_below
) {
int n_items = current_rank.n_elem;
ivec a = Rcpp::sample(n_items, 1) - 1;
int item = a(0);
int lower_limit = find_lower_limit(item, items_above[item], current_rank);
int upper_limit = find_upper_limit(item, items_below[item], current_rank);
Rcpp::IntegerVector b = Rcpp::seq(lower_limit, upper_limit);
ivec d = Rcpp::sample(b, 1);
int proposed_rank = d(0);
RankProposal rp{current_rank};
rp.rankings(item) = proposed_rank;
rp = shift(rp, current_rank, item);
return rp;
}

@osorensen osorensen added the bug Something isn't working label Mar 22, 2024
@osorensen osorensen self-assigned this Mar 22, 2024
@osorensen
Copy link
Collaborator Author

I ran the following code lines:

library(BayesMallows)
set.seed(3)
par(mfrow = c(3, 1))
for(k in 1:3) {
  dat <- subset(beach_preferences, assessor <= 2)
  mod_init <- compute_mallows(
    data = setup_rank_data(
      preferences = dat
    ),
    compute_options = set_compute_options(nmc = 3000, burnin = 1000)
  )
  
  # Next we provide assessors 21 to 60 one at a time.
  alpha <- numeric()
  mod <- mod_init
  for (i in 3:60) {
    print(i)
    mod <- update_mallows(
      model = mod,
      new_data = setup_rank_data(
        preferences = subset(beach_preferences, assessor == i),
        user_ids = i
      ),
      smc_options = set_smc_options(
        n_particles = 2000, latent_sampling_lag = 1)
    )
    alpha <- c(alpha, mean(mod$alpha_samples))
  }
  
  plot(alpha)
  
}

Results with version 2.1.1

image

Results with the version I'm currently developing

image

The alpha parameter certainly gets higher, and hence closer to the MCMC results.

@osorensen
Copy link
Collaborator Author

Some confirmation that my way of computing probabilities is correct:

# Validate pairwise leap-and-shift proposal
devtools::load_all()
library(tidyverse)
library(furrr)

dat <- subset(beach_preferences, assessor == 3 & bottom_item < 7 & top_item < 12)
n_items <- 15
dd <- setup_rank_data(preferences = dat, n_items = n_items)
constraints <- dd$constraints[[1]]

current_rank <- dd$rankings

plan("multisession")
rankings <- future_map_dfr(1:1000000, function(i) {
  u <- sample(n_items, 1)
  if(u %in% constraints$constrained_items) {
    ia <- constraints$items_above[[u]]
    ib <- constraints$items_below[[u]]
    if(length(ia) > 0) {
      l <- max(current_rank[ia]) + 1
    } else {
      l <- 1
    }
    if(length(ib) > 0) {
      r <- min(current_rank[ib]) - 1
    } else {
      r <- n_items
    }
  } else {
    l <- 1
    r <- n_items
  }
  support <- seq(from = l, to = r, by = 1)
  r_prime <- current_rank
  r_prime[[u]] <- sample(support, 1)

  for(j in seq_len(n_items)) {
    if(current_rank[[u]] < current_rank[[j]] && current_rank[[j]] <= r_prime[[u]]) {
      r_prime[[j]] <- current_rank[[j]] - 1
    } else if(current_rank[[u]] > current_rank[[j]] && current_rank[[j]] >= r_prime[[u]]) {
      r_prime[[j]] <- current_rank[[j]] + 1
    }
  }
  stopifnot(BayesMallows:::validate_permutation(r_prime))
  tibble(
    iteration = i,
    probability = 1 / length(support) / n_items,
    ranking = list(as.numeric(r_prime))
  )
}, .options = furrr_options(seed = 1L))

empirical_probs <- rankings %>%
  mutate(ranking = map_chr(ranking, paste, collapse = "")) %>%
  group_by(ranking) %>%
  summarise(n = n(), .groups = "drop") %>%
  mutate(proportion = n / sum(n))

calculated_probs <- rankings %>%
  mutate(ranking = map_chr(ranking, paste, collapse = "")) %>%
  distinct(ranking, probability) %>%
  group_by(ranking) %>%
  summarise(
    probability = sum(probability), .groups = "drop"
  )

empirical_probs %>%
  full_join(calculated_probs, by = "ranking") %>%
  ggplot(aes(x = proportion, y = probability)) +
  geom_point() +
  geom_abline()

First with this row:

dat <- subset(beach_preferences, assessor == 3 & bottom_item < 12 & top_item < 12)

image

Next with this:

dat <- subset(beach_preferences, assessor == 3 & bottom_item < 7 & top_item < 12)

image

osorensen added a commit that referenced this issue Mar 22, 2024
osorensen added a commit that referenced this issue Mar 22, 2024
osorensen added a commit that referenced this issue Apr 19, 2024
* incrementing dev version

* updating news

* Allow different initial values across particles also when using pairwise preference data (#406)

* simplifications

* used Cpp for all_topological_sorts. Much much faster, since it is recursive code

* converted preferences to matrix for SMC

* starting to set up preferences

* done

* styling

* removed shuffle_unranked argument

* restructured arguments to setup_rank_data for pairwise preferences

* fixing some errors

* removing unnecessary statement

* fixed bug in augmented rankings for existing users

* made a better progress reporter

* updated news and description

* fixed #407 (#408)

* Fixing bug in Ulam distance (#411)

* fixed bug and added test

* simplifying

* styling

* Exporting exact partition function (#412)

* fixing documentation typo

* fixing #409

* Consistency checks with pairwise preferences (#414)

* fixed issue with updated users with pairwise preferences

* fixed #404

* improved cpp code for topological sorts

* incrementing dev version

* generating all topological sorts in random order

* adding save=TRUE argument where necessary

* incrementing and updating

* updating cran comments

* fixing a long-running example

* fixing CodeFactor issues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant