Probability is missing? #407

osorensen · 2024-03-22T08:56:47Z

Leap-and-shift for latent rankings based on pairwise preference data is symmetric, but here it is treated as uniform on the constraint set. Is that correct? This does not matter for MCMC, but it does matter for the particle weights in SMC.

BayesMallows/src/rank_proposal.cpp

Lines 142 to 159 in 6f6b85d

    
           RankProposal PairwiseLeapAndShift::propose( 
        
               const vec& current_rank, const doubly_nested& items_above, 
        
               const doubly_nested& items_below 
        
           ) { 
        
             int n_items = current_rank.n_elem; 
        
             ivec a = Rcpp::sample(n_items, 1) - 1; 
        
             int item = a(0); 
        
             int lower_limit = find_lower_limit(item, items_above[item], current_rank); 
        
             int upper_limit = find_upper_limit(item, items_below[item], current_rank); 
        
             Rcpp::IntegerVector b = Rcpp::seq(lower_limit, upper_limit); 
        
             ivec d = Rcpp::sample(b, 1); 
        
             int proposed_rank = d(0); 
        
             RankProposal rp{current_rank}; 
        
             rp.rankings(item) = proposed_rank; 
        
             rp = shift(rp, current_rank, item); 
        
             return rp; 
        
           }

osorensen · 2024-03-22T10:55:08Z

I ran the following code lines:

library(BayesMallows)
set.seed(3)
par(mfrow = c(3, 1))
for(k in 1:3) {
  dat <- subset(beach_preferences, assessor <= 2)
  mod_init <- compute_mallows(
    data = setup_rank_data(
      preferences = dat
    ),
    compute_options = set_compute_options(nmc = 3000, burnin = 1000)
  )
  
  # Next we provide assessors 21 to 60 one at a time.
  alpha <- numeric()
  mod <- mod_init
  for (i in 3:60) {
    print(i)
    mod <- update_mallows(
      model = mod,
      new_data = setup_rank_data(
        preferences = subset(beach_preferences, assessor == i),
        user_ids = i
      ),
      smc_options = set_smc_options(
        n_particles = 2000, latent_sampling_lag = 1)
    )
    alpha <- c(alpha, mean(mod$alpha_samples))
  }
  
  plot(alpha)
  
}

Results with version 2.1.1

Results with the version I'm currently developing

The alpha parameter certainly gets higher, and hence closer to the MCMC results.

osorensen · 2024-03-22T12:06:52Z

Some confirmation that my way of computing probabilities is correct:

# Validate pairwise leap-and-shift proposal
devtools::load_all()
library(tidyverse)
library(furrr)

dat <- subset(beach_preferences, assessor == 3 & bottom_item < 7 & top_item < 12)
n_items <- 15
dd <- setup_rank_data(preferences = dat, n_items = n_items)
constraints <- dd$constraints[[1]]

current_rank <- dd$rankings

plan("multisession")
rankings <- future_map_dfr(1:1000000, function(i) {
  u <- sample(n_items, 1)
  if(u %in% constraints$constrained_items) {
    ia <- constraints$items_above[[u]]
    ib <- constraints$items_below[[u]]
    if(length(ia) > 0) {
      l <- max(current_rank[ia]) + 1
    } else {
      l <- 1
    }
    if(length(ib) > 0) {
      r <- min(current_rank[ib]) - 1
    } else {
      r <- n_items
    }
  } else {
    l <- 1
    r <- n_items
  }
  support <- seq(from = l, to = r, by = 1)
  r_prime <- current_rank
  r_prime[[u]] <- sample(support, 1)

  for(j in seq_len(n_items)) {
    if(current_rank[[u]] < current_rank[[j]] && current_rank[[j]] <= r_prime[[u]]) {
      r_prime[[j]] <- current_rank[[j]] - 1
    } else if(current_rank[[u]] > current_rank[[j]] && current_rank[[j]] >= r_prime[[u]]) {
      r_prime[[j]] <- current_rank[[j]] + 1
    }
  }
  stopifnot(BayesMallows:::validate_permutation(r_prime))
  tibble(
    iteration = i,
    probability = 1 / length(support) / n_items,
    ranking = list(as.numeric(r_prime))
  )
}, .options = furrr_options(seed = 1L))

empirical_probs <- rankings %>%
  mutate(ranking = map_chr(ranking, paste, collapse = "")) %>%
  group_by(ranking) %>%
  summarise(n = n(), .groups = "drop") %>%
  mutate(proportion = n / sum(n))

calculated_probs <- rankings %>%
  mutate(ranking = map_chr(ranking, paste, collapse = "")) %>%
  distinct(ranking, probability) %>%
  group_by(ranking) %>%
  summarise(
    probability = sum(probability), .groups = "drop"
  )

empirical_probs %>%
  full_join(calculated_probs, by = "ranking") %>%
  ggplot(aes(x = proportion, y = probability)) +
  geom_point() +
  geom_abline()

First with this row:

dat <- subset(beach_preferences, assessor == 3 & bottom_item < 12 & top_item < 12)

Next with this:

dat <- subset(beach_preferences, assessor == 3 & bottom_item < 7 & top_item < 12)

* incrementing dev version * updating news * Allow different initial values across particles also when using pairwise preference data (#406) * simplifications * used Cpp for all_topological_sorts. Much much faster, since it is recursive code * converted preferences to matrix for SMC * starting to set up preferences * done * styling * removed shuffle_unranked argument * restructured arguments to setup_rank_data for pairwise preferences * fixing some errors * removing unnecessary statement * fixed bug in augmented rankings for existing users * made a better progress reporter * updated news and description * fixed #407 (#408) * Fixing bug in Ulam distance (#411) * fixed bug and added test * simplifying * styling * Exporting exact partition function (#412) * fixing documentation typo * fixing #409 * Consistency checks with pairwise preferences (#414) * fixed issue with updated users with pairwise preferences * fixed #404 * improved cpp code for topological sorts * incrementing dev version * generating all topological sorts in random order * adding save=TRUE argument where necessary * incrementing and updating * updating cran comments * fixing a long-running example * fixing CodeFactor issues

osorensen added the bug Something isn't working label Mar 22, 2024

osorensen self-assigned this Mar 22, 2024

osorensen added a commit that referenced this issue Mar 22, 2024

fixed #407

2caae5d

osorensen added a commit that referenced this issue Mar 22, 2024

fixed #407 (#408)

83c505a

osorensen closed this as completed in ad4b935 Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Probability is missing? #407

Probability is missing? #407

osorensen commented Mar 22, 2024

osorensen commented Mar 22, 2024

osorensen commented Mar 22, 2024

Probability is missing? #407

Probability is missing? #407

Comments

osorensen commented Mar 22, 2024

osorensen commented Mar 22, 2024

Results with version 2.1.1

Results with the version I'm currently developing

osorensen commented Mar 22, 2024