You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Baxter's comment in the source about issues with impute_confidence for continuous values should be documented somewhere more stable + visible (ie here).
# The confidence in continuous imputation is "the probability that
# there exists a unimodal summary" which is defined as the proportion of
# probability mass in the largest mode of a DPMM inferred from the simulate
# samples. We use crosscat on the samples for a given number of iterations,
# then calculate the proportion of mass in the largest mode.
#
# NOTE: The definition of confidence and its implementation do not agree.
# The probability of a unimodal summary is P(k=1|X), where k is the number
# of components in some infinite mixture model. I would describe the
# current implementation as "Is there a mode with sufficient enough mass
# that we can ignore the other modes". If this second formulation is to be
# used, it means that we need to not use the median of all the samples as
# the imputed value, but the median of the samples of the summary mode,
# because the summary (the imputed value) should come from the summary
# mode.
#
# There are a lot of problems with this second formulation.
#0. SLOW. Like, for real.
#1. Non-deterministic. The answer will be different given the same
# samples.
#2. Inaccurate. Approximate inference about approximate inferences.
# In practice confidences on the sample samples could be significantly
# different because the Gibbs sampler that underlies crosscat is
# susceptible to getting stuck in local maximum. Of course, this could be
# mitigated to some extent by using more chains, but things are slow
# enough as it is.
#3. Confidence (interval) has a distinct meaning to the people who will
# be using this software. A unimodal summary does not necessarily mean
# that inferences are within an acceptable range. We are going to need to
# be loud about this. Maybe there should be a notion of tolerance?
#
# An alternative: mutual predictive coverage
# ------------------------------------------
# Divide the number of samples in the intersection of the 90% CI's of each
# component model by the number of samples in the union of the 90% CI's of
# each component model.
The text was updated successfully, but these errors were encountered:
Baxter's comment in the source about issues with
impute_confidence
for continuous values should be documented somewhere more stable + visible (ie here).The text was updated successfully, but these errors were encountered: