Issues with impute_confidence. #54

fsaad · 2015-09-20T15:20:28Z

Baxter's comment in the source about issues with impute_confidence for continuous values should be documented somewhere more stable + visible (ie here).

    # The confidence in continuous imputation is "the probability that
    # there exists a unimodal summary" which is defined as the proportion of
    # probability mass in the largest mode of a DPMM inferred from the simulate
    # samples. We use crosscat on the samples for a given number of iterations,
    # then calculate the proportion of mass in the largest mode.
    #
    # NOTE: The definition of confidence and its implementation do not agree.
    # The probability of a unimodal summary is P(k=1|X), where k is the number
    # of components in some infinite mixture model. I would describe the
    # current implementation as "Is there a mode with sufficient enough mass
    # that we can ignore the other modes". If this second formulation is to be
    # used, it means that we need to not use the median of all the samples as
    # the imputed value, but the median of the samples of the summary mode,
    # because the summary (the imputed value) should come from the summary
    # mode.
    #
    # There are a lot of problems with this second formulation.
    #0. SLOW. Like, for real.
    #1. Non-deterministic. The answer will be different given the same
    #   samples.
    #2. Inaccurate. Approximate inference about approximate inferences.
    #   In practice confidences on the sample samples could be significantly
    #   different because the Gibbs sampler that underlies crosscat is
    #   susceptible to getting stuck in local maximum. Of course, this could be
    #   mitigated to some extent by using more chains, but things are slow
    #   enough as it is.
    #3. Confidence (interval) has a distinct meaning to the people who will
    #   be using this software. A unimodal summary does not necessarily mean
    #   that inferences are within an acceptable range. We are going to need to
    #   be loud about this. Maybe there should be a notion of tolerance?
    #
    # An alternative: mutual predictive coverage
    # ------------------------------------------
    # Divide the number of samples in the intersection of the 90% CI's of each
    # component model by the number of samples in the union of the 90% CI's of
    # each component model.

The text was updated successfully, but these errors were encountered:

raxraxraxraxrax · 2015-12-22T20:08:59Z

This should be documented somewhere for 0.1.4, but I'm not sure where it should be documented.

fsaad mentioned this issue Oct 22, 2015

Define imputation (predict_confidence) probcomp/bayeslite#271

Open

raxraxraxraxrax added the 014 high label Dec 22, 2015

gregory-marton assigned raxraxraxraxrax Jan 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with impute_confidence. #54

Issues with impute_confidence. #54

fsaad commented Sep 20, 2015

raxraxraxraxrax commented Dec 22, 2015

Issues with impute_confidence. #54

Issues with impute_confidence. #54

Comments

fsaad commented Sep 20, 2015

raxraxraxraxrax commented Dec 22, 2015