Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with impute_confidence. #54

Open
fsaad opened this issue Sep 20, 2015 · 1 comment
Open

Issues with impute_confidence. #54

fsaad opened this issue Sep 20, 2015 · 1 comment
Assignees
Labels

Comments

@fsaad
Copy link
Collaborator

fsaad commented Sep 20, 2015

Baxter's comment in the source about issues with impute_confidence for continuous values should be documented somewhere more stable + visible (ie here).

    # The confidence in continuous imputation is "the probability that
    # there exists a unimodal summary" which is defined as the proportion of
    # probability mass in the largest mode of a DPMM inferred from the simulate
    # samples. We use crosscat on the samples for a given number of iterations,
    # then calculate the proportion of mass in the largest mode.
    #
    # NOTE: The definition of confidence and its implementation do not agree.
    # The probability of a unimodal summary is P(k=1|X), where k is the number
    # of components in some infinite mixture model. I would describe the
    # current implementation as "Is there a mode with sufficient enough mass
    # that we can ignore the other modes". If this second formulation is to be
    # used, it means that we need to not use the median of all the samples as
    # the imputed value, but the median of the samples of the summary mode,
    # because the summary (the imputed value) should come from the summary
    # mode.
    #
    # There are a lot of problems with this second formulation.
    #0. SLOW. Like, for real.
    #1. Non-deterministic. The answer will be different given the same
    #   samples.
    #2. Inaccurate. Approximate inference about approximate inferences.
    #   In practice confidences on the sample samples could be significantly
    #   different because the Gibbs sampler that underlies crosscat is
    #   susceptible to getting stuck in local maximum. Of course, this could be
    #   mitigated to some extent by using more chains, but things are slow
    #   enough as it is.
    #3. Confidence (interval) has a distinct meaning to the people who will
    #   be using this software. A unimodal summary does not necessarily mean
    #   that inferences are within an acceptable range. We are going to need to
    #   be loud about this. Maybe there should be a notion of tolerance?
    #
    # An alternative: mutual predictive coverage
    # ------------------------------------------
    # Divide the number of samples in the intersection of the 90% CI's of each
    # component model by the number of samples in the union of the 90% CI's of
    # each component model.
@raxraxraxraxrax
Copy link

This should be documented somewhere for 0.1.4, but I'm not sure where it should be documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants