Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt AbstractMCMC.jl interface #259

Merged
merged 42 commits into from
Jul 15, 2021
Merged

Adopt AbstractMCMC.jl interface #259

merged 42 commits into from
Jul 15, 2021

Conversation

torfjelde
Copy link
Member

@torfjelde torfjelde commented Apr 6, 2021

Things to discuss

  • AFAIK, the way to customize the logging in AbstractMCMC.jl is to pass progress=false to the underlying AbstractMCMC.mcmcsample and then use the callback keyword argument to log the progress. So the question is: should we do this so as to preserve the current logging functionality?
  • To replicate the current summarization functionality (e.g. inform the user of average acceptance rates and EBFMI) as a post-sample step, we can overload StatsBase.sample and then perform this step after the call to AbstractMCMC.mcmcsample. Should we do this?

@cpfiffer
Copy link
Member

cpfiffer commented Apr 6, 2021

@torfjelde torfjelde requested a review from xukai92 April 6, 2021 18:42
@@ -211,7 +213,7 @@ nsteps(τ::Trajectory{TS, I, TC}) where {TS, I, TC<:FixedIntegrationTime} =
## Kernel interface
##

struct HMCKernel{R, T<:Trajectory} <: AbstractMCMCKernel
struct HMCKernel{R, T<:Trajectory} <: AbstractMCMCKernel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check this is just the removal of a space?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. But also, now AbstractMCMCKernel is AbstractMCMC.AbstractSampler (I'll remove this alias, and make it explicit).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any side effect it might cause?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think so. Anything particular in mind or just general question?

The only thing is that if we remove AbstractMCMCKernel completely, not even aliasing, e.g. CoupledHMC.jl won't work.

Maybe best approach is to just make AbstractMCMCKernel <: AbstractMCMC.AbstractSampler?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine as long as CoupledHMC is fine (with sutff like mixture kernels still working).

src/sampler.jl Outdated
return AbstractMCMC.step(rng, model, spl, state; kwargs...)
end

function AbstractMCMC.step(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the old signature, which would construct HamiltonianModel and HMCState then call this function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe. We have already introduced breaking changes on master with the introduction of HMCKernel, etc., so it seems like it's as good a time as any to also transition completely over to AbstractMCMC.jl interface?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's not like there's anything in the current impl that isn't supported by the new implementation:)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There wasn't actually breaking changes to the outmost intefaces (e.g. those need to call sample).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what do you mean? E.g. the example in the README doesn't work now since proposal isn't constructed in the same way (might also be other issues, but that's the immediate one that came to mind).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'm cool with adding the old sample back and deprecating:) That seems like a clean approach 👍

Also it's a bit cumbersome to write sample(..., HamiltonianModel(hamiltonian), ..) instead of sample(..., hamiltonian, ..), no?

I'm also cool with either of the following:

  1. Making AbstractHamiltonian <: AbstractMCMC.AbstractModel, which IMO seems sensible.
  2. Additional overload that will automagically wrap hamiltonian in HamiltonianModel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that option 1 is more sensible, as we already making kernel sub-typed of samples from AbstractMCMC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeaah I've realized I need to make quite a few changes. Because everything is mutated somehow, we essentially need to put everything in the sampler state.

Sooo the question is what we actually use for the model, etc. As of right now they would just be "fake" in the sense that they only contain the initial Hamiltonian and the initial AbstractHMCKernel 😕 We could mutate them, but that seems very much against the AbstractMCMC.jl interface...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Do you mean it actually only make sense to make the log density and gradient as model but treating metric differently?

Copy link
Member Author

@torfjelde torfjelde Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Only" is maybe strong, but yeah, I think it makes more sense to keep the hamiltonian and the metric separate. Potentially do away completely with the hamiltonian and only keep metric or, alternatively, make hamiltonian nothing but a wrapper around metric (in case there are future cases we might want to cover) and just make a DifferentiableDensityModel <: AbstractModel (could even move this + DensityModel from AbstractMH.jl into AbstractMCMC.jl at some point, so these can be re-used e.g. for MALA which is present in AdvancedHMC.jl)

EDIT: I think it's best to keep hamiltonian, but just reconstruct it from a DifferentiableDensityModel at every call to step. And we just pass the "state" of hamiltonian around, e.g. metric.

src/sampler.jl Outdated
rng::AbstractRNG,
model::HamiltonianModel,
spl::HMCKernel;
init_params, # TODO: implement this. Just do `rand`? Need dimensionality though.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see multiple ways to do this but it's unclear what's the best.
Can you file a PR/discussion to chat on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do! But for now I'll just extract the dimensionality from the metric and use rand. But yeah, I'll open an issue after this PR so we can figure out potential better defaults.

@xukai92
Copy link
Member

xukai92 commented Apr 6, 2021

So the question is: should we do this so as to preserve the current logging functionality?

I think we should do this as to keep the functionality unchanged.
We can decide later if that's hard to maintain or not being as useful.

To replicate the current summarization functionality ...

Does it make sense to add this functionality in AbtractMCMC first? Seems to be a useful thing to have in general.

@cpfiffer
Copy link
Member

cpfiffer commented Apr 6, 2021

To replicate the current summarization functionality (e.g. inform the user of average acceptance rates and EBFMI) as a post-sample step, we can overload StatsBase.sample and then perform this step after the call to AbstractMCMC.mcmcsample. Should we do this?

Yes, that's the intended use case. 👍

AFAIK, the way to customize the logging in AbstractMCMC.jl is to pass progress=false to the underlying AbstractMCMC.mcmcsample and then use the callback keyword argument to log the progress. So the question is: should we do this so as to preserve the current logging functionality?

Yeah, I think the AHMC logging stuff should be moved to a callback.

@torfjelde
Copy link
Member Author

Does it make sense to add this functionality in AbtractMCMC first? Seems to be a useful thing to have in general.

Not needed; with TuringLang/AbstractMCMC.jl#56 merged we can implement this as a callback very easily, and just make this callback the default.

Project.toml Outdated
@@ -3,6 +3,7 @@ uuid = "0bf59076-c3b1-5ca4-86bd-e02cd72cde3d"
version = "0.2.28"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also increment the version here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make a minor version bump? Seems like quite a significant change given the introduction of HMCKernel too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a minor version bump sounds suitable.

@torfjelde
Copy link
Member Author

Btw, waiting for TuringLang/Turing.jl#1579 since tests depend on Turing.

@HarrisonWilde HarrisonWilde marked this pull request as ready for review April 28, 2021 14:46
@HarrisonWilde HarrisonWilde marked this pull request as draft April 28, 2021 14:47
@yebai
Copy link
Member

yebai commented May 25, 2021

Is there anything missing from this PR? If there is nothing major, let's try to merge this PR since it will benefit several downstream tasks.

@HarrisonWilde
Copy link
Member

I've been working off the branch for this PR to support tempering on AHMC via the new MCMCTempering package and noticed that upon finishing sampling, even if I have using MCMCChains, I still just get back a vector of transitions from sampling, I think it would be nice to add a function similar to the one in Turing/Inference that implements AbstractMCMC.bundle_samples to actually make use of Chains, what do others think @torfjelde @yebai ?

@torfjelde
Copy link
Member Author

Is there anything missing from this PR? If there is nothing major, let's try to merge this PR since it will benefit several downstream tasks.

A couple of things:

  • Not happy with how we have a DifferentiableModel in this repo. This should preferably go somewhere else, e.g. AMCMC. Alternatively we can make use of LogDensityProblems.jl, which might be a good approach (me and @devmotion discussed this on slack a couple of weeks ago).
  • I've noticed something that might be a bug, so need to debug that first.

I've been working off the branch for this PR to support tempering on AHMC via the new MCMCTempering package and noticed that upon finishing sampling, even if I have using MCMCChains, I still just get back a vector of transitions from sampling, I think it would be nice to add a function similar to the one in Turing/Inference that implements AbstractMCMC.bundle_samples to actually make use of Chains, what do others think @torfjelde @yebai ?

I'm also using this branch heavily in some work, and have an example of such a method implemented locally. But my implementation is using information from VarInfo after a model-run in Turing, so it's not independent of Turing.jl. This is because the MCMCChains.jl-framework is mainly nice if you have several different variables while in AHMC there's no separation/labelling of the different parameters and thus we don't get much from MCMCChains other than statistics (which are already implemented for arrays, i.e. no need to make a Chain first). The issue is that we can't really put this in AHMC, as it we don't want to depend on MCMCChains here.

How are you using it? Are you using it with Turing or nah?

@HarrisonWilde
Copy link
Member

HarrisonWilde commented May 25, 2021

I see yes, that is a fair assessment, I am using just AHMC, haven't tried with Turing but I agree it perhaps makes less sense to have something specific here for working with AHMC. On another note, I agree regarding Tor's comments on the state of the model currently, it would be better perhaps to rely on some standard definition in AMCMC.

What is the minimal expectation for the return from sample though? As currently it appears to be this vector of transitions rather than the more standard vector of thetas you see on master?

@torfjelde
Copy link
Member Author

What is the minimal expectation for the return from sample though? As currently it appears to be this vector of transitions rather than the more standard vector of thetas you see on master?

Glad you brought this up! I'm not certain tbh 😕 @xukai92 @yebai What do you think? Should it return PhasePoint or just thetas? I'm with whatever, but made it return PhasePoint initially.

@xukai92
Copy link
Member

xukai92 commented May 25, 2021

What do you think? Should it return PhasePoint or just thetas?

I think we should not change the return at least for this PR.
Plus now stuff like ComponentArrays is supported, you can even have simple "variable bindings".

Not happy with how we have a DifferentiableModel in this repo. This should preferably go somewhere else, e.g. AMCMC. Alternatively we can make use of LogDensityProblems.jl, which might be a good approach (me and @devmotion discussed this on slack a couple of weeks ago).

Both are OK but I don't think we have to make it perfect within this PR as long as it's used behind the scene (i.e. users don't need to create it explicitly).
Just hope this PR doesn't keep growing than interface binding.

  • I've noticed something that might be a bug, so need to debug that first.

What's the potential bug?

@xukai92
Copy link
Member

xukai92 commented Jul 15, 2021

The CI workflow also fails here. Any idea?

@torfjelde
Copy link
Member Author

The CI workflow also fails here. Any idea?

It might just be that the settings I have for the sampler is bad because it passes most of the time.

@xukai92
Copy link
Member

xukai92 commented Jul 15, 2021

It might just be that the settings I have for the sampler is bad because it passes most of the time.

They were fine in the other PR so I suppose you mean it's due to randomness?

Seems like there are also conflicts.
Let's resolve them and see if the CI passes with a re-run.

@torfjelde torfjelde marked this pull request as ready for review July 15, 2021 00:27
@torfjelde
Copy link
Member Author

Btw, one annoying thing is that find_good_stepsize isn't supported by DifferentiableDensityModel yet. There's a bit too much functionality that uses the hamiltonian in there atm, so I haven't bothered trying to refactor that part yet. But this is something we can always do later anyways.

@xukai92
Copy link
Member

xukai92 commented Jul 15, 2021

Btw, one annoying thing is that find_good_stepsize isn't supported by DifferentiableDensityModel yet. There's a bit too much functionality that uses the hamiltonian in there atm, so I haven't bothered trying to refactor that part yet. But this is something we can always do later anyways.

Doing it in another PR sounds more reasonable. Can you create an issue for this along with other missing thing from DifferentiableDensityModel or even other alternatives (I believe we discussed a few).

@xukai92
Copy link
Member

xukai92 commented Jul 15, 2021

How could IntegrationTests also pass even Turing is not compatible?

@torfjelde
Copy link
Member Author

Doing it in another PR sounds more reasonable. Can you create an issue for this along with other missing thing from DifferentiableDensityModel or even other alternatives (I believe we discussed a few).

Will do 👍

How could IntegrationTests also pass even Turing is not compatible?

Because this is a new minor version for AHMC, which won't be compat with Turing.jl since it has compat-bounds on AHMC + it indicates it's a breaking release and so Turing.jl should be expected to have to make some changes to be compatible with the new version. If you look at the integration tests you see Info: Not compatible with this release. No problem..

So it's just a question of whether we make the tests pass or fail in this case; I went with letting it pass since this is also what we do in other Turing.jl-packages, e.g. DPPL.

@torfjelde
Copy link
Member Author

Okay this is just strange. It fails on macOS but seems to do fine otherwise.

@xukai92
Copy link
Member

xukai92 commented Jul 15, 2021

So it's just a question of whether we make the tests pass or fail in this case; I went with letting it pass since this is also what we do in other Turing.jl-packages, e.g. DPPL.

I see. Makes sense to me now.

Okay this is just strange. It fails on macOS but seems to do fine otherwise.

Could be randomness again but I feel using the same parameters as the non-AMCMC version is enough. Otherwise it might indicate some issues.

If I give same initial position and same random seed, do I expect to get exact same samples from the interface implementation? If so maybe we can add a test to check if samples are exactly the same.

@torfjelde
Copy link
Member Author

I realized what was causing the issue: I was including the adapted samples in the mean-computation:)

Now that those are dropped, it works. But now there's a different issue? This one seems completely unrelated to the changes in this PR though. You got any clue @xukai92 ?

@xukai92
Copy link
Member

xukai92 commented Jul 15, 2021

Looks like the tests are fine now. Is turning off the progress (the last commit) the fix?

@torfjelde
Copy link
Member Author

Looks like the tests are fine now. Is turning off the progress (the last commit) the fix?

No, that was just to make the logs nicer. It seems like one of the adaptation tests can fail if we're unlucky with the random seed, but as I said, this has nothing to do with this PR so IMO we merge and I'll make an issue reminding us that the adaptation test is fickle.

@torfjelde torfjelde merged commit 7cad9f0 into master Jul 15, 2021
@delete-merged-branch delete-merged-branch bot deleted the tor/abstractmcmc branch July 15, 2021 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants