Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when calculating landscape analysis agglomerative clustering mean volumes #423

Open
michal-g opened this issue Nov 18, 2024 · 0 comments
Assignees

Comments

@michal-g
Copy link
Collaborator

michal-g commented Nov 18, 2024

@alkinkaz found that a recent change to analyze_landscape introduced a bug in how mean volumes were being calculated for the classes recovered through agglomerative clustering of sketched volumes (saved under e.g. landscape.39/clustering_L2_average_10/state_<i>_mean.mrc and state_<i>_std.mrc):

for i in range(M):
vol_i = np.where(labels == i)[0]
logger.info(f"State {i}: {len(vol_i)} volumes")
if vol_ind is not None:
vol_i = np.arange(K)[vol_ind][vol_i]
vol_i_all = torch.stack(
[
ImageSource.from_file(
f"{outdir}/kmeans{K}/vol_{vol_start_index+i:03d}.mrc"
).images()
for i in vol_i
]
)
nparticles = np.array([kmeans_counts[i] for i in vol_i])
vol_i_mean = np.average(vol_i_all, axis=0, weights=nparticles)
vol_i_std = (
np.average((vol_i_all - vol_i_mean) ** 2, axis=0, weights=nparticles) ** 0.5
)
write_mrc(
f"{subdir}/state_{i}_mean.mrc", vol_i_mean.astype(np.float32), Apix=Apix
)
write_mrc(
f"{subdir}/state_{i}_std.mrc", vol_i_std.astype(np.float32), Apix=Apix
)

became

for i in range(M):
vol_i = np.where(labels == i)[0]
logger.info(f"State {i}: {len(vol_i)} volumes")
if vol_ind is not None:
vol_i = np.arange(K)[vol_ind][vol_i]
vol_fl = os.path.join(outdir, f"kmeans{K}", f"vol_{vol_start_index+i:03d}.mrc")
vol_i_all = torch.stack([ImageSource.from_file(vol_fl).images() for i in vol_i])
nparticles = np.array([kmeans_counts[i] for i in vol_i])
vol_i_mean = np.average(vol_i_all, axis=0, weights=nparticles)
vol_i_std = (
np.average((vol_i_all - vol_i_mean) ** 2, axis=0, weights=nparticles) ** 0.5
)
write_mrc(
f"{subdir}/state_{i}_mean.mrc", vol_i_mean.astype(np.float32), Apix=Apix
)
write_mrc(
f"{subdir}/state_{i}_std.mrc", vol_i_std.astype(np.float32), Apix=Apix
)

Thus vol_i_all became an iteration over the same volume vol_fl indexed by the current i, instead of iterating over all the volumes in the cluster as indexed by vol_i. This manifests itself most clearly in producing an empty volume under state_<i>_std.mrc.

This has been fixed at 6a1e93e, where we also improved the clarity of the code to make it more robust against future bugs:

for cluster_i in range(M):
vol_indices = np.where(labels == cluster_i)[0]
logger.info(f"State {cluster_i}: {len(vol_indices)} volumes")
if vol_ind is not None:
vol_indices = np.arange(K)[vol_ind][vol_indices]
vol_fls = [
os.path.join(kmean_dir, f"vol_{vol_start_index + vol_i:03d}.mrc")
for vol_i in vol_indices
]
vol_i_all = torch.stack(
[torch.Tensor(parse_mrc(vol_fl)[0]) for vol_fl in vol_fls]
)
nparticles = np.array([kmeans_counts[vol_i] for vol_i in vol_indices])
vol_i_mean = np.average(vol_i_all, axis=0, weights=nparticles)
vol_i_std = (
np.average((vol_i_all - vol_i_mean) ** 2, axis=0, weights=nparticles) ** 0.5
)

These changes will be part of the upcoming v3.4.3 release and are presently available through our beta release channel:

pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ 'cryodrgn<=3.4.3' --pre
@michal-g michal-g self-assigned this Nov 18, 2024
@michal-g michal-g changed the title Bug when calculating landscape analysis k-means mean volumes Bug when calculating landscape analysis agglomerative clustering mean volumes Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant