Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subset pixel channel averaging to remove bottleneck #823

Merged
merged 31 commits into from
Dec 7, 2022
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
6660f25
Initial implementation of pixel subsetting for channel averaging
alex-l-kong Nov 10, 2022
9106b99
Polish up remaining tests that rely on count checks for channel avera…
alex-l-kong Nov 10, 2022
5067dde
Add comment in compute_pixel_cluster_channel_avg documentation to cla…
alex-l-kong Nov 10, 2022
376874c
Merge branch 'main' into subset_channel_avg
alex-l-kong Nov 11, 2022
c360fc9
Merge branch 'main' into subset_channel_avg
alex-l-kong Nov 15, 2022
c3bccd6
Update FOV subset logic
alex-l-kong Nov 15, 2022
2d90461
Merge remote-tracking branch 'origin/subset_channel_avg' into subset_…
alex-l-kong Nov 15, 2022
493b1e1
Include check for fov_subset_proportion too low
alex-l-kong Nov 15, 2022
f4fb94d
Final checks
alex-l-kong Nov 15, 2022
426dbbb
Fix test for remapping
alex-l-kong Nov 15, 2022
f01943f
Increase FOV subset proportion
alex-l-kong Nov 15, 2022
dfb6a99
Add random seed to pytest
alex-l-kong Nov 15, 2022
86dad65
Merge branch 'main' into subset_channel_avg
alex-l-kong Nov 17, 2022
6f36cbd
Update pixel channel average function to take max subset of FOVs and …
alex-l-kong Nov 21, 2022
9ffb268
Add tests for subsetting more FOVs than there exist
alex-l-kong Nov 21, 2022
0120c40
Add test to verify warning is thrown if final number of clusters is l…
alex-l-kong Nov 21, 2022
146dd11
Merge branch 'main' into subset_channel_avg
alex-l-kong Nov 21, 2022
4d89145
Fix fov_subset_proportion to num_fovs_subset
alex-l-kong Nov 21, 2022
ef60b82
Clarify the warning message
alex-l-kong Nov 22, 2022
507748d
Merge branch 'main' into subset_channel_avg
alex-l-kong Nov 28, 2022
5262030
Initial commit of updated notebook process with summary file generati…
alex-l-kong Nov 29, 2022
9481753
Merge branch 'subset_channel_avg' of https://github.com/angelolab/ark…
alex-l-kong Nov 29, 2022
529ab6e
Purge new pipeline of bugs
alex-l-kong Nov 29, 2022
c2f56ea
Define stubs of new functions
alex-l-kong Nov 30, 2022
896b381
Get tests for the broken up pixel clustering functions in
alex-l-kong Nov 30, 2022
1555c89
Adjust existing meta cluster remapping tests so they don't include av…
alex-l-kong Nov 30, 2022
d9c2390
Documentation fix
alex-l-kong Nov 30, 2022
e30bf83
PYCODESTYLE
alex-l-kong Nov 30, 2022
4754b4a
Adjust test in notebooks_test.py for broken up pixel remapping function
alex-l-kong Nov 30, 2022
40b697f
Update error message for lost clusters in channel subsetting
alex-l-kong Dec 6, 2022
60be1a3
Merge branch 'main' into subset_channel_avg
alex-l-kong Dec 6, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
- stage: pytest_run
script:
- python -m pip install --editable .
- python -m pytest --cov=ark --pycodestyle ark
- python -m pytest --cov=ark --pycodestyle ark --randomly-seed=24 --randomly-dont-reorganize
- stage: test_pypi_deploy
if: tag IS present
script:
Expand Down
43 changes: 36 additions & 7 deletions ark/phenotyping/pixel_cluster_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import feather
import numpy as np
import pandas as pd
import random
import scipy.ndimage as ndimage
from pyarrow.lib import ArrowInvalid
from skimage.io import imread, imsave
Expand Down Expand Up @@ -288,8 +289,11 @@ def filter_with_nuclear_mask(fovs, tiff_dir, seg_dir, channel,


def compute_pixel_cluster_channel_avg(fovs, channels, base_dir, pixel_cluster_col,
pixel_data_dir='pixel_mat_data', keep_count=False):
"""Compute the average channel values across each pixel SOM cluster
pixel_data_dir='pixel_mat_data',
fov_subset_proportion=0.1, keep_count=False):
"""Compute the average channel values across each pixel SOM cluster.

To improve performance, number of FOVs is downsampled by `fov_subset_proportion`

Args:
fovs (list):
Expand All @@ -302,6 +306,8 @@ def compute_pixel_cluster_channel_avg(fovs, channels, base_dir, pixel_cluster_co
Name of the column to group by
pixel_data_dir (str):
Name of the directory containing the pixel data with cluster labels
fov_subset_proportion (float):
The proportion of FOVs to take, truncated to nearest int
keep_count (bool):
Whether to keep the count column when aggregating or not
This should only be set to `True` for visualization purposes
Expand All @@ -317,10 +323,16 @@ def compute_pixel_cluster_channel_avg(fovs, channels, base_dir, pixel_cluster_co
valid_cluster_cols=['pixel_som_cluster', 'pixel_meta_cluster']
)

# define the cluster averages DataFrame
cluster_avgs = pd.DataFrame()
# define a list to hold the cluster averages for each FOV
fov_cluster_avgs = []

for fov in fovs:
# subset number of FOVs per fov_subset_proportion
fovs_sub = random.sample(fovs, int(len(fovs) * fov_subset_proportion))

if len(fovs_sub) == 0:
raise ValueError("fov_subset_proportion is too low, please increase")

for fov in fovs_sub:
# read in the fovs data
try:
fov_pixel_data = feather.read_dataframe(
Expand All @@ -343,8 +355,10 @@ def compute_pixel_cluster_channel_avg(fovs, channels, base_dir, pixel_cluster_co
sum_by_cluster, count_by_cluster, left_index=True, right_index=True
).reset_index()

# concat the results together
cluster_avgs = pd.concat([cluster_avgs, agg_results])
# append the result to cluster_avgs
fov_cluster_avgs.append(agg_results)

cluster_avgs = pd.concat(fov_cluster_avgs)

# reset the index of cluster_avgs for consistency
cluster_avgs = cluster_avgs.reset_index(drop=True)
Expand Down Expand Up @@ -912,6 +926,7 @@ def cluster_pixels(fovs, channels, base_dir, data_dir='pixel_mat_data',
norm_vals_name='post_rowsum_chan_norm.feather',
weights_name='pixel_weights.feather',
pc_chan_avg_som_cluster_name='pixel_channel_avg_som_cluster.csv',
fov_subset_proportion=0.1,
multiprocess=False, batch_size=5, ncores=multiprocessing.cpu_count() - 1):
"""Uses trained weights to assign cluster labels on full pixel data
Saves data with cluster labels to `cluster_dir`. Computes and saves the average channel
Expand All @@ -932,6 +947,9 @@ def cluster_pixels(fovs, channels, base_dir, data_dir='pixel_mat_data',
The name of the weights file created by `train_pixel_som`
pc_chan_avg_som_cluster_name (str):
The name of the file to save the average channel expression across all SOM clusters
fov_subset_proportion (float):
The proportion of FOVs to take for SOM cluster channel averaging,
truncated to nearest int
multiprocess (bool):
Whether to use multiprocessing or not
batch_size (int):
Expand Down Expand Up @@ -1043,6 +1061,7 @@ def cluster_pixels(fovs, channels, base_dir, data_dir='pixel_mat_data',
base_dir,
'pixel_som_cluster',
data_dir,
fov_subset_proportion=fov_subset_proportion,
keep_count=True
)

Expand All @@ -1058,6 +1077,7 @@ def pixel_consensus_cluster(fovs, channels, base_dir, max_k=20, cap=3,
pc_chan_avg_som_cluster_name='pixel_channel_avg_som_cluster.csv',
pc_chan_avg_meta_cluster_name='pixel_channel_avg_meta_cluster.csv',
clust_to_meta_name='pixel_clust_to_meta.feather',
fov_subset_proportion=0.1,
multiprocess=False, batch_size=5,
ncores=multiprocessing.cpu_count() - 1, seed=42):
"""Run consensus clustering algorithm on pixel-level summed data across channels
Expand Down Expand Up @@ -1085,6 +1105,9 @@ def pixel_consensus_cluster(fovs, channels, base_dir, max_k=20, cap=3,
Name of file to save the channel-averaged results across all meta clusters to
clust_to_meta_name (str):
Name of file storing the SOM cluster to meta cluster mapping
fov_subset_proportion (float):
The proportion of FOVs to take for meta cluster channel averaging,
truncated to nearest int
multiprocess (bool):
Whether to use multiprocessing or not
batch_size (int):
Expand Down Expand Up @@ -1160,6 +1183,7 @@ def pixel_consensus_cluster(fovs, channels, base_dir, max_k=20, cap=3,
base_dir,
'pixel_meta_cluster',
data_dir,
fov_subset_proportion=fov_subset_proportion,
keep_count=True
)

Expand Down Expand Up @@ -1243,6 +1267,7 @@ def apply_pixel_meta_cluster_remapping(fovs, channels, base_dir,
pixel_remapped_name,
pc_chan_avg_som_cluster_name,
pc_chan_avg_meta_cluster_name,
fov_subset_proportion=0.1,
multiprocess=False, batch_size=5):
"""Apply the meta cluster remapping to the data in `pixel_consensus_dir`.

Expand All @@ -1268,6 +1293,9 @@ def apply_pixel_meta_cluster_remapping(fovs, channels, base_dir,
Name of the file containing the channel-averaged results across all SOM clusters
pc_chan_avg_meta_cluster_name (str):
Name of the file containing the channel-averaged results across all meta clusters
fov_subset_proportion (float):
The proportion of FOVs to take for SOM cluster channel averaging,
truncated to nearest int
multiprocess (bool):
Whether to use multiprocessing or not
batch_size (int):
Expand Down Expand Up @@ -1386,6 +1414,7 @@ def apply_pixel_meta_cluster_remapping(fovs, channels, base_dir,
base_dir,
'pixel_meta_cluster',
pixel_data_dir,
fov_subset_proportion=fov_subset_proportion,
keep_count=True
)
pixel_channel_avg_meta_cluster['pixel_meta_cluster_rename'] = \
Expand Down
46 changes: 25 additions & 21 deletions ark/phenotyping/pixel_cluster_utils_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -633,13 +633,20 @@ def test_compute_pixel_cluster_channel_avg(cluster_col, keep_count, corrupt):
num_repeats = 10
result = np.repeat(np.array([[0.1, 0.2, 0.3]]), repeats=num_repeats, axis=0)

# test fov_subset_proportion too low
with pytest.raises(ValueError):
pixel_cluster_utils.compute_pixel_cluster_channel_avg(
fovs, chans, temp_dir, cluster_col,
'pixel_mat_consensus', fov_subset_proportion=1 / 10, keep_count=keep_count
)

# compute pixel cluster average matrix
cluster_avg = pixel_cluster_utils.compute_pixel_cluster_channel_avg(
fovs, chans, temp_dir, cluster_col,
'pixel_mat_consensus', keep_count=keep_count
'pixel_mat_consensus', fov_subset_proportion=1 / 3, keep_count=keep_count
)

# define the columns to check in cluster_avg, count may also be included
# define the columns to check in cluster_avg
cluster_avg_cols = chans[:]

# verify the provided channels and the channels in cluster_avg are exactly the same
Expand All @@ -648,20 +655,13 @@ def test_compute_pixel_cluster_channel_avg(cluster_col, keep_count, corrupt):
provided_chans=chans
)

# if keep_count is true then add the counts
# NOTE: subtract out the corrupted counts if specified
# assert count column adds up to just one FOV sampled
if keep_count:
if cluster_col == 'pixel_som_cluster':
counts = 20 if corrupt else 30
else:
counts = 200 if corrupt else 300

count_col = np.expand_dims(np.repeat(counts, repeats=result.shape[0]), axis=1)
result = np.append(result, count_col, 1)

cluster_avg_cols.append('count')
assert cluster_avg['count'].sum() == 1000

# assert all elements of cluster_avg and the actual result are equal
# assert all the rows equal [0.1, 0.2, 0.3]
num_repeats = cluster_avg.shape[0]
result = np.repeat(np.array([[0.1, 0.2, 0.3]]), repeats=num_repeats, axis=0)
assert np.array_equal(result, np.round(cluster_avg[cluster_avg_cols].values, 1))


Expand Down Expand Up @@ -1541,7 +1541,7 @@ def test_pixel_consensus_cluster(mocker):

# compute averages by cluster, this happens before call to R
cluster_avg = pixel_cluster_utils.compute_pixel_cluster_channel_avg(
fovs, chans, temp_dir, 'pixel_som_cluster'
fovs, chans, temp_dir, 'pixel_som_cluster', fov_subset_proportion=1 / 3
)

# save the DataFrame
Expand All @@ -1564,10 +1564,11 @@ def test_pixel_consensus_cluster(mocker):
'pixel_mat_data',
fov + '.feather'))

# assert we didn't modify the cluster column in the consensus clustered results
assert np.all(
fov_data[fov]['pixel_som_cluster'].values ==
fov_consensus_data['pixel_som_cluster'].values
# assert all assigned SOM cluster values contained in original fov-data
# NOTE: can't test exact values because of randomization of channel averaging
misc_utils.verify_in_list(
assigned_som_values=fov_consensus_data['pixel_som_cluster'].unique(),
valid_som_values=fov_data[fov]['pixel_som_cluster']
)

# assert we didn't assign any cluster 20 or above
Expand Down Expand Up @@ -1843,6 +1844,7 @@ def test_apply_pixel_meta_cluster_remapping_base(multiprocess):
'sample_pixel_remapping.csv',
'sample_pixel_som_cluster_chan_avgs.csv',
'sample_pixel_meta_cluster_chan_avgs.csv',
fov_subset_proportion=1 / 3,
multiprocess=multiprocess
)

Expand Down Expand Up @@ -1903,8 +1905,9 @@ def test_apply_pixel_meta_cluster_remapping_base(multiprocess):
np.round(sample_pixel_channel_avg_meta_cluster[chans].values, 1) == result
)

# assert the counts data has been updated correctly
assert np.all(sample_pixel_channel_avg_meta_cluster['count'].values == 150)
# assert the total counts add up to 1000 (number in 1 FOV)
# NOTE: we can't test specific count values due to randomization of channel averaging
assert sample_pixel_channel_avg_meta_cluster['count'].sum() == 1000

# assert the correct metacluster labels are contained
sample_pixel_channel_avg_meta_cluster = sample_pixel_channel_avg_meta_cluster.sort_values(
Expand Down Expand Up @@ -1968,6 +1971,7 @@ def test_apply_pixel_meta_cluster_remapping_temp_corrupt(multiprocess, capsys):
'sample_pixel_remapping.csv',
'sample_pixel_som_cluster_chan_avgs.csv',
'sample_pixel_meta_cluster_chan_avgs.csv',
fov_subset_proportion=1 / 2,
multiprocess=multiprocess
)

Expand Down
16 changes: 15 additions & 1 deletion ark/utils/notebooks_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,21 @@ def test_pixel_apply_remap(self):

notebooks_test_utils.create_pixel_remap_files(self.base_dir, pixel_meta_cluster_remap)

self.tb.execute_cell("pixel_apply_remap")
remap_inject = """
pixel_cluster_utils.apply_pixel_meta_cluster_remapping(
fovs,
channels,
base_dir,
pixel_data_dir,
pixel_meta_cluster_remap_name,
pc_chan_avg_som_cluster_name,
pc_chan_avg_meta_cluster_name,
multiprocess=multiprocess,
fov_subset_proportion=0.5,
batch_size=batch_size
)
"""
self.tb.inject(remap_inject, "pixel_apply_remap")

def test_pixel_cmap_gen(self):
self.tb.execute_cell("pixel_cmap_gen")
Expand Down
1 change: 1 addition & 0 deletions requirements-test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ pytest-cases>=3.6.0,<4
pytest-cov>=3.0.0,<4
pytest-mock<4.0.0
pytest-pycodestyle>=2.3.0,<3.0
pytest-randomly>=3.12.0,<4.0
pytest-asyncio>=0.18.1,<1.0
six>=1.16.0,<2.00
testbook>=0.4.2,<1.0
Expand Down