Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove intermediate _clustered/_consensus directories/files from pixel and cell clustering pipeline #586

Merged
merged 20 commits into from
Jun 10, 2022
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
f4fc600
Remove pixel_mat_clustered and pixel_mat_consensus directories
alex-l-kong Jun 3, 2022
6e3a6bc
Update cell clustering to not have duplicate cluster and consensus files
alex-l-kong Jun 3, 2022
78679c4
First pass of updating tests
alex-l-kong Jun 3, 2022
e6de9f8
Final pass updating tests to account for one home directory for pixel…
alex-l-kong Jun 3, 2022
dd57a86
Merge branch 'master' into condence_cluster_data
alex-l-kong Jun 3, 2022
2023f57
Remove unused argument from docstring of cell_consensus_cluster
alex-l-kong Jun 3, 2022
2cc1cb0
Merge branch 'condence_cluster_data' of https://github.com/angelolab/…
alex-l-kong Jun 3, 2022
37025d4
Same docfix needed for pixel_consensus_cluster
alex-l-kong Jun 3, 2022
9443175
Update notebook tests to account for new directory structure
alex-l-kong Jun 3, 2022
1fc1b26
Merge branch 'master' into condence_cluster_data
alex-l-kong Jun 6, 2022
1103fd0
Merge branch 'master' of https://github.com/angelolab/ark-analysis in…
alex-l-kong Jun 6, 2022
4ea01f9
Merge branch 'condence_cluster_data' of https://github.com/angelolab/…
alex-l-kong Jun 6, 2022
deeecbb
Commit updated cell clustering pipeline
alex-l-kong Jun 7, 2022
a6f0ffa
Reset cell_cluster_prefix back to None
alex-l-kong Jun 7, 2022
ccbc09d
Remove old commented code in example_cell_clustering.ipynb
alex-l-kong Jun 8, 2022
b802c64
Merge branch 'master' into condence_cluster_data
alex-l-kong Jun 8, 2022
8c482f6
Merge branch 'master' into condence_cluster_data
alex-l-kong Jun 8, 2022
9239fc0
Merge branch 'master' into condence_cluster_data
alex-l-kong Jun 9, 2022
b3b36d1
Fix merge conflict (pre_dir to data_dir)
alex-l-kong Jun 9, 2022
5615666
Remove extraneous string formatter for norm vals name
alex-l-kong Jun 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 8 additions & 12 deletions ark/phenotyping/cell_consensus_cluster.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,13 @@
# defined as the mean counts of each SOM pixel/meta cluster across all cell SOM clusters in each fov
# (m x n table, where m is the number of cell SOM/meta clusters and n is the number of pixel SOM/meta clusters).

# Usage: Rscript {pixelClusterCol} {maxK} {cap} {cellClusterPath} {clusterAvgPath} {cellConsensusPath} {clustToMeta} {seed}
# Usage: Rscript {pixelClusterCol} {maxK} {cap} {cellMatPath} {clusterAvgPath} {clustToMeta} {seed}

# - pixelClusterCol: the prefix of the columns defining pixel SOM/meta cluster counts per cell
# - maxK: number of consensus clusters
# - cap: maximum z-score cutoff
# - cellClusterPath: path to the cell-level data containing the counts of each SOM pixel/meta clusters per cell, labeled with cell SOM clusters
# - cellMatPath: path to the cell-level data containing the counts of each SOM pixel/meta clusters per cell, labeled with cell SOM clusters
# - clusterAvgPath: path to the averaged cell data table (as defined above)
# - cellConsensusPath: path to file where the cell consensus cluster results will be written
# - clustToMeta: path to file where the SOM cluster to meta cluster mapping will be written
# - seed: random factor

Expand All @@ -29,20 +28,17 @@ maxK <- strtoi(args[2])
# get z-score scaling factor
cap <- strtoi(args[3])

# get the cell cluster path
cellClusterPath <- args[4]
# get the path to the cell data (with SOM labels)
cellMatPath <- args[4]

# get path to the averaged cluster data
clusterAvgPath <- args[5]

# get consensus cluster write path
cellConsensusPath <- args[6]

# get the clust to meta write path
clustToMeta <- args[7]
clustToMeta <- args[6]

# set the random seed
seed <- strtoi(args[8])
seed <- strtoi(args[7])
set.seed(seed)

print("Reading cluster averaged data")
Expand All @@ -66,9 +62,9 @@ names(som_to_meta_map) <- clusterAvgs$cell_som_cluster

# append cell_meta_cluster to data
print("Writing consensus clustering")
cellClusterData <- arrow::read_feather(cellClusterPath)
cellClusterData <- arrow::read_feather(cellMatPath)
cellClusterData$cell_meta_cluster <- som_to_meta_map[as.character(cellClusterData$cell_som_cluster)]
arrow::write_feather(as.data.table(cellClusterData), cellConsensusPath)
arrow::write_feather(as.data.table(cellClusterData), cellMatPath)

# save the mapping from cell_som_cluster to cell_meta_cluster
print("Writing SOM to meta cluster mapping table")
Expand Down
23 changes: 9 additions & 14 deletions ark/phenotyping/pixel_consensus_cluster.R
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
# Runs consensus clustering on the pixel data averaged across all channels

# Usage: Rscript {fovs} {markers} {maxK} {cap} {pixelClusterDir} {clusterAvgPath} {pixelMatConsensus} {clustToMeta} {seed}
# Usage: Rscript {fovs} {markers} {maxK} {cap} {pixelMatDir} {clusterAvgPath} {clustToMeta} {seed}

# - fovs: list of fovs to cluster
# - markers: list of channel columns to use
# - maxK: number of consensus clusters
# - cap: max z-score cutoff
# - pixelClusterDir: path to the pixel data with SOM clusters
# - pixelMatDir: path to the pixel data with SOM clusters
# - clusterAvgPath: path to the averaged cluster data
# - pixelMatConsensus: path to file where the consensus cluster results will be written
# - clustToMeta: path to file where the SOM cluster to meta cluster mapping will be written
# - seed: random factor

Expand All @@ -31,20 +30,17 @@ maxK <- strtoi(args[3])
# get z-score scaling factor
cap <- strtoi(args[4])

# get path to the clustered pixel data
pixelClusterDir <- args[5]
# get path to the pixel data
pixelMatDir <- args[5]

# get path to the averaged channel data
clusterAvgPath <- args[6]

# get consensus clustered write path
pixelMatConsensus <- args[7]

# get the clust to meta write path
clustToMeta <- args[8]
clustToMeta <- args[7]

# set the random seed
seed <- strtoi(args[9])
seed <- strtoi(args[8])
set.seed(seed)

# read cluster averaged data
Expand All @@ -70,15 +66,14 @@ print("Writing consensus clustering results")
for (i in 1:length(fovs)) {
# read in pixel data, we'll need the cluster column for mapping
fileName <- file.path(fovs[i], "feather", fsep=".")
matPath <- file.path(pixelClusterDir, fileName)
matPath <- file.path(pixelMatDir, fileName)
fovPixelData <- arrow::read_feather(matPath)

# assign hierarchical cluster labels
fovPixelData$pixel_meta_cluster <- som_to_meta_map[as.character(fovPixelData$pixel_som_cluster)]

# write consensus clustered data
clusterPath <- file.path(pixelMatConsensus, fileName)
arrow::write_feather(as.data.table(fovPixelData), clusterPath)
# write consensus clustered data, overwrite original data with the same data with meta cluster label
arrow::write_feather(as.data.table(fovPixelData), matPath)

# print an update every 10 fovs
if (i %% 10 == 0) {
Expand Down
10 changes: 5 additions & 5 deletions ark/phenotyping/run_cell_som.R
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Assigns cluster labels to cell data using a trained SOM weights matrix

# Usage: Rscript run_cell_som.R {clusterCountsNormPath} {cellWeightsPath} {cellClusterNormPath}
# Usage: Rscript run_cell_som.R {clusterCountsNormPath} {cellWeightsPath} {cellMatNormPath}

# - clusterCountsNormPath: path to file with counts of unique cells (rows) by unique pixel SOM/meta clusters (columns), with counts normalized by cell size
# - cellWeightsPath: path to the SOM weights file
# - cellClusterNormPath: the path to write the normalized pixel SOM/meta cluster count data (normalized) with cell SOM labelss. This will be used for consensus clustering.
# - cellMatNormPath: the path to write the normalized pixel SOM/meta cluster count data (normalized) with cell SOM labelss. This will be used for consensus clustering.

library(arrow)
library(data.table)
Expand All @@ -19,8 +19,8 @@ clusterCountsPath <- args[1]
# get the weights write path
cellWeightsPath <- args[2]

# get the cluster write path (normalized)
cellClusterPathNorm <- args[3]
# get the data write path (normalized)
cellMatPathNorm <- args[3]

# read the cluster counts data (norm)
print("Reading the cluster counts data")
Expand Down Expand Up @@ -62,4 +62,4 @@ clusterCountsNorm$cell_som_cluster <- as.integer(clusters[,1])

# write to feather
print("Writing clustered data")
arrow::write_feather(as.data.table(clusterCountsNorm), cellClusterPathNorm)
arrow::write_feather(as.data.table(clusterCountsNorm), cellMatPathNorm)
11 changes: 3 additions & 8 deletions ark/phenotyping/run_pixel_som.R
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
# Assigns cluster labels to pixel data using a trained SOM weights matrix

# Usage: Rscript run_pixel_som.R {fovs} {pixelMatDir} {normValsPath} {pixelWeightsPath} {pixelClusterDir}
# Usage: Rscript run_pixel_som.R {fovs} {pixelMatDir} {normValsPath} {pixelWeightsPath}

# - fovs: list of fovs to cluster
# - pixelMatDir: path to directory containing the complete pixel data
# - normValsPath: path to the 99.9% normalization values file (created during preprocessing)
# - pixelWeightsPath: path to the SOM weights file
# - pixelClusterDir: path to directory where the clustered data will be written to

library(arrow)
library(data.table)
Expand All @@ -27,9 +26,6 @@ normValsPath <- args[3]
# get path to the weights
pixelWeightsPath <- args[4]

# get the cluster write path directory
pixelClusterDir <- args[5]

# read the weights
somWeights <- as.matrix(arrow::read_feather(pixelWeightsPath))

Expand Down Expand Up @@ -64,9 +60,8 @@ for (i in 1:length(fovs)) {
# assign cluster labels column to pixel data
fovPixelData$pixel_som_cluster <- as.integer(clusters[,1])

# write to feather
clusterPath <- file.path(pixelClusterDir, fileName)
arrow::write_feather(as.data.table(fovPixelData), clusterPath)
# write to feather, overwrite original data with the same data with SOM cluster label
arrow::write_feather(as.data.table(fovPixelData), matPath)

# print an update every 10 fovs
# TODO: find a way to capture sprintf to the console
Expand Down
Loading