Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BPT-155] Spatial Data Refactor #104

Merged
merged 71 commits into from
Apr 5, 2024
Merged
Show file tree
Hide file tree
Changes from 56 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
f0f5329
Update README.md
ckmah Jul 12, 2023
7d3ec21
Merge pull request #103 from ckmah/download-badge
ckmah Jul 12, 2023
1d582ac
Delete Non-refactored Files
dylanclam12 Aug 8, 2023
22e1141
SpatialData Modified Query Script
dylanclam12 Aug 8, 2023
a7ff251
SpatialData Formatting in _io.py and _geometry.py
dylanclam12 Aug 8, 2023
efda092
_shape_features.py Refactor
dylanclam12 Aug 8, 2023
f1912d5
_point_features.py Refactor
dylanclam12 Aug 8, 2023
7ff7f69
_lp.py Refactor Part 1
dylanclam12 Aug 11, 2023
ed6e1e9
_lp.py Refactor Part 2
dylanclam12 Aug 15, 2023
e47a265
_neighborhoods.py Addition
dylanclam12 Aug 15, 2023
f83ff83
_utils.py sync_points function and minor fixes
dylanclam12 Aug 17, 2023
e724986
_flux.py Refactor
dylanclam12 Aug 22, 2023
cb29d2d
Removed Dask Functionality
dylanclam12 Aug 24, 2023
55766cf
_flux_enrichment.py Refactor
dylanclam12 Aug 28, 2023
7068c43
_colocation.py, _decomposition.py, _composition.py Refactor
dylanclam12 Aug 29, 2023
0a33e1b
_plotting.py, _colors.py, _layers.py, _utils.py Refactor
dylanclam12 Oct 5, 2023
f17bf6e
_lp.py, _multidimensional.py Plotting Refactor
dylanclam12 Oct 13, 2023
31e2fc5
Addressing review comments on PR [BPT-155]
dylanclam12 Jan 16, 2024
10f1b33
Parse refactor for _utils.py, _geometry.py, and _io.py
dylanclam12 Jan 30, 2024
49f81f8
Parse refactor for _shape_features.py
dylanclam12 Jan 30, 2024
0585f38
Parse refactor for _points_features.py
dylanclam12 Jan 30, 2024
b28217a
Parse refactor for _flux.py and _flux_enrichment.py
dylanclam12 Jan 30, 2024
5e8cee9
_signatures.py Refactor
dylanclam12 Jan 30, 2024
f1a3980
_colocation.py minor fix
dylanclam12 Jan 31, 2024
8e82f13
Remove cell_boundaries_key parameters
dylanclam12 Feb 20, 2024
49512aa
cleanup typos, delete misc code, comments
ckmah Feb 22, 2024
1562b8d
cleanup formatting, fix docstring return info
ckmah Feb 22, 2024
0d5f7e7
shape sjoin index added to cell_boundaries df; use full shape names
ckmah Feb 22, 2024
dbdc570
cleanup format with ruff
ckmah Feb 22, 2024
ae5ef79
sjoins require key for cell shape
ckmah Feb 22, 2024
2ad8165
use full cell shape key for prefixing shape feature keys
ckmah Feb 22, 2024
35d07ae
sync sjoin data types, restrict points to 2D
ckmah Feb 28, 2024
0e6a929
remove shape prefixing
ckmah Feb 28, 2024
f790681
remove import unused to_tensor fn
ckmah Feb 28, 2024
b48f167
use instance_key to track cell shape
ckmah Feb 29, 2024
ec9ce51
test_io seems to work, test_shape_features wip, commented out other t…
ckmah Mar 1, 2024
a24e171
cast index to str always
ckmah Mar 19, 2024
26b1785
fixed point and shape sync (untested)
ckmah Mar 19, 2024
e90076a
save instance_key to proper points attr field
ckmah Mar 19, 2024
f9d3ccb
Merge remote-tracking branch 'origin/sjoin-index-type-fix' into spati…
dylanclam12 Mar 19, 2024
e13e73e
_geometry.py setters and getters
dylanclam12 Mar 20, 2024
d1a253d
test_io.py
dylanclam12 Mar 20, 2024
f234817
_shape_feature.py remove overwrite
dylanclam12 Mar 20, 2024
d7b7424
_shape_features.py remove overwrite
dylanclam12 Mar 20, 2024
c966c93
Merge remote-tracking branch 'origin/spatialdata_refactor' into spati…
dylanclam12 Mar 20, 2024
f9d5b7d
_shape_features.py Tests
dylanclam12 Mar 20, 2024
e73470b
_points_features.py: shape_names --> shape_keys
dylanclam12 Mar 22, 2024
06b615a
_points_features.py: instance_key
dylanclam12 Mar 22, 2024
abc9f51
_points_features.py: shape_key_index
dylanclam12 Mar 22, 2024
0470639
_point_features.py: analyze_points refactor
dylanclam12 Mar 22, 2024
495fdec
test_point_features.py
dylanclam12 Mar 22, 2024
cc2113e
simplifying test_point_features.py
dylanclam12 Mar 22, 2024
4eb7435
simplifying test_shape_features.py
dylanclam12 Mar 22, 2024
f0dfe93
Change PATTERN_FEATURES
dylanclam12 Mar 22, 2024
bf5f276
_lp.py: instance_key
dylanclam12 Mar 22, 2024
ebf3e36
_lp.py Tests
dylanclam12 Mar 22, 2024
9fa35a3
_geometry.py: set_metadata documentation
dylanclam12 Mar 28, 2024
79cb21e
_flux.py: instance_key
dylanclam12 Mar 28, 2024
f6e44af
_neighborhoods.py: feature_name
dylanclam12 Mar 28, 2024
0fbd7a0
_flux.py Tests
dylanclam12 Mar 28, 2024
cc18e88
_flux_enrichment.py: set_points_metadata
dylanclam12 Apr 1, 2024
a179937
_flux_enrichement.py: instance_key
dylanclam12 Apr 1, 2024
9034dd5
_flux_enrichment.py Tests
dylanclam12 Apr 1, 2024
dad41c4
remove import random test_flux.py and test_flux_enrichment.py
dylanclam12 Apr 1, 2024
285ee56
_colocation.py: colocation instance_key and feature_key
dylanclam12 Apr 1, 2024
48c8d62
_colocation.py: coloc_quotient keys
dylanclam12 Apr 1, 2024
411f62f
_colocation.py Tests
dylanclam12 Apr 1, 2024
30ecb71
_geometry.py Tests
dylanclam12 Apr 5, 2024
ee1f007
cleanup misc constants
ckmah Apr 5, 2024
8fdfdf1
Merge branch 'v2.1' into spatialdata_refactor
ckmah Apr 5, 2024
fcb72ec
more merge conflicts
ckmah Apr 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file modified .github/workflows/publish_pypi.yml
100755 → 100644
Empty file.
Empty file modified .github/workflows/python-package.yml
100755 → 100644
Empty file.
Empty file modified .gitignore
100755 → 100644
Empty file.
Empty file modified .readthedocs.yml
100755 → 100644
Empty file.
Empty file modified MANIFEST.in
100755 → 100644
Empty file.
1 change: 1 addition & 0 deletions README.md
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
[![PyPI version](https://badge.fury.io/py/bento-tools.svg)](https://badge.fury.io/py/bento-tools)
[![codecov](https://codecov.io/gh/ckmah/bento-tools/branch/master/graph/badge.svg?token=XVHDKNDCDT)](https://codecov.io/gh/ckmah/bento-tools)
[![Documentation Status](https://readthedocs.org/projects/bento-tools/badge/?version=latest)](https://bento-tools.readthedocs.io/en/latest/?badge=latest)
[![Downloads](https://static.pepy.tech/badge/bento-tools)](https://pepy.tech/project/bento-tools)
![PyPI - Downloads](https://img.shields.io/pypi/dm/bento-tools)
[![GitHub stars](https://badgen.net/github/stars/ckmah/bento-tools)](https://github.com/Naereen/ckmah/bento-tools)

Expand Down
3 changes: 1 addition & 2 deletions bento/__init__.py
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
from . import datasets as ds
from . import io
from . import plotting as pl
from . import tools as tl
from . import _utils as ut
from . import geometry as geo
from . import query as qy
from .plotting import _colors as colors
from ._utils import sync
14 changes: 7 additions & 7 deletions bento/_constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,17 @@
PATTERN_NAMES = ["cell_edge", "cytoplasmic", "none", "nuclear", "nuclear_edge"]
PATTERN_PROBS = [f"{p}_p" for p in PATTERN_NAMES]
PATTERN_FEATURES = [
"cell_inner_proximity",
"nucleus_inner_proximity",
"nucleus_outer_proximity",
"cell_inner_asymmetry",
"nucleus_inner_asymmetry",
"nucleus_outer_asymmetry",
"cell_boundaries_inner_proximity",
"nucleus_boundaries_inner_proximity",
"nucleus_boundaries_outer_proximity",
"cell_boundaries_inner_asymmetry",
"nucleus_boundaries_inner_asymmetry",
"nucleus_boundaries_outer_asymmetry",
"l_max",
"l_max_gradient",
"l_min_gradient",
"l_monotony",
"l_half_radius",
"point_dispersion_norm",
"nucleus_dispersion_norm",
"nucleus_boundaries_dispersion_norm",
]
322 changes: 0 additions & 322 deletions bento/_utils.py
Original file line number Diff line number Diff line change
@@ -1,322 +0,0 @@
import inspect
import warnings
import geopandas as gpd
import pandas as pd
import seaborn as sns
from anndata import AnnData
from functools import wraps
from typing import Iterable
from shapely import wkt


def get_default_args(func):
signature = inspect.signature(func)
return {
k: v.default
for k, v in signature.parameters.items()
if v.default is not inspect.Parameter.empty
}


def track(func):
"""
Track changes in AnnData object after applying function.

1. First remembers a shallow list of AnnData attributes by listing keys from obs, var, etc.
2. Perform arbitrary task
3. List attributes again, perform simple diff between list of old and new attributes
4. Print to user added and removed keys

Parameters
----------
func : function
"""

@wraps(func)
def wrapper(*args, **kwds):
kwargs = get_default_args(func)
kwargs.update(kwds)

if type(args[0]) == AnnData:
adata = args[0]
else:
adata = args[1]

old_attr = list_attributes(adata)

if kwargs["copy"]:
out_adata = func(*args, **kwds)
new_attr = list_attributes(out_adata)
else:
func(*args, **kwds)
new_attr = list_attributes(adata)

# Print differences between new and old adata
out = ""
out += "AnnData object modified:"

if old_attr["n_obs"] != new_attr["n_obs"]:
out += f"\nn_obs: {old_attr['n_obs']} -> {new_attr['n_obs']}"

if old_attr["n_vars"] != new_attr["n_vars"]:
out += f"\nn_vars: {old_attr['n_vars']} -> {new_attr['n_vars']}"

modified = False
for attr in old_attr.keys():
if attr == "n_obs" or attr == "n_vars":
continue

removed = list(old_attr[attr] - new_attr[attr])
added = list(new_attr[attr] - old_attr[attr])

if len(removed) > 0 or len(added) > 0:
modified = True
out += f"\n {attr}:"
if len(removed) > 0:
out += f"\n - {', '.join(removed)}"
if len(added) > 0:
out += f"\n + {', '.join(added)}"

if modified:
print(out)

return out_adata if kwargs["copy"] else None

return wrapper


def list_attributes(adata):
"""Traverse AnnData object attributes and list keys.

Parameters
----------
adata : AnnData
AnnData object

Returns
-------
dict
Dictionary of keys for each AnnData attribute.
"""
found_attr = dict(n_obs=adata.n_obs, n_vars=adata.n_vars)
for attr in [
"obs",
"var",
"uns",
"obsm",
"varm",
"layers",
"obsp",
"varp",
]:
keys = set(getattr(adata, attr).keys())
found_attr[attr] = keys

return found_attr


def pheno_to_color(pheno, palette):
"""
Maps list of categorical labels to a color palette.
Input values are first sorted alphanumerically least to greatest before mapping to colors.
This ensures consistent colors regardless of input value order.

Parameters
----------
pheno : pd.Series
Categorical labels to map
palette: None, string, or sequence, optional
Name of palette or None to return current palette.
If a sequence, input colors are used but possibly cycled and desaturated.
Taken from sns.color_palette() documentation.

Returns
-------
dict
Mapping of label to color in RGBA
tuples
List of converted colors for each sample, formatted as RGBA tuples.

"""
if isinstance(palette, str):
palette = sns.color_palette(palette)

values = list(set(pheno))
values.sort()
palette = sns.color_palette(palette, n_colors=len(values))
study2color = dict(zip(values, palette))
sample_colors = [study2color[v] for v in pheno]
return study2color, sample_colors


def sync(data, copy=False):
"""
Sync existing point sets and associated metadata with data.obs_names and data.var_names

Parameters
----------
data : AnnData
Spatial formatted AnnData object
copy : bool, optional
"""
adata = data.copy() if copy else data

if "point_sets" not in adata.uns.keys():
adata.uns["point_sets"] = dict(points=[])

# Iterate over point sets
for point_key in adata.uns["point_sets"]:
points = adata.uns[point_key]

# Subset for cells
cells = adata.obs_names.tolist()
in_cells = points["cell"].isin(cells)

# Subset for genes
in_genes = [True] * points.shape[0]
if "gene" in points.columns:
genes = adata.var_names.tolist()
in_genes = points["gene"].isin(genes)

# Combine boolean masks
valid_mask = (in_cells & in_genes).values

# Sync points using mask
points = points.loc[valid_mask]

# Remove unused categories for categorical columns
for col in points.columns:
if points[col].dtype == "category":
points[col].cat.remove_unused_categories(inplace=True)

adata.uns[point_key] = points

# Sync point metadata using mask
for metadata_key in adata.uns["point_sets"][point_key]:
if metadata_key not in adata.uns:
warnings.warn(
f"Skipping: metadata {metadata_key} not found in adata.uns"
)
continue

metadata = adata.uns[metadata_key]
# Slice DataFrame if not empty
if isinstance(metadata, pd.DataFrame) and not metadata.empty:
adata.uns[metadata_key] = metadata.loc[valid_mask, :]

# Slice Iterable if not empty
elif isinstance(metadata, list) and any(metadata):
adata.uns[metadata_key] = [
m for i, m in enumerate(metadata) if valid_mask[i]
]
elif isinstance(metadata, Iterable) and metadata.shape[0] > 0:
adata.uns[metadata_key] = adata.uns[metadata_key][valid_mask]
else:
warnings.warn(f"Metadata {metadata_key} is not a DataFrame or Iterable")

return adata if copy else None


def _register_points(data, point_key, metadata_keys):
required_cols = ["x", "y", "cell"]

if point_key not in data.uns.keys():
raise ValueError(f"Key {point_key} not found in data.uns")

points = data.uns[point_key]

if not all([col in points.columns for col in required_cols]):
raise ValueError(
f"Point DataFrame must have columns {', '.join(required_cols)}"
)

# Check for valid cells
cells = data.obs_names.tolist()
if not points["cell"].isin(cells).all():
raise ValueError("Invalid cells in point DataFrame")

# Initialize/add to point registry
if "point_sets" not in data.uns.keys():
data.uns["point_sets"] = dict()

if point_key not in data.uns["point_sets"].keys():
data.uns["point_sets"][point_key] = []

if len(metadata_keys) < 0:
return

# Register metadata
for key in metadata_keys:
# Check for valid metadata
if key not in data.uns.keys():
raise ValueError(f"Key {key} not found in data.uns")

n_points = data.uns[point_key].shape[0]
metadata_len = data.uns[key].shape[0]
if metadata_len != n_points:
raise ValueError(
f"Metadata {key} must have same length as points {point_key}"
)

# Add metadata key to registry
if key not in data.uns["point_sets"][point_key]:
data.uns["point_sets"][point_key].append(key)


def register_points(point_key: str, metadata_keys: list):
"""Decorator function to register points to the current `AnnData` object.
This keeps track of point sets and keeps them in sync with `AnnData` object.

Parameters
----------
point_key : str
Key where points are stored in `data.uns`
metadata_keys : list
Keys where point metadata are stored in `data.uns`
"""

def decorator(func):
@wraps(func)
def wrapper(*args, **kwds):
kwargs = get_default_args(func)
kwargs.update(kwds)

func(*args, **kwds)
data = args[0]
# Check for required columns
return _register_points(data, point_key, metadata_keys)

return wrapper

return decorator


def sc_format(data, copy=False):
"""
Convert data.obs GeoPandas columns to string for compatibility with scanpy.
"""
adata = data.copy() if copy else data

shape_names = data.obs.columns.str.endswith("_shape")

for col in data.obs.columns[shape_names]:
adata.obs[col] = adata.obs[col].astype(str)

return adata if copy else None


def geo_format(data, copy=False):
"""
Convert data.obs scanpy columns to GeoPandas compatible types.
"""
adata = data.copy() if copy else data

shape_names = adata.obs.columns[adata.obs.columns.str.endswith("_shape")]

adata.obs[shape_names] = adata.obs[shape_names].apply(
lambda col: gpd.GeoSeries(
col.astype(str).apply(lambda val: wkt.loads(val) if val != "None" else None)
)
)

return adata if copy else None
5 changes: 0 additions & 5 deletions bento/datasets/__init__.py

This file was deleted.

Loading