Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1566 shape feature extract #381

Merged
merged 7 commits into from
Aug 24, 2023
Merged

1566 shape feature extract #381

merged 7 commits into from
Aug 24, 2023

Conversation

raylim
Copy link
Contributor

@raylim raylim commented Jul 31, 2023

No description provided.

@raylim raylim requested a review from armaank July 31, 2023 21:55
@armaank
Copy link
Collaborator

armaank commented Aug 3, 2023

This looks very good, good call on computing log ratios

A few items to address:

  1. You can remove all features extracted for Detection probability, these are unneeded and can be dropped from the fx extraction phase
  2. You can remove features associated w/ unclassified regions, those can be dropped from the fx extraction phase
  3. The whole slide features are redundant in this output format, since they are the same no matter the cell type, so we have double the number of features for whole slide features
  4. A lot of the whole slide features we probably don't need, here is a list of features that don't need to be extracted: anything related to bounding boxes (bbox), anything related to centroids, anything related to inertia tensors, whole slide label, anything related to moments, whole slide area in microns (we already have whole slide area measured in pixels, and it kind of makes sense to keep all of our measurements in pixels rather than just converting area to units of microns)

I also think it makes sense to restructure the output format. Even though it'll be a little wide, it's more convenient for downstream use to have all of these features in a single column, so the columns will be structured as variable_parent_class . That way for an entire cohort we can concat all of the single-row results into a dataframe suitable for downstream use in a scikit-learn or pytorch model.

@tomp tomp self-requested a review August 24, 2023 15:33
Copy link
Contributor

@tomp tomp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

raylim and others added 7 commits August 24, 2023 09:21
chore: 377, 375, 374, 379, 382 package versions

feat: extract_tile_shape_features: add options for limiting variables in final output, clean up output a bit

feat: add extra features to extract_tile_shape_features
This is just Armaan's original notebook, updated for the current
state of luna.  It still uses the CLIs.
1. Added function to pull output file names from metadata.yml files.
2. Cast Path objects to strings in a couple of places.
3. Check for both `aperio.MPP` and `openslide.mpp-x` in the slide
   image properties, and log a warning if neither is found.
4. Added new notebook to the tutorial docs
5. Renamed notebook to spatial_stats.ipynb
This lets us run a docs server on one of the compute servers and view
the docs site on our laptops.
@raylim raylim merged commit 7f37682 into dev Aug 24, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants