Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert output of cross-product bridge to be used in downstream functions #472

Merged
merged 14 commits into from
Jan 24, 2025

Conversation

dtopouza
Copy link
Contributor

Title: Convert output of cross-product bridge to be used in downstream functions

Problem: Output of olink_normalization_product is not suitable for input to downstream OA functions

Solution: Added a function to format cross-product bridging output for downstream analysis, and added an argument to olink_normalization to toggle it.

Key Features:

  • olink_normalization_product_format function

    • Input is olink_normalization_product result
    • Removes internal and external controls
    • Replaces SampleID with the concatenation of SampleID and Project to make unique sample IDs for downstream analysis
    • Removes non-bridgeable assays
    • Replaces the NPX values of the non-reference project by the Median Centered or QS Normalized NPX, according to the Bridging Recommendation.
    • Replaces OlinkID by the concatenation of the Explore HT and Explore 3072 OlinkIDs to record the OlinkIDs from both projects.
    • Removes BridgingRecommendation, MedianCenteredNPX, QSNormalizedNPX, OlinkID_E3072 columns
  • olink_normalization function

    • Adds the format argument, a Boolean that controls whether the normalized dataset will be formatted for input to downstream analysis. Only applicable for cross-product bridge normalization.

Checklist

  • Make sure you are requesting to pull a feature/bugfix branch (right side). This should not be main or develop.
  • Make sure you are make a pull request against either main or develop (left side). (Requesting to main should be reserved for bugfixes and new releases)
  • Add or update unit tests (if applicable)
  • Check your code with any unit tests (Run devtools::check() locally)
  • Add neccessary documentation (if applicable)

Type of changes

What type of changes does your code introduce?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue) (link the issue on the right)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Code style update (formatting, renaming)
  • Documentation Update
  • Other (explain)

dtopouza and others added 6 commits November 12, 2024 16:52
…k_normalization_product_format is run. Add argument to olink_normalization to toggle whether formatting is applied.
…k-Proteomics/OlinkRPackage into develop_product_bridging_downstream
dplyr::mutate(SampleID = paste0(.data[["SampleID"]],
"_",
.data[["Project"]])) |>
dplyr::filter(.data[["BridgingRecommendation"]] != "NotBridgeable") |>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removes NotBridgeable datapoints as expected per PR notes.

Question: do we want to remove these? Or can we keep the NPX as is in these cases and keep these lines in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NotBridgeable assays don't have a recommendation for MedianCenteredNPX or QSNormalizedNPX, I'm not sure how to include them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kathy suggested to keep them with the 3k NPX values and 3k OlinkID and the HT ones with just the HT OlinkID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not get this to work - downstream tests like t.test would throw an error, since the not bridgeable assay HT and 3072 OlinkIDs are only present in some samples and not others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved, NotBridgeable and NotOverlapping assays are retained

OlinkAnalyze/R/olink_normalization_product.R Show resolved Hide resolved
OlinkAnalyze/R/olink_normalization_product.R Show resolved Hide resolved
.data[["BridgingRecommendation"]] == "QuantileSmoothing" ~
.data[["QSNormalizedNPX"]],
.default = .data[["NPX"]])) |>
dplyr::filter(.data[["AssayType"]] == "assay") |>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removes internal controls as appropriate. Looks good.


### Keep the data following BridgingRecommendation
df_format <- df |>
dplyr::filter(.data[["SampleType"]] == "SAMPLE") |> # Remove controls
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removes external controls as appropriate. Looks good.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to leave in SCs? Might be a more complicated conversation that Kathy, Amrita, and I have touched base on. May be added functionality for a later release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCs have NA values for their QSNormalizedNPX, not sure how to include them. We would have to change the QS normalization function as well - may need to consult Kathy/Klev

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kathy suggested that we keep SCs out for now, and potentially include them in a future release once we update the QC normalization function

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, the current code reflects that external controls, including SCs, are removed from the formatted DF. Looks good.

OlinkAnalyze/R/olink_normalization_product.R Show resolved Hide resolved
dplyr::mutate(OlinkID = paste0(.data[["OlinkID"]],
"_",
.data[["OlinkID_E3072"]])) |>
dplyr::select(!c(.data[["BridgingRecommendation"]],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bridging rec, median normalized NPX, QS normalized NPX, and 3K OID columns removed as appropriate. HT OID column replaced with concatenated HT OID and 3K OID. Looks good.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests in this file (old and new) passed on my end - great job!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests (old and new) have once again passed, looks good.

.data[["QSNormalizedNPX"]],
.data[["OlinkID_E3072"]]))# Remove extra columns

return(df_format)
df_full <- rbind(df_format,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion / optional change: does it make sense to arrange the combined dataframe by Project or Project-SampleID? Not sure if this is super important to implement or something we could push to a later release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

#' OlinkID by the concatenation of the Explore HT and Explore 3072 OlinkIDs to
#' record the OlinkIDs from both projects for bridgeable assays. Assays that are
#' NotBridgeable or NotOverlapping retain their original non-reference OlinkIDs
#' and NPX values. Replaces SampleID with the concatenation of SampleID and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that we reincorporate the NotOverlapping datapoints! I can make a mention of this in the 3K-HT vignette. No action needed on your end.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that NotOverlapping assays are incorporated into the formatted DF as expected, looks good.

@dtopouza dtopouza merged commit 2f285ae into develop Jan 24, 2025
14 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants