-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Enrichment and pathway scatter-plots #136
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis pull request updates the environment dependencies and extends visualization functionalities in the workflow. The Changes
Sequence Diagram(s)sequenceDiagram
participant R as Snakemake Rule
participant E as Conda Environment (pystats.yaml)
participant S as enrichment_pathway_scatter.py
participant O as Scatter Plot JSON
R->>E: Activate environment
E->>S: Execute scatter plot generation (TSV input)
S->>O: Produce JSON output
Suggested reviewers
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
workflow/scripts/enrichment_pathway_scatter.py (1)
60-61
: Remove unnecessary f-string.The f-string prefix is used but there are no placeholders in the string.
- color=f"{name}", + color=name,🧰 Tools
🪛 Ruff (0.8.2)
61-61: f-string without any placeholders
Remove extraneous
f
prefix(F541)
workflow/envs/pystats.yaml (1)
9-9
: Add newline at end of file.YAML files should end with a newline character.
Add a newline after the last line.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 9-9: no new line character at the end of file
(new-line-at-end-of-file)
workflow/resources/datavzrd/spia-template.yaml (2)
122-127
: New Scatter Plot View: Validate Configuration and Correct Typo.
The newpathways_scatter
view is successfully added to enable an interactive scatter plot visualization based on the SPIA table dataset. The configuration—including the dataset assignment and therender-plot
spec-path—is consistent with the intended purpose.
Minor Suggestion: In the description, consider correcting the typographical error “pertubation” to “perturbation” for clarity.
128-128
: Clean Up Extra Blank Line.
YAMLlint flagged an extra blank line on line 128. Please remove this unnecessary blank line to adhere to file formatting guidelines.🧰 Tools
🪛 YAMLlint (1.35.1)
[warning] 128-128: too many blank lines
(1 > 0) (empty-lines)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
workflow/envs/pystats.yaml
(1 hunks)workflow/resources/datavzrd/go-enrichment-template.yaml
(1 hunks)workflow/resources/datavzrd/spia-template.yaml
(1 hunks)workflow/rules/datavzrd.smk
(3 hunks)workflow/scripts/enrichment_pathway_scatter.py
(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
workflow/envs/pystats.yaml
[error] 9-9: no new line character at the end of file
(new-line-at-end-of-file)
workflow/resources/datavzrd/spia-template.yaml
[warning] 128-128: too many blank lines
(1 > 0) (empty-lines)
🪛 Ruff (0.8.2)
workflow/scripts/enrichment_pathway_scatter.py
61-61: f-string without any placeholders
Remove extraneous f
prefix
(F541)
121-121: Undefined name snakemake
(F821)
123-123: Undefined name snakemake
(F821)
124-124: Undefined name snakemake
(F821)
125-125: Undefined name snakemake
(F821)
177-177: Undefined name snakemake
(F821)
🔇 Additional comments (6)
workflow/scripts/enrichment_pathway_scatter.py (3)
10-45
: LGTM! Well-designed visualization configuration.The plot configuration is well thought out with:
- Appropriate logarithmic scales for both axes
- Interactive point highlighting
- Clear tooltips with all relevant information
121-129
: LGTM! Clear and efficient data processing.The data processing logic is well-structured:
- Limiting to top 50 rows helps maintain visualization clarity
- Clear separation of positive and negative effects
🧰 Tools
🪛 Ruff (0.8.2)
121-121: Undefined name
snakemake
(F821)
123-123: Undefined name
snakemake
(F821)
124-124: Undefined name
snakemake
(F821)
125-125: Undefined name
snakemake
(F821)
133-177
: LGTM! Robust chart generation logic.The chart generation handles both single and dual chart cases well, with appropriate layout configuration and scale resolution.
🧰 Tools
🪛 Ruff (0.8.2)
177-177: Undefined name
snakemake
(F821)
workflow/rules/datavzrd.smk (1)
50-78
: LGTM! Well-structured new rules for scatter plot generation.The new rules are well-designed with:
- Clear input/output paths
- Appropriate conda environment specification
- Well-defined parameters for plot customization
workflow/envs/pystats.yaml (1)
7-9
: LGTM! Appropriate dependency updates for visualization features.The dependency updates are well-chosen:
- Altair 5.3.0 provides necessary visualization capabilities
- vl-convert-python is required for saving Altair charts
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 9-9: no new line character at the end of file
(new-line-at-end-of-file)
workflow/resources/datavzrd/go-enrichment-template.yaml (1)
136-141
: LGTM! Well-documented scatter plot view configuration.The new go_terms_scatter view is well-configured with:
- Clear description of the visualization
- Appropriate dataset reference
- Well-defined plot specification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
workflow/scripts/enrichment_pathway_scatter.py (1)
8-8
:⚠️ Potential issueAdd missing
name
parameter to function signature.The
name
parameter is used in the function but not defined in the signature.-def plot(df_plot, effect_x, effect_y, title, selector, color_scheme): +def plot(df_plot, effect_x, effect_y, title, selector, color_scheme, name):
🧹 Nitpick comments (7)
workflow/scripts/enrichment_pathway_scatter.py (7)
5-5
: Use a context manager for file handling.Use a context manager (
with
statement) to ensure proper file cleanup:-sys.stderr = open(snakemake.log[0], "w") +with open(snakemake.log[0], "w") as log_file: + sys.stderr = log_file🧰 Tools
🪛 Ruff (0.8.2)
5-5: Use a context manager for opening files
(SIM115)
5-5: Undefined name
snakemake
(F821)
9-10
: Consider performance implications and improve shape group calculation.
- The
disable_max_rows()
call could lead to memory issues with large datasets.- The shape group calculation could be more descriptive.
- alt.data_transformers.disable_max_rows() - df_plot["shape_group"] = df_plot.index % 5 # 5 verschiedene Formen + # Set a reasonable max rows limit instead of disabling + alt.data_transformers.enable(max_rows=10000) + # Use a named constant for clarity + NUM_SHAPES = 5 + df_plot["shape_group"] = df_plot.index % NUM_SHAPES
64-64
: Remove unnecessary f-string.The f-string doesn't contain any placeholders.
- shape=f"shape_group:N", + shape="shape_group:N",🧰 Tools
🪛 Ruff (0.8.2)
64-64: f-string without any placeholders
Remove extraneous
f
prefix(F541)
71-122
: Move color scheme to a configuration file.Consider moving the color scheme to a separate configuration file for better maintainability and reusability.
Create a new file
config/visualization.yaml
and move the color scheme there. Then load it using:with open("config/visualization.yaml") as f: config = yaml.safe_load(f) color_scheme = config["color_scheme"]
124-126
: Document data filtering rationale.Add comments explaining why only the top 50 entries are selected and how this might affect the visualization.
df = pd.read_csv(snakemake.input[0], sep="\t") +# Limit to top 50 entries to ensure readability of the scatter plot +# and prevent overplotting df = df.head(50)🧰 Tools
🪛 Ruff (0.8.2)
124-124: Undefined name
snakemake
(F821)
126-126: Undefined name
snakemake
(F821)
130-132
: Document data transformation rationale.Add comments explaining why negative values are transformed and how this affects interpretation.
+# Split data into positive and negative effects for separate visualization df_positive = df[df[f"{effect_x}"] >= 0] df_negative = df[df[f"{effect_x}"] < 0] +# Take absolute values of negative effects for log scale visualization df_negative[f"{effect_x}"] = df_negative[f"{effect_x}"].abs()
136-180
: Parameterize chart layout configuration.Consider moving the padding values and other layout configurations to parameters for better flexibility.
+# Define chart layout parameters +CHART_PADDING = {"left": 0, "top": 0, "right": 400, "bottom": 0} if df_negative.empty: scatter = plot(df_positive, effect_x, effect_y, "", point_selector, color_scheme) legend = plot_legend(df_positive.copy(), name, point_selector) chart = ( alt.hconcat( - scatter, legend, padding={"left": 0, "top": 0, "right": 400, "bottom": 0} + scatter, legend, padding=CHART_PADDING )🧰 Tools
🪛 Ruff (0.8.2)
180-180: Undefined name
snakemake
(F821)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
workflow/rules/datavzrd.smk
(3 hunks)workflow/scripts/enrichment_pathway_scatter.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
workflow/scripts/enrichment_pathway_scatter.py
5-5: Use a context manager for opening files
(SIM115)
5-5: Undefined name snakemake
(F821)
64-64: f-string without any placeholders
Remove extraneous f
prefix
(F541)
124-124: Undefined name snakemake
(F821)
126-126: Undefined name snakemake
(F821)
127-127: Undefined name snakemake
(F821)
128-128: Undefined name snakemake
(F821)
180-180: Undefined name snakemake
(F821)
🔇 Additional comments (3)
workflow/rules/datavzrd.smk (3)
50-64
: LGTM! Well-structured rule for enrichment scatter plots.The rule follows Snakemake best practices with clear input/output definitions and appropriate conda environment specification.
67-81
: LGTM! Well-structured rule for pathway scatter plots.The rule follows Snakemake best practices and maintains consistency with the enrichment scatter plot rule.
96-96
: LGTM! Proper integration of scatter plots in reports.The scatter plot outputs are correctly integrated into the existing report generation rules.
Also applies to: 159-159
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
workflow/scripts/enrichment_pathway_scatter.py (6)
5-5
: Use context manager for file handling.Use a context manager (
with
statement) to ensure proper file cleanup:-sys.stderr = open(snakemake.log[0], "w") +with open(snakemake.log[0], "w") as log: + sys.stderr = log🧰 Tools
🪛 Ruff (0.8.2)
5-5: Use a context manager for opening files
(SIM115)
5-5: Undefined name
snakemake
(F821)
10-11
: Consider making shape configuration more flexible.The number of shapes is hardcoded to 5. Consider making this configurable:
- df_plot["shape_group"] = df_plot.index % 5 # 5 verschiedene Formen - shapes = ["triangle-up", "triangle-down", "square", "diamond", "cross"] + SHAPE_TYPES = ["triangle-up", "triangle-down", "square", "diamond", "cross"] + df_plot["shape_group"] = df_plot.index % len(SHAPE_TYPES)
60-60
: Consider making labelLimit configurable.The labelLimit is hardcoded to 700. Consider making this configurable based on the data:
- title="", ticks=False, domain=False, labelLimit=700, orient="right" + title="", ticks=False, domain=False, labelLimit=snakemake.params.get("label_limit", 700), orient="right"
71-122
: Consider moving color scheme to configuration.The color scheme is hardcoded in the script. Consider moving it to a configuration file for better maintainability and reusability:
-color_scheme = [ - "#1f78b4", - ... - "#d95f02", -] +with open(snakemake.params.get("color_scheme_path", "default_colors.json")) as f: + color_scheme = json.load(f)
124-125
: Make row limit configurable.The row limit is hardcoded to 50. Consider making this configurable:
-df = df.head(50) +df = df.head(snakemake.params.get("max_rows", 50))🧰 Tools
🪛 Ruff (0.8.2)
124-124: Undefined name
snakemake
(F821)
144-144
: Make padding configuration flexible.The padding values are hardcoded. Consider making them configurable:
- scatter, legend, padding={"left": 0, "top": 0, "right": 400, "bottom": 0} + scatter, legend, padding=snakemake.params.get("padding", {"left": 0, "top": 0, "right": 400, "bottom": 0})Also applies to: 184-184
workflow/envs/pystats.yaml (1)
10-10
: Fix YAML formatting issues.The file has formatting issues:
- Missing newline at end of file
- Trailing whitespace
- vl-convert-python=1.7.0 - + vl-convert-python=1.7.0🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 10-10: no new line character at the end of file
(new-line-at-end-of-file)
[error] 10-10: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
workflow/envs/pystats.yaml
(1 hunks)workflow/scripts/enrichment_pathway_scatter.py
(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
workflow/envs/pystats.yaml
[error] 10-10: no new line character at the end of file
(new-line-at-end-of-file)
[error] 10-10: trailing spaces
(trailing-spaces)
🪛 Ruff (0.8.2)
workflow/scripts/enrichment_pathway_scatter.py
5-5: Use a context manager for opening files
(SIM115)
5-5: Undefined name snakemake
(F821)
124-124: Undefined name snakemake
(F821)
126-126: Undefined name snakemake
(F821)
127-127: Undefined name snakemake
(F821)
128-128: Undefined name snakemake
(F821)
191-191: Undefined name snakemake
(F821)
🔇 Additional comments (2)
workflow/scripts/enrichment_pathway_scatter.py (1)
9-9
: Consider adding a max rows limit.Disabling the max rows limit could impact performance with large datasets. Consider setting an appropriate limit instead of disabling it completely.
workflow/envs/pystats.yaml (1)
7-9
: Verify compatibility of package versions.The updated Altair version and new vl-convert-python dependency should be verified for compatibility:
✅ Verification successful
🌐 Web query:
What is the recommended vl-convert-python version for Altair 5.3.0?
💡 Result:
The recommended `vl-convert-python` version for **Altair 5.3.0** is **1.1.0 or higher**. This requirement was introduced in Altair **5.2.0** to support features like offline HTML/PDF exports and Vega editor URL generation[1][8]. Since Altair **5.x** versions maintain backward compatibility with dependency specifications unless explicitly changed, this requirement persists in **5.3.x** releases[1][8]. ### Key Context from Search Results: - Altair **5.x** replaced `altair_saver` with `vl-convert-python` starting in **v5.2**[2][8]. - Version **1.x** of `vl-convert-python` introduced critical features like HTML/PDF export localization and Vega editor URL compression[1][3]. - The dependency requirement (`>=1.x`) isn't relaxed in subsequent minor releases like **5.x.x**, as no breaking changes were mentioned[1][8]. For optimal functionality with Altair **5.x**, ensure `vl-convert-python` is updated to at least **v1.x**. Citations: - 1: https://github.com/vega/altair/releases - 2: https://github.com/vega/altair/issues/2866 - 3: https://pypi.org/project/vl-convert-python/ - 4: https://formulae.brew.sh/cask/ - 5: https://altair-viz.github.io/altair-viz-v4/releases/changes.html - 6: https://pypi.org/project/altair/ - 7: https://altair-viz.github.io/user_guide/display_frontends.html - 8: https://data.safetycli.com/packages/pypi/altair/changelog - 9: https://github.com/altair-viz/altair_viewer - 10: https://gricad-pages.univ-grenoble-alpes.fr/galerat/open-research-software-monitor/-/blob/auth_mode/requirements.txt
Compatibility Verified: Altair 5.3.0 and vl-convert-python 1.7.0 are compatible.
- Altair 5.3.0 requires a vl-convert-python version of 1.1.0 or higher.
- The dependency specified in workflow/envs/pystats.yaml (vl-convert-python=1.7.0) meets this requirement.
- No compatibility issues were found with the given versions.
This PR adds new scatter plot views to the pathway and enrichment datavzrd reports:
Changes & Features
Summary by CodeRabbit
New Features
Chores