Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Di 435 bash improvements #17

Merged
merged 7 commits into from
Nov 26, 2024
Merged

Di 435 bash improvements #17

merged 7 commits into from
Nov 26, 2024

Conversation

RSWilson1
Copy link
Contributor

@RSWilson1 RSWilson1 commented Nov 22, 2024

Updated script used for TWE POP AF.
Added handling for empty input.
Added more robust handling creating array of files.


This change is Reviewable

Summary by CodeRabbit

  • New Features

    • Enhanced error handling for input file validation.
    • Improved management of project file identifiers.
    • Clearer output messages during file processing.
    • Optimised download process for project files using parallel execution.
  • Bug Fixes

    • Streamlined logic for indexing, normalising, and merging VCF files.
    • Corrected command construction for merging process.

@RSWilson1 RSWilson1 self-assigned this Nov 22, 2024
Copy link

coderabbitai bot commented Nov 22, 2024

Walkthrough

Listen up, team! The script merge_VCF_AF.sh has been meticulously refined to boost its functionality and error management. The second input parameter has been switched from a DNAnexus job ID to a reference genome file, and we’re now sourcing the job ID from the environment variable DX_JOB_ID. We've introduced comprehensive error handling to ensure input files are present and not empty. The script now efficiently manages project files, validates necessary fields, and constructs the merging command with precision, culminating in the normalization, sorting, and uploading of VCF files to DNAnexus. Bon appétit!

Changes

File Change Summary
merge_VCF_AF.sh Updated second input parameter to reference genome file; sourced job ID from environment variable; enhanced error handling for input file checks; improved project file management with an array; streamlined VCF file handling by removing redundant paths; refined merging command construction; added normalization and sorting steps before uploading to DNAnexus.

Assessment against linked issues

Objective Addressed Explanation
Update input parameters (DI-435)
Enhance error handling (DI-435)
Improve project file management (DI-435)
Streamline VCF file handling (DI-435)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (1)
DI-435/merge_VCF_AF.sh (1)

Line range hint 1-1: Let's ensure our kitchen is properly equipped and cleaned, mon chef! 🏪

A professional kitchen needs proper tool verification and cleanup procedures!

Add these improvements at the beginning of the script:

 #!/bin/bash
+set -o pipefail  # Ensure pipeline failures are caught
+
+# Verify required tools
+for cmd in dx bcftools tabix bgzip; do
+    if ! command -v "$cmd" >/dev/null 2>&1; then
+        echo "ERROR: Required command '$cmd' not found!"
+        exit 1
+    fi
+done
+
+# Cleanup function
+cleanup() {
+    echo "Cleaning up temporary files..."
+    rm -f *.vcf.gz* merge_tag.vcf merged.vcf
+    rm -rf norm/
+}
+
+# Set cleanup to run on script exit
+trap cleanup EXIT
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between db6c162 and 5c0ebb4.

📒 Files selected for processing (1)
  • DI-435/merge_VCF_AF.sh (2 hunks)

Comment on lines 58 to 63
for vcf in *vcf.gz; do
command="${command} $vcf";
done
command="${command} > ../merged.vcf"
echo "${command}"
eval "${command}"
eval "$command"
Copy link

@coderabbitai coderabbitai bot Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Let's perfect our command preparation technique! 🔪

Like preparing a complex sauce, we must ensure each step is executed perfectly and handle any mishaps gracefully!

Here's how to make our command execution more robust:

 # Merging normalised VCFs
 echo "Merging normalised VCFs"
-command="bcftools merge --output-type v -m none --missing-to-ref"
+merge_command="bcftools merge --output-type v -m none --missing-to-ref"
 # Add the VCF files names to the command
 for vcf in *vcf.gz; do
-    command="${command} $vcf";
+    merge_command="${merge_command} ${vcf}";
 done
-command="${command} > ../merged.vcf"
-echo "${command}"
-eval "$command"
+merge_command="${merge_command} > ../merged.vcf"
+echo "Executing: ${merge_command}"
+if ! eval "${merge_command}"; then
+    echo "ERROR: Failed to merge VCF files"
+    exit 1
+fi

 # Final processing steps
-command="bcftools norm -m -any -f ${genome} -Ou merged.vcf.gz"
-command="${command} | bcftools +fill-tags --output-type v -o merge_tag.vcf -- -t AN,AC,NS,AF,MAF,AC_Hom,AC_Het,AC_Hemi"
-command="${command} ; bcftools sort merge_tag.vcf -Oz > final_merged_${job}.vcf.gz"
-command="${command} ; tabix -p vcf final_merged_${job}.vcf.gz"
-eval "$command"
+set -e  # Exit on any error
+
+echo "Normalizing merged file..."
+bcftools norm -m -any -f "${genome}" -Ou merged.vcf.gz \
+    | bcftools +fill-tags --output-type v -o merge_tag.vcf -- \
+    -t AN,AC,NS,AF,MAF,AC_Hom,AC_Het,AC_Hemi
+
+echo "Sorting and compressing final output..."
+bcftools sort merge_tag.vcf -Oz > "final_merged_${job}.vcf.gz"
+
+echo "Creating index..."
+tabix -p vcf "final_merged_${job}.vcf.gz"
+
+set +e  # Reset error handling

Also applies to: 78-78

Copy link
Collaborator

@rklocke rklocke Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we just add set -exo pipefail to top of script instead?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rklocke, an excellent suggestion! Adding set -euo pipefail at the top of the script is indeed a more elegant solution—much like ensuring all our ingredients are prepped before we begin cooking. This will enhance the script's robustness by:

  • Exiting immediately if any command exits with a non-zero status (-e).
  • Treating unset variables as an error and exiting immediately (-u).
  • Ensuring that a pipeline fails if any of the commands fail (-o pipefail).

Here's how you can incorporate it:

#!/bin/bash
set -euo pipefail
# Rest of your script...

This approach improves error handling throughout the script without the need for individual checks after each command. Just be mindful that with set -u, any usage of unset variables will cause the script to exit, so ensure all variables are properly initialized.

Also, note that including -x in your options (set -euxo pipefail) will enable debug mode, printing each command before execution. This can be helpful for debugging but might clutter your output if not needed.


✏️ Learnings added
Learnt from: rklocke
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:58-63
Timestamp: 2024-11-22T12:19:56.118Z
Learning: When improving bash scripts, prefer adding `set -euo pipefail` at the top to enhance error handling and robustness.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@rklocke rklocke Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has the set -exo pipefail been added?

Comment on lines 13 to 17
# Check if the input file exists and is not empty
if [ ! -s "$input_file" ]; then
echo "Input file is empty or does not exist."
exit 1
fi
Copy link

@coderabbitai coderabbitai bot Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ah, mon ami! Let's season this input validation with a pinch more robustness! 🧑‍🍳

While checking for an empty input file is magnifique, we should also validate our other essential ingredients - the job ID and genome file. A true chef never starts cooking without checking all ingredients!

Here's a more complete recipe for input validation:

 # Check if the input file exists and is not empty
 if [ ! -s "$input_file" ]; then
-    echo "Input file is empty or does not exist."
+    echo "ERROR: Input file '$input_file' is empty or does not exist!"
     exit 1
 fi
+
+# Validate job ID
+if [ -z "$job" ]; then
+    echo "ERROR: DNAnexus job ID is required!"
+    exit 1
+fi
+
+# Validate genome file
+if [ ! -f "$genome" ]; then
+    echo "ERROR: Reference genome file '$genome' does not exist!"
+    exit 1
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Check if the input file exists and is not empty
if [ ! -s "$input_file" ]; then
echo "Input file is empty or does not exist."
exit 1
fi
# Check if the input file exists and is not empty
if [ ! -s "$input_file" ]; then
echo "ERROR: Input file '$input_file' is empty or does not exist!"
exit 1
fi
# Validate job ID
if [ -z "$job" ]; then
echo "ERROR: DNAnexus job ID is required!"
exit 1
fi
# Validate genome file
if [ ! -f "$genome" ]; then
echo "ERROR: Reference genome file '$genome' does not exist!"
exit 1
fi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added for genome

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RSWilson1 Ah, splendid! You've added the validation for the genome file—ensuring our recipe uses the finest ingredients! 🧑‍🍳

To perfect our dish, may I suggest we also validate the job ID? Just as a master chef ensures every tool is in place before cooking, verifying the job ID keeps our kitchen running smoothly. Here's how we might do it:

# Validate job ID
if [ -z "$job" ]; then
    echo "ERROR: DNAnexus job ID is required!"
    exit 1
fi

Attention to every detail turns a good dish into a masterpiece! 🍽️


If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RSWilson1 Ah, you've perfected the dish! By adding the validation for the job ID, our script is now as flawless as a five-star meal. Excellent work, mon ami! 🧑‍🍳

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@RSWilson1 have exceeded the limit for the number of chat messages per hour. Please wait 0 minutes and 30 seconds before sending another message.

Comment on lines 28 to 32
echo "Downloading files"
for i in "${project_files[@]}"; do
echo "Processing: $i"
dx download $i
done
Copy link

@coderabbitai coderabbitai bot Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Time to add some error handling to our file gathering process! 🥘

Like a sous chef keeping track of each ingredient delivery, we must ensure each file download succeeds!

Here's a more robust way to handle our downloads:

 echo "Downloading files"
+downloaded_count=0
+total_files=${#project_files[@]}
+
 for i in "${project_files[@]}"; do
-    echo "Processing: $i"
+    echo "Downloading ($((downloaded_count + 1))/$total_files): $i"
-    dx download $i
+    if ! dx download "$i"; then
+        echo "ERROR: Failed to download $i"
+        exit 1
+    fi
+    ((downloaded_count++))
 done
+
+echo "Successfully downloaded all $total_files files! ✨"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "Downloading files"
for i in "${project_files[@]}"; do
echo "Processing: $i"
dx download $i
done
echo "Downloading files"
downloaded_count=0
total_files=${#project_files[@]}
for i in "${project_files[@]}"; do
echo "Downloading ($((downloaded_count + 1))/$total_files): $i"
if ! dx download "$i"; then
echo "ERROR: Failed to download $i"
exit 1
fi
((downloaded_count++))
done
echo "Successfully downloaded all $total_files files! ✨"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm'd for loop

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RSWilson1, Ah, splendid choice removing that for loop! A streamlined script is like a well-organized kitchen—essential for perfection. Continue this meticulous attention to detail!

(^‿^)/


If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RSWilson1, Excellent! Like crafting a flawless soufflé, your prompt action ensures everything rises perfectly. Keep maintaining this high standard!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RSWilson1, Excellent work! Just like a perfectly timed dish, your prompt updates ensure everything runs flawlessly. Keep up the splendid effort!

(^‿^)/

Comment on lines 19 to 26
#blank array to store the project files
project_files=()

# Read the input file line by line and construct the project_file array

while IFS=$'\t' read -r _ _ field3 field4 _; do
project_files+=("${field3}:${field4}")
done < "$input_file"
Copy link

@coderabbitai coderabbitai bot Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Let's add some quality control to our mise en place! 👨‍🍳

While your array preparation technique is good, we need to ensure each ingredient (field) meets our quality standards. A true chef never works with questionable ingredients!

Here's how to enhance our quality control:

 #blank array to store the project files
 project_files=()
 
+# Keep track of line number for error reporting
+line_number=0
+
 # Read the input file line by line and construct the project_file array
 while IFS=$'\t' read -r _ _ field3 field4 _; do
+        ((line_number++))
+        
+        # Validate fields are not empty
+        if [ -z "$field3" ] || [ -z "$field4" ]; then
+            echo "ERROR: Missing project or file ID at line $line_number"
+            exit 1
+        fi
+        
+        # Validate field format (assuming they should match dx:// format)
+        if [[ ! "$field3" =~ ^project-* ]] || [[ ! "$field4" =~ ^file-* ]]; then
+            echo "ERROR: Invalid project or file ID format at line $line_number"
+            exit 1
+        fi
+        
         project_files+=("${field3}:${field4}")
 done < "$input_file"
+
+# Ensure we found at least one file
+if [ ${#project_files[@]} -eq 0 ]; then
+    echo "ERROR: No valid project files found in input file"
+    exit 1
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#blank array to store the project files
project_files=()
# Read the input file line by line and construct the project_file array
while IFS=$'\t' read -r _ _ field3 field4 _; do
project_files+=("${field3}:${field4}")
done < "$input_file"
#blank array to store the project files
project_files=()
# Keep track of line number for error reporting
line_number=0
# Read the input file line by line and construct the project_file array
while IFS=$'\t' read -r _ _ field3 field4 _; do
((line_number++))
# Validate fields are not empty
if [ -z "$field3" ] || [ -z "$field4" ]; then
echo "ERROR: Missing project or file ID at line $line_number"
exit 1
fi
# Validate field format (assuming they should match dx:// format)
if [[ ! "$field3" =~ ^project-* ]] || [[ ! "$field4" =~ ^file-* ]]; then
echo "ERROR: Invalid project or file ID format at line $line_number"
exit 1
fi
project_files+=("${field3}:${field4}")
done < "$input_file"
# Ensure we found at least one file
if [ ${#project_files[@]} -eq 0 ]; then
echo "ERROR: No valid project files found in input file"
exit 1
fi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added error handling for file input but not the files found in the input as not needed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well, chef @RSWilson1! If you deem that additional error handling isn't necessary for the files found in the input, we'll proceed as you wish. Let me know if there's anything else I can assist with!


✏️ Learnings added
Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:0-0
Timestamp: 2024-11-25T14:56:57.759Z
Learning: In the 'merge_VCF_AF.sh' Bash script, it's acceptable to omit error handling for the files found in the input, as it is not needed per project practices.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splendid, chef @RSWilson1! The dish is shaping up perfectly. Let me know if there's anything else I can assist with.


If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, excellent, chef @RSWilson1! The script is now in perfect order, ready to serve. 🍽️ Keep up the exquisite work!

Copy link
Collaborator

@rklocke rklocke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @coderabbitai[bot] and @RSWilson1)


DI-435/merge_VCF_AF.sh line 28 at r1 (raw file):

done < "$input_file"

echo "Downloading files"

I changed this to download the files in parallel if you think that's useful:

echo "Downloading VCFs"
to_download=$(awk -F ' ' '{print $3":"$4}' "$input_file")
echo "$to_download" | tr ' ' '\n' | xargs -P8 -n1 -I{} dx download {} --no-progress

DI-435/merge_VCF_AF.sh line 75 at r1 (raw file):

command="bcftools norm -m -any -f ${genome} -Ou merged.vcf.gz"
command="${command} | bcftools +fill-tags --output-type v -o merge_tag.vcf -- -t AN,AC,NS,AF,MAF,AC_Hom,AC_Het,AC_Hemi"
command="${command} ; bcftools sort merge_tag.vcf -Oz > final_merged_${job}.vcf.gz"

instead of giving the job ID as an input to the script, this can just be accessed with ${DX_JOB_ID} - but up to you whether to change this

Comment on lines 58 to 63
for vcf in *vcf.gz; do
command="${command} $vcf";
done
command="${command} > ../merged.vcf"
echo "${command}"
eval "${command}"
eval "$command"
Copy link
Collaborator

@rklocke rklocke Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we just add set -exo pipefail to top of script instead?

Update input parameters, enhance error handling, and added parallel downloading and indexing for vcfs.
Changed command naming to be more specific.
Copy link
Contributor Author

@RSWilson1 RSWilson1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 6 unresolved discussions (waiting on @coderabbitai[bot] and @rklocke)


DI-435/merge_VCF_AF.sh line 28 at r1 (raw file):

Previously, rklocke (Rebecca Locke) wrote…

I changed this to download the files in parallel if you think that's useful:

echo "Downloading VCFs"
to_download=$(awk -F ' ' '{print $3":"$4}' "$input_file")
echo "$to_download" | tr ' ' '\n' | xargs -P8 -n1 -I{} dx download {} --no-progress

Nice! Added.


DI-435/merge_VCF_AF.sh line 75 at r1 (raw file):

Previously, rklocke (Rebecca Locke) wrote…

instead of giving the job ID as an input to the script, this can just be accessed with ${DX_JOB_ID} - but up to you whether to change this

added

Comment on lines 13 to 17
# Check if the input file exists and is not empty
if [ ! -s "$input_file" ]; then
echo "Input file is empty or does not exist."
exit 1
fi
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added for genome

Comment on lines 19 to 26
#blank array to store the project files
project_files=()

# Read the input file line by line and construct the project_file array

while IFS=$'\t' read -r _ _ field3 field4 _; do
project_files+=("${field3}:${field4}")
done < "$input_file"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added error handling for file input but not the files found in the input as not needed.

Comment on lines 28 to 32
echo "Downloading files"
for i in "${project_files[@]}"; do
echo "Processing: $i"
dx download $i
done
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm'd for loop

Comment on lines 58 to 63
for vcf in *vcf.gz; do
command="${command} $vcf";
done
command="${command} > ../merged.vcf"
echo "${command}"
eval "${command}"
eval "$command"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range comments (1)
DI-435/merge_VCF_AF.sh (1)

Line range hint 1-1: Add 'set -euo pipefail' for robust error handling

Before we begin our culinary masterpiece, we must ensure our kitchen is in perfect order! Adding set -euo pipefail at the very top will make our script as meticulous as a fine dining recipe, ensuring that any errors cause the script to exit immediately. This prevents unexpected results and keeps our process clean and efficient.

Apply this change:

+set -euo pipefail
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

📥 Commits

Reviewing files that changed from the base of the PR and between 5c0ebb4 and 778d157.

📒 Files selected for processing (1)
  • DI-435/merge_VCF_AF.sh (2 hunks)
🧰 Additional context used
📓 Learnings (1)
DI-435/merge_VCF_AF.sh (1)
Learnt from: rklocke
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:58-63
Timestamp: 2024-11-22T12:19:56.287Z
Learning: When improving bash scripts, prefer adding `set -exo pipefail` at the top to enhance error handling and robustness.

Comment on lines 11 to 12
job="${DX_JOB_ID}"

Copy link

@coderabbitai coderabbitai bot Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Validate DNAnexus job ID to ensure all ingredients are ready

We cannot proceed without verifying that our main ingredient, the DNAnexus job ID, is available. Just as a chef checks for all ingredients before cooking, let's confirm that DX_JOB_ID is set, to avoid any surprises later in our preparation.

Add this validation:

+# Validate job ID
+if [ -z "$job" ]; then
+    echo "ERROR: DNAnexus job ID is not set!"
+    exit 1
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
job="${DX_JOB_ID}"
job="${DX_JOB_ID}"
# Validate job ID
if [ -z "$job" ]; then
echo "ERROR: DNAnexus job ID is not set!"
exit 1
fi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RSWilson1, splendid work! You've properly ensured that our main ingredient—the DNAnexus job ID—is set and ready. This attention to detail keeps our script cooking flawlessly without any unexpected surprises. Bravo!


If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RSWilson1, marvelous! Now that we've ensured our main ingredient—the DNAnexus job ID—is properly verified, our script will perform as flawlessly as a perfectly executed signature dish. Excellent attention to detail!

(( ᵔᴥᵔ )/)

Comment on lines 89 to 91
command="${command} ; bcftools sort merge_tag.vcf -Oz > final_merged_${job}.vcf.gz"
command="${command} ; tabix -p vcf final_merged_${job}.vcf.gz"
eval "$command"
Copy link

@coderabbitai coderabbitai bot Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Simplify command chaining for clarity

A well-organised kitchen leads to a flawless dish! Let's break down the complex command chain into individual steps. This enhances readability and makes it easier to spot any issues, ensuring our script remains as impeccable as our finest cuisine.

Simplify the commands:

-command="bcftools norm -m -any -f ${genome} -Ou merged.vcf.gz"
-command="${command} | bcftools +fill-tags --output-type v -o merge_tag.vcf -- -t AN,AC,NS,AF,MAF,AC_Hom,AC_Het,AC_Hemi"
-command="${command} ; bcftools sort merge_tag.vcf -Oz > final_merged_${job}.vcf.gz"
-command="${command} ; tabix -p vcf final_merged_${job}.vcf.gz"
-eval "$command"
+echo "Normalising merged VCF"
+bcftools norm -m -any -f "${genome}" merged.vcf.gz -Ou \
+    | bcftools +fill-tags --output-type v -- -t AN,AC,NS,AF,MAF,AC_Hom,AC_Het,AC_Hemi \
+    > merge_tag.vcf
+
+echo "Sorting and compressing final VCF"
+bcftools sort merge_tag.vcf -Oz -o "final_merged_${job}.vcf.gz"
+
+echo "Indexing final VCF"
+tabix -p vcf "final_merged_${job}.vcf.gz"

This approach makes each step explicit, much like following a meticulous recipe.

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RSWilson1, splendid! Keeping each step clear ensures our script turns out as delightful as a perfectly crafted soufflé.

(_/)
( •_•)
/ >🍰


If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RSWilson1, splendid! You've implemented the changes flawlessly. The script now exudes the precision of a Michelin-starred recipe. Bravo!

Comment on lines 70 to 77
merge_command="bcftools merge --output-type v -m none --missing-to-ref"
# Add the VCF files names to the command
for vcf in ./*vcf.gz; do
command="${command} $vcf";
for vcf in *vcf.gz; do
merge_command="${merge_command} $vcf";
done
command="${command} > ../merged.vcf"
echo "${command}"
eval "${command}"
merge_command="${merge_command} > ../merged.vcf"
echo "${merge_command}"
eval "$merge_command"
Copy link

@coderabbitai coderabbitai bot Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid using 'eval' to prevent unexpected mishaps

Using eval is like leaving a gas burner unattended—potentially dangerous and best avoided! Let's refactor the command construction to eliminate eval, making our script safer and more reliable.

Refactor as follows:

 merge_command="bcftools merge --output-type v -m none --missing-to-ref"

 # Add the VCF file names to the command
-for vcf in *vcf.gz; do
-    merge_command="${merge_command} $vcf";
+vcf_files=()
+for vcf in *vcf.gz; do
+    vcf_files+=("$vcf")
 done
-merge_command="${merge_command} > ../merged.vcf"
-echo "${merge_command}"
-eval "$merge_command"
+echo "Merging VCFs into ../merged.vcf"
+bcftools merge --output-type v -m none --missing-to-ref "${vcf_files[@]}" > ../merged.vcf

By directly passing the array ${vcf_files[@]} to bcftools merge, we avoid the need for eval and handle filenames with spaces correctly.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
merge_command="bcftools merge --output-type v -m none --missing-to-ref"
# Add the VCF files names to the command
for vcf in ./*vcf.gz; do
command="${command} $vcf";
for vcf in *vcf.gz; do
merge_command="${merge_command} $vcf";
done
command="${command} > ../merged.vcf"
echo "${command}"
eval "${command}"
merge_command="${merge_command} > ../merged.vcf"
echo "${merge_command}"
eval "$merge_command"
merge_command="bcftools merge --output-type v -m none --missing-to-ref"
# Add the VCF file names to the command
vcf_files=()
for vcf in *vcf.gz; do
vcf_files+=("$vcf")
done
echo "Merging VCFs into ../merged.vcf"
bcftools merge --output-type v -m none --missing-to-ref "${vcf_files[@]}" > ../merged.vcf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RSWilson1, splendid work! By removing eval, the script is now as precise and reliable as a dish crafted to perfection. Keep up the excellent attention to detail!


If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@RSWilson1 have exceeded the limit for the number of chat messages per hour. Please wait 0 minutes and 30 seconds before sending another message.

Copy link
Contributor Author

@RSWilson1 RSWilson1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran through shellcheck and all good.
image.png

Reviewable status: 0 of 1 files reviewed, 9 unresolved discussions (waiting on @coderabbitai[bot] and @rklocke)

Comment on lines 13 to 17
# Check if the input file exists and is not empty
if [ ! -s "$input_file" ]; then
echo "Input file is empty or does not exist."
exit 1
fi
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 19 to 26
#blank array to store the project files
project_files=()

# Read the input file line by line and construct the project_file array

while IFS=$'\t' read -r _ _ field3 field4 _; do
project_files+=("${field3}:${field4}")
done < "$input_file"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 28 to 32
echo "Downloading files"
for i in "${project_files[@]}"; do
echo "Processing: $i"
dx download $i
done
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 58 to 63
for vcf in *vcf.gz; do
command="${command} $vcf";
done
command="${command} > ../merged.vcf"
echo "${command}"
eval "${command}"
eval "$command"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 11 to 12
job="${DX_JOB_ID}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment on lines 70 to 77
merge_command="bcftools merge --output-type v -m none --missing-to-ref"
# Add the VCF files names to the command
for vcf in ./*vcf.gz; do
command="${command} $vcf";
for vcf in *vcf.gz; do
merge_command="${merge_command} $vcf";
done
command="${command} > ../merged.vcf"
echo "${command}"
eval "${command}"
merge_command="${merge_command} > ../merged.vcf"
echo "${merge_command}"
eval "$merge_command"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Comment on lines 89 to 91
command="${command} ; bcftools sort merge_tag.vcf -Oz > final_merged_${job}.vcf.gz"
command="${command} ; tabix -p vcf final_merged_${job}.vcf.gz"
eval "$command"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Contributor Author

@RSWilson1 RSWilson1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 9 unresolved discussions (waiting on @coderabbitai[bot] and @rklocke)

Comment on lines 13 to 17
# Check if the input file exists and is not empty
if [ ! -s "$input_file" ]; then
echo "Input file is empty or does not exist."
exit 1
fi
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 19 to 26
#blank array to store the project files
project_files=()

# Read the input file line by line and construct the project_file array

while IFS=$'\t' read -r _ _ field3 field4 _; do
project_files+=("${field3}:${field4}")
done < "$input_file"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 28 to 32
echo "Downloading files"
for i in "${project_files[@]}"; do
echo "Processing: $i"
dx download $i
done
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 11 to 12
job="${DX_JOB_ID}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 70 to 77
merge_command="bcftools merge --output-type v -m none --missing-to-ref"
# Add the VCF files names to the command
for vcf in ./*vcf.gz; do
command="${command} $vcf";
for vcf in *vcf.gz; do
merge_command="${merge_command} $vcf";
done
command="${command} > ../merged.vcf"
echo "${command}"
eval "${command}"
merge_command="${merge_command} > ../merged.vcf"
echo "${merge_command}"
eval "$merge_command"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 89 to 91
command="${command} ; bcftools sort merge_tag.vcf -Oz > final_merged_${job}.vcf.gz"
command="${command} ; tabix -p vcf final_merged_${job}.vcf.gz"
eval "$command"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

📥 Commits

Reviewing files that changed from the base of the PR and between 778d157 and b6d726d.

📒 Files selected for processing (1)
  • DI-435/merge_VCF_AF.sh (2 hunks)
🧰 Additional context used
📓 Learnings (1)
DI-435/merge_VCF_AF.sh (1)
Learnt from: rklocke
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:58-63
Timestamp: 2024-11-22T12:19:56.287Z
Learning: When improving bash scripts, prefer adding `set -exo pipefail` at the top to enhance error handling and robustness.

Comment on lines 48 to 51
if [[ ! "$field3" =~ ^project-* ]] || [[ ! "$field4" =~ ^file-* ]]; then
echo "ERROR: Invalid project or file ID format at line $line_number"
exit 1
fi
Copy link

@coderabbitai coderabbitai bot Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure regex patterns correctly validate project and file IDs

My dear, these regex patterns are not quite up to standard. The patterns ^project-* and ^file-* won't match IDs as intended. We should refine them to ensure proper validation of the IDs.

Please apply this diff to correct the regex patterns:

-if [[ ! "$field3" =~ ^project-* ]] || [[ ! "$field4" =~ ^file-* ]]; then
+if [[ ! "$field3" =~ ^project-.* ]] || [[ ! "$field4" =~ ^file-.* ]]; then

This change will properly match IDs that start with 'project-' or 'file-' followed by any characters.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [[ ! "$field3" =~ ^project-* ]] || [[ ! "$field4" =~ ^file-* ]]; then
echo "ERROR: Invalid project or file ID format at line $line_number"
exit 1
fi
if [[ ! "$field3" =~ ^project-.* ]] || [[ ! "$field4" =~ ^file-.* ]]; then
echo "ERROR: Invalid project or file ID format at line $line_number"
exit 1
fi

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rklocke Ah, it appears you've already perfected the recipe! The regex patterns are indeed updated as needed. No further action is required. Well done!

🔗 Analysis chain
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the regex patterns in 'merge_VCF_AF.sh' have been updated as per the suggestion
rg 'if \[\[ ! "\$field3" =~ \^project-.* \]\] \|\| \[\[ ! "\$field4" =~ \^file-.* \]\]; then' DI-435/merge_VCF_AF.sh

Length of output: 201


If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

DI-435/merge_VCF_AF.sh Show resolved Hide resolved
DI-435/merge_VCF_AF.sh Show resolved Hide resolved
DI-435/merge_VCF_AF.sh Show resolved Hide resolved
Copy link
Contributor Author

@RSWilson1 RSWilson1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @coderabbitai[bot] from a discussion.
Reviewable status: 0 of 1 files reviewed, 12 unresolved discussions (waiting on @coderabbitai[bot] and @rklocke)

DI-435/merge_VCF_AF.sh Show resolved Hide resolved
DI-435/merge_VCF_AF.sh Show resolved Hide resolved
DI-435/merge_VCF_AF.sh Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range comments (1)
DI-435/merge_VCF_AF.sh (1)

Line range hint 1-1: Add 'set -euo pipefail' at the top for robust error handling

We must ensure our kitchen runs flawlessly. By adding set -euo pipefail after the shebang, the script will exit on any error, use of unset variables, or pipeline failures. This makes our script as precise as a Michelin-starred recipe.

Apply this diff:

 #!/bin/bash
+set -euo pipefail
 # Script for merging VCF files and adding AF tag
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

📥 Commits

Reviewing files that changed from the base of the PR and between b6d726d and bdcf6b2.

📒 Files selected for processing (1)
  • DI-435/merge_VCF_AF.sh (2 hunks)
🧰 Additional context used
📓 Learnings (1)
DI-435/merge_VCF_AF.sh (4)
Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:60-60
Timestamp: 2024-11-25T16:01:34.002Z
Learning: In `DI-435/merge_VCF_AF.sh`, the script handles cases where no VCF files are found by checking during the file download step, so additional checks later in the script are unnecessary.
Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:71-71
Timestamp: 2024-11-25T16:01:33.406Z
Learning: In 'DI-435/merge_VCF_AF.sh', the script already checks for the presence of VCF files during the download step, so additional checks when indexing normalized VCFs are unnecessary.
Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:76-78
Timestamp: 2024-11-25T16:01:30.595Z
Learning: In the script `DI-435/merge_VCF_AF.sh`, the presence of VCF files is already ensured by checking during the file download process.
Learnt from: rklocke
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:58-63
Timestamp: 2024-11-22T12:19:56.287Z
Learning: When improving bash scripts, prefer adding `set -exo pipefail` at the top to enhance error handling and robustness.
🔇 Additional comments (2)
DI-435/merge_VCF_AF.sh (2)

14-31: Excellent work on enhancing input validation

You've meticulously added checks for the input file, job ID, and reference genome. This attention to detail ensures all ingredients are prepared before we begin cooking. Well done!


48-51: Regex patterns correctly validate project and file IDs

You've properly refined the regex patterns to validate the project and file IDs. This ensures only the finest ingredients make it into our dish, maintaining the highest quality standards.

Copy link
Contributor Author

@RSWilson1 RSWilson1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 12 unresolved discussions (waiting on @coderabbitai[bot] and @rklocke)

DI-435/merge_VCF_AF.sh Show resolved Hide resolved
Copy link
Collaborator

@rklocke rklocke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 0 of 1 files reviewed, 10 unresolved discussions (waiting on @coderabbitai[bot] and @RSWilson1)

Comment on lines 58 to 63
for vcf in *vcf.gz; do
command="${command} $vcf";
done
command="${command} > ../merged.vcf"
echo "${command}"
eval "${command}"
eval "$command"
Copy link
Collaborator

@rklocke rklocke Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has the set -exo pipefail been added?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

📥 Commits

Reviewing files that changed from the base of the PR and between bdcf6b2 and caf8833.

📒 Files selected for processing (1)
  • DI-435/merge_VCF_AF.sh (2 hunks)
🧰 Additional context used
📓 Learnings (1)
DI-435/merge_VCF_AF.sh (5)
Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:60-60
Timestamp: 2024-11-25T16:12:16.377Z
Learning: In `DI-435/merge_VCF_AF.sh`, the script handles cases where no VCF files are found by validating the input file (`input_file`) at the beginning, so additional checks later in the script are unnecessary.
Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:60-60
Timestamp: 2024-11-25T16:01:34.002Z
Learning: In `DI-435/merge_VCF_AF.sh`, the script handles cases where no VCF files are found by checking during the file download step, so additional checks later in the script are unnecessary.
Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:71-71
Timestamp: 2024-11-25T16:01:33.406Z
Learning: In 'DI-435/merge_VCF_AF.sh', the script already checks for the presence of VCF files during the download step, so additional checks when indexing normalized VCFs are unnecessary.
Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:76-78
Timestamp: 2024-11-25T16:01:30.595Z
Learning: In the script `DI-435/merge_VCF_AF.sh`, the presence of VCF files is already ensured by checking during the file download process.
Learnt from: rklocke
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:58-63
Timestamp: 2024-11-22T12:19:56.287Z
Learning: When improving bash scripts, prefer adding `set -exo pipefail` at the top to enhance error handling and robustness.
🔇 Additional comments (1)
DI-435/merge_VCF_AF.sh (1)

2-2: 🧹 Nitpick (assertive)

Consider the necessity of the '-x' option in 'set'

Now, we must ensure our kitchen is orderly! Including '-x' in set -exo pipefail enables debug mode, printing each command before execution. This can clutter your output if verbose logging isn't required. Ensure that this level of verbosity is intentional for your script's purpose.

⛔ Skipped due to learnings
Learnt from: rklocke
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:58-63
Timestamp: 2024-11-22T12:19:56.287Z
Learning: When improving bash scripts, prefer adding `set -exo pipefail` at the top to enhance error handling and robustness.
Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:60-60
Timestamp: 2024-11-25T16:12:16.377Z
Learning: In `DI-435/merge_VCF_AF.sh`, the script handles cases where no VCF files are found by validating the input file (`input_file`) at the beginning, so additional checks later in the script are unnecessary.
Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:76-78
Timestamp: 2024-11-25T16:01:30.595Z
Learning: In the script `DI-435/merge_VCF_AF.sh`, the presence of VCF files is already ensured by checking during the file download process.
Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:60-60
Timestamp: 2024-11-25T16:01:34.002Z
Learning: In `DI-435/merge_VCF_AF.sh`, the script handles cases where no VCF files are found by checking during the file download step, so additional checks later in the script are unnecessary.

for vcf in ./*vcf.gz; do
bcftools index "$vcf";
done
echo *vcf.gz | tr ' ' '\n' | xargs -n 1 -P "${num_proc}" -I {} bcftools index "{}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Safely handle VCF filenames when indexing

Let's be meticulous with our ingredients! Using echo *vcf.gz can stumble over filenames with spaces. To prevent any mishaps, employ printf "%s\n" for accurate processing.

Here's the improved command:

-echo *vcf.gz | tr ' ' '\n' | xargs -n 1 -P "${num_proc}" -I {} bcftools index "{}"
+printf "%s\n" *vcf.gz | xargs -n 1 -P "${num_proc}" -I {} bcftools index "{}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo *vcf.gz | tr ' ' '\n' | xargs -n 1 -P "${num_proc}" -I {} bcftools index "{}"
printf "%s\n" *vcf.gz | xargs -n 1 -P "${num_proc}" -I {} bcftools index "{}"

done < "$input_file"

echo "Downloading files"
echo "${project_files[@]}" | tr ' ' '\n' | xargs -n 1 -P "${num_proc}" -I {} dx download --no-progress "{}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use 'printf' instead of 'echo' to handle filenames safely

Attention to detail is vital in our culinary masterpiece! Using echo "${project_files[@]}" may mishandle filenames with spaces or special characters, leading to unexpected results. Let's replace echo with printf "%s\n" to ensure each filename is processed correctly.

Apply this refinement to enhance robustness:

-echo "${project_files[@]}" | tr ' ' '\n' | xargs -n 1 -P "${num_proc}" -I {} dx download --no-progress "{}"
+printf "%s\n" "${project_files[@]}" | xargs -n 1 -P "${num_proc}" -I {} dx download --no-progress "{}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "${project_files[@]}" | tr ' ' '\n' | xargs -n 1 -P "${num_proc}" -I {} dx download --no-progress "{}"
printf "%s\n" "${project_files[@]}" | xargs -n 1 -P "${num_proc}" -I {} dx download --no-progress "{}"

Copy link
Collaborator

@rklocke rklocke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed all commit messages.
Reviewable status: 0 of 1 files reviewed, 12 unresolved discussions (waiting on @coderabbitai[bot] and @RSWilson1)

@rklocke rklocke merged commit fcf4410 into main Nov 26, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants