Di 435 bash improvements #17

RSWilson1 · 2024-11-22T09:55:35Z

Updated script used for TWE POP AF.
Added handling for empty input.
Added more robust handling creating array of files.

This change is

Summary by CodeRabbit

New Features
- Enhanced error handling for input file validation.
- Improved management of project file identifiers.
- Clearer output messages during file processing.
- Optimised download process for project files using parallel execution.
Bug Fixes
- Streamlined logic for indexing, normalising, and merging VCF files.
- Corrected command construction for merging process.

…g for empty input file.

coderabbitai · 2024-11-22T09:55:45Z

Walkthrough

Listen up, team! The script merge_VCF_AF.sh has been meticulously refined to boost its functionality and error management. The second input parameter has been switched from a DNAnexus job ID to a reference genome file, and we’re now sourcing the job ID from the environment variable DX_JOB_ID. We've introduced comprehensive error handling to ensure input files are present and not empty. The script now efficiently manages project files, validates necessary fields, and constructs the merging command with precision, culminating in the normalization, sorting, and uploading of VCF files to DNAnexus. Bon appétit!

Changes

File	Change Summary
merge_VCF_AF.sh	Updated second input parameter to reference genome file; sourced job ID from environment variable; enhanced error handling for input file checks; improved project file management with an array; streamlined VCF file handling by removing redundant paths; refined merging command construction; added normalization and sorting steps before uploading to DNAnexus.

Assessment against linked issues

Objective	Addressed	Explanation
Update input parameters (DI-435)	✅
Enhance error handling (DI-435)	✅
Improve project file management (DI-435)	✅
Streamline VCF file handling (DI-435)	✅

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (1)

DI-435/merge_VCF_AF.sh (1)

Line range hint 1-1: Let's ensure our kitchen is properly equipped and cleaned, mon chef! 🏪

A professional kitchen needs proper tool verification and cleanup procedures!

Add these improvements at the beginning of the script:

 #!/bin/bash
+set -o pipefail  # Ensure pipeline failures are caught
+
+# Verify required tools
+for cmd in dx bcftools tabix bgzip; do
+    if ! command -v "$cmd" >/dev/null 2>&1; then
+        echo "ERROR: Required command '$cmd' not found!"
+        exit 1
+    fi
+done
+
+# Cleanup function
+cleanup() {
+    echo "Cleaning up temporary files..."
+    rm -f *.vcf.gz* merge_tag.vcf merged.vcf
+    rm -rf norm/
+}
+
+# Set cleanup to run on script exit
+trap cleanup EXIT

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between db6c162 and 5c0ebb4.

📒 Files selected for processing (1)

DI-435/merge_VCF_AF.sh (2 hunks)

coderabbitai · 2024-11-22T09:57:50Z

DI-435/merge_VCF_AF.sh

+for vcf in *vcf.gz; do
    command="${command} $vcf";
 done
 command="${command} > ../merged.vcf"
 echo "${command}"
-eval "${command}"
+eval "$command"


🛠️ Refactor suggestion

Let's perfect our command preparation technique! 🔪

Like preparing a complex sauce, we must ensure each step is executed perfectly and handle any mishaps gracefully!

Here's how to make our command execution more robust:

# Merging normalised VCFs echo "Merging normalised VCFs" -command="bcftools merge --output-type v -m none --missing-to-ref" +merge_command="bcftools merge --output-type v -m none --missing-to-ref" # Add the VCF files names to the command for vcf in *vcf.gz; do - command="${command} $vcf"; + merge_command="${merge_command} ${vcf}"; done -command="${command} > ../merged.vcf" -echo "${command}" -eval "$command" +merge_command="${merge_command} > ../merged.vcf" +echo "Executing: ${merge_command}" +if ! eval "${merge_command}"; then + echo "ERROR: Failed to merge VCF files" + exit 1 +fi # Final processing steps -command="bcftools norm -m -any -f ${genome} -Ou merged.vcf.gz" -command="${command} | bcftools +fill-tags --output-type v -o merge_tag.vcf -- -t AN,AC,NS,AF,MAF,AC_Hom,AC_Het,AC_Hemi" -command="${command} ; bcftools sort merge_tag.vcf -Oz > final_merged_${job}.vcf.gz" -command="${command} ; tabix -p vcf final_merged_${job}.vcf.gz" -eval "$command" +set -e # Exit on any error + +echo "Normalizing merged file..." +bcftools norm -m -any -f "${genome}" -Ou merged.vcf.gz \ + | bcftools +fill-tags --output-type v -o merge_tag.vcf -- \ + -t AN,AC,NS,AF,MAF,AC_Hom,AC_Het,AC_Hemi + +echo "Sorting and compressing final output..." +bcftools sort merge_tag.vcf -Oz > "final_merged_${job}.vcf.gz" + +echo "Creating index..." +tabix -p vcf "final_merged_${job}.vcf.gz" + +set +e # Reset error handling

Also applies to: 78-78

could we just add set -exo pipefail to top of script instead?

@rklocke, an excellent suggestion! Adding set -euo pipefail at the top of the script is indeed a more elegant solution—much like ensuring all our ingredients are prepped before we begin cooking. This will enhance the script's robustness by:

Exiting immediately if any command exits with a non-zero status (-e).

Treating unset variables as an error and exiting immediately (-u).

Ensuring that a pipeline fails if any of the commands fail (-o pipefail).

Here's how you can incorporate it:

#!/bin/bash set -euo pipefail # Rest of your script...

This approach improves error handling throughout the script without the need for individual checks after each command. Just be mindful that with set -u, any usage of unset variables will cause the script to exit, so ensure all variables are properly initialized.

Also, note that including -x in your options (set -euxo pipefail) will enable debug mode, printing each command before execution. This can be helpful for debugging but might clutter your output if not needed.

✏️ Learnings added

Learnt from: rklocke PR: eastgenomics/RD_requests#17 File: DI-435/merge_VCF_AF.sh:58-63 Timestamp: 2024-11-22T12:19:56.118Z Learning: When improving bash scripts, prefer adding `set -euo pipefail` at the top to enhance error handling and robustness.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

has the set -exo pipefail been added?

coderabbitai · 2024-11-22T09:57:50Z

DI-435/merge_VCF_AF.sh

+# Check if the input file exists and is not empty
+if [ ! -s "$input_file" ]; then
+    echo "Input file is empty or does not exist."
+    exit 1
+fi


🛠️ Refactor suggestion

Ah, mon ami! Let's season this input validation with a pinch more robustness! 🧑‍🍳

While checking for an empty input file is magnifique, we should also validate our other essential ingredients - the job ID and genome file. A true chef never starts cooking without checking all ingredients!

Here's a more complete recipe for input validation:

# Check if the input file exists and is not empty if [ ! -s "$input_file" ]; then - echo "Input file is empty or does not exist." + echo "ERROR: Input file '$input_file' is empty or does not exist!" exit 1 fi + +# Validate job ID +if [ -z "$job" ]; then + echo "ERROR: DNAnexus job ID is required!" + exit 1 +fi + +# Validate genome file +if [ ! -f "$genome" ]; then + echo "ERROR: Reference genome file '$genome' does not exist!" + exit 1 +fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Check if the input file exists and is not empty

if [ ! -s "$input_file" ]; then

echo "Input file is empty or does not exist."

exit 1

fi

# Check if the input file exists and is not empty

if [ ! -s "$input_file" ]; then

echo "ERROR: Input file '$input_file' is empty or does not exist!"

exit 1

fi

# Validate job ID

if [ -z "$job" ]; then

echo "ERROR: DNAnexus job ID is required!"

exit 1

fi

# Validate genome file

if [ ! -f "$genome" ]; then

echo "ERROR: Reference genome file '$genome' does not exist!"

exit 1

fi

added for genome

@RSWilson1 Ah, splendid! You've added the validation for the genome file—ensuring our recipe uses the finest ingredients! 🧑‍🍳

To perfect our dish, may I suggest we also validate the job ID? Just as a master chef ensures every tool is in place before cooking, verifying the job ID keeps our kitchen running smoothly. Here's how we might do it:

# Validate job ID if [ -z "$job" ]; then echo "ERROR: DNAnexus job ID is required!" exit 1 fi

Attention to every detail turns a good dish into a masterpiece! 🍽️

If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

@RSWilson1 Ah, you've perfected the dish! By adding the validation for the job ID, our script is now as flawless as a five-star meal. Excellent work, mon ami! 🧑‍🍳

Rate Limit Exceeded

@RSWilson1 have exceeded the limit for the number of chat messages per hour. Please wait 0 minutes and 30 seconds before sending another message.

coderabbitai · 2024-11-22T09:57:50Z

DI-435/merge_VCF_AF.sh

+echo "Downloading files"
+for i in "${project_files[@]}"; do
+    echo "Processing: $i"
+    dx download $i
+done


🛠️ Refactor suggestion

Time to add some error handling to our file gathering process! 🥘

Like a sous chef keeping track of each ingredient delivery, we must ensure each file download succeeds!

Here's a more robust way to handle our downloads:

echo "Downloading files" +downloaded_count=0 +total_files=${#project_files[@]} + for i in "${project_files[@]}"; do - echo "Processing: $i" + echo "Downloading ($((downloaded_count + 1))/$total_files): $i" - dx download $i + if ! dx download "$i"; then + echo "ERROR: Failed to download $i" + exit 1 + fi + ((downloaded_count++)) done + +echo "Successfully downloaded all $total_files files! ✨"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

echo "Downloading files"

for i in "${project_files[@]}"; do

echo "Processing: $i"

dx download $i

done

echo "Downloading files"

downloaded_count=0

total_files=${#project_files[@]}

for i in "${project_files[@]}"; do

echo "Downloading ($((downloaded_count + 1))/$total_files): $i"

if ! dx download "$i"; then

echo "ERROR: Failed to download $i"

exit 1

fi

((downloaded_count++))

done

echo "Successfully downloaded all $total_files files! ✨"

rm'd for loop

@RSWilson1, Ah, splendid choice removing that for loop! A streamlined script is like a well-organized kitchen—essential for perfection. Continue this meticulous attention to detail!

(^‿^)/

If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

@RSWilson1, Excellent! Like crafting a flawless soufflé, your prompt action ensures everything rises perfectly. Keep maintaining this high standard!

@RSWilson1, Excellent work! Just like a perfectly timed dish, your prompt updates ensure everything runs flawlessly. Keep up the splendid effort!

(^‿^)/

coderabbitai · 2024-11-22T09:57:50Z

DI-435/merge_VCF_AF.sh

+#blank array to store the project files
+project_files=()
+
+# Read the input file line by line and construct the project_file array
+
+while IFS=$'\t' read -r _ _ field3 field4 _; do
+        project_files+=("${field3}:${field4}")
+done < "$input_file"


🛠️ Refactor suggestion

Let's add some quality control to our mise en place! 👨‍🍳

While your array preparation technique is good, we need to ensure each ingredient (field) meets our quality standards. A true chef never works with questionable ingredients!

Here's how to enhance our quality control:

#blank array to store the project files project_files=() +# Keep track of line number for error reporting +line_number=0 + # Read the input file line by line and construct the project_file array while IFS=$'\t' read -r _ _ field3 field4 _; do + ((line_number++)) + + # Validate fields are not empty + if [ -z "$field3" ] || [ -z "$field4" ]; then + echo "ERROR: Missing project or file ID at line $line_number" + exit 1 + fi + + # Validate field format (assuming they should match dx:// format) + if [[ ! "$field3" =~ ^project-* ]] || [[ ! "$field4" =~ ^file-* ]]; then + echo "ERROR: Invalid project or file ID format at line $line_number" + exit 1 + fi + project_files+=("${field3}:${field4}") done < "$input_file" + +# Ensure we found at least one file +if [ ${#project_files[@]} -eq 0 ]; then + echo "ERROR: No valid project files found in input file" + exit 1 +fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#blank array to store the project files

project_files=()

# Read the input file line by line and construct the project_file array

while IFS=$'\t' read -r _ _ field3 field4 _; do

project_files+=("${field3}:${field4}")

done < "$input_file"

#blank array to store the project files

project_files=()

# Keep track of line number for error reporting

line_number=0

# Read the input file line by line and construct the project_file array

while IFS=$'\t' read -r _ _ field3 field4 _; do

((line_number++))

# Validate fields are not empty

if [ -z "$field3" ] || [ -z "$field4" ]; then

echo "ERROR: Missing project or file ID at line $line_number"

exit 1

fi

# Validate field format (assuming they should match dx:// format)

if [[ ! "$field3" =~ ^project-* ]] || [[ ! "$field4" =~ ^file-* ]]; then

echo "ERROR: Invalid project or file ID format at line $line_number"

exit 1

fi

project_files+=("${field3}:${field4}")

done < "$input_file"

# Ensure we found at least one file

if [ ${#project_files[@]} -eq 0 ]; then

echo "ERROR: No valid project files found in input file"

exit 1

fi

Added error handling for file input but not the files found in the input as not needed.

Very well, chef @RSWilson1! If you deem that additional error handling isn't necessary for the files found in the input, we'll proceed as you wish. Let me know if there's anything else I can assist with!

✏️ Learnings added

Learnt from: RSWilson1 PR: eastgenomics/RD_requests#17 File: DI-435/merge_VCF_AF.sh:0-0 Timestamp: 2024-11-25T14:56:57.759Z Learning: In the 'merge_VCF_AF.sh' Bash script, it's acceptable to omit error handling for the files found in the input, as it is not needed per project practices.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Splendid, chef @RSWilson1! The dish is shaping up perfectly. Let me know if there's anything else I can assist with.

If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

Ah, excellent, chef @RSWilson1! The script is now in perfect order, ready to serve. 🍽️ Keep up the exquisite work!

rklocke

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @coderabbitai[bot] and @RSWilson1)

DI-435/merge_VCF_AF.sh line 28 at r1 (raw file):

done < "$input_file"

echo "Downloading files"

I changed this to download the files in parallel if you think that's useful:

echo "Downloading VCFs"
to_download=$(awk -F ' ' '{print $3":"$4}' "$input_file")
echo "$to_download" | tr ' ' '\n' | xargs -P8 -n1 -I{} dx download {} --no-progress

DI-435/merge_VCF_AF.sh line 75 at r1 (raw file):

command="bcftools norm -m -any -f ${genome} -Ou merged.vcf.gz"
command="${command} | bcftools +fill-tags --output-type v -o merge_tag.vcf -- -t AN,AC,NS,AF,MAF,AC_Hom,AC_Het,AC_Hemi"
command="${command} ; bcftools sort merge_tag.vcf -Oz > final_merged_${job}.vcf.gz"

instead of giving the job ID as an input to the script, this can just be accessed with ${DX_JOB_ID} - but up to you whether to change this

rklocke · 2024-11-22T12:19:22Z

DI-435/merge_VCF_AF.sh

+for vcf in *vcf.gz; do
    command="${command} $vcf";
 done
 command="${command} > ../merged.vcf"
 echo "${command}"
-eval "${command}"
+eval "$command"


could we just add set -exo pipefail to top of script instead?

Update input parameters, enhance error handling, and added parallel downloading and indexing for vcfs. Changed command naming to be more specific.

RSWilson1

Reviewable status: 0 of 1 files reviewed, 6 unresolved discussions (waiting on @coderabbitai[bot] and @rklocke)

DI-435/merge_VCF_AF.sh line 28 at r1 (raw file):

Previously, rklocke (Rebecca Locke) wrote…

I changed this to download the files in parallel if you think that's useful:
echo "Downloading VCFs"
to_download=$(awk -F ' ' '{print $3":"$4}' "$input_file")
echo "$to_download" | tr ' ' '\n' | xargs -P8 -n1 -I{} dx download {} --no-progress

Nice! Added.

DI-435/merge_VCF_AF.sh line 75 at r1 (raw file):

Previously, rklocke (Rebecca Locke) wrote…

instead of giving the job ID as an input to the script, this can just be accessed with ${DX_JOB_ID} - but up to you whether to change this

added

RSWilson1 · 2024-11-25T14:56:26Z

DI-435/merge_VCF_AF.sh

+# Check if the input file exists and is not empty
+if [ ! -s "$input_file" ]; then
+    echo "Input file is empty or does not exist."
+    exit 1
+fi


added for genome

RSWilson1 · 2024-11-25T14:56:26Z

DI-435/merge_VCF_AF.sh

+#blank array to store the project files
+project_files=()
+
+# Read the input file line by line and construct the project_file array
+
+while IFS=$'\t' read -r _ _ field3 field4 _; do
+        project_files+=("${field3}:${field4}")
+done < "$input_file"


Added error handling for file input but not the files found in the input as not needed.

RSWilson1 · 2024-11-25T14:56:27Z

DI-435/merge_VCF_AF.sh

+echo "Downloading files"
+for i in "${project_files[@]}"; do
+    echo "Processing: $i"
+    dx download $i
+done


rm'd for loop

RSWilson1 · 2024-11-25T14:56:27Z

DI-435/merge_VCF_AF.sh

+for vcf in *vcf.gz; do
    command="${command} $vcf";
 done
 command="${command} > ../merged.vcf"
 echo "${command}"
-eval "${command}"
+eval "$command"


coderabbitai

Actionable comments posted: 3

🧹 Outside diff range comments (1)

DI-435/merge_VCF_AF.sh (1)
Line range hint 1-1: Add 'set -euo pipefail' for robust error handling

Before we begin our culinary masterpiece, we must ensure our kitchen is in perfect order! Adding set -euo pipefail at the very top will make our script as meticulous as a fine dining recipe, ensuring that any errors cause the script to exit immediately. This prevents unexpected results and keeps our process clean and efficient.

Apply this change:
+set -euo pipefail

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

📥 Commits

Reviewing files that changed from the base of the PR and between 5c0ebb4 and 778d157.

📒 Files selected for processing (1)

DI-435/merge_VCF_AF.sh (2 hunks)

🧰 Additional context used

📓 Learnings (1)

DI-435/merge_VCF_AF.sh (1)

Learnt from: rklocke
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:58-63
Timestamp: 2024-11-22T12:19:56.287Z
Learning: When improving bash scripts, prefer adding `set -exo pipefail` at the top to enhance error handling and robustness.

coderabbitai · 2024-11-25T14:58:39Z

DI-435/merge_VCF_AF.sh

+job="${DX_JOB_ID}"
+


🛠️ Refactor suggestion

Validate DNAnexus job ID to ensure all ingredients are ready

We cannot proceed without verifying that our main ingredient, the DNAnexus job ID, is available. Just as a chef checks for all ingredients before cooking, let's confirm that DX_JOB_ID is set, to avoid any surprises later in our preparation.

Add this validation:

+# Validate job ID +if [ -z "$job" ]; then + echo "ERROR: DNAnexus job ID is not set!" + exit 1 +fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

job="${DX_JOB_ID}"

job="${DX_JOB_ID}"

# Validate job ID

if [ -z "$job" ]; then

echo "ERROR: DNAnexus job ID is not set!"

exit 1

fi

@RSWilson1, splendid work! You've properly ensured that our main ingredient—the DNAnexus job ID—is set and ready. This attention to detail keeps our script cooking flawlessly without any unexpected surprises. Bravo!

If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

@RSWilson1, marvelous! Now that we've ensured our main ingredient—the DNAnexus job ID—is properly verified, our script will perform as flawlessly as a perfectly executed signature dish. Excellent attention to detail!

(( ᵔᴥᵔ )/)

coderabbitai · 2024-11-25T14:58:39Z

DI-435/merge_VCF_AF.sh

 command="${command} ; bcftools sort merge_tag.vcf -Oz > final_merged_${job}.vcf.gz"
 command="${command} ; tabix -p vcf final_merged_${job}.vcf.gz"
 eval "$command"


🧹 Nitpick (assertive)

Simplify command chaining for clarity

A well-organised kitchen leads to a flawless dish! Let's break down the complex command chain into individual steps. This enhances readability and makes it easier to spot any issues, ensuring our script remains as impeccable as our finest cuisine.

Simplify the commands:

-command="bcftools norm -m -any -f ${genome} -Ou merged.vcf.gz" -command="${command} | bcftools +fill-tags --output-type v -o merge_tag.vcf -- -t AN,AC,NS,AF,MAF,AC_Hom,AC_Het,AC_Hemi" -command="${command} ; bcftools sort merge_tag.vcf -Oz > final_merged_${job}.vcf.gz" -command="${command} ; tabix -p vcf final_merged_${job}.vcf.gz" -eval "$command" +echo "Normalising merged VCF" +bcftools norm -m -any -f "${genome}" merged.vcf.gz -Ou \ + | bcftools +fill-tags --output-type v -- -t AN,AC,NS,AF,MAF,AC_Hom,AC_Het,AC_Hemi \ + > merge_tag.vcf + +echo "Sorting and compressing final VCF" +bcftools sort merge_tag.vcf -Oz -o "final_merged_${job}.vcf.gz" + +echo "Indexing final VCF" +tabix -p vcf "final_merged_${job}.vcf.gz"

This approach makes each step explicit, much like following a meticulous recipe.

Committable suggestion skipped: line range outside the PR's diff.

@RSWilson1, splendid! Keeping each step clear ensures our script turns out as delightful as a perfectly crafted soufflé.

(_/)
( •_•)
/ >🍰

If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

@RSWilson1, splendid! You've implemented the changes flawlessly. The script now exudes the precision of a Michelin-starred recipe. Bravo!

coderabbitai · 2024-11-25T14:58:39Z

DI-435/merge_VCF_AF.sh

+merge_command="bcftools merge --output-type v -m none --missing-to-ref"
 # Add the VCF files names to the command
-for vcf in ./*vcf.gz; do
-    command="${command} $vcf";
+for vcf in *vcf.gz; do
+    merge_command="${merge_command} $vcf";
 done
-command="${command} > ../merged.vcf"
-echo "${command}"
-eval "${command}"
+merge_command="${merge_command} > ../merged.vcf"
+echo "${merge_command}"
+eval "$merge_command"


🛠️ Refactor suggestion

Avoid using 'eval' to prevent unexpected mishaps

Using eval is like leaving a gas burner unattended—potentially dangerous and best avoided! Let's refactor the command construction to eliminate eval, making our script safer and more reliable.

Refactor as follows:

merge_command="bcftools merge --output-type v -m none --missing-to-ref" # Add the VCF file names to the command -for vcf in *vcf.gz; do - merge_command="${merge_command} $vcf"; +vcf_files=() +for vcf in *vcf.gz; do + vcf_files+=("$vcf") done -merge_command="${merge_command} > ../merged.vcf" -echo "${merge_command}" -eval "$merge_command" +echo "Merging VCFs into ../merged.vcf" +bcftools merge --output-type v -m none --missing-to-ref "${vcf_files[@]}" > ../merged.vcf

By directly passing the array ${vcf_files[@]} to bcftools merge, we avoid the need for eval and handle filenames with spaces correctly.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

merge_command="bcftools merge --output-type v -m none --missing-to-ref"

# Add the VCF files names to the command

for vcf in ./*vcf.gz; do

command="${command} $vcf";

for vcf in *vcf.gz; do

merge_command="${merge_command} $vcf";

done

command="${command} > ../merged.vcf"

echo "${command}"

eval "${command}"

merge_command="${merge_command} > ../merged.vcf"

echo "${merge_command}"

eval "$merge_command"

merge_command="bcftools merge --output-type v -m none --missing-to-ref"

# Add the VCF file names to the command

vcf_files=()

for vcf in *vcf.gz; do

vcf_files+=("$vcf")

done

echo "Merging VCFs into ../merged.vcf"

bcftools merge --output-type v -m none --missing-to-ref "${vcf_files[@]}" > ../merged.vcf

@RSWilson1, splendid work! By removing eval, the script is now as precise and reliable as a dish crafted to perfection. Keep up the excellent attention to detail!

If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

Rate Limit Exceeded

@RSWilson1 have exceeded the limit for the number of chat messages per hour. Please wait 0 minutes and 30 seconds before sending another message.

removed dependence on eval.

RSWilson1

Ran through shellcheck and all good.

Reviewable status: 0 of 1 files reviewed, 9 unresolved discussions (waiting on @coderabbitai[bot] and @rklocke)

RSWilson1 · 2024-11-25T15:31:01Z

DI-435/merge_VCF_AF.sh

+# Check if the input file exists and is not empty
+if [ ! -s "$input_file" ]; then
+    echo "Input file is empty or does not exist."
+    exit 1
+fi


RSWilson1 · 2024-11-25T15:31:01Z

DI-435/merge_VCF_AF.sh

+#blank array to store the project files
+project_files=()
+
+# Read the input file line by line and construct the project_file array
+
+while IFS=$'\t' read -r _ _ field3 field4 _; do
+        project_files+=("${field3}:${field4}")
+done < "$input_file"


RSWilson1 · 2024-11-25T15:31:01Z

DI-435/merge_VCF_AF.sh

+echo "Downloading files"
+for i in "${project_files[@]}"; do
+    echo "Processing: $i"
+    dx download $i
+done


RSWilson1 · 2024-11-25T15:31:02Z

DI-435/merge_VCF_AF.sh

+for vcf in *vcf.gz; do
    command="${command} $vcf";
 done
 command="${command} > ../merged.vcf"
 echo "${command}"
-eval "${command}"
+eval "$command"


RSWilson1 · 2024-11-25T15:31:02Z

DI-435/merge_VCF_AF.sh

+job="${DX_JOB_ID}"
+


RSWilson1 · 2024-11-25T15:31:02Z

DI-435/merge_VCF_AF.sh

+merge_command="bcftools merge --output-type v -m none --missing-to-ref"
 # Add the VCF files names to the command
-for vcf in ./*vcf.gz; do
-    command="${command} $vcf";
+for vcf in *vcf.gz; do
+    merge_command="${merge_command} $vcf";
 done
-command="${command} > ../merged.vcf"
-echo "${command}"
-eval "${command}"
+merge_command="${merge_command} > ../merged.vcf"
+echo "${merge_command}"
+eval "$merge_command"


RSWilson1 · 2024-11-25T15:31:03Z

DI-435/merge_VCF_AF.sh

 command="${command} ; bcftools sort merge_tag.vcf -Oz > final_merged_${job}.vcf.gz"
 command="${command} ; tabix -p vcf final_merged_${job}.vcf.gz"
 eval "$command"


RSWilson1

Reviewable status: 0 of 1 files reviewed, 9 unresolved discussions (waiting on @coderabbitai[bot] and @rklocke)

RSWilson1 · 2024-11-25T15:32:56Z

DI-435/merge_VCF_AF.sh

+# Check if the input file exists and is not empty
+if [ ! -s "$input_file" ]; then
+    echo "Input file is empty or does not exist."
+    exit 1
+fi


RSWilson1 · 2024-11-25T15:32:56Z

DI-435/merge_VCF_AF.sh

+#blank array to store the project files
+project_files=()
+
+# Read the input file line by line and construct the project_file array
+
+while IFS=$'\t' read -r _ _ field3 field4 _; do
+        project_files+=("${field3}:${field4}")
+done < "$input_file"


RSWilson1 · 2024-11-25T15:32:57Z

DI-435/merge_VCF_AF.sh

+echo "Downloading files"
+for i in "${project_files[@]}"; do
+    echo "Processing: $i"
+    dx download $i
+done


RSWilson1 · 2024-11-25T15:32:57Z

DI-435/merge_VCF_AF.sh

+job="${DX_JOB_ID}"
+


RSWilson1 · 2024-11-25T15:32:57Z

DI-435/merge_VCF_AF.sh

+merge_command="bcftools merge --output-type v -m none --missing-to-ref"
 # Add the VCF files names to the command
-for vcf in ./*vcf.gz; do
-    command="${command} $vcf";
+for vcf in *vcf.gz; do
+    merge_command="${merge_command} $vcf";
 done
-command="${command} > ../merged.vcf"
-echo "${command}"
-eval "${command}"
+merge_command="${merge_command} > ../merged.vcf"
+echo "${merge_command}"
+eval "$merge_command"


RSWilson1 · 2024-11-25T15:32:58Z

DI-435/merge_VCF_AF.sh

 command="${command} ; bcftools sort merge_tag.vcf -Oz > final_merged_${job}.vcf.gz"
 command="${command} ; tabix -p vcf final_merged_${job}.vcf.gz"
 eval "$command"


coderabbitai

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

📥 Commits

Reviewing files that changed from the base of the PR and between 778d157 and b6d726d.

📒 Files selected for processing (1)

DI-435/merge_VCF_AF.sh (2 hunks)

🧰 Additional context used

📓 Learnings (1)

DI-435/merge_VCF_AF.sh (1)

Learnt from: rklocke
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:58-63
Timestamp: 2024-11-22T12:19:56.287Z
Learning: When improving bash scripts, prefer adding `set -exo pipefail` at the top to enhance error handling and robustness.

coderabbitai · 2024-11-25T15:34:14Z

DI-435/merge_VCF_AF.sh

+        if [[ ! "$field3" =~ ^project-* ]] || [[ ! "$field4" =~ ^file-* ]]; then
+            echo "ERROR: Invalid project or file ID format at line $line_number"
+            exit 1
+        fi


⚠️ Potential issue

Ensure regex patterns correctly validate project and file IDs

My dear, these regex patterns are not quite up to standard. The patterns ^project-* and ^file-* won't match IDs as intended. We should refine them to ensure proper validation of the IDs.

Please apply this diff to correct the regex patterns:

-if [[ ! "$field3" =~ ^project-* ]] || [[ ! "$field4" =~ ^file-* ]]; then +if [[ ! "$field3" =~ ^project-.* ]] || [[ ! "$field4" =~ ^file-.* ]]; then

This change will properly match IDs that start with 'project-' or 'file-' followed by any characters.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if [[ ! "$field3" =~ ^project-* ]] || [[ ! "$field4" =~ ^file-* ]]; then

echo "ERROR: Invalid project or file ID format at line $line_number"

exit 1

fi

if [[ ! "$field3" =~ ^project-.* ]] || [[ ! "$field4" =~ ^file-.* ]]; then

echo "ERROR: Invalid project or file ID format at line $line_number"

exit 1

fi

@rklocke Ah, it appears you've already perfected the recipe! The regex patterns are indeed updated as needed. No further action is required. Well done!

🔗 Analysis chain

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Check if the regex patterns in 'merge_VCF_AF.sh' have been updated as per the suggestion rg 'if \[\[ ! "\$field3" =~ \^project-.* \]\] \|\| \[\[ ! "\$field4" =~ \^file-.* \]\]; then' DI-435/merge_VCF_AF.sh

Length of output: 201

If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

DI-435/merge_VCF_AF.sh

RSWilson1

Dismissed @coderabbitai[bot] from a discussion.
Reviewable status: 0 of 1 files reviewed, 12 unresolved discussions (waiting on @coderabbitai[bot] and @rklocke)

DI-435/merge_VCF_AF.sh

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range comments (1)

DI-435/merge_VCF_AF.sh (1)
Line range hint 1-1: Add 'set -euo pipefail' at the top for robust error handling

We must ensure our kitchen runs flawlessly. By adding set -euo pipefail after the shebang, the script will exit on any error, use of unset variables, or pipeline failures. This makes our script as precise as a Michelin-starred recipe.

Apply this diff:
 #!/bin/bash
+set -euo pipefail
 # Script for merging VCF files and adding AF tag

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

📥 Commits

Reviewing files that changed from the base of the PR and between b6d726d and bdcf6b2.

📒 Files selected for processing (1)

DI-435/merge_VCF_AF.sh (2 hunks)

🧰 Additional context used

📓 Learnings (1)

DI-435/merge_VCF_AF.sh (4)

Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:60-60
Timestamp: 2024-11-25T16:01:34.002Z
Learning: In `DI-435/merge_VCF_AF.sh`, the script handles cases where no VCF files are found by checking during the file download step, so additional checks later in the script are unnecessary.

Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:71-71
Timestamp: 2024-11-25T16:01:33.406Z
Learning: In 'DI-435/merge_VCF_AF.sh', the script already checks for the presence of VCF files during the download step, so additional checks when indexing normalized VCFs are unnecessary.

Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:76-78
Timestamp: 2024-11-25T16:01:30.595Z
Learning: In the script `DI-435/merge_VCF_AF.sh`, the presence of VCF files is already ensured by checking during the file download process.

Learnt from: rklocke
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:58-63
Timestamp: 2024-11-22T12:19:56.287Z
Learning: When improving bash scripts, prefer adding `set -exo pipefail` at the top to enhance error handling and robustness.

🔇 Additional comments (2)

DI-435/merge_VCF_AF.sh (2)

14-31: Excellent work on enhancing input validation

You've meticulously added checks for the input file, job ID, and reference genome. This attention to detail ensures all ingredients are prepared before we begin cooking. Well done!

48-51: Regex patterns correctly validate project and file IDs

You've properly refined the regex patterns to validate the project and file IDs. This ensures only the finest ingredients make it into our dish, maintaining the highest quality standards.

RSWilson1

Reviewable status: 0 of 1 files reviewed, 12 unresolved discussions (waiting on @coderabbitai[bot] and @rklocke)

DI-435/merge_VCF_AF.sh

rklocke

Reviewed all commit messages.
Reviewable status: 0 of 1 files reviewed, 10 unresolved discussions (waiting on @coderabbitai[bot] and @RSWilson1)

rklocke · 2024-11-26T10:09:00Z

DI-435/merge_VCF_AF.sh

+for vcf in *vcf.gz; do
    command="${command} $vcf";
 done
 command="${command} > ../merged.vcf"
 echo "${command}"
-eval "${command}"
+eval "$command"


has the set -exo pipefail been added?

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

📥 Commits

Reviewing files that changed from the base of the PR and between bdcf6b2 and caf8833.

📒 Files selected for processing (1)

DI-435/merge_VCF_AF.sh (2 hunks)

🧰 Additional context used

📓 Learnings (1)

DI-435/merge_VCF_AF.sh (5)

Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:60-60
Timestamp: 2024-11-25T16:12:16.377Z
Learning: In `DI-435/merge_VCF_AF.sh`, the script handles cases where no VCF files are found by validating the input file (`input_file`) at the beginning, so additional checks later in the script are unnecessary.

Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:60-60
Timestamp: 2024-11-25T16:01:34.002Z
Learning: In `DI-435/merge_VCF_AF.sh`, the script handles cases where no VCF files are found by checking during the file download step, so additional checks later in the script are unnecessary.

Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:71-71
Timestamp: 2024-11-25T16:01:33.406Z
Learning: In 'DI-435/merge_VCF_AF.sh', the script already checks for the presence of VCF files during the download step, so additional checks when indexing normalized VCFs are unnecessary.

Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:76-78
Timestamp: 2024-11-25T16:01:30.595Z
Learning: In the script `DI-435/merge_VCF_AF.sh`, the presence of VCF files is already ensured by checking during the file download process.

Learnt from: rklocke
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:58-63
Timestamp: 2024-11-22T12:19:56.287Z
Learning: When improving bash scripts, prefer adding `set -exo pipefail` at the top to enhance error handling and robustness.

🔇 Additional comments (1)

DI-435/merge_VCF_AF.sh (1)

2-2: 🧹 Nitpick (assertive)

Consider the necessity of the '-x' option in 'set'

Now, we must ensure our kitchen is orderly! Including '-x' in set -exo pipefail enables debug mode, printing each command before execution. This can clutter your output if verbose logging isn't required. Ensure that this level of verbosity is intentional for your script's purpose.

⛔ Skipped due to learnings

Learnt from: rklocke
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:58-63
Timestamp: 2024-11-22T12:19:56.287Z
Learning: When improving bash scripts, prefer adding `set -exo pipefail` at the top to enhance error handling and robustness.

Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:60-60
Timestamp: 2024-11-25T16:12:16.377Z
Learning: In `DI-435/merge_VCF_AF.sh`, the script handles cases where no VCF files are found by validating the input file (`input_file`) at the beginning, so additional checks later in the script are unnecessary.

Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:76-78
Timestamp: 2024-11-25T16:01:30.595Z
Learning: In the script `DI-435/merge_VCF_AF.sh`, the presence of VCF files is already ensured by checking during the file download process.

Learnt from: RSWilson1
PR: eastgenomics/RD_requests#17
File: DI-435/merge_VCF_AF.sh:60-60
Timestamp: 2024-11-25T16:01:34.002Z
Learning: In `DI-435/merge_VCF_AF.sh`, the script handles cases where no VCF files are found by checking during the file download step, so additional checks later in the script are unnecessary.

coderabbitai · 2024-11-26T10:32:18Z

DI-435/merge_VCF_AF.sh

-for vcf in ./*vcf.gz; do
-    bcftools index "$vcf";
-done
+echo *vcf.gz | tr ' ' '\n' | xargs -n 1 -P "${num_proc}" -I {} bcftools index "{}"


🛠️ Refactor suggestion

Safely handle VCF filenames when indexing

Let's be meticulous with our ingredients! Using echo *vcf.gz can stumble over filenames with spaces. To prevent any mishaps, employ printf "%s\n" for accurate processing.

Here's the improved command:

-echo *vcf.gz | tr ' ' '\n' | xargs -n 1 -P "${num_proc}" -I {} bcftools index "{}" +printf "%s\n" *vcf.gz | xargs -n 1 -P "${num_proc}" -I {} bcftools index "{}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

echo *vcf.gz | tr ' ' '\n' | xargs -n 1 -P "${num_proc}" -I {} bcftools index "{}"

printf "%s\n" *vcf.gz | xargs -n 1 -P "${num_proc}" -I {} bcftools index "{}"

coderabbitai · 2024-11-26T10:32:19Z

DI-435/merge_VCF_AF.sh

+done < "$input_file"
+
+echo "Downloading files"
+echo "${project_files[@]}" | tr ' ' '\n' | xargs -n 1 -P "${num_proc}" -I {} dx download --no-progress "{}"


🛠️ Refactor suggestion

Use 'printf' instead of 'echo' to handle filenames safely

Attention to detail is vital in our culinary masterpiece! Using echo "${project_files[@]}" may mishandle filenames with spaces or special characters, leading to unexpected results. Let's replace echo with printf "%s\n" to ensure each filename is processed correctly.

Apply this refinement to enhance robustness:

-echo "${project_files[@]}" | tr ' ' '\n' | xargs -n 1 -P "${num_proc}" -I {} dx download --no-progress "{}" +printf "%s\n" "${project_files[@]}" | xargs -n 1 -P "${num_proc}" -I {} dx download --no-progress "{}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

echo "${project_files[@]}" | tr ' ' '\n' | xargs -n 1 -P "${num_proc}" -I {} dx download --no-progress "{}"

printf "%s\n" "${project_files[@]}" | xargs -n 1 -P "${num_proc}" -I {} dx download --no-progress "{}"

rklocke

Reviewed all commit messages.
Reviewable status: 0 of 1 files reviewed, 12 unresolved discussions (waiting on @coderabbitai[bot] and @RSWilson1)

RSWilson1 added 2 commits November 20, 2024 16:54

Added new code to create an array of files to download. Added checkin…

0a530e6

…g for empty input file.

added # for comment and blank line

5c0ebb4

RSWilson1 requested a review from rklocke November 22, 2024 09:55

RSWilson1 self-assigned this Nov 22, 2024

coderabbitai bot reviewed Nov 22, 2024

View reviewed changes

rklocke requested changes Nov 22, 2024

View reviewed changes

RSWilson1 added 2 commits November 25, 2024 14:53

Refactor merge_VCF_AF.sh:

fc2a770

Update input parameters, enhance error handling, and added parallel downloading and indexing for vcfs. Changed command naming to be more specific.

rm'd commented out code for production.

778d157

RSWilson1 commented Nov 25, 2024

View reviewed changes

coderabbitai bot reviewed Nov 25, 2024

View reviewed changes

changed num of processors for xargs to match available processors.

b6d726d

removed dependence on eval.

RSWilson1 commented Nov 25, 2024

View reviewed changes

coderabbitai bot reviewed Nov 25, 2024

View reviewed changes

RSWilson1 commented Nov 25, 2024

View reviewed changes

DI-435/merge_VCF_AF.sh Show resolved Hide resolved

DI-435/merge_VCF_AF.sh Show resolved Hide resolved

DI-435/merge_VCF_AF.sh Show resolved Hide resolved

small change to regex

bdcf6b2

coderabbitai bot reviewed Nov 25, 2024

View reviewed changes

RSWilson1 commented Nov 25, 2024

View reviewed changes

DI-435/merge_VCF_AF.sh Show resolved Hide resolved

rklocke reviewed Nov 26, 2024

View reviewed changes

Added set -exo

caf8833

coderabbitai bot reviewed Nov 26, 2024

View reviewed changes

rklocke approved these changes Nov 26, 2024

View reviewed changes

rklocke merged commit fcf4410 into main Nov 26, 2024
3 of 4 checks passed

coderabbitai bot mentioned this pull request Nov 26, 2024

DI-920 Changes for Medicover internal AF VCF #19

Merged

-# Check if the input file exists and is not empty
-if [ ! -s "$input_file" ]; then
-    echo "Input file is empty or does not exist."
-    exit 1
-fi
+# Check if the input file exists and is not empty
+if [ ! -s "$input_file" ]; then
+    echo "ERROR: Input file '$input_file' is empty or does not exist!"
+    exit 1
+fi
+# Validate job ID
+if [ -z "$job" ]; then
+    echo "ERROR: DNAnexus job ID is required!"
+    exit 1
+fi
+# Validate genome file
+if [ ! -f "$genome" ]; then
+    echo "ERROR: Reference genome file '$genome' does not exist!"
+    exit 1
+fi

-echo "Downloading files"
-for i in "${project_files[@]}"; do
-    echo "Processing: $i"
-    dx download $i
-done
+echo "Downloading files"
+downloaded_count=0
+total_files=${#project_files[@]}
+for i in "${project_files[@]}"; do
+    echo "Downloading ($((downloaded_count + 1))/$total_files): $i"
+    if ! dx download "$i"; then
+        echo "ERROR: Failed to download $i"
+        exit 1
+    fi
+    ((downloaded_count++))
+done
+echo "Successfully downloaded all $total_files files! ✨"

	echo *vcf.gz \| tr ' ' '\n' \| xargs -n 1 -P "${num_proc}" -I {} bcftools index "{}"
	printf "%s\n" *vcf.gz \| xargs -n 1 -P "${num_proc}" -I {} bcftools index "{}"

	echo "${project_files[@]}" \| tr ' ' '\n' \| xargs -n 1 -P "${num_proc}" -I {} dx download --no-progress "{}"
	printf "%s\n" "${project_files[@]}" \| xargs -n 1 -P "${num_proc}" -I {} dx download --no-progress "{}"

Di 435 bash improvements #17

Di 435 bash improvements #17

Conversation

RSWilson1 commented Nov 22, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Nov 22, 2024 • edited Loading

Walkthrough

Changes

Assessment against linked issues

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

rklocke Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

coderabbitai bot Nov 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rklocke Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

coderabbitai bot Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 25, 2024

Choose a reason for hiding this comment

Rate Limit Exceeded

coderabbitai bot Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 25, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 25, 2024

Choose a reason for hiding this comment

rklocke left a comment

Choose a reason for hiding this comment

rklocke Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

RSWilson1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Nov 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RSWilson1 commented Nov 22, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 22, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

coderabbitai bot Nov 22, 2024 •

edited

Loading

rklocke Nov 22, 2024 •

edited

Loading

rklocke Nov 26, 2024 •

edited

Loading

coderabbitai bot Nov 22, 2024 •

edited

Loading

coderabbitai bot Nov 22, 2024 •

edited

Loading

coderabbitai bot Nov 22, 2024 •

edited

Loading

rklocke Nov 22, 2024 •

edited

Loading

coderabbitai bot Nov 25, 2024 •

edited

Loading

coderabbitai bot Nov 25, 2024 •

edited

Loading

coderabbitai bot Nov 25, 2024 •

edited

Loading