Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Underscore nullifies/ignored in replacement between matched groups #34

Closed
ammaraziz opened this issue Mar 12, 2024 · 8 comments
Closed

Comments

@ammaraziz
Copy link

Hi Shenwei,

Thanks for the awesome. I came across a weird issue. I am trying to replace this pattern:

Input:

1234_OC43_S17_R1.fastq.gz
1234_OC43_S17_R2.fastq.gz
1235_OC43_S24_R1.fastq.gz
...
etc

Expected output:

S17_1234_OC43_R1.fastq.gz
S17_1234_OC43_R2.fastq.gz
S24_1235_OC43_R1.fastq.gz
...
etc

I want to move S\d\d pattern from the middle to the first of the string. eg this: XXX_XXX_S17_R1.fastq.gz becomes S17_XXX_XXX_R1.fastq.gz.

Command I used:

brename -p "(.*)_(S\d+)_(.*)" -r '$2_$1_$3' -d

Output:

  [OK] 1234_OC43_S17_R1.fastq.gz -> R1.fastq.gz
  [OK] 1234_OC43_S17_R2.fastq.gz  -> R2.fastq.gz
  [overwriting newly renamed path] 1235_OC43_S24_R1.fastq.gz -> R1.fastq.gz

Odd! _ seems to completely nullify the group string eg $1

Replacing underscores with - fixes this issue:

brename -p "(.*)_(S\d+)_(.*)" -r '$2-$1-$3' -d

To me the issues seems to be replaced to the regex groups. Because when I run this to achieve the desired result it works:

brename -p "-" -r '_' -d

Thanks!

Versions:

brename v2.14.0
LSB Version:    core-11.1.0ubuntu4-noarch:security-11.1.0ubuntu4-noarch
Distributor ID: Linuxmint
Description:    Linux Mint 21
Release:        21
Codename:       vanessa
@shenwei356
Copy link
Owner

It's a common issue. see #33

@ammaraziz
Copy link
Author

Apologies I missed that issue when I did a quick search.

@shenwei356
Copy link
Owner

No worries, it happens a lot. I really have to add this to attention or FAQs.

@shenwei356
Copy link
Owner

Added:

Special cases of replacement string:
 *1. Capture variables better be in the format of '${1}'.
    a). If the capture variable is followed with space or other simple, it's OK:
            -r '$1 abc'
    b). If followed by numbers, characters, or underscore. That is ambiguous:
            -r '$1abc' actually refers to the variable '1abc', please use '${1}abc'.
            -r '$2_$1' actually refers to the variable '2_', please use '${2}_${1}'.
  2. Want to replace with a character '$',
    a). If using '{kv}', you need use '$$$$' instead of a single '$':
            -r '{kv}' -k <(sed 's/\$/$$$$/' kv.txt)
    b). If not, use '$$'. e.g., adding '$' to all numbers:
            -p '(\d+)' -d -r '$$${1}'

@ammaraziz
Copy link
Author

Thank you @shenwei356 !

If I may offer a suggestion on the wording:

Capture variables better be in the format of '${1}'.

[it] better be is an idiom in English that conveys an angry/threatening tone. I suggest changing the wording to something along the lines of:

Capture variables should be in the format of '${1}' to reduce errors

I hope you don't mind the correction.

Thank you again for taking the time to respond to my issue so quickly.

@shenwei356
Copy link
Owner

shenwei356 commented Mar 13, 2024

Thank you Ammar very much!
I never realized that, because the tones of these two phrases are reversed in Chinese ...
I'll correct that.

shenwei356 added a commit that referenced this issue Mar 13, 2024
@ammaraziz
Copy link
Author

That's super interesting. I can see how the word 'should' can be a bit forward.

I generally avoid the word as it's ambiguous but I couldn't think of another way to phrase it.

Thanks again!

@shenwei356
Copy link
Owner

"recommended to be" might be a good alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants