Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different Results vs. VBScript? #19

Open
jpbro opened this issue Sep 6, 2017 · 7 comments
Open

Different Results vs. VBScript? #19

jpbro opened this issue Sep 6, 2017 · 7 comments
Assignees
Labels

Comments

@jpbro
Copy link
Owner

jpbro commented Sep 6, 2017

When Multiline = TRUE and Global = True (for VbScript/NA for my wrapper) consider the following subject:

"File1.zip.exe" & vbCrLf & "File2.com" & vbCrLf & "File 3"

And the following regex:

.*$

VBScript returns 6 matches, but my wrapper returns only 2. Who is right?

@jpbro jpbro added the question label Sep 6, 2017
@jpbro jpbro self-assigned this Sep 6, 2017
jpbro added a commit that referenced this issue Sep 6, 2017
…2 wrapper.

Issue: #19

Signed-off-by: Jason Peter Brown <jason@bitspaces.com>
@jpbro
Copy link
Owner Author

jpbro commented Sep 6, 2017

See changes to modTests.TestRegex2 method for a demonstration as per commit 66c88a1

@dragokas
Copy link

dragokas commented Sep 6, 2017

I see.
1

Strange, that PCRE2 produces in fact only one significant result: File1.zip.exe instead of three.

However, I think such .*$ regexp is incorrect at all. It is the same like (about)? regexp. Mean: you are trying to find empty string (as one of true results). If you enter such regexp e.g. on some online java regexp tester it will produce error, mean that regexp should not allow an empty strings as one of the true results of execution.

From this point of view, difference in results between VBS/PCRE2 is only a matter of its internal error handler mechanism which has different realization.

So, in real .+$ shoud be used instead of .*$.

As a conclusion, personally I believe that it is not necessary to touch such behavior. Anyway, if I would change something, I would detect regexp string that allow empty result and replace result string with raising error.

@dragokas
Copy link

dragokas commented Sep 6, 2017

Although, if VBS already produces the most complete result, I still would not have refused if PCRE2 would produces the same result to support strategy of PCRE2 as analogue of VBS.Regexp to show at least these 3 lines for .*$

  • File1.zip.exe
  • File2.com
  • File 3

But I don't khow, how you can track such cases and not break anything else.

@jpbro
Copy link
Owner Author

jpbro commented Sep 6, 2017

Yeah it's a bit of a weird one - interesting that some online regexp sites produce an error, but PCRE2 and VBScript produce results (albeit different). Makes it hard to know what the best approach is.

It might be that there is a PCRE2 option flag to handle this situation, I'll ahve to look at them all more closely (or maybe it's just up to my Global matching loop to work a bit differently to produce the same results as VBScript).

I don't have time to look closer right now, but I will try soon.

@dragokas
Copy link

dragokas commented Sep 6, 2017

According to my tests, no option pre-defined in your class allow to change behavior, except:

  • Dollar Matches End of string Only
  • Dot matches All Characters
    which affect all text falls into first substring, like:
    Match Count: 2
    
    #1: File1.zip.exe
    File2.com
    File 3
                                  Sub.#1: 
    #2: 
                                  Sub.#2: 

@skacurt
Copy link

skacurt commented Feb 16, 2019

Who is right?

Both results are correct. The wrong here is your expectation.

Multiline = True in VBScript's RegExp simply means ^$ match at line breaks which is an option that must explicitly set (as you did in VBScript) for PCRE, namely PCRE2_MULTILINE.

So it seems OK, you just changed the default behavior for VBScript but not for PCRE in your test.

@skacurt
Copy link

skacurt commented Feb 16, 2019

Oh, I forgot to mention. I've never used your wrapper.
If you're sure that the PCRE2_MULTILINE flag is set in your wrapper that means a problem of your wrapper or PCRE. VBScript's RegExp works as it should in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants