-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve operator detection #150
Comments
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Hi, i have the same problem
Quick solution, add /(?<special>!=|[=%*/\-+,;:<>&\|\^])/ |
Hey, sorry for not replying here. Simply put I haven't had that much time the last few weeks due to various reasons. But next week is vacation, maybe I'll have some time then 😄 No promises though |
That works for me, but could you do a PR? |
Wait, that doesn't address the main problem I listed in this issue. Why not just go with what I originally suggested?
|
As @wkeese mentioned in the description a fix for this issue might (or more likely will) affect #149. I'm very keen on resolving the number detection issue, and I think that coming up with a good solution to that issue first would be the better approach here. Then we could look at how to adapt the matching logic introduced there to work with multi character operators. To add my five cents to the conversation regarding different regexes, without having actually tested these myself I believe implementing the more generic regex suggested by @wkeese would generally be a better approach since it takes away a bit of the burden maintaining that character list. That being there might be a risk it's a bit too greedy, so the tests will have to tell if it will actually work. |
You made a lot of good points. ISTM the root issue, probably impractical to solve, is that + and - can be unary or binary operators. So regardless of whether we have an explicit list of operators, this can be tokenized two ways: x<>-5 And so can this: x-5 But those are edge cases due to the lack of spaces. I do think it makes sense for number matching to take priority as that will do the right thing for common expressions like x > -5 |
PS: I added a PR #149 for the numbers. I points to the |
This fixes the "special" segments to include some missing operators from https://www.w3schools.com/sql/sql_operators.asp. It also keeps all the symbols previously specified as "special" even though some of them aren't listed on https://www.w3schools.com/sql/sql_operators.asp, specifically: != , ; : . Fixes scriptcoded#150
This fixes the "special" segments to include some missing operators from https://www.w3schools.com/sql/sql_operators.asp and other sites. I took the conservative approach of listing all the operators, as opposed to the general regexp /(?<special>[^\w\s"'`]+)/, because the conservative approach works better in certain cases such as "x>-5". The downside is that you need to list all the operators, so it's a bit fragile, especially since the exact operators vary by SQL version (MySQL, TransactSQL, Postgres, etc.). I kept all the symbols previously specified as "special" even though some of them aren't operators , specifically: , ; : . Fixes scriptcoded#150
This fixes the "special" segments to include some missing operators from https://www.w3schools.com/sql/sql_operators.asp and other sites. I took the conservative approach of listing all the operators, as opposed to the general regexp /(?<special>[^\w\s"'`]+)/, because the conservative approach works better in certain cases such as "x>-5". The downside is that you need to list all the operators, so it's a bit fragile, especially since the exact operators vary by SQL version (MySQL, TransactSQL, Postgres, etc.). I kept all the symbols previously specified as "special" even though some of them aren't operators, specifically: , ; : . Fixes scriptcoded#150
I added #193, now we match all operators. Actually, it's using "special" as the bucket for anything that doesn't match the other regexps (number, identifier, string, etc.). But I added lots of tests and I've convinced myself that it isn't too greedy. In particular, that minus, plus, or dot before a digit is treated as part of the number. |
This fixes the "special" segments to include some missing operators from https://www.w3schools.com/sql/sql_operators.asp and other sites. I took the conservative approach of listing all the operators, as opposed to the general regexp /(?<special>[^\w\s"'`]+)/, because the conservative approach works better in certain cases such as "x>-5". The downside is that you need to list all the operators, so it's a bit fragile, especially since the exact operators vary by SQL version (MySQL, TransactSQL, Postgres, etc.). I kept all the symbols previously specified as "special" even though some of them aren't operators, specifically: , ; : . Fixes #150
This fixes the "special" segments to include some missing operators from https://www.w3schools.com/sql/sql_operators.asp and other sites. I took the conservative approach of listing all the operators, as opposed to the general regexp /(?<special>[^\w\s"'`]+)/, because the conservative approach works better in certain cases such as "x>-5". The downside is that you need to list all the operators, so it's a bit fragile, especially since the exact operators vary by SQL version (MySQL, TransactSQL, Postgres, etc.). I kept all the symbols previously specified as "special" even though some of them aren't operators, specifically: , ; : . Fixes scriptcoded#150
This fixes the "special" segments to include some missing operators from https://www.w3schools.com/sql/sql_operators.asp and other sites. I took the conservative approach of listing all the operators, as opposed to the general regexp /(?<special>[^\w\s"'`]+)/, because the conservative approach works better in certain cases such as "x>-5". The downside is that you need to list all the operators, so it's a bit fragile, especially since the exact operators vary by SQL version (MySQL, TransactSQL, Postgres, etc.). I kept all the symbols previously specified as "special" even though some of them aren't operators, specifically: , ; : . Fixes #150
The fix for this in #193 has been merged to the beta branch and will be released very soon. I'll close this a bit prematurely, but feel free to reopen if you feel it's not fixed. |
This fixes the "special" segments to include some missing operators from https://www.w3schools.com/sql/sql_operators.asp and other sites. I took the conservative approach of listing all the operators, as opposed to the general regexp /(?<special>[^\w\s"'`]+)/, because the conservative approach works better in certain cases such as "x>-5". The downside is that you need to list all the operators, so it's a bit fragile, especially since the exact operators vary by SQL version (MySQL, TransactSQL, Postgres, etc.). I kept all the symbols previously specified as "special" even though some of them aren't operators, specifically: , ; : . Fixes #150
# [5.0.0](v4.4.2...v5.0.0) (2024-07-02) * chore!: add support for Node 22 ([9478bf1](9478bf1)) ### Bug Fixes * improve number detection ([02d459a](02d459a)), closes [#149](#149) * improve operator detection ([183a4fb](183a4fb)), closes [#150](#150) * typo in unknown segments ([70af287](70af287)), closes [#148](#148) [#178](#178) [#148](#148) ### Features * add way to style identifiers ([25677d4](25677d4)), closes [#147](#147) * release 5.1.0 ([cb0c0f1](cb0c0f1)) ### BREAKING CHANGES * The `default` segment has been split into `identifier` and `whitespace` segments. There's also a new `unknown` segment that will only show up for malformed SQL such as an unclosed string. However, the highlight() function works largely the same as before, both normal mode and HTML mode, except for the bug fix to stop classifying identifiers as strings. In other words, SQL like select * from EMP where NAME="John Smith" will get highlighted the same as before, i.e. no syntax highlighting for EMP or NAME. * drop support for Node 14.
# [5.0.0](v4.4.2...v5.0.0) (2024-07-02) * chore!: add support for Node 22 ([9478bf1](9478bf1)) ### Bug Fixes * improve number detection ([02d459a](02d459a)), closes [#149](#149) * improve operator detection ([183a4fb](183a4fb)), closes [#150](#150) * typo in unknown segments ([70af287](70af287)), closes [#148](#148) [#178](#178) [#148](#148) ### Features * add way to style identifiers ([25677d4](25677d4)), closes [#147](#147) ### BREAKING CHANGES * The `default` segment has been split into `identifier` and `whitespace` segments. There's also a new `unknown` segment that will only show up for malformed SQL such as an unclosed string. However, the highlight() function works largely the same as before, both normal mode and HTML mode, except for the bug fix to stop classifying identifiers as strings. In other words, SQL like select * from EMP where NAME="John Smith" will get highlighted the same as before, i.e. no syntax highlighting for EMP or NAME. * drop support for Node 14.
# [6.0.0](v5.0.0...v6.0.0) (2024-07-02) ### Bug Fixes * improve number detection ([02d459a](02d459a)), closes [#149](#149) * improve operator detection ([183a4fb](183a4fb)), closes [#150](#150) * typo in unknown segments ([70af287](70af287)), closes [#148](#148) [#178](#178) [#148](#148) ### Features * add way to style identifiers ([25677d4](25677d4)), closes [#147](#147) * release 5.1.0 ([3a58def](3a58def)) ### BREAKING CHANGES * The `default` segment has been split into `identifier` and `whitespace` segments. There's also a new `unknown` segment that will only show up for malformed SQL such as an unclosed string. However, the highlight() function works largely the same as before, both normal mode and HTML mode, except for the bug fix to stop classifying identifiers as strings. In other words, SQL like select * from EMP where NAME="John Smith" will get highlighted the same as before, i.e. no syntax highlighting for EMP or NAME.
🎉 This issue has been resolved in version 6.0.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
There are multiple multi-character operators in SQL such as
>=
,<=
,<>
, and maybe others. Currently, each of them except for!=
is split into multiple segments, for example:This precludes some (likely uncommon) HTML formatting, such as drawing a box around each operator.
Ideally, these multi-character operators would be classified as a single segment, i.e.
For example (if building on top of my previous PR #148):
The current regexp for "special" segments is:
Besides the problem/inefficiency I listed above, another problem with that regexp is that it misses some single-character operators, for example
&
and|
. See https://www.w3schools.com/sql/sql_operators.asp for a list.For those two reasons, I wonder if instead of the current regexp, we should just look for all sequences of punctuation, excluding the quotation marks that start strings and identifiers:
Admittedly as I wrote in #149 it's sometimes hard to differentiate between subtraction and negative numbers, and the regexp above would exacerbate that problem in corner cases like
AGE*-5
. It would work fine with spaces though, i.e.AGE * -5
.The text was updated successfully, but these errors were encountered: