-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Added support for string manipulating functions with more than one parameter #32356
Conversation
…rameter: CONCAT, LEFT, RIGHT, REPEAT, POSITION, LOCATE, REPLACE, SUBSTRING, INSERT
Pinging @elastic/es-search-aggs |
import static org.elasticsearch.xpack.sql.expression.function.scalar.script.ScriptTemplate.formatTemplate; | ||
|
||
/** | ||
* A binary string function that abstracts away the second parameter of a binary string function and its result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think "abstracts away" is quite right here.
/** | ||
* A binary string function that abstracts away the second parameter of a binary string function and its result. | ||
* This class is the base for binary functions that have the first parameter a string, the second parameter a number | ||
* or a string and the result can be a string or a number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The javadoc says the result can be a string or a number but the dataType
function below is always a KEYWORD
which makes me think it is always a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. dataType
should be implemented in the concrete classes where the specific type should be specified.
|
||
@Override | ||
public int hashCode() { | ||
return Objects.hash(left(), right(), operation()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think operation
belongs in equals or hashCode.
super(location, left, right); | ||
} | ||
|
||
protected abstract BiFunction<String, T, R> operation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be protected abstract R operation(String, R)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see why this is a function ref - so that the toString
generates the right method to invoke. That feels a little brittle but I understand what is up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add some javadoc here explaining this.
} | ||
|
||
public enum BinaryStringStringOperation implements BiFunction<String, String, Number> { | ||
//not implemented yet. Relies on soundex similarity for computation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should throw an exception here rather than return 0? Do we really plan on implementing this? I don't think we should drag the phonetic analysis plugin as a depndency of SQL. I worry that implementing these as scripts is fairly likely to send the wrong message about how to do fancy searching with Elasticsearch-SQL. Namely, you should use Elasticsearch queries for fancy string searches rather than SQL things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right @nik9000 . We discussed about this before and the fancy searches should be done with Elasticsearch directly. I'll remove the function from there.
|
||
int startInt = ((Number) start).intValue() - 1; | ||
int realStart = startInt < 0 ? 0 : startInt; | ||
StringBuilder sb = new StringBuilder(source instanceof String ? (String) source : ((Character) source).toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same deal as above with the instanceof
StringBuilder sb = new StringBuilder(source instanceof String ? (String) source : ((Character) source).toString()); | ||
String replString = (replacement instanceof String ? (String) replacement : ((Character) replacement).toString()); | ||
|
||
if (startInt > sb.length()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth double checking the bounds here because you already subtract one from it. It is sneaky enough that we need to have a tests case for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also maybe check this before building sb
and stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nik9000 good point about building sb
AFTER the possible return
.
But I'm failing to see the issue with startInt
s comparison with the length. Indeed, startInt
can be negative, but when it happens realStart
(the one used with StringBuilder) will be 0 and it shouldn't be an issue I think...
return source; | ||
} | ||
|
||
return sb.replace(realStart, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need sb
at all, actually? I think String.replace
works fine here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nik9000 not sure String.replace
will work here. String
s method is about replacing a certain sequence with another one in the given string. And doing this for each matching substring. The SQL insert
function is about replacing whatever characters (not having a certain template to search for) are found starting at a certain position within the target string, for length characters and put back in the replacement string.
@@ -28,3 +28,366 @@ len:i | first_name:s | |||
72 |Prasadram | |||
88 |Sreekrishna | |||
; | |||
|
|||
selectConcatWithOrderBy | |||
SELECT first_name f, last_name l, CONCAT(first_name,last_name) cct FROM test_emp ORDER BY CONCAT(first_name,last_name) LIMIT 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CONCAT(first_name, ' ', last_name)`? Feels a little more like what folks will do.
selectAsciiOfConcatWithGroupByOrderByCount | ||
SELECT ASCII(CONCAT("first_name","last_name")) ascii, COUNT(*) count FROM "test_emp" GROUP BY ASCII(CONCAT("first_name","last_name")) ORDER BY ASCII(CONCAT("first_name","last_name")) DESC LIMIT 10; | ||
|
||
ascii: i | count: l |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we usually have the space between the :
and the type?
…earch for concat function, small refactoring in the binary string functions.
@nik9000 thank you for the valuable feedback. I've made another commit with some changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to merge in master to get the BWC tests to pass. I left a few nitpicking comments, but otherwise LGTM.
|
||
import java.io.IOException; | ||
import java.util.function.BiFunction; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Javadocs would be really appreciated here.
|
||
import java.io.IOException; | ||
import java.util.function.BiFunction; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Javadocs would be helpful here.
String substring(String, int, int) | ||
String replace(String, String, String) | ||
Integer locate(String,String) | ||
Integer locate(String,String,Integer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: could you add spaces after commas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
super(location, left, right); | ||
} | ||
|
||
protected abstract BiFunction<String, T, R> operation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add some javadoc here explaining this.
…ne parameter (#32356) Added support for string manipulating functions with more than one parameter: CONCAT, LEFT, RIGHT, REPEAT, POSITION, LOCATE, REPLACE, SUBSTRING, INSERT
* 6.x: Fix scriptdocvalues tests with dates Correct minor typo in explain.asciidoc for HLRC Fix painless whitelist and warnings from backporting #31441 Build: Add elastic maven to repos used by BuildPlugin (#32549) Scripting: Conditionally use java time api in scripting (#31441) [ML] Improve error when no available field exists for rule scope (#32550) [ML] Improve error for functions with limited rule condition support (#32548) [ML] Remove multiple_bucket_spans [ML] Fix thread leak when waiting for job flush (#32196) (#32541) Painless: Clean Up PainlessField (#32525) Add @AwaitsFix for #32554 Remove broken @link in Javadoc Add AwaitsFix to failing test - see #32546 SQL: Added support for string manipulating functions with more than one parameter (#32356) [DOCS] Reloadable Secure Settings (#31713) Fix compilation error introduced by #32339 [Rollup] Remove builders from TermsGroupConfig (#32507) Use hostname instead of IP with SPNEGO test (#32514) Switch x-pack rolling restart to new style Requests (#32339) [DOCS] Small fixes in rule configuration page (#32516) Painless: Clean up PainlessMethod (#32476) SQL: Add test for handling of partial results (#32474) Docs: Add missing migration doc for logging change Build: Remove shadowing from benchmarks (#32475) Docs: Add all JDKs to CONTRIBUTING.md Logging: Make node name consistent in logger (#31588) High-level client: fix clusterAlias parsing in SearchHit (#32465) REST high-level client: parse back _ignored meta field (#32362) backport fix of reduceRandom fix (#32508) Add licensing enforcement for FIPS mode (#32437) INGEST: Clean up Java8 Stream Usage (#32059) (#32485) Improve the error message when an index is incompatible with field aliases. (#32482) Mute testFilterCacheStats Scripting: Fix painless compiler loader to know about context classes (#32385) [ML][DOCS] Fix typo applied_to => applies_to Mute SSLTrustRestrictionsTests on JDK 11 Changed ReindexRequest to use Writeable.Reader (#32401) Increase max chunk size to 256Mb for repo-azure (#32101) Mute KerberosAuthenticationIT fix no=>not typo (#32463) HLRC: Add delete watch action (#32337) Fix calculation of orientation of polygons (#27967) [Kerberos] Add missing javadocs (#32469) Fix missing JavaDoc for @throws in several places in KerberosTicketValidator. Make get all app privs requires "*" permission (#32460) Ensure KeyStoreWrapper decryption exceptions are handled (#32472) update rollover to leverage write-alias semantics (#32216) [Kerberos] Remove Kerberos bootstrap checks (#32451) Switch security to new style Requests (#32290) Switch security spi example to new style Requests (#32341) Painless: Add PainlessConstructor (#32447) Update Fuzzy Query docs to clarify default behavior re max_expansions (#30819) Remove > from Javadoc (fatal with Java 11) Tests: Fix convert error tests to use fixed value (#32415) IndicesClusterStateService should replace an init. replica with an init. primary with the same aId (#32374) auto-interval date histogram - 6.x backport (#32107) [CI] Mute DocumentSubsetReaderTests testSearch [TEST] Mute failing InternalEngineTests#testSeqNoAndCheckpoints TEST: testDocStats should always use forceMerge (#32450) TEST: Avoid deletion in FlushIT AwaitsFix IndexShardTests#testDocStats Painless: Add method type to method. (#32441) Remove reference to non-existent store type (#32418) [TEST] Mute failing FlushIT test Fix ordering of bootstrap checks in docs (#32417) Wrong discovery.type for azure in breaking changes (#32432) Mute ConvertProcessorTests failing tests TESTS: Move netty leak detection to paranoid level (#32354) (#32425) Upgrade to Lucene-7.5.0-snapshot-608f0277b0 (#32390) [Kerberos] Avoid vagrant update on precommit (#32416) TEST: Avoid triggering merges in FlushIT [DOCS] Fixes formatting of scope object in job resource Switch x-pack/plugin to new style Requests (#32327) Release requests in cors handle (#32410) Remove BouncyCastle dependency from runtime (#32402) Copy missing segment attributes in getSegmentInfo (#32396) Rest HL client: Add put license action (#32214) Docs: Correcting a typo in tophits (#32359) Build: Stop double generating buildSrc pom (#32408) Switch x-pack full restart to new style Requests (#32294) Painless: Clean Up PainlessClass Variables (#32380) [ML] Consistent pattern for strict/lenient parser names (#32399) Add Restore Snapshot High Level REST API Update update-settings.asciidoc (#31378) Introduce index store plugins (#32375) Rank-Eval: Reduce scope of an unchecked supression Make sure _forcemerge respects `max_num_segments`. (#32291)
* master: HLRC: Move commercial clients from XPackClient (#32596) Add cluster UUID to Cluster Stats API response (#32206) Security: move User to protocol project (#32367) [TEST] Test for shard failures, add debug to testProfileMatchesRegular Minor fix for javadoc (applicable for java 11). (#32573) Painless: Move Some Lookup Logic to PainlessLookup (#32565) TEST: Avoid merges in testSeqNoAndCheckpoints [Rollup] Remove builders from HistoGroupConfig (#32533) Mutes failing SQL string function tests due to #32589 fixed elements in array of produced terms (#32519) INGEST: Enable default pipelines (#32286) Remove cluster state initial customs (#32501) Mutes LicensingDocumentationIT due to #32580 [ML] Remove multiple_bucket_spans (#32496) [ML] Rename JobProvider to JobResultsProvider (#32551) Correct minor typo in explain.asciidoc for HLRC Build: Add elastic maven to repos used by BuildPlugin (#32549) Clarify the error message when a pipeline agg is used in the 'order' parameter. (#32522) Revert "[test] turn on host io cache for opensuse (#32053)" Enable packaging tests on suse boxes [ML] Improve error when no available field exists for rule scope (#32550) [ML] Improve error for functions with limited rule condition support (#32548) Painless: Clean Up PainlessField (#32525) Add @AwaitsFix for #32554 Remove broken @link in Javadoc Scripting: Conditionally use java time api in scripting (#31441) [ML] Fix thread leak when waiting for job flush (#32196) (#32541) Add AwaitsFix to failing test - see #32546 Core: Minor size reduction for AbstractComponent (#32509) SQL: Added support for string manipulating functions with more than one parameter (#32356) [DOCS] Reloadable Secure Settings (#31713) Watcher: Reenable HttpSecretsIntegrationTests#testWebhookAction test (#32456) [Rollup] Remove builders from TermsGroupConfig (#32507) Use hostname instead of IP with SPNEGO test (#32514) Switch x-pack rolling restart to new style Requests (#32339) NETWORKING: Fix Netty Leaks by upgrading to 4.1.28 (#32511) [DOCS] Small fixes in rule configuration page (#32516) Painless: Clean up PainlessMethod (#32476) Build: Remove shadowing from benchmarks (#32475) Docs: Add all JDKs to CONTRIBUTING.md Add licensing enforcement for FIPS mode (#32437) SQL: Add test for handling of partial results (#32474) Mute testFilterCacheStats [ML][DOCS] Fix typo applied_to => applies_to Scripting: Fix painless compiler loader to know about context classes (#32385)
This PR is the second part fix for #31604. It covers all the functions with more than one parameter: CONCAT, LEFT, RIGHT, REPEAT, POSITION, LOCATE, REPLACE, SUBSTRING, INSERT.