-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(strings): make StringValue.capitalize()
consistent across backends
#8270
Conversation
5e16716
to
3591c99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this Nick!
Without doing a survey of what's more common, I personally would find capitalizing anything but the first character (and lowercasing the remaining) in the string surprising. I'd vote to follow Python's str.capitalize
behavior here. It's also easier to implement using more common string slice/concat/upper/lower primatives for backends that don't have this functionality.
@jcrist I wrote this when I thought the intended behavior was like |
896b977
to
2cf9885
Compare
Wow, after the sqlglot refactor this is SO much cleaner. Love it, thanks so much for your work on that big rewrite! |
2cf9885
to
15d12c7
Compare
This might deserve a callout fairly prominently in the release notes, because although it might technically be a fix, that is only due to a poor definition of what capitalize was supposed to do. I bet some users' results are going to change as a result of this. There was even a test in the postgres backend testing for the old semantics that I had to deleted. |
cfe6bfd
to
fa05985
Compare
It appears that exasol treats |
I filed a support ticket with exasol and asked them to chime in here. Curious if they will fix their implementation, or if we will need to have a custom workaround forever. |
fa05985
to
df81627
Compare
also, it appears that many backends have that "bug" that exasol does, so maybe it's not a bug. |
Except for BigQuery and Snowflake, where there are real credentials that are necessarily secret, you can find the credentials for all the backends in their respective |
I try
so I am ignoring the minio error (maybe a separate issue?), since exasol looks to be working, and then try and then get a bunch of
I am on an M1 mac, is that an issue? Am I doing something else wrong? |
You won't be able to run every service on macos. Some of these services use Linux-specific APIs. |
So yeah, as Phillip mentioned, not all of these work out of the box on mac. I believe @ncclementi has at least some of them working using |
OK, thanks. Do you think either of you could finish this PR up then in the meantime? |
I got the postgress one up and running, and I was able to run tests. But I was trying to see if I could get up the exasol container to run the tests but I didn't make it pass. I couldn't get the container up and healthy. I tried also using colima's virtualization but I get
When I check the logs I get
any idea what is wrong? Maybe this is because exasol only supports linux, and their image says https://github.com/EXASOL/docker-db?tab=readme-ov-file#host-os |
hmm, it seems to me that MacOS is the most likely cause of exasol not working, and doesn't look like there are easy workarounds. |
Yes, it seems people have been asking for a while for support of the exasol image for M1 but no answers. I bypass the error above providing more memory but the container is always "unhealthy", I hit this same error https://github.com/exasol/docker-db/issues/77 (it's been open for a while) and not sure what's the reason behind. |
Improve docstring, add some more test cases. Also remove some backend-specific test cases that are now redundant. I tried compiling the operation directly, but since differenct backedns have different semantics around concatenating null strings, there were a lot of compat code. Since StringConcat already has that compat built in already, let's leverage that.
df81627
to
444adbb
Compare
The amount of additional code generated here doesn't seem worth the trouble. For Oracle we end up generating this monstrosity to capitalize the empty string: SELECT
CASE
WHEN LENGTH('') = 0
THEN ''
ELSE CASE
WHEN UPPER(
CASE
WHEN (
0 + 1
) >= 1
THEN SUBSTR('', 0 + 1, 1)
ELSE SUBSTR('', 0 + 1 + LENGTH(''), 1)
END
) IS NULL
OR LOWER(
CASE
WHEN (
1 + 1
) >= 1
THEN SUBSTR('', 1 + 1, CASE WHEN (
LENGTH('') - 1
) < 0 THEN 0 ELSE LENGTH('') - 1 END)
ELSE SUBSTR(
'',
1 + 1 + LENGTH(''),
CASE WHEN (
LENGTH('') - 1
) < 0 THEN 0 ELSE LENGTH('') - 1 END
)
END
) IS NULL
THEN NULL
ELSE CONCAT(
UPPER(
CASE
WHEN (
0 + 1
) >= 1
THEN SUBSTR('', 0 + 1, 1)
ELSE SUBSTR('', 0 + 1 + LENGTH(''), 1)
END
),
LOWER(
CASE
WHEN (
1 + 1
) >= 1
THEN SUBSTR('', 1 + 1, CASE WHEN (
LENGTH('') - 1
) < 0 THEN 0 ELSE LENGTH('') - 1 END)
ELSE SUBSTR(
'',
1 + 1 + LENGTH(''),
CASE WHEN (
LENGTH('') - 1
) < 0 THEN 0 ELSE LENGTH('') - 1 END
)
END
)
)
END
END AS "Capitalize('')" And it's still not correct. |
Alright, it turns out the mishandling of the empty string is reproducible in the |
StringValue.capitalize()
consistent across backends
82b16e2
to
f34300e
Compare
Gosh, what a hellscape this problem is. |
You're a hero @cpcloud ! Thank you! |
…nds (ibis-project#8270) Fixes ibis-project#8271. BREAKING CHANGE: Backends that previously used initcap (analogous to str.title) to implement StringValue.capitalize() will produce different results when the input string contains multiple words (a word's definition being backend-specific).
Fixes #8271.
BREAKING CHANGE: Backends that previouslly used
initcap
(analogous tostr.title
) to implementStringValue.capitalize()
will produce different results when the input string contains multiple words (a word's definition being backend-specific).