-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parser option enable_options_value_normalization #11330
Conversation
Thanks @xinlifoobar working on this issue. What I suggest is that we can let people to define their own normalizations. Rather than directing all values to lowercase, people can declare their own normalization methods for specific option keys. If we choose to do that in this PR, I believe the normalization process should be done during the assignment of table options. |
The APIs that could be used in the datafusion/datafusion/expr/src/planner.rs Lines 33 to 83 in 2f02c43
|
If the key is not found, use the default normalization or do nothing (according to your |
IMO it won't be a good idea... I thought the *configs are static values. What should you do if you would keep a config file for function values? |
I wonder what is the usecase for a user defined normalization? It may be more complicated to code I like the approach proposed in this PR. As I understand it if follows what we do for SQL: by default apply some hard coded normalization (e.g. to lower_case) and let users opt out of that normalization either by:
|
Hi again @xinlifoobar. The idea in my mind was something like this: synnada-ai@341b484. I haven't been able to work on it much, and it can't be compiled right now, but you can understand what I mean as an idea. By removing the lowercase transformations we made while parsing the statement, just like in this PR, it seems that we can easily normalize the fields we want in the desired way. cc @alamb |
Hi, I got the idea and am okay to make the change if we agree. The question is still the normalize go beyond the config itself. I am still unsure about this design because the normalize function is just for a small set of configs. in the future, if we have multiple functions for different purposes, the macro might become huge and difficult to read. |
Hey @berkaysynnada @alamb are we all ok with this design? I plan to update this PR tomorrow. |
Hi @xinlifoobar -- I haven't had a chance to review this PR unfortunately. Perhaps @tinfoil-knight has some time to review the design as the original filer of #10853 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks good overall. Some minor changes requested. I prefer the current implementation that changes the normalization for all option values.
@berkaysynnada 's solution allows option-specific normalization but I don't think that kind of fine-grained control over option values is needed.
Sidenote:
We could add traits later on that'd allow users specify custom normalization behavior for identifiers and option values through SqlToRel::new_with_options
.
// Example
pub trait Normalizer {
fn normalize(&self, key: &str, value: Value) -> Result<String>;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb The current design and PR looks good to me. Can we get this merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree -- this looks great to me. Than you @xinlifoobar and @tinfoil-knight
I'll plan to merge this tomorrow unless there are other comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@berkaysynnada 's solution allows option-specific normalization but I don't think that kind of fine-grained control over option values is needed.
We actually do need this in our use case. Let's do the general solution to avoid API churn on such a user-visible part of DataFusion.
@xinlifoobar and @tinfoil-knight, do you have time to apply @berkaysynnada's suggestion? It's OK if not, we can work on it instead.
Hey, since this is pending long. I could do this as a follow up after this is merged. |
Sure -- let's just make sure to avoid API churn and complete the work 🙂 |
How about we file a ticket explaining what else is needed prior to merging this PR. I can do this but I want to make sure I have the use case down correctly @berkaysynnada says:
The code in synnada-ai@341b484 I think seems to show a way to add an annotation to a built in config setting value that controls how it will be normalized. Is the use case to
Something else? |
Both. We would like to have key-level control on what normalization means for config options, built-in and extension. |
In the interest of keeping the code moving, I filed #11650 to track the additional feature and I merged up from main to resolve a conflict |
Sounds good -- this is good to go from my perspective. |
Clippy failure seems unrelated to this PR. I think it is #11651 |
Merged up to get the clippy fix |
🚀 thanks and sorry for the delay @xinlifoobar |
Which issue does this PR close?
Closes #10853
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?