-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make format prefix optional for format options in COPY #9723
Changes from all commits
e8f8406
fb54f05
a4352dd
3dc6c4b
6bb5bd4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -850,7 +850,16 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { | |
return plan_err!("Unsupported Value in COPY statement {}", value); | ||
} | ||
}; | ||
options.insert(key.to_lowercase(), value_string.to_lowercase()); | ||
if !(&key.contains('.')) { | ||
// If config does not belong to any namespace, assume it is | ||
// a format option and apply the format prefix for backwards | ||
// compatibility. | ||
|
||
let renamed_key = format!("format.{}", key); | ||
options.insert(renamed_key.to_lowercase(), value_string.to_lowercase()); | ||
} else { | ||
options.insert(key.to_lowercase(), value_string.to_lowercase()); | ||
} | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note for @devinjdangelo In the previous similar PR, we were only renaming keys when the file type matches JSON, CSV or PARQUET. I decided to not do this now since it'd seem weird if some file types need a Let me know if you think differently. Sidenote: Other file formats don't have any supported format options currently. |
||
|
||
let file_type = if let Some(file_type) = statement.stored_as { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tinfoil-knight @alamb, just would like to know the history of why we would make the value string lower case, I am encountering this case issue in another PR as described https://github.com/apache/datafusion/pull/10792/files#r1632877093. Thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change was originally added in this PR: #9382
It seems like the author of that PR was trying to consolidate case changes (using
to_lowercase
) in multiple places to just a single place.For now, you could add an exclusion for the access token option that is case sensitive in your PR so you can continue your work.
Ideally, we should probably stop lower-casing the options and handle case-sensitivity in the specific reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having keys lowercased makes sense for standardization purposes, but I'm not sure why we are also lowercasing option values. It may either be because (1) of the same standardization concerns, or (2) by mistake.
The most flexible way would be to make value (not key) standardization logic customizable, with the default being lowercase. If we had that, users like @xinlifoobar would be able to avoid standardization modifications. Other users who want other kinds of standardizations would be able to override it etc.
@tinfoil-knight, would you be interested in thinking about how we can do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened an issue here: #10853
Making the value standardization logic customizable makes sense. Integrations should receive the original value and be able to decide what to do with it.
I'll attempt a fix for the issue this weekend if it hasn't already been picked up by then.