-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add function & test for parsing table_or_uri #1138
add function & test for parsing table_or_uri #1138
Conversation
Signed-off-by: Marijn Valk <marijncv@hotmail.com>
Signed-off-by: Marijn Valk <marijncv@hotmail.com>
Signed-off-by: Marijn Valk <marijncv@hotmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for doing this. Would you be wiling to add Path
support to DeltaTable
too? Then we can clean up most of the unit tests that have to do the Path to str conversion.
data = pa.table({"vals": pa.array(["1", "2", "3"])}) | ||
table_or_uri = str(tmp_path / "delta_table") | ||
write_deltalake(table_or_uri, data) | ||
delta_table = DeltaTable(table_or_uri) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. We need DeltaTable.__init__
to support Path
too. Would you be willing to add that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added it there! And also in try_get_deltatable
python/deltalake/writer.py
Outdated
# Non-existant local paths are only accepted as fully-qualified URIs | ||
table_uri = "file://" + str(Path(table_or_uri).absolute()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by now we actually extended the path handling on the rust side to also support relative paths and scheme-less paths, and also create a directory if it does not yet exists.
Lines 353 to 376 in 67b512f
pub(crate) fn ensure_table_uri(table_uri: impl AsRef<str>) -> DeltaResult<Url> { | |
let table_uri = table_uri.as_ref(); | |
if let Ok(path) = std::fs::canonicalize(table_uri) { | |
return Url::from_directory_path(path) | |
.map_err(|_| DeltaTableError::InvalidTableLocation(table_uri.to_string())); | |
} | |
if let Ok(url) = Url::parse(table_uri) { | |
return Ok(match url.scheme() { | |
"file" => url, | |
_ => { | |
let mut new_url = url.clone(); | |
new_url.set_path(url.path().trim_end_matches('/')); | |
new_url | |
} | |
}); | |
} | |
// The table uri still might be a relative paths that does not exist. | |
std::fs::create_dir_all(table_uri) | |
.map_err(|_| DeltaTableError::InvalidTableLocation(table_uri.to_string()))?; | |
let path = std::fs::canonicalize(table_uri) | |
.map_err(|_| DeltaTableError::InvalidTableLocation(table_uri.to_string()))?; | |
Url::from_directory_path(path) | |
.map_err(|_| DeltaTableError::InvalidTableLocation(table_uri.to_string())) | |
} |
Is it maybe a good idea to rely on the same logic on the python side as well, so we are always consistent in the supported paths?
Of course supporting Path
objects will always be python territory :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer! That makes the python code a bit simpler
python/deltalake/writer.py
Outdated
) -> Tuple[Optional[DeltaTable], str]: | ||
"""Parses `table_or_uri` and returns `table` & `table_uri`. | ||
|
||
Raises a ValueError if `table_or_uri` is not of type `str`, `Path` or `DeltaTable` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe write this in numpy docstring format? https://numpydoc.readthedocs.io/en/latest/format.html#raises
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very sorry, but I said something wrong. We are not using the numpy format... we are using the reST format. not sure how raises are done there. @wjones127 - do you know? I could not find good docs for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, the format shown here looks very similar to what is used in other places in the codebase and it also specifies a way to describe a raise. I can rewrite it in that format if it's the correct one to use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's the format. Could you rewrite that?
Signed-off-by: Marijn Valk <marijncv@hotmail.com>
Signed-off-by: Marijn Valk <marijncv@hotmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is a great improvement :)
Signed-off-by: Marijn Valk <marijncv@hotmail.com> # Description Now that #1138 is merged, we can use `pathlib.Path` instead of `str` in `write_deltalake()` and `DeltaTable.__init__()` --------- Signed-off-by: Marijn Valk <marijncv@hotmail.com>
# Description Adds a more specific ValueError if a wrong `table_or_uri` is provided in the `write_delta_lake` function. It also adds support for a pathlib `Path` object. # Related Issue(s) - closes delta-io#1123 # Documentation <!--- Share links to useful documentation ---> --------- Signed-off-by: Marijn Valk <marijncv@hotmail.com>
Signed-off-by: Marijn Valk <marijncv@hotmail.com> # Description Now that delta-io#1138 is merged, we can use `pathlib.Path` instead of `str` in `write_deltalake()` and `DeltaTable.__init__()` --------- Signed-off-by: Marijn Valk <marijncv@hotmail.com>
Description
Adds a more specific ValueError if a wrong
table_or_uri
is provided in thewrite_delta_lake
function. It also adds support for a pathlibPath
object.Related Issue(s)
Documentation