-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to use polars 0.19.1 and arrow2 0.9 #234
Conversation
Thanks @glennpierce for the PR! You can use |
The reason I didn't run cargo fmt in the end was that I was getting errors on things I hadn't changed. For example
|
I ran cargo fmt on a different machine. My install on the other my be in a odd state |
…on, update python test for arrow2
// let schema = &self.arrow_schema.clone(); | ||
let (rbs, schema): (Vec<Chunk<ArrayRef>>, Arc<Schema>) = self.arrow()?; | ||
let fields: &[arrow2::datatypes::Field] = schema.fields.as_slice(); | ||
let rb: Chunk<ArrayRef> = rbs[0].clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks weird to me, seems like only the first chunk could be converted to polars DataFrame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right the TryFrom impl in polars only takes one Chucnk which is wrong the correct fix is to fix it there which I will do, However, I suggest the fix here is below. Should I create another pr for this or can you put it in ?
Or should we wait until I get it into polars git ?
#[throws(Arrow2DestinationError)]
pub fn polars(self) -> DataFrame {
use polars::prelude::NamedFrom;
let (rbs, schema): (Vec<Chunk<ArrayRef>>, Arc<Schema>) = self.arrow()?;
let fields: &[arrow2::datatypes::Field] = schema.fields.as_slice();
fn try_from(chunks: (&[Chunk<ArrayRef>], &[ArrowField])) -> std::result::Result<DataFrame, PolarsError> {
let mut series: Vec<Series> = vec![];
for chunk in chunks.0.iter() {
let columns_results: std::result::Result<Vec<Series>, PolarsError> = chunk
.columns()
.iter()
.zip(chunks.1)
.map(|(arr, field)| Series::try_from((field.name.as_ref(), arr.clone())))
.collect();
let mut columns = columns_results?;
if series.is_empty() {
for col in columns.iter() {
let name = col.name().to_string();
series.push(Series::new(&name, col));
}
}
for (i, col) in columns.into_iter().enumerate() {
series[i].append(&col);
}
}
DataFrame::new(series)
}
try_from((&rbs, fields)).unwrap()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! I think we can put it in this PR (and have an alpha version for it if you need one right now). After the TryFrom
impl is updated in polars we can update this code and release another version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed this fix. Does it mean I need to re initiate a new pull request ? (Sorry I haven't much experience working with github PR's)
Also I have left the toml pointing at the latest polars git and set arrow2 pointing to the same version polars does as when different they often cause issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to create a new PR. I have fixed a little bug and also add the version of arrow2 the same in connectorx-python
. It has passed the tests now. We can merge this PR now. Thanks for the fix!
Hi @glennpierce , I have added the json support for arrow2 (#235) and arrow2 0.9 support for the python binding. I also left a comment on getting the polars DataFrame, can you take a look? (might be the reason that cause the error in |
Update to use polars 0.19.1 and arrow2 0.9.