Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to use polars 0.19.1 and arrow2 0.9 #234

Merged
merged 11 commits into from
Feb 15, 2022
Merged

Conversation

glennpierce
Copy link
Contributor

Update to use polars 0.19.1 and arrow2 0.9.

@wangxiaoying
Copy link
Contributor

Thanks @glennpierce for the PR! You can use cargo fmt to format the code so it could pass the check in ci.

@glennpierce
Copy link
Contributor Author

Thanks @glennpierce for the PR! You can use cargo fmt to format the code so it could pass the check in ci.

The reason I didn't run cargo fmt in the end was that I was getting errors on things I hadn't changed. For example

error: expected one of `,` or `=`, found `::`
 --> /home/glenn/devel/connector-x/connectorx/src/lib.rs:3:16
  |
3 | #![allow(clippy::upper_case_acronyms)]
  |                ^^

error: expected one of `=` or `]`, found `::`
 --> /home/glenn/devel/connector-x/connectorx/src/lib.rs:3:16
  |
3 | #![allow(clippy::upper_case_acronyms)]
  |                ^^

error: expected one of `,` or `as`, found `::`
 --> /home/glenn/devel/connector-x/connectorx/tests/test_mysql.rs:2:10
  |
2 |     array::{Float64Array, Int64Array, LargeStringArray},
  |          ^^

@glennpierce
Copy link
Contributor Author

Thanks @glennpierce for the PR! You can use cargo fmt to format the code so it could pass the check in ci.

The reason I didn't run cargo fmt in the end was that I was getting errors on things I hadn't changed. For example

error: expected one of `,` or `=`, found `::`
 --> /home/glenn/devel/connector-x/connectorx/src/lib.rs:3:16
  |
3 | #![allow(clippy::upper_case_acronyms)]
  |                ^^

error: expected one of `=` or `]`, found `::`
 --> /home/glenn/devel/connector-x/connectorx/src/lib.rs:3:16
  |
3 | #![allow(clippy::upper_case_acronyms)]
  |                ^^

error: expected one of `,` or `as`, found `::`
 --> /home/glenn/devel/connector-x/connectorx/tests/test_mysql.rs:2:10
  |
2 |     array::{Float64Array, Int64Array, LargeStringArray},
  |          ^^

I ran cargo fmt on a different machine. My install on the other my be in a odd state

// let schema = &self.arrow_schema.clone();
let (rbs, schema): (Vec<Chunk<ArrayRef>>, Arc<Schema>) = self.arrow()?;
let fields: &[arrow2::datatypes::Field] = schema.fields.as_slice();
let rb: Chunk<ArrayRef> = rbs[0].clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird to me, seems like only the first chunk could be converted to polars DataFrame.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right the TryFrom impl in polars only takes one Chucnk which is wrong the correct fix is to fix it there which I will do, However, I suggest the fix here is below. Should I create another pr for this or can you put it in ?
Or should we wait until I get it into polars git ?

#[throws(Arrow2DestinationError)]
    pub fn polars(self) -> DataFrame {
  
        use polars::prelude::NamedFrom;

        let (rbs, schema): (Vec<Chunk<ArrayRef>>, Arc<Schema>) = self.arrow()?;
        let fields: &[arrow2::datatypes::Field] = schema.fields.as_slice();

        fn try_from(chunks: (&[Chunk<ArrayRef>], &[ArrowField])) -> std::result::Result<DataFrame, PolarsError> {

            let mut series: Vec<Series> = vec![]; 

            for chunk in chunks.0.iter() {

                let columns_results: std::result::Result<Vec<Series>, PolarsError> = chunk
                    .columns()
                    .iter()
                    .zip(chunks.1)
                    .map(|(arr, field)| Series::try_from((field.name.as_ref(), arr.clone())))
                    .collect();

                let mut columns = columns_results?;

                if series.is_empty() {
                    for col in columns.iter() {
                        let name = col.name().to_string();
                        series.push(Series::new(&name, col));
                    }
                }

                for (i, col) in columns.into_iter().enumerate() {
                    series[i].append(&col);
                }
            }
    
            DataFrame::new(series)
        }

        try_from((&rbs, fields)).unwrap()
    }

Copy link
Contributor

@wangxiaoying wangxiaoying Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I think we can put it in this PR (and have an alpha version for it if you need one right now). After the TryFrom impl is updated in polars we can update this code and release another version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed this fix. Does it mean I need to re initiate a new pull request ? (Sorry I haven't much experience working with github PR's)

Also I have left the toml pointing at the latest polars git and set arrow2 pointing to the same version polars does as when different they often cause issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to create a new PR. I have fixed a little bug and also add the version of arrow2 the same in connectorx-python. It has passed the tests now. We can merge this PR now. Thanks for the fix!

@wangxiaoying
Copy link
Contributor

wangxiaoying commented Feb 14, 2022

Hi @glennpierce , I have added the json support for arrow2 (#235) and arrow2 0.9 support for the python binding. I also left a comment on getting the polars DataFrame, can you take a look? (might be the reason that cause the error in connectorx/tests/test_polars.rs: https://github.com/sfu-db/connector-x/runs/5177624204?check_suite_focus=true)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants