Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an IbisInterface #4517

Merged
merged 68 commits into from
Nov 20, 2020
Merged

Add an IbisInterface #4517

merged 68 commits into from
Nov 20, 2020

Conversation

philippjfr
Copy link
Member

@philippjfr philippjfr commented Jul 10, 2020

Adds an IbisInterface built on the Ibis project which will let us interface directly with various database backends.

So far this is a basic skeleton which we can fill in bit by bit. The first item for discussion is what object a HoloViews Dataset should wrap. To me it seems most appropriate for it to wrap any Expr object which represents some expression on a Database, e.g. the selection of an individual Table or any other operation. The question then is what types of expressions should be supported, e.g. can or should we support a ColumnExpr or must the basic unit always represent a table.

Implementation status

Core methods

  • init: Basic constructor
  • range: Computes the range along a column
  • values: Returns all values along a column (or only unique values)
  • length: Returns number of rows in the table
  • nonzero: Whether there are any rows in the table
  • dtype: Return the dtype of the column
  • isscalar: Checks whether column value has a single unique value
  • select: Select given boolean mask or given scalar, range or list selection(s)
  • groupby: Returns one or more Datasets grouped by the values along one or more columns
  • redim: Rename columns in the table
  • iloc: Integer indexing of rows and columns
  • sort: Sort by one or more columns ascending/descending
  • aggregate: Aggregate along one or more columns with given function (e.g. mean, min, max)
  • sample: A list of selections

Write methods (these methods write back to the data)

  • add_dimension: Add a column to the table (can this be supported?)
  • mask: Mask out values given a boolean mask
  • assign: Add or overwrite one or more columns

Conversion methods

  • array: Return a NumPy array of requested columns

Tests Status

Passing: 73
Error: 34
Fail: 0
Skip: 16

Cc: @kcpevey @tonyfast

@philippjfr philippjfr changed the title Add a IbisInterface Add an IbisInterface Jul 10, 2020
@MarcSkovMadsen
Copy link
Collaborator

MarcSkovMadsen commented Jul 10, 2020

I have been thinking of HoloViews/ Panel and Ibis lately on the back of a blog by quantstack.

I cannot read from the above the why. So i hope you will write a bit about the why or vision.

For me the vision would be to make it very easy to use a wide range of data sources within HoloViz. Similarly to how easy it is to use BI tools like Tableau with All kinds of data sources.

And maybe this interface could be used to only load the data needed to display in the plot or do the aggregation on the backend server.

For me unfortunately Ibis does not support sql server or snowflake which means i cannot use it in my Daily Work.

@jbednar
Copy link
Member

jbednar commented Jul 27, 2020

Yes, the vision is to be able to instantiate a HoloViews element where the "data" is lazy, just a query and not the actual data, with the data fetched dynamically when it is actually needed for plotting or analysis. That does indeed set HoloViews up to do some more of the things that BI tools do.

If you have any funding, I would imagine that Quansight would add support to Ibis for sql server, and maybe snowflake...

@kcpevey
Copy link
Collaborator

kcpevey commented Jul 27, 2020

@MarcSkovMadsen Adding additional support in Ibis is definitely something we at Quansight would be open to discussing.

@tonyfast
Copy link
Collaborator

There is some WIP on a microsoft sql server interface ibis-project/ibis#1997

@philippjfr
Copy link
Member Author

This is starting to look good. I've got a few comments but will hold off until you tell me it's ready.

@tonyfast
Copy link
Collaborator

tonyfast commented Aug 4, 2020

I'd love to take some comments soon. I'm nearing the end of the skeleton and starting to understand the model better.

Is my PR what is breaking the travis tests?

@philippjfr
Copy link
Member Author

philippjfr commented Aug 4, 2020 via email

@philippjfr
Copy link
Member Author

Will have a go at this now.

@philippjfr
Copy link
Member Author

Huge amount of progress, 73 tests passing and 34 errors.

@philippjfr
Copy link
Member Author

Okay the only real problems remaining:

  • iloc doesn't work
  • groupby and aggregate do not preserve ordering (this is okay, just needs to be documented since the same is true for cuDFs).

@tonyfast
Copy link
Collaborator

Any ideas why the appveyor ci is getting mad at the sqlite connections? I can't seem to find a solution.

@tonyfast
Copy link
Collaborator

tonyfast commented Aug 25, 2020

I don't think it is possible to avoid sorting for some ibis backends. what should be done in this case? is it likely that test_dataset_groupby_second_dim is failing because of sorting problems?

@philippjfr
Copy link
Member Author

That's fine, we just have to document it and update the test. The cuDF interface has the same problem currently.

@dharhas
Copy link
Member

dharhas commented Aug 31, 2020

Looks like travis_wait 30 doit develop_install $CHANS_DEV -o $HV_REQUIREMENTS is failing, If I understand right this is waiting 30 mins and then dying so 30 mins is isn't long enough for this command to execute. Not sure why this command is taking greater than 30 mins though.

@philippjfr
Copy link
Member Author

Looks like travis_wait 30 doit develop_install $CHANS_DEV -o $HV_REQUIREMENTS is failing, If I understand right this is waiting 30 mins and then dying so 30 mins is isn't long enough for this command to execute. Not sure why this command is taking greater than 30 mins though.

I'll rebase, I fixed that one master. Pulling from conda-forge made it very slow.

@goanpeca
Copy link

I'll rebase, I fixed that one master. Pulling from conda-forge made it very slow.

Hi @philippjfr, I am looking with @tonyfast into the failing tests. Had to reset a commit.

@philippjfr
Copy link
Member Author

I had dropped conda-forge because it was taking forever to solve in the Py2 build. Will see if we can get around that somehow, otherwise let's just increase the timeout. Looks like we're still getting the NoWindowOp errors, still the wrong version of Ibis?

@tonyfast
Copy link
Collaborator

tonyfast commented Sep 1, 2020

naw. i'm doing something wrong. i had to add ibis builds from conda-forge cause i couldn't get it from anywhere the other channels. I've got to keep trying things otherwise I'll likely have to duck some of the iloc tests until a new version of ibis shows up.

@philippjfr
Copy link
Member Author

'll likely have to duck some of the iloc tests until a new version of ibis shows up.

Totally fine, I'd suggest adding a decorator that checks for a specific version:

from unittest import skipIf
ibis_skip = skipIf(ibis is None or ibis_version < '...', 'Ibis not available or wrong version')

@kcpevey
Copy link
Collaborator

kcpevey commented Sep 2, 2020

The sister PR to add rowid support in Ibis has been merged ibis-project/ibis#2345

@philippjfr
Copy link
Member Author

Thanks for all your efforts here @kcpevey and @tonyfast! I'm merging now and hoping that we can get some docs in before release.

@philippjfr philippjfr merged commit 13e4417 into master Nov 20, 2020
@jbednar jbednar added this to the v1.14.0 milestone Nov 20, 2020
@kcpevey kcpevey deleted the ibis_interface branch April 22, 2022 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants