Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: generic support for "pseudo" or "hidden" columns #10049

Open
jcrist opened this issue Sep 6, 2024 · 0 comments
Open

feat: generic support for "pseudo" or "hidden" columns #10049

jcrist opened this issue Sep 6, 2024 · 0 comments

Comments

@jcrist
Copy link
Member

jcrist commented Sep 6, 2024

Some databases have "hidden" columns that can be referenced in a query, but won't show up (by default) in a SELECT *. Oracle calls these "pseudocolumns", and that name seems to have stuck with other databases too. Personally I think "hidden" is a more descriptive name, but 🤷

A few examples:

In #9375 a (pragmatic) hack was added to the bigquery backend to support filtering on _TABLE_SUFFIX for partitioned tables. This hack is unfortunate in a few ways:

  • The schema of the table expression won't always match the resulting schema (in the common case, the ibis schema has _TABLE_SUFFIX, while the result doesn't).
  • If a user intentionally tries to select _TABLE_SUFFIX, the resulting table still won't include it since we unconditionally drop the value. For example, t.select("_TABLE_SUFFIX", "a").execute() will just have "a".
  • It messes up the result handling pipeline, requiring a bunch of special casing (see fix(bigquery): fix column name mismatches and support _TABLE_SUFFIX in all to_* methods #10048).

I propose we drop this special case in favor of a generic mechanism. This won't be as convenient for users trying to access _TABLE_SUFFIX, but it will be generic and less of a pain to maintain.

I think the easiest way to do this would be to add a method to Table (I'll call it hidden here, but not attached, could also be pseudo/pseudo_col/pseudocol/).

# When called, this method takes a name and an optional type.
# If no type is given, the type is `unknown` and will require a cast
# to do much with it.
t.filter(t.hidden("_TABLE_SUFFIX", "string") > "abc")

# With no type specified, defaults to unknown
t.filter(t.hidden("_TABLE_SUFFIX").cast("string") > "abc")

One nice thing about this (besides dropping the special casing) is it still allows users to include these columns in the result set if they're explicitly asked for:

expr = t.mutate(t.hidden("_TABLE_SUFFIX"))
expr.to_pandas()  # this will include _TABLE_SUFFIX, while currently we don't
@jcrist jcrist changed the title feat: generic support for "pseudocolumns" feat: generic support for "pseudo" or "hidden" columns Sep 6, 2024
@cpcloud cpcloud added this to the 10.0 milestone Sep 9, 2024
@cpcloud cpcloud removed this from the 10.0 milestone Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: backlog
Development

No branches or pull requests

2 participants