Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: a lot of tables listed by default in Postgres #8763

Closed
1 task done
lostmygithubaccount opened this issue Mar 25, 2024 · 13 comments · Fixed by #8655
Closed
1 task done

feat: a lot of tables listed by default in Postgres #8763

lostmygithubaccount opened this issue Mar 25, 2024 · 13 comments · Fixed by #8655
Assignees
Labels
feature Features or general enhancements postgres The PostgreSQL backend

Comments

@lostmygithubaccount
Copy link
Member

Is your feature request related to a problem?

a lot (247) of tables are listed in con.list_tables() for a default postgres connection after running just up postgres:

[ins] In [1]: import ibis

[ins] In [2]: con = ibis.postgres.connect(host="localhost", user="postgres", password="postgres", database="ibis_testing")

[ins] In [3]: len(con.list_tables())
Out[3]: 247

most of these seem to be system tables (though I'm not that familiar w/ Postgres details):

See all tables
[ins] In [4]: con.list_tables()
Out[4]:
['_pg_foreign_data_wrappers',
 '_pg_foreign_servers',
 '_pg_foreign_table_columns',
 '_pg_foreign_tables',
 '_pg_user_mappings',
 'addr',
 'addrfeat',
 'administrable_role_authorizations',
 'applicable_roles',
 'attributes',
 'bg',
 'character_sets',
 'check_constraint_routine_usage',
 'check_constraints',
 'collation_character_set_applicability',
 'collations',
 'column_column_usage',
 'column_domain_usage',
 'column_options',
 'column_privileges',
 'column_udt_usage',
 'columns',
 'constraint_column_usage',
 'constraint_table_usage',
 'county',
 'county_lookup',
 'countysub_lookup',
 'cousub',
 'data_type_privileges',
 'direction_lookup',
 'domain_constraints',
 'domain_udt_usage',
 'domains',
 'edges',
 'element_types',
 'enabled_roles',
 'faces',
 'featnames',
 'foreign_data_wrapper_options',
 'foreign_data_wrappers',
 'foreign_server_options',
 'foreign_servers',
 'foreign_table_options',
 'foreign_tables',
 'geocode_settings',
 'geocode_settings_default',
 'geography_columns',
 'geometry_columns',
 'information_schema_catalog_name',
 'key_column_usage',
 'layer',
 'loader_lookuptables',
 'loader_platform',
 'loader_variables',
 'pagc_gaz',
 'pagc_lex',
 'pagc_rules',
 'parameters',
 'pg_aggregate',
 'pg_am',
 'pg_amop',
 'pg_amproc',
 'pg_attrdef',
 'pg_attribute',
 'pg_auth_members',
 'pg_authid',
 'pg_available_extension_versions',
 'pg_available_extensions',
 'pg_backend_memory_contexts',
 'pg_cast',
 'pg_class',
 'pg_collation',
 'pg_config',
 'pg_constraint',
 'pg_conversion',
 'pg_cursors',
 'pg_database',
 'pg_db_role_setting',
 'pg_default_acl',
 'pg_depend',
 'pg_description',
 'pg_enum',
 'pg_event_trigger',
 'pg_extension',
 'pg_file_settings',
 'pg_foreign_data_wrapper',
 'pg_foreign_server',
 'pg_foreign_table',
 'pg_group',
 'pg_hba_file_rules',
 'pg_ident_file_mappings',
 'pg_index',
 'pg_indexes',
 'pg_inherits',
 'pg_init_privs',
 'pg_language',
 'pg_largeobject',
 'pg_largeobject_metadata',
 'pg_locks',
 'pg_matviews',
 'pg_namespace',
 'pg_opclass',
 'pg_operator',
 'pg_opfamily',
 'pg_parameter_acl',
 'pg_partitioned_table',
 'pg_policies',
 'pg_policy',
 'pg_prepared_statements',
 'pg_prepared_xacts',
 'pg_proc',
 'pg_publication',
 'pg_publication_namespace',
 'pg_publication_rel',
 'pg_publication_tables',
 'pg_range',
 'pg_replication_origin',
 'pg_replication_origin_status',
 'pg_replication_slots',
 'pg_rewrite',
 'pg_roles',
 'pg_rules',
 'pg_seclabel',
 'pg_seclabels',
 'pg_sequence',
 'pg_sequences',
 'pg_settings',
 'pg_shadow',
 'pg_shdepend',
 'pg_shdescription',
 'pg_shmem_allocations',
 'pg_shseclabel',
 'pg_stat_activity',
 'pg_stat_all_indexes',
 'pg_stat_all_tables',
 'pg_stat_archiver',
 'pg_stat_bgwriter',
 'pg_stat_database',
 'pg_stat_database_conflicts',
 'pg_stat_gssapi',
 'pg_stat_progress_analyze',
 'pg_stat_progress_basebackup',
 'pg_stat_progress_cluster',
 'pg_stat_progress_copy',
 'pg_stat_progress_create_index',
 'pg_stat_progress_vacuum',
 'pg_stat_recovery_prefetch',
 'pg_stat_replication',
 'pg_stat_replication_slots',
 'pg_stat_slru',
 'pg_stat_ssl',
 'pg_stat_subscription',
 'pg_stat_subscription_stats',
 'pg_stat_sys_indexes',
 'pg_stat_sys_tables',
 'pg_stat_user_functions',
 'pg_stat_user_indexes',
 'pg_stat_user_tables',
 'pg_stat_wal',
 'pg_stat_wal_receiver',
 'pg_stat_xact_all_tables',
 'pg_stat_xact_sys_tables',
 'pg_stat_xact_user_functions',
 'pg_stat_xact_user_tables',
 'pg_statio_all_indexes',
 'pg_statio_all_sequences',
 'pg_statio_all_tables',
 'pg_statio_sys_indexes',
 'pg_statio_sys_sequences',
 'pg_statio_sys_tables',
 'pg_statio_user_indexes',
 'pg_statio_user_sequences',
 'pg_statio_user_tables',
 'pg_statistic',
 'pg_statistic_ext',
 'pg_statistic_ext_data',
 'pg_stats',
 'pg_stats_ext',
 'pg_stats_ext_exprs',
 'pg_subscription',
 'pg_subscription_rel',
 'pg_tables',
 'pg_tablespace',
 'pg_timezone_abbrevs',
 'pg_timezone_names',
 'pg_transform',
 'pg_trigger',
 'pg_ts_config',
 'pg_ts_config_map',
 'pg_ts_dict',
 'pg_ts_parser',
 'pg_ts_template',
 'pg_type',
 'pg_user',
 'pg_user_mapping',
 'pg_user_mappings',
 'pg_views',
 'place',
 'place_lookup',
 'referential_constraints',
 'role_column_grants',
 'role_routine_grants',
 'role_table_grants',
 'role_udt_grants',
 'role_usage_grants',
 'routine_column_usage',
 'routine_privileges',
 'routine_routine_usage',
 'routine_sequence_usage',
 'routine_table_usage',
 'routines',
 'schemata',
 'secondary_unit_lookup',
 'sequences',
 'spatial_ref_sys',
 'sql_features',
 'sql_implementation_info',
 'sql_parts',
 'sql_sizing',
 'state',
 'state_lookup',
 'street_type_lookup',
 'tabblock',
 'tabblock20',
 'table_constraints',
 'table_privileges',
 'tables',
 'topology',
 'tract',
 'transforms',
 'triggered_update_columns',
 'triggers',
 'udt_privileges',
 'usage_privileges',
 'user_defined_types',
 'user_mapping_options',
 'user_mappings',
 'view_column_usage',
 'view_routine_usage',
 'view_table_usage',
 'views',
 'zcta5',
 'zip_lookup',
 'zip_lookup_all',
 'zip_lookup_base',
 'zip_state',
 'zip_state_loc']

Describe the solution you'd like

if possible, avoid listing system tables

What version of ibis are you running?

main

What backend(s) are you using, if any?

Postgres; would be nice to do this for others if it applies

Code of Conduct

  • I agree to follow this project's Code of Conduct
@lostmygithubaccount lostmygithubaccount added the feature Features or general enhancements label Mar 25, 2024
@lostmygithubaccount lostmygithubaccount added the postgres The PostgreSQL backend label Mar 25, 2024
@gforsyth
Copy link
Member

Can you try out my no schema branch? I think I fixed this over there

@lostmygithubaccount
Copy link
Member Author

lostmygithubaccount commented Mar 25, 2024

mostly fixed! down to 3:

[ins] In [1]: import ibis

[ins] In [2]: con = ibis.postgres.connect(host="localhost", user="postgres", password="postgres", database="ibis_testing")

[ins] In [3]: con.list_tables()
Out[3]: ['geography_columns', 'geometry_columns', 'spatial_ref_sys']

if it's possible/you want to remove those you can leave this open, or perhaps just mark your PR as closing this issue?

@gforsyth
Copy link
Member

I'll self-assign this and tackle it as a followup after #8655 is merged.

@gforsyth gforsyth self-assigned this Mar 25, 2024
@gforsyth
Copy link
Member

Hmm, well, I'm not sure what to do about those tables -- it's two views and one table and they are in the public "schema" (grrr).

ibis_testing=# \dt public.*
              List of relations
 Schema |      Name       | Type  |  Owner   
--------+-----------------+-------+----------
 public | spatial_ref_sys | table | postgres
(1 row)

ibis_testing=# \dv public.*
              List of relations
 Schema |       Name        | Type |  Owner   
--------+-------------------+------+----------
 public | geography_columns | view | postgres
 public | geometry_columns  | view | postgres
(2 rows)

We can add special handling to remove them from the output of list_tables but I think we'd want to know that users wouldn't ever need to see them.

@jcrist
Copy link
Member

jcrist commented Mar 25, 2024

We can add special handling to remove them from the output of list_tables but I think we'd want to know that users wouldn't ever need to see them.

IMO I think the optimal behavior for "system" tables like these would be:

  • They don't show up in things like list_tables cluttering the output
  • Users can still explicitly get a reference to them by doing con.table(name) to access them if they need to

Filtering them from list_tables seems fine to me.

@gforsyth
Copy link
Member

Currently on the kill-schema branch, if no catalog or database is provided, we use the default values of self.current_catalog and self.current_database -- which for the Ibis testing container should be ibis_testing.public.

I think filtering out the above system tables for a "clean" con.list_tables() sounds good.

Do we want to be less prescriptive if the user is explicitly asking for things in ibis_testing.public?

e.g. do we want something like this? ( I genuinely don't know -- I think I don't like this but I'm waffling)

>>> con.list_tables()
[]
>>> con.list_tables(database=("ibis_testing", "public"))
['geography_columns', 'geometry_columns', 'spatial_ref_sys']

@cpcloud
Copy link
Member

cpcloud commented Mar 25, 2024

I think if we consider non-system tables to be any table that wasn't created by postgres itself, that was created using a CREATE TABLE statement or the corresponding C API, then we should include tables created by extensions in the output of list_tables().

@jcrist
Copy link
Member

jcrist commented Mar 25, 2024

I think we should be uniform in how we're filtering, so if "system" tables can appear in other databases/catalogs then we should also filter there.

I could also go with Phillip's definition of system tables above, although I think the geo-extension is common enough (and creates some known tables) so special-casing and excluding those here also seems fine to me.

@cpcloud
Copy link
Member

cpcloud commented Mar 25, 2024

It seems like extension tables are probably placed into the public schema for a reason, so listing them doesn't strike me as a horrible choice

@cpcloud
Copy link
Member

cpcloud commented Mar 25, 2024

Postgres' default behavior is not as simple as "look in public", it's got something analogous to PATH but for the schema search space, called search_path, further complicating things.

@lostmygithubaccount
Copy link
Member Author

from a UX perspective, I mainly just want to easily see the tables I create in the database. that's hard with 247 but fine with 3 in there by default. no strong opinion from me on whether those 3 should be listed or not but it's fine for my use cases at least

@NickCrews
Copy link
Contributor

I vote don't do anything clever, show everything in public. If you don't want clutter (which is reasonable) then this should get fixed at the geospatial extension level.

@gforsyth
Copy link
Member

I vote don't do anything clever
that's a good guiding principle.

Ok, I'm going set #8655 to close this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements postgres The PostgreSQL backend
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants