Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres: make sure table from the public schema doesn't get merged with table from other schemas #4224

Merged
merged 2 commits into from
Oct 10, 2019

Conversation

arikfr
Copy link
Member

@arikfr arikfr commented Oct 7, 2019

What type of PR is this? (check all applicable)

  • Bug Fix

Description

This pull request addresses an edge case in Postgres' schema handling:

By default we omit the public schema name from the table name. But there are edge cases, where this might cause conflicts. For example:

  • We have a schema named "main" with table "users".
  • We have a table named "main.users" in the public schema.

(while this feels unlikely, this actually happened)

In this case if we omit the schema name for the public table, we will have a conflict. This pull reuqest makes sure that if such collision exists, we will name the public schema table as public."main.users" (quotes are needed because of the dot).

This is probably relevant to other data sources, but for now limiting the change to the Postgres data source to make the change smaller and simpler.

@arikfr arikfr added the Backend label Oct 7, 2019
@arikfr arikfr requested a review from rauchy October 7, 2019 08:23
@arikfr arikfr merged commit 9d8812a into master Oct 10, 2019
@arikfr arikfr deleted the dudeup-schema branch October 10, 2019 10:02


def build_schema(query_result, schema):
# By default we omit the public schema name from the table name. But there are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you convert this to a heredoc or simply link to this PR's description?

# (while this feels unlikely, this actually happened)
# In this case if we omit the schema name for the public table, we will have
# a conflict.
table_names = set(map(lambda r: full_table_name(r['table_schema'], r['table_name']), query_result['rows']))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, as part of the Python 3 PR, there's a massive conversion of maps to comprehensions. While we can argue on which is nicer (hey, I'm a map guy!), I do feel should consider a consolidation of styles.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicolasLM is this something 2to3 does on its own?

Until python-3 hits master branch and as we merge new code, I would like to reduce the amount of manual changes you need to do. So if there are anything like this that you need to manually update, please let us know so we are mindful of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 2to3 replaces maps with list comprehensions (but only when the first argument given to map is a lambda). This is because in Python 3 map returns a lazy iterator while in Python 2.7 it returns a list, so 2to3 tries to preserve the return type. A similar thing happens with dict.keys() or dict.values().

In this specific example it does not matter since everything is wrapped in a set just after, so it will work in both Python 2.7 and Python 3.

I don't think you need to bother too much about this until the python-3 branch is merged as these incompatibilities are usually caught early on in tests after a rebase.

Comment on lines +84 to +90
if row['table_schema'] != 'public':
table_name = full_table_name(row['table_schema'], row['table_name'])
else:
if row['table_name'] in table_names:
table_name = full_table_name(row['table_schema'], row['table_name'])
else:
table_name = row['table_name']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an edge case, maybe it'll be best if it has less distraction?

Suggested change
if row['table_schema'] != 'public':
table_name = full_table_name(row['table_schema'], row['table_name'])
else:
if row['table_name'] in table_names:
table_name = full_table_name(row['table_schema'], row['table_name'])
else:
table_name = row['table_name']
table_name = row['table_name']
if row['table_schema'] != 'public' or table_name in table_names:
table_name = full_table_name(row['table_schema'], row['table_name'])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember having some issue with doing it this way, but looking at it now it seems legit. 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants