Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres: make sure table from the public schema doesn't get merged with table from other schemas #4224

Merged
merged 2 commits into from
Oct 10, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 33 additions & 11 deletions redash/query_runner/pg.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,38 @@ def _wait(conn, timeout=None):
raise psycopg2.OperationalError("select.error received")


def full_table_name(schema, name):
if '.' in name:
name = u'"{}"'.format(name)

return u'{}.{}'.format(schema, name)


def build_schema(query_result, schema):
# By default we omit the public schema name from the table name. But there are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you convert this to a heredoc or simply link to this PR's description?

# edge cases, where this might cause conflicts. For example:
# * We have a schema named "main" with table "users".
# * We have a table named "main.users" in the public schema.
# (while this feels unlikely, this actually happened)
# In this case if we omit the schema name for the public table, we will have
# a conflict.
table_names = set(map(lambda r: full_table_name(r['table_schema'], r['table_name']), query_result['rows']))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, as part of the Python 3 PR, there's a massive conversion of maps to comprehensions. While we can argue on which is nicer (hey, I'm a map guy!), I do feel should consider a consolidation of styles.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicolasLM is this something 2to3 does on its own?

Until python-3 hits master branch and as we merge new code, I would like to reduce the amount of manual changes you need to do. So if there are anything like this that you need to manually update, please let us know so we are mindful of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 2to3 replaces maps with list comprehensions (but only when the first argument given to map is a lambda). This is because in Python 3 map returns a lazy iterator while in Python 2.7 it returns a list, so 2to3 tries to preserve the return type. A similar thing happens with dict.keys() or dict.values().

In this specific example it does not matter since everything is wrapped in a set just after, so it will work in both Python 2.7 and Python 3.

I don't think you need to bother too much about this until the python-3 branch is merged as these incompatibilities are usually caught early on in tests after a rebase.


for row in query_result['rows']:
if row['table_schema'] != 'public':
table_name = full_table_name(row['table_schema'], row['table_name'])
else:
if row['table_name'] in table_names:
table_name = full_table_name(row['table_schema'], row['table_name'])
else:
table_name = row['table_name']
Comment on lines +84 to +90
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an edge case, maybe it'll be best if it has less distraction?

Suggested change
if row['table_schema'] != 'public':
table_name = full_table_name(row['table_schema'], row['table_name'])
else:
if row['table_name'] in table_names:
table_name = full_table_name(row['table_schema'], row['table_name'])
else:
table_name = row['table_name']
table_name = row['table_name']
if row['table_schema'] != 'public' or table_name in table_names:
table_name = full_table_name(row['table_schema'], row['table_name'])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember having some issue with doing it this way, but looking at it now it seems legit. 🤷‍♂️


if table_name not in schema:
schema[table_name] = {'name': table_name, 'columns': []}

schema[table_name]['columns'].append(row['column_name'])


class PostgreSQL(BaseSQLQueryRunner):
noop_query = "SELECT 1"

Expand Down Expand Up @@ -112,17 +144,7 @@ def _get_definitions(self, schema, query):

results = json_loads(results)

for row in results['rows']:
if row['table_schema'] != 'public':
table_name = u'{}.{}'.format(row['table_schema'],
row['table_name'])
else:
table_name = row['table_name']

if table_name not in schema:
schema[table_name] = {'name': table_name, 'columns': []}

schema[table_name]['columns'].append(row['column_name'])
build_schema(results, schema)

def _get_tables(self, schema):
'''
Expand Down
23 changes: 23 additions & 0 deletions tests/query_runner/test_pg.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@

from unittest import TestCase
from redash.query_runner.pg import build_schema


class TestBuildSchema(TestCase):
def test_handles_dups_between_public_and_other_schemas(self):
results = {
'rows': [
{'table_schema': 'public', 'table_name': 'main.users', 'column_name': 'id'},
{'table_schema': 'main', 'table_name': 'users', 'column_name': 'id'},
{'table_schema': 'main', 'table_name': 'users', 'column_name': 'name'},
]
}

schema = {}

build_schema(results, schema)

self.assertIn('main.users', schema.keys())
self.assertListEqual(schema['main.users']['columns'], ['id', 'name'])
self.assertIn('public."main.users"', schema.keys())
self.assertListEqual(schema['public."main.users"']['columns'], ['id'])