Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify column name length in PostgreSQL and SQL Server #13742

Merged
merged 4 commits into from
Aug 24, 2022

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Aug 19, 2022

Description

Fixes #12882

Documentation

(x) No documentation is needed.

Release notes

(x) Release notes entries required with the following suggested text:

# PostgreSQL
* Prevent using column names with a name which is longer
  than the max length supported by PostgreSQL. Previously, long names were
  truncated to the max length. ({issue}`13742`)
# SQL Server
* Prevent renaming a column to a name which is longer
  than the max length supported by SQL Server. Previously, long names were
  truncated to the max length. ({issue}`13742`)

@cla-bot cla-bot bot added the cla-signed label Aug 19, 2022
@ebyhr ebyhr force-pushed the ebi/long-column-name branch 4 times, most recently from 4bb1cb8 to 4c10f34 Compare August 19, 2022 09:34
@ebyhr ebyhr marked this pull request as ready for review August 19, 2022 23:14
@ebyhr ebyhr force-pushed the ebi/long-column-name branch from 4c10f34 to 88dae14 Compare August 19, 2022 23:21
@ebyhr ebyhr requested a review from hashhar August 20, 2022 03:10
@findepi
Copy link
Member

findepi commented Aug 23, 2022

@ebyhr can you please rebase? there are conflicts.

@ebyhr ebyhr force-pushed the ebi/long-column-name branch from 88dae14 to 831a156 Compare August 23, 2022 21:05
@ebyhr
Copy link
Member Author

ebyhr commented Aug 23, 2022

Rebased on upstream to resolved conflicts.

@@ -150,6 +150,8 @@
// An empty character means that the table doesn't have a comment in MariaDB
private static final String NO_COMMENT = "";

private static final int PARSE_ERROR = 1064;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it documented? can you link to docs?

@@ -457,7 +459,7 @@ public void renameColumn(ConnectorSession session, JdbcTableHandle handle, JdbcC
execute(connection, sql);
}
catch (TrinoException e) {
if (e.getCause() instanceof SQLSyntaxErrorException) {
if (e.getCause() instanceof SQLSyntaxErrorException syntaxError && syntaxError.getErrorCode() == PARSE_ERROR) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLSyntaxErrorException contains invalid column name.

add a code comment

// Note: SQLSyntaxErrorException can be throw also when column name is invalid

BTW do we need this complicated logic?
why not "drop support" for the older MariaDB version and simplify this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think throwing an MariaDB exception without conversion is also fine. Let me handle in a follow-up PR.

Comment on lines +795 to +799
jdbcColumn.getColumnName(),
newRemoteColumnName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't these quoted?

(i know they weren't quoted before the refactor)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that. Let me fix in a follow-up PR.

// PostgreSQL truncates table name to 63 chars silently
// PostgreSQL driver caches the max column name length in a DatabaseMetaData object. The cost to call this method per column is low.
if (columnName.length() > databaseMetadata.getMaxColumnNameLength()) {
throw new TrinoException(NOT_SUPPORTED, format("Column name must be shorter than or equal to '%s' characters but got '%s'", databaseMetadata.getMaxColumnNameLength(), columnName.length()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quote the contents of the columnName in the message besides their length.

It's important for CREATE TABLE statement, where there can be multiple potentially problematic columns in play

// SQL Server truncates table name to the max length silently when renaming a column
// SQL Server driver doesn't communicate with a server in getMaxColumnNameLength. The cost to call this method per column is low.
if (columnName.length() > databaseMetadata.getMaxColumnNameLength()) {
throw new TrinoException(NOT_SUPPORTED, format("Column name must be shorter than or equal to '%s' characters but got '%s'", databaseMetadata.getMaxColumnNameLength(), columnName.length()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quote the contents of the columnName in the message besides their length

@ebyhr ebyhr force-pushed the ebi/long-column-name branch from 831a156 to b01e763 Compare August 24, 2022 10:05
@ebyhr
Copy link
Member Author

ebyhr commented Aug 24, 2022

CI hit #13825 & #13199

@ebyhr ebyhr merged commit fe16cae into master Aug 24, 2022
@ebyhr ebyhr deleted the ebi/long-column-name branch August 24, 2022 22:04
@ebyhr ebyhr mentioned this pull request Aug 24, 2022
@github-actions github-actions bot added this to the 394 milestone Aug 24, 2022
@hashhar
Copy link
Member

hashhar commented Sep 10, 2023

Today while in looking into something related I saw a very disturbing behaviour from postgres. Directly in Postgres:

tpch=# select * from (select name as zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz, regionkey as zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz from tpch.region) t;
NOTICE:  identifier "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz" will be truncated to "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
NOTICE:  identifier "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz" will be truncated to "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
 zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz | zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
-----------------------------------------------------------------+-----------------------------------------------------------------
 AFRICA                                                          |                                                               0
 AMERICA                                                         |                                                               1
 ASIA                                                            |                                                               2
 EUROPE                                                          |                                                               3
 MIDDLE EAST                                                     |                                                               4
(5 rows)

tpch=# select zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz from (select name as zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz, regionkey as zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz from tpch.region) t;
NOTICE:  identifier "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz" will be truncated to "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
NOTICE:  identifier "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz" will be truncated to "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
ERROR:  column reference "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz" is ambiguous
LINE 1: select zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz...
               ^
tpch=# select zzzzzzzzzzzzzzzzzzzzzzzzzzzzz from (select name as zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz, regionkey as zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz from tpch.region) t;
NOTICE:  identifier "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz" will be truncated to "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
NOTICE:  identifier "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz" will be truncated to "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
ERROR:  column "zzzzzzzzzzzzzzzzzzzzzzzzzzzzz" does not exist
LINE 1: select zzzzzzzzzzzzzzzzzzzzzzzzzzzzz from (select name as zz...

This means that Postgres truncates ALL occurences of long identifiers and the truncated values are treated equivalently.
i.e. If postgres only allowed 2 char identifiers then zzz == zzzz == zzzzzzz and any other combination.

However when selecting from a relation which produces zzzz and zzz both get truncated to zz and the SELECT complains about ambiguity.

So it seems we're still "safe" in the sense that if for example join pushdown caused truncated identifiers then the worst thing that can happen would be query failure because postgres would complain of ambiguity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

PostgreSQL connector truncates long identifiers silently
3 participants