Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for case sensitive identifiers #2863

Open
cberner opened this issue May 4, 2015 · 35 comments
Open

Add support for case sensitive identifiers #2863

cberner opened this issue May 4, 2015 · 35 comments

Comments

@cberner
Copy link
Contributor

cberner commented May 4, 2015

  • We need to record the fact that identifiers are quoted/not-quoted in the sql
  • We need to extend the SPI, so that connectors can specify whether table/column..etc identifiers are case sensitive
  • This should work for fields in row types also
  • Rule should be that if either side is case insensitive then the match is done ignoring case
  • For case insensitive matching, if more than one item matches, throw an ambiguity exception
@jamiemccrindle
Copy link

+1

@electrum
Copy link
Contributor

I don't think the last one is correct. Case insensitive matching should only happen if the target is the correct case.

You can have multiple identifiers differing only in case. Case insensitive matching should match at most one.

@dain
Copy link
Contributor

dain commented May 12, 2015

@electrum correct. I updated the description.

@XiLongZheng
Copy link

Is there any plan to get this fixed so I could use Upper case in table name?

@dain
Copy link
Contributor

dain commented Jul 27, 2015

I took a look at this a few weeks back. The problem is the change is spread throughout the codebase and in parts that are actively being changed. After @martint's planner changes are in and @haozhun's changes to field dereferencing, the change should be doable.

@ashish6976
Copy link

Is there any other way to use upper case in database name and table name while querying from presto

@dain
Copy link
Contributor

dain commented Aug 18, 2015

@ashish6976 This issues is about changing the current Presto behavior to support full case sensitive identifiers.

From the users perspective, the current behavior of only supporting case insensitive identifiers should just work, unless you happen to have an existing system containing identifiers that only differ in case. If you are seeing failures when there is only one identifier, please file a new issue since that is a bug.

@ashish6976
Copy link

@dain Here are the scenarios in which I am facing isssues

My MYSQL Server is running on Centos.
I am using presto to query my MYSQL database using MYSQL connector where my catalog name is mysql.

Scenario 1 - DataBase name and Table name is combination of upper case and lower case letters

Database Name - TestDB
Table Names - EmployeeDetails, EmployeeTable
Query 1 - show schemas from mysql;
Output -
Schema

information_schema
performance_schema
testdb
(3 rows)

Query 20150818_064410_00003_837eu, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [3 rows, 61B] [25 rows/s, 524B/s]

Query 2 - show tables from mysql.testdb;
Output -
Table

(0 rows)

Query 20150818_064532_00004_837eu, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

In this case presto is not able to Fetch the table names which are present in database TestDB.

The Mysql output
mysql> show tables from TestDB;
+------------------+
| Tables_in_TestDB |
+------------------+
| EmployeeDetails |
| EmployeeTable |
+------------------+
2 rows in set (0.00 sec)

Scenario 2 - DataBase name is in lower case , Table name is combination of upper case and lower case letters

Database Name - lowercasedb
Table Names - TableOne, TableTwo
Query 1 - show schemas from mysql;
Output -
Schema

information_schema
lowercasedb
performance_schema
testdb
(4 rows)

Query 20150818_065347_00005_837eu, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [4 rows, 77B] [27 rows/s, 522B/s]

Query 2 - show tables from mysql.lowercasedb;
Output -
Table

tableone
tabletwo
(2 rows)

Query 20150818_065432_00006_837eu, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [2 rows, 66B] [15 rows/s, 505B/s]

Query 3 - select * from mysql.lowercasedb.tableone;
Output -
Query 20150818_065535_00007_837eu failed: Table mysql.lowercasedb.tableone does not exist

In this scenario presto is able to fetch the table names but when I am accessing the table the its giving me an error as shown above.

The Mysql output
mysql> select * from lowercasedb.TableOne;
+-----------+-----------+
| ColumnOne | ColumnTwo |
+-----------+-----------+
| 1 | Row 1 |
| 2 | Row 2 |
+-----------+-----------+
2 rows in set (0.00 sec)

Scenario 3 - DataBase name and Table name is in lower case letters

Database Name - lowercasedb
Table Names - tableone, tabletwo
Query 1 - show schemas from mysql;
Output -
Schema

information_schema
lowercasedb
lowercasetabledb
performance_schema
testdb
(5 rows)

Query 20150818_070234_00008_837eu, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [5 rows, 98B] [30 rows/s, 597B/s]

Query 2 - show tables from mysql.lowercasetabledb;
Output -
Table

tableone
tabletwo
(2 rows)

Query 20150818_070253_00009_837eu, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [2 rows, 76B] [17 rows/s, 652B/s]

Query 3 - select * from mysql.lowercasetabledb.tableone;
Output -
columnone | columntwo
-----------+-----------
1 | Row 1
2 | Row 2
(2 rows)

Query 20150818_070319_00010_837eu, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [2 rows, 0B] [8 rows/s, 0B/s]

In this scenario I am able to access the tables in the database.

@dain
Copy link
Contributor

dain commented Aug 18, 2015

@ashish6976 This is a bug in the mysql connector. Please open a new issue for the mysql connector.

@saileshmittal
Copy link

Any progress on this?

@samanthrao
Copy link

I have the same issue as @ashish6976 but in my case the database is MongoDB. Some of the database names and collections are in title case per company standards. When Presto loads the catalog, it converts everything to lowercase and results in zero records being processed. The only thing that works is, when a collection name or database name is lowercase.

Are there any alternative to overcome this?

@umnya
Copy link

umnya commented Feb 20, 2017

Any update about this ?

@knoguchi
Copy link

knoguchi commented Feb 22, 2017

I've read the source a bit. There are places in the Presto code like this
https://github.com/prestodb/presto/blob/master/presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java#L49

I think we have to remove those toLowerCase().

@img22
Copy link

img22 commented May 19, 2017

Is there a fix for this? I'm not able to query upper case tables

@faisal00813
Copy link

Any update about the fix?

@RameshByndoor
Copy link
Contributor

I am using 0.180 version of presto and issue is still present. We have all our mysql table names in UPPERCASE and any alternative or patches available..?

@MichaelAlo
Copy link

Hi there,
Is the issue about tables in upper case in MySQL and a Presto connection fixed? :)
Cheers

@Drizzt321
Copy link

I'm testing out Presto with MySQL, running into this exact issue. Is there any will, at all, to fix this rather big issue? Clearly there's more than 1 or 2 people, and this would prevent us from using Presto as our tables are managed by Hibernate, which creates them. I have seen https://dev.mysql.com/doc/refman/5.7/en/identifier-case-sensitivity.html, which setting it to 2 might work. I'm having trouble getting my MySQL instance to get that set properly though, but I'll report back if I manage to and if that works.

@Drizzt321
Copy link

So, sadly I can't get it set to '2', I suppose it's because I'm running on Linux, and so mysql really doesn't like it so it keeps reverting to 0. I can get it set to 1, which simply lowercases all table names and stores them on disk as lowercased, but that's not so useful for lots of existing tables.

@martint
Copy link
Contributor

martint commented Aug 4, 2017

For reference, here are the relevant parts from the SQL spec:

<delimited identifier> ::=
  <double quote> <delimited identifier body> <double quote>

<delimited identifier body> ::=  <delimited identifier part>...
<delimited identifier part> ::=
    <nondoublequote character>
  | <doublequote symbol>

<Unicode delimited identifier> ::=
  U <ampersand> <double quote> <Unicode delimiter body> <double quote>
      <Unicode escape specifier>
<Unicode escape specifier> ::=
  [ UESCAPE <quote> <Unicode escape character> <quote> ]
<Unicode delimiter body> ::=
  <Unicode identifier part>...
<Unicode identifier part> ::=
    <delimited identifier part>
  | <Unicode escape value>
24) For every <identifier body> IB there is exactly one corresponding case-normal form CNF. CNF is an <identifier body> derived from IB as follows:
Let n be the number of characters in IB. For i ranging from 1 (one) to n, the i-th character Mi of IB is transliterated into the corresponding character 
or characters of CNF as follows:
Case:
   a) If Mi is a lower case character or a title case character for which an equivalent upper case sequence U is de ned by Unicode, then let j be th
       e number of characters in U; the next j characters of CNF are U.
   b) Otherwise, the next character of CNF is Mi.
25) The case-normal form of the <identifier body> of a <regular identifier> is used for purposes such as and including determination of identifier 
      equivalence, representation in the Definition and Information Schemas, and representation in diagnostics areas.

...

27) Two <regular identifier>s are equivalent if the case-normal forms of their <identifier body>s, considered as the repetition of a <character string literal> 
that specifies a <character set specification> of SQL_IDENTIFIER and an implementation-defined collation IDC that is sensitive to case, compare equally 
according to the comparison rules in Subclause 8.2, “<comparison predicate>”.

28) A <regular identifier> and a <delimited identifier> are equivalent if the case-normal form of the <identifier body> of the <regular identifier> and the 
<delimited identifier body> of the <delimited identifier> (with all occurrences of <quote> replaced by <quote symbol> and all occurrences of 
<doublequote symbol> replaced by <double quote>), considered as the repetition of a <character string literal> that specifies a <character set specification>
 of SQL_IDENTIFIER and IDC, compare equally according to the comparison rules in Subclause 8.2, “<comparison predicate>”.


29) Two<delimited identifier>s are equivalent if their <delimited identifierbody>s,considered as the repetition of a <character string literal> that specifies
 a <character set specification> of SQL_IDENTIFIER and an implementation-defined collation that is sensitive to case, compare equally according to the
 comparison rules in Subclause 8.2, “<comparison predicate>”.

30) Two <Unicode delimited identifier>s are equivalent if their <Unicode delimiter body>s, considered as the repetition of a <character string literal> that
 specifies a <character set specification> of SQL_IDENTIFIER and an implementation-defined collation that is sensitive to case, compare equally according
 to the comparison rules in Subclause 8.2, “<comparison predicate>”.

31) A <Unicode delimited identifier> and a <delimited identifier> are equivalent if their <Unicode delimiter body> and <delimited identifier body>, 
respectively, each considered as the repetition of a <character string literal> that specifies a <character set specification> of SQL_IDENTIFIER and 
an implementation-defined collation that is sensitive to case, compare equally according to the comparison rules in Subclause 8.2, “<comparison predicate>”.

32) A <regular identifier> and a <Unicode delimited identifier> are equivalent if the case-normal form of the <identifier body> of the <regular identifier> 
and the <Unicode delimiter body> of the <Unicode delimited identifier> considered as the repetition of a <character string literal>, each specifying a
 <character set specification> of SQL_IDENTIFIER and an implementation-defined collation that is sensitive to case, compare equally according to the 
comparison rules in Subclause 8.2, “<comparison predicate>”.

@ghost
Copy link

ghost commented Apr 16, 2018

+1
Is there any progess on this? We are useing version 0.192 and the problem with Scenario 2 is still there.

@Drizzt321
Copy link

I have an open PR #8674 that got stalled waiting for some replies from maintainers, and then I got busy with work and unable to continue it. I don't see any time in the foreseeable future for me to pick this back up.

@nihalbtq8
Copy link

This is still an issue currently. Is there any alternative available?

@binque
Copy link

binque commented Aug 28, 2018

Any update ?

@kokosing
Copy link
Contributor

Any update ?

Unfortunately, no ;(

@dain
Copy link
Contributor

dain commented Aug 28, 2018

@hgschmie told me the other day he was looking at this

@SanjayJosh
Copy link

Any update on this? All the tables in my project are by default in upper case.
Or can any pointers be given in the code as to what can be changed so that a custom build for the same can be made?

@tooptoop4
Copy link

any update? hive metastore tables in mysql like DBS, PARTITIONS can't be queried.

presto> select * from mysql.metastore.DBS;
Query 20180913_015448_00010_774as failed: line 1:15: Table mysql.metastore.dbs does not exist

@ciscoring
Copy link
Contributor

Any updae on this issue? My collections in mongodb have names with upper case. Presto can't read these collections because it always change lower case.

@alvespat
Copy link

I'm getting the same issue with Sqlserver connectors and tables defined in uppercase... ( only be able to query tables where names are defined in lower case.... :-( (presto 0.203)

@cberner
Copy link
Contributor Author

cberner commented Jun 20, 2019

I no longer am actively working on Presto

@mizunno
Copy link

mizunno commented Dec 31, 2019

Any update on this? Thanks in advance.

@xqliang
Copy link

xqliang commented Oct 21, 2021

Fixed at Release 0.225.

JDBC Changes

Match schema and table names case insensitively. This behavior can be enabled by setting the case-insensitive-name-matching catalog configuration option to true.

@yys12138
Copy link

yys12138 commented Nov 18, 2021

I have a table that that only differ by case,for example ,columns ‘toPkgVErsName ' and ‘ toPkGVersName‘ . I can execute with 'select topkgversname from ......' correctly when i use hive.But the same sql in presto will be wrong (I use hue to connect with hive and presto).
and the error message is :

Multiple entries with same key: [topkgversname] required binary topkgversname (STRING)=min: , max: , num_nulls: 0 and [topkgversname] required binary topkgversname (STRING)=min: , max: walleve-1.0.269.171227.eaddc8f, num_nulls: 0 at io.prestosql.jdbc.AbstractPrestoResultSet.resultsException(AbstractPrestoResultSet.java:1731) at io.prestosql.jdbc.PrestoResultSet$ResultsPageIterator.computeNext(PrestoResultSet.java:216) at io.prestosql.jdbc.PrestoResultSet$ResultsPageIterator.computeNext(PrestoResultSet.java:176) at io.prestosql.jdbc.$internal.guava.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141) at io.prestosql.jdbc.$internal.guava.collect.AbstractIterator.hasNext(AbstractIterator.java:136) at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811) at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:295) at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:207) at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:162) at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:301) at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681) at io.prestosql.jdbc.PrestoResultSet$AsyncIterator.lambda$new$0(PrestoResultSet.java:122) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Is there any setting that could fix this problem?

@ebyhr
Copy link
Contributor

ebyhr commented Nov 18, 2021

@yys12138 It seems you are using Trino formerly PrestoSQL. The right repository is https://github.com/trinodb/trino and you can ask questions in the community Slack. https://trino.io/slack.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests