-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column type mismatch between partition schema and table schema- table schema is string & partition schema is decimal(4,0) #2817
Comments
The Hive connector should support any conversion that is supported consistently in Hive (certain conversions work different for different file formats which is harder). Any numeric type to string should work. It starts with This commit is a good starting point to see what needs to be changed and where the tests go: f0eb4f3 |
decimtal instead of decimal ? @anirbanch |
Is anyone looking into this issue? Also 1 qq, why Presto do check for partition metadata for all columns? For the above reported issue, user is not using any problematic column in the query anywhere but still Presto reports an issue as it compares the full table/partition metadata. We noticed that hive also reports the issue for same column but only when the column is used in the query. Now since the data type was changed for only 1-2 columns unsupported by Presto, all our users who are accessing this table are paying the penalty as none of our users are able to query this table now. This is our detail table which is used very heavily. Until this issue is fixed, as a workaround, we are thinking of creating another external table with the old structure on the older set of partitions or historical data specific to our Presto users to unblock them but would be great if the code can be fixed ASAP to support this. |
The reason is that Presto connectors need to return metadata for all columns in the table. The engine doesn’t ask for metadata about specific columns. |
The table/partition mismatch is actually a different issue. This happens at runtime during scheduling and it might be possible to only do it for used columns, but I don’t know the complexity of that change. |
Is there a workaround to this problem? We are getting the same issue:
All we are trying to do is to get the data for the previous day of the current date using code We are not sure why it's looking into |
I'm willing to pick this up starting the coming weekend. This is a pain point for me too. The existing workarounds are to create a new external table with the problematic column removed but it would be nice to support such conversions where possible. |
@electrum As a first step will allowing coercion from any numeric type to string be useful? I've started work on this in #4426. Having Presto be silent in case of schema mismatch on an unused column would require changes to the SPI as you said and I'm not sure it's a better decision than failing earlier with a clear message rather than people discovering the issue so late that they cannot even fix it. |
Our general rule for the Hive connector is that it should have the same behavior as Hive. If Hive can read the data in this evolution case, then we should as well (returning the same result). |
@degoyal No. I hadn't been able to spend a lot of time on this. Pending things are to verify if Hive allows implicit conversion between ANY numeric type and Strings. If someone could create a matrix of supported conversions it would help since I don't have a Hive setup handy and only devote weekends to this. |
For Hive setup, you can use this. Hive 1.2
Hive 3.1
|
any progress about this issue? @hashhar |
@cyofeiyue I had a PR open at #4426 but I ran into issues regarding all the possible type coercions that can be performed within Hive (since they are undocumented). I'm not actively working on it at the moment - whoever picks this up should focus on just a single coercion within a single pull-request otherwise it's unlikely that you'll be able to finish the work. |
After modifying the fields, the msck repair table is still inaccessible. The correct way is to delete the table, rebuild it, and then repair the partition, but it may be troublesome for internal tables |
Fixed by #14172 |
Query fails when table schema is defined as string but partition schema is decimal. Error message below:
Query failed (#1111): There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced. The column 'aaaa' in table 'ttttt' is declared as type 'string', but partition ‘pppp’ declared column 'aaaa' as type 'decimtal(4,0)'. [DB Errorcode=16777224]
The Hive Metastore has this column aaaa defined as type string but older partitions (like pppp) of parquet schema has that column defined as type decimal. Hive is able to read from the older partitions.
Ended up in this situation as the field type had to be changed from decimal to string, for business reasons.
Can Presto cast decimal to type string in above situation to return a result? (Not asking for string to decimal casting)
The text was updated successfully, but these errors were encountered: